Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff

From: gscottjones:mchsi at: 12-May-2002 21:11

From: "Geza Lakner MD"
<snip> > The right order for Hungarian vowels: actually the diaresis characters > come first and then the double acute ones (only o and u have double > accents in the Hungarian alphabet): > oO > uU
This was easy to fix.
> Unfortunately the case-insensitiveness does not work. Look: > hungarian-sort ["alom" "lom" "lom" "llam"] > == ["alom" "lom" "llam" "lom"] > > Though it should read: > alom llam lom lom.
Yes, this is a problem. My current algorithm will not easily accommodate this change. I now can even remember thinking last year that the approach might cause a problem, but the test samples presented apparently did not detect this problem at that time. Hmmm. Time to go back to the drawing board. I already have an idea, but it may take a while before I have some time to create the new algorithm.
> - The /case refinement results in the same result as the one without > it :-( : > >> hungarian-sort/case ["alom" "lom" "lom" "llam"] > == ["alom" "lom" "llam" "lom"] > > The case-sensitive collation sequence IMHO would be a bit different than > you have defined, namely: > aA...eE... > > Your order was: > aA...eE...
There end up being two issues at work here. Having the order as aA...eE... was not my intention. What I was aiming to do was a..e..A..E... which may also not seem correct to you; however, this behavior mirrors REBOL's default behavior for the /case switch, but does differ in placing the little letters before the capital letters. Petr K. said that this was the more normal method in eastern europe (Czech language in his case). So I was trying to reflect this pattern, but did make the one ordering error. The REBOL 'sort /case switch will sort all the words first by whether the letter is capital or not. In fact, REBOL places all the words that begin in capital letters _before_ the words that begin in small letters (because of the ascii number assigned to the letters). Maybe we need an additional switch that allows for the eastern european desire to have smalls before capitals, and to interleave these together as you suggest. Sometimes it would be handy to have these options too here in the US. Just need a clever name or names for these switches (or paths in REBOLese). Any ideas are welcomed.
> - and so on for all affected special accented chars.
and so on for life in general! :-) I'll repost after I have a chance to develop the new algorithm that I have in mind. "Stay tuned" Thanks for your feedback! --Scott Jones