Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff

From: gscottjones:mchsi at: 18-May-2002 6:51

From: "Geza Lakner MD"
> > In this case, the pattern has a bit more information: > > a=á<b<c<cs<d<e=é<f<g<gy<h<i=í...<z<zs > > where "a" can be told to sort the same as "a with acute", both of these
> > before "b" ... and "zs" sorts after "z" > Actually a<>á and e<>é ... more clearly a<á and e<é. In some relaxed > situtations the equivalence could be stated but the Hungarian grammar > is much more complex that I could be an "ex catedra" judge about it.
How do you like Carl's representation?
<snip> > Back to these di-graphemes: they are important, fundamental parts of > our language but personally I can live without sorting them correctly > in a computer program. :-)
That was the final opinion of the Hungarian author (Péter Szigetvári) of the website I was using as a reference. By the way, he offers a number of format conversion tools that are Hungarian friendly. They are written in Perl. I almost have the ISO-8859-2 character set (for central europe) mapped based on a the various sort orders that we discussed earlier. (I just remembered that I forgot Petr K's Czech "ch" -- darn!) If you would like to use Carl R's nifty sorting parser, I can transform the various sorting orders into patterns for easy use (that was a very clever idea). What I do not have is any authoritative resource that tells me the best order that covers "all" the bases. My fear is that the letters with diacritics may sort differently in the various languages covered by the ISO-8859-2 character set: Albanian, Bosnian, Croatian, Czech, English, Finnish, Hungarian, Irish, German, Polish, Romanian, Serbian (Latin transcription), Slovak, Slovenian, and Sorbian (Lusatian). My master table can now handle any permutation, but it is the actual orders that are so hard to come across. Thanks for the feedback on the "multi-letter graphemes." --Scott Jones