[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff
From: gscottjones:mchsi at: 18-May-2002 6:51
From: "Geza Lakner MD"
> > In this case, the pattern has a bit more information:
> > a=á<b<c<cs<d<e=é<f<g<gy<h<i=í...<z<zs
> > where "a" can be told to sort the same as "a with acute", both of these
sort
> > before "b" ... and "zs" sorts after "z"
> Actually a<>á and e<>é ... more clearly a<á and e<é. In some relaxed
> situtations the equivalence could be stated but the Hungarian grammar
> is much more complex that I could be an "ex catedra" judge about it.
How do you like Carl's representation?
<snip>
> Back to these di-graphemes: they are important, fundamental parts of
> our language but personally I can live without sorting them correctly
> in a computer program. :-)
That was the final opinion of the Hungarian author (Péter Szigetvári) of the
website I was using as a reference. By the way, he offers a number of
format conversion tools that are Hungarian friendly. They are written in
Perl.
http://budling.nytud.hu/~szigetva/etcetera/Hungarian.html
I almost have the ISO-8859-2 character set (for central europe) mapped based
on a the various sort orders that we discussed earlier. (I just remembered
that I forgot Petr K's Czech "ch" -- darn!) If you would like to use Carl
R's nifty sorting parser, I can transform the various sorting orders into
patterns
for easy use (that was a very clever idea). What I do not have
is any authoritative resource that tells me the best order that covers "all"
the bases. My fear is that the letters with diacritics may sort differently
in the various languages covered by the ISO-8859-2 character set: Albanian,
Bosnian, Croatian, Czech, English, Finnish, Hungarian, Irish, German,
Polish, Romanian, Serbian (Latin transcription), Slovak, Slovenian, and
Sorbian (Lusatian). My master table can now handle any permutation, but it
is the actual orders that are so hard to come across.
Thanks for the feedback on the "multi-letter graphemes."
--Scott Jones