Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff

From: geza67:freestart:hu at: 17-May-2002 22:59

Hello Scott
> In this case, the pattern has a bit more information: > a=á<b<c<cs<d<e=é<f<g<gy<h<i=í...<z<zs > where "a" can be told to sort the same as "a with acute", both of these sort > before "b" ... and "zs" sorts after "z"
Actually a<>á and e<>é ... more clearly a<á and e<é. In some relaxed situtations the equivalence could be stated but the Hungarian grammar is much more complex that I could be an "ex catedra" judge about it.
> Geza, how important are these "multi-letter graphemes" (cs, dz, dzs, gy, ly, > ny, sz, ty and zs) in a sort algorithm? At the same site, Péter Szigetvári
dz and dzs are good for translative ortography i.e. for transcribing foreign words. E.g. dzs means j (the Hungarian language is more phonetic-oriented than any other indo-europian or latin-legacy language families). cs, gy, ly, ny, sz, ty and zs are "inborn" Hungarian specialities, many words has them as components. How important they are? That's a very hard question because in a mixed language text (e.g. Hungarian medical report intersprsed with medical latin terminology) one should understand the word itself to specify its corresponding sorting order: e.g. in a Hungarian word the "ly" phoneme (which roughly corresponds to the English "y", but in Hungarian "j" is phonetically also equivalent with "ly" but ortographically different words use the one than the other). If you don't know the word you cannot even decide its hyphenation, as you wrote:
> "Unfortunately, the task is not trivial: some sequences that look like > multi-letter graphemes are in fact not, e.g., bércsík may be ranked before > or after bérczerge depending on its morphology: bér+csík (after bérczerge) > or bérc+sík (before bérczerge). This can be decided only with a
bér-csík or bérc-sík - different sorting order and even different hyphenation (just for fulfilling your presmued curiosity what these words mean: the 1st one could be translated to payment-stripe [not a logical word combination] the second one to a geographical plane [correct Hungarian word]. Without a dictionary, no program can get through this, not even a semantic parser. Back to these di-graphemes: they are important, fundamental parts of our language but personally I can live without sorting them correctly in a computer program. :-) -- Best regards, Geza mailto:[geza67--freestart--hu]