Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff

From: carl:cybercraft at: 16-May-2002 19:35

On 16-May-02, G. Scott Jones wrote:
> Hi, Carl, > The idea did look promising, even for the "multi-letter graphemes" > (like the czech "ch"), but then I believe we run into a limitation > of 'parse. The longer phrase rule needs to come before the shorter > one, so that: > rule-4: pattern-rule ["a" "A" "b" "B" "c" "C" "h" "H" "ch" "Ch"] > will not correctly sort: >>> pattern-sort ["c" "ch" "h"] rule-4 > == ["ch" "c" "h"] > ;should be "c" "h" "ch" > At least one other person has mused over the desire to have a > pattern sort (in this case under the gnu Linux sort) (look near the > bottom): > http://budling.nytud.hu/~szigetva/etcetera/converters/README > In this case, the pattern has a bit more information: > a==E1<b<c<cs<d<e==E9<f<g<gy<h<i==ED...<z<zs > where "a" can be told to sort the same as "a with acute", both of > these sort before "b" ... and "zs" sorts after "z" > Breaking apart this information might allow a parse rule to set-up > the sequence to allow the longer phrase rules to come before the > shorter ones. At least I think it would work.
My first thoughts are that it'd work too, but then we're talking about my coding here. (; Anyway, only the order of the parse rules should need to be changed. ie, this is what's currently generated...
>> probe rule-4
[some ["a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "c" (r 5) | "C" (r 6) | "h" (r 7) | "H" (r 8) | "ch" (r 9) | "Ch" (r 10) | skip (r 11)]] Moving the "ch"s to the front of the rule gives us this... rule-5: [some ["ch" (r 9) | "Ch" (r 10) | "a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "c" (r 5) | "C" (r 6) | "h" (r 7) | "H" (r 8) | skip (r 11) ]] Using that fixes your error above...
>> pattern-sort ["c" "ch" "h"] rule-5
== ["c" "h" "ch"] though it screws up string sorting big-time...
>> pattern-sort "cchh" rule-5
== "bch" (: Anyway, I'll see if I can get it to behave, and I'll try out the speed improvements I thought of as well. -- Carl Read