[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff
From: carl:cybercraft at: 16-May-2002 19:35
On 16-May-02, G. Scott Jones wrote:
> Hi, Carl,
> The idea did look promising, even for the "multi-letter graphemes"
> (like the czech "ch"), but then I believe we run into a limitation
> of 'parse. The longer phrase rule needs to come before the shorter
> one, so that:
> rule-4: pattern-rule ["a" "A" "b" "B" "c" "C" "h" "H" "ch" "Ch"]
> will not correctly sort:
>>> pattern-sort ["c" "ch" "h"] rule-4
> == ["ch" "c" "h"]
> ;should be "c" "h" "ch"
> At least one other person has mused over the desire to have a
> pattern sort (in this case under the gnu Linux sort) (look near the
> bottom):
> http://budling.nytud.hu/~szigetva/etcetera/converters/README
> In this case, the pattern has a bit more information:
> a==E1<b<c<cs<d<e==E9<f<g<gy<h<i==ED...<z<zs
> where "a" can be told to sort the same as "a with acute", both of
> these sort before "b" ... and "zs" sorts after "z"
> Breaking apart this information might allow a parse rule to set-up
> the sequence to allow the longer phrase rules to come before the
> shorter ones. At least I think it would work.
My first thoughts are that it'd work too, but then we're talking about
my coding here. (;
Anyway, only the order of the parse rules should need to be changed.
ie, this is what's currently generated...
>> probe rule-4
[some ["a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "c" (r 5) | "C"
(r 6) | "h" (r 7) | "H" (r 8) | "ch" (r 9) | "Ch" (r 10) | skip (r
11)]]
Moving the "ch"s to the front of the rule gives us this...
rule-5: [some ["ch" (r 9) | "Ch" (r 10) | "a" (r 1) |
"A" (r 2) | "b" (r 3) | "B" (r 4) | "c" (r 5) |
"C" (r 6) | "h" (r 7) | "H" (r 8) | skip (r 11)
]]
Using that fixes your error above...
>> pattern-sort ["c" "ch" "h"] rule-5
== ["c" "h" "ch"]
though it screws up string sorting big-time...
>> pattern-sort "cchh" rule-5
== "bch"
(: Anyway, I'll see if I can get it to behave, and I'll try out the
speed improvements I thought of as well.
--
Carl Read