Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff

From: carl:cybercraft at: 16-May-2002 20:15

On 16-May-02, Volker Nitsch wrote:
> Carl, my volley pass works similar to yours, except i made it more > complicated :)
Simple things should be simple. (So I can understand them;)
> your pattern-rule "aAbBcC" would look like [+ "a" + "A" + "b" + "B" > + "c" + "C"] because i use blocks with strings, i can also map > multi-char-codes. so [+ "CH"] maps to one char.
I allowed for that. It accepts a string or a block of strings. Except it's bugged as it stands ): But it may be able to be fixed...
> IIRC "CH" is handled > like one char? also i have [+ "CH" = "Ch"]. this says, "CH" and "Ch" > are the same. (add a new code-number for "CH" and use the same > number for "Ch").
This I didn't allow for. Currently my rule-blocks look like this... ["a" "b" "c" "ch" "CH"] and words with "ch" in would always preceed ones with "CH" in after sorting. Placing equal ones in blocks would seem a nice solution... ["a" "b" "c" ["ch" "CH"]]
> And in my telephone-book "ö" is handled like > "oe", so one char expands to two. So i need this kind of commands?
I can't see why we would, as we're sorting something in just the one format, not changing the format. (I hope:)
> (Scott, i found no "ss" in this book, because "ss" has always two > chars before it and is rarely used. sorry.. >
http://www.uni-koeln.de/phil-fak/spinfo/lehre/java/kap23/collating.htm#3.3
> duden 73: "ß" like "ss", by same words before (argh!) since 96 > changed: "ß" after "ss". and "ä" the same as "a". > telephonbook is wrong? or duden? hmm.. back to script.) > ) > first the block is initialized with ascii-codes [.. + "@" + "A" ..] > then i could move whole char-blocks around, to mix "aAbB". then > comes > customize-ascii: [ > at "h" [+ "ch"] > at "H" [+ "CH" = "Ch"] > ] > which says {find "h" in block and insert[+ "ch"] behind},same for > "H". now i have [.. + "h" + "ch" + "i"]. > in a second pass i give numbers to the strings in tis order, in a > third i create the parse-rule, which translates a string to the > sort-encoding. > for sorting i mix strings and their translations like [translation1 > string1 translation2 string2] sort with sort/skip 2,
I should've used sort/skip - it's one of the ways I'm hoping to speed things up.
> and extract the strings back to the original block. > hmm, somehow i like your string more. if it could deal with > multi-chars.
It can - just not correctly. (; 'rule-3 showed how it's meant to work...
>>>> rule-3: pattern-rule ["a" "A" "b" "B" "ch" "c" "C"] >> == [some ["a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "ch" (r >> 5) | >> "c" (r 6) | "C" (r 7) | skip (r 8)]] >>>> pattern-sort "abcABCchCbA" rule-3 >> == "aAAbbBchcCC" >>>> pattern-sort ["AabA" "chab" "chAB" "cchc" "achA"] rule-3 >> == ["achA" "AabA" "chab" "chAB" "cchc"]
But, as Scott pointed out, it doesn't get this right... ---8<--- rule-4: pattern-rule ["a" "A" "b" "B" "c" "C" "h" "H" "ch" "Ch"] will not correctly sort:
>> pattern-sort ["c" "ch" "h"] rule-4
== ["ch" "c" "h"] ;should be "c" "h" "ch" ---8<--- Back to the drawing-board... -- Carl Read