[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff
From: carl:cybercraft at: 16-May-2002 20:15
On 16-May-02, Volker Nitsch wrote:
> Carl, my volley pass works similar to yours, except i made it more
> complicated :)
Simple things should be simple.
(So I can understand them;)
> your pattern-rule "aAbBcC" would look like [+ "a" + "A" + "b" + "B"
> + "c" + "C"] because i use blocks with strings, i can also map
> multi-char-codes. so [+ "CH"] maps to one char.
I allowed for that. It accepts a string or a block of strings.
Except it's bugged as it stands ): But it may be able to be fixed...
> IIRC "CH" is handled
> like one char? also i have [+ "CH" = "Ch"]. this says, "CH" and "Ch"
> are the same. (add a new code-number for "CH" and use the same
> number for "Ch").
This I didn't allow for. Currently my rule-blocks look like this...
["a" "b" "c" "ch" "CH"]
and words with "ch" in would always preceed ones with "CH" in after
sorting. Placing equal ones in blocks would seem a nice solution...
["a" "b" "c" ["ch" "CH"]]
> And in my telephone-book "ö" is handled like
> "oe", so one char expands to two. So i need this kind of commands?
I can't see why we would, as we're sorting something in just the one
format, not changing the format. (I hope:)
> (Scott, i found no "ss" in this book, because "ss" has always two
> chars before it and is rarely used. sorry..
>
http://www.uni-koeln.de/phil-fak/spinfo/lehre/java/kap23/collating.htm#3.3
> duden 73: "ß" like "ss", by same words before (argh!) since 96
> changed: "ß" after "ss". and "ä" the same as "a".
> telephonbook is wrong? or duden? hmm.. back to script.)
> )
> first the block is initialized with ascii-codes [.. + "@" + "A" ..]
> then i could move whole char-blocks around, to mix "aAbB". then
> comes
> customize-ascii: [
> at "h" [+ "ch"]
> at "H" [+ "CH" = "Ch"]
> ]
> which says {find "h" in block and insert[+ "ch"] behind},same for
> "H". now i have [.. + "h" + "ch" + "i"].
> in a second pass i give numbers to the strings in tis order, in a
> third i create the parse-rule, which translates a string to the
> sort-encoding.
> for sorting i mix strings and their translations like [translation1
> string1 translation2 string2] sort with sort/skip 2,
I should've used sort/skip - it's one of the ways I'm hoping to speed
things up.
> and extract the strings back to the original block.
> hmm, somehow i like your string more. if it could deal with
> multi-chars.
It can - just not correctly. (; 'rule-3 showed how it's meant to
work...
>>>> rule-3: pattern-rule ["a" "A" "b" "B" "ch" "c" "C"]
>> == [some ["a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "ch" (r
>> 5) |
>> "c" (r 6) | "C" (r 7) | skip (r 8)]]
>>>> pattern-sort "abcABCchCbA" rule-3
>> == "aAAbbBchcCC"
>>>> pattern-sort ["AabA" "chab" "chAB" "cchc" "achA"] rule-3
>> == ["achA" "AabA" "chab" "chAB" "cchc"]
But, as Scott pointed out, it doesn't get this right...
---8<---
rule-4: pattern-rule ["a" "A" "b" "B" "c" "C" "h" "H" "ch" "Ch"]
will not correctly sort:
>> pattern-sort ["c" "ch" "h"] rule-4
== ["ch" "c" "h"]
;should be "c" "h" "ch"
---8<---
Back to the drawing-board...
--
Carl Read