Mailing List Archive: Re: Hungarian Alphabet Sort (was Re: Collation sequence

[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff

From: carl:cybercraft at: 16-May-2002 20:15


On 16-May-02, Volker Nitsch wrote:

> Carl, my volley pass works similar to yours, except i made it more
> complicated :)

Simple things should be simple.
  (So I can understand them;)

> your pattern-rule "aAbBcC" would look like [+ "a" + "A" + "b" + "B"
> + "c" + "C"] because i use blocks with strings, i can also map
> multi-char-codes. so [+ "CH"] maps to one char.

I allowed for that.  It accepts a string or a block of strings.
Except it's bugged as it stands ):  But it may be able to be fixed...

> IIRC "CH" is handled
> like one char? also i have [+ "CH" = "Ch"]. this says, "CH" and "Ch"
> are the same. (add a new code-number for "CH" and use the same
> number for "Ch").

This I didn't allow for.  Currently my rule-blocks look like this...

    ["a" "b" "c" "ch" "CH"]

and words with "ch" in would always preceed ones with "CH" in after
sorting.  Placing equal ones in blocks would seem a nice solution...

    ["a" "b" "c" ["ch" "CH"]]

> And in my telephone-book "&ouml;" is handled like
> "oe", so one char expands to two. So i need this kind of commands?

I can't see why we would, as we're sorting something in just the one
format, not changing the format. (I hope:)

> (Scott, i found no "ss" in this book, because "ss" has always two
> chars before it and is rarely used. sorry..
>
http://www.uni-koeln.de/phil-fak/spinfo/lehre/java/kap23/collating.htm#3.3
> duden 73: "&szlig;" like "ss", by same words before (argh!) since 96
> changed: "&szlig;" after "ss". and "&auml;" the same as "a".
> telephonbook is wrong? or duden? hmm.. back to script.)
> )
> first the block is initialized with ascii-codes [.. + "@" + "A" ..]
> then i could move whole char-blocks around, to mix "aAbB". then
> comes
>   customize-ascii: [
>        at "h" [+ "ch"]
>        at "H" [+ "CH" = "Ch"]
>    ]
> which says {find "h" in block and insert[+ "ch"] behind},same for
> "H". now i have [.. + "h" + "ch" + "i"].

> in a second pass i give numbers to the strings in tis order, in a
> third i create the parse-rule, which translates a string to the
> sort-encoding.
> for sorting i mix strings and their translations like [translation1
> string1 translation2 string2] sort with sort/skip 2,

I should've used sort/skip - it's one of the ways I'm hoping to speed
things up.

> and extract the strings back to the original block.

> hmm, somehow i like your string more. if it could deal with
> multi-chars.

It can - just not correctly. (;  'rule-3 showed how it's meant to
work...

>>>> rule-3: pattern-rule ["a" "A" "b" "B" "ch" "c" "C"]

>> == [some ["a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "ch" (r
>> 5) |
>> "c" (r 6) | "C" (r 7) | skip (r 8)]]

>>>> pattern-sort "abcABCchCbA" rule-3

>> == "aAAbbBchcCC"

>>>> pattern-sort ["AabA" "chab" "chAB" "cchc" "achA"]  rule-3

>> == ["achA" "AabA" "chab" "chAB" "cchc"]

But, as Scott pointed out, it doesn't get this right...

---8<---

rule-4: pattern-rule ["a" "A" "b" "B" "c" "C" "h" "H" "ch" "Ch"]

will not correctly sort:

>> pattern-sort ["c" "ch" "h"]  rule-4
== ["ch" "c" "h"]
;should be "c" "h" "ch"

---8<---

Back to the drawing-board...

--
Carl Read