[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff
From: nitsch-lists:netcologne at: 15-May-2002 18:58
Hi Carl, Scott, Gesa,
Am Mittwoch, 15. Mai 2002 14:30 schrieb Carl Read:
> On 15-May-02, G. Scott Jones wrote:
> > From: "Volker Nitsch"
> > ...
> >
> >> not sure if this helps, but since i spended some time to it,
> >> i post ;)
> >
> > <snipped code>
> >
> > Hi, Volker,
> >
> > Neat idea. Kind of like a good cut of beef, I'm going to have to
> > chew on it a bit to fully understand its potential. Thanks for the
> > trans-atlantic volley ball pass.
>
> Glad you could work it out, as I couldn't make head nor tail of it. (:
>
Carl, my volley pass works similar to yours, except i made it more complicated
:)
your pattern-rule "aAbBcC" would look like
[+ "a" + "A" + "b" + "B" + "c" + "C"]
because i use blocks with strings, i can also map multi-char-codes.
so [+ "CH"] maps to one char. IIRC "CH" is handled like one char?
also i have [+ "CH" = "Ch"]. this says, "CH" and "Ch" are the same.
(add a new code-number for "CH" and use the same number for "Ch").
And in my telephone-book "ö" is handled like "oe", so one char
expands to two. So i need this kind of commands?
(Scott, i found no "ss" in this book, because "ss" has always two chars
before it and is rarely used. sorry..
http://www.uni-koeln.de/phil-fak/spinfo/lehre/java/kap23/collating.htm#3.3
duden 73: "ß" like "ss", by same words before (argh!)
since 96 changed: "ß" after "ss".
and "ä" the same as "a". telephonbook is wrong? or duden? hmm..
back to script.)
)
first the block is initialized with ascii-codes [.. + "@" + "A" ..]
then i could move whole char-blocks around, to mix "aAbB".
then comes
customize-ascii: [
at "h" [+ "ch"]
at "H" [+ "CH" = "Ch"]
]
which says {find "h" in block and insert[+ "ch"] behind},same for "H".
now i have [.. + "h" + "ch" + "i"].
in a second pass i give numbers to the strings in tis order,
in a third i create the parse-rule, which translates a string to the
sort-encoding.
for sorting i mix strings and their translations like
[translation1 string1 translation2 string2]
sort with sort/skip 2,
and extract the strings back to the original block.
hmm, somehow i like your string more. if it could deal with multi-chars.
> Anyway, I've played around with my idea for sorting according to a
> pattern, and while I'm not sure if the following code's very fast (or
> bug-free:), like Volker, I post.
>
Good idea :-)
> There's two functions: One to take a pattern for creating a rule from
> and another to use the rule to sort strings or blocks of strings
> with. First, the functions...
>
> pattern-rule: func [
> "Create a rule for use by pattern-sort."
> pattern [string! block!] "An ordered pattern."
> /local rule n
> ][
> rule: copy []
> n: 1
> forall pattern [
> append rule reduce [pattern/1 to-paren reduce ['r n] '|]
> n: n + 1
> ]
> append rule reduce ['skip to-paren reduce ['r n]]
> reduce ['some rule]
> ]
>
> pattern-sort: func [
> {Sort a string or block of strings based on a rule created
> by pattern-rule.}
> series [string! block!] "Series to sort."
> rule [block!] "Pattern rule."
> /reverse "Reverse sort order."
> /local ptrs blk r pos val
> ][
> ptrs: copy []
> blk: copy []
> r: func [n][append/only blk n]
> bind rule 'r
> either string? series [
> parse/case series rule
> pos: 1
> foreach n blk [
> append/only ptrs reduce [
> n pick rule/2 (n - 1) * 3 + 1
> ]
> val: next first back tail ptrs
> if 'skip = val/1 [change val pick series pos]
> pos: pos + either char? val/1 [1][length? val/1]
> ]
> ][
> forall series [
> clear blk
> parse/case series/1 rule
> append/only ptrs copy blk
> append last ptrs series/1
> ]
> ]
> either reverse [sort/reverse ptrs][sort ptrs]
> clear series
> forall ptrs [append series last ptrs/1]
> series
> ]
>
> And some examples of use...
>
> >> rule-1: pattern-rule "aAbBcC"
>
> == [some [#"a" (r 1) | #"A" (r 2) | #"b" (r 3) | #"B" (r 4) | #"c" (r
> 5) | #"C" (r 6) | skip (r 7)]]
>
> >> pattern-sort "AacCBb" rule-1
>
> == "aAbBcC"
>
> >> pattern-sort ["Abc" "abc" "aBC" "ABC"] rule-1
>
> == ["abc" "aBC" "Abc" "ABC"]
>
> >> pattern-sort/reverse ["Abc" "abc" "aBC" "ABC"] rule-1
>
> == ["ABC" "Abc" "aBC" "abc"]
>
> >> rule-2: pattern-rule "AaBbCc"
>
> == [some [#"A" (r 1) | #"a" (r 2) | #"B" (r 3) | #"b" (r 4) | #"C" (r
> 5) | #"c" (r 6) | skip (r 7)]]
>
> >> pattern-sort "AacCBb" rule-2
>
> == "AaBbCc"
>
> >> pattern-sort ["Abc" "abc" "aBC" "ABC"] rule-2
>
> == ["ABC" "Abc" "aBC" "abc"]
>
> >> rule-3: pattern-rule ["a" "A" "b" "B" "ch" "c" "C"]
>
> == [some ["a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "ch" (r 5) |
> "c" (r 6) | "C" (r 7) | skip (r 8)]]
>
> >> pattern-sort "abcABCchCbA" rule-3
>
> == "aAAbbBchcCC"
>
> >> pattern-sort ["AabA" "chab" "chAB" "cchc" "achA"] rule-3
>
> == ["achA" "AabA" "chab" "chAB" "cchc"]
>
> It seems to work and might be of some use, but I'd test it well before
> trusting it. It's had no real-world tests at all...
greetings
volker