Mailing List Archive: Re: Hungarian Alphabet Sort (was Re: Collation sequence

[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff

From: gscottjones:mchsi at: 16-May-2002 7:44


From: "Carl Read"
> On 16-May-02, G. Scott Jones wrote:
>
> > This is extremely promising. I drew from the ISO-8859-2 character
> > set to make a rule, and it initially seems to sort correctly. The
> > time through is roughly the same as my hack (but I've not really
> > set-up a clean time condition).
>
> I've thoughts about how to speed it up - will be testing them out.

Great!

> > The only problem so far occurs when
> > I run my word sample list through more than once. It seems to
> > magically have kept the original sort/s and continues to append new
> > results to the block. I cannot seem to find where the problem is
> > occurring.
>
> It's in here...
>
>         forall series [
>             clear blk
>             parse/case series/1 rule
>             append/only ptrs copy blk
>             append last ptrs series/1
>         ]
>
> 'forall leaving 'series at its tail, so the 'clear that follows
> doesn't clear it.  Change it to the following and it should fix that
> problem.  (Though not your other one.  See my other post about that.)
>
>         foreach s series [
>             clear blk
>             parse/case s rule
>             append/only ptrs copy blk
>             append last ptrs s
>         ]

Yep, needless to say, that fixed it.

rule-4: pattern-rule { !"#$%&'()*+,-./0123456789:;<=>?
@Aa鍍§邇曩辭BbCc✽⺾ルDd砏唦Ee扙抏呠毞Ff
GgHhIi俀昵JjKkLl驚扔ㄢMmNn桍秪Oo郠崞淴翊Pp
QqRr濬惉Ss朱往的粲t姣稓Uu湀絟嫈跅VvWwXxYy
毻Zz狩紊Zz[\]^_`{|}~}

Here is the rule pattern I **generated** from my table for the ISO-8859-2
character set.  Currently, this is sorted big-uns before little-uns.  If the
character looks totally out of place, it is because this representation used
the ISO-8859-1 implicit in REBOL.  Which brings me to the next "problem",
there will be no way to generate the proper character that isn't already
contained in the standard character set.  So some sorted solutions will not
appear
 to be correct, until the result is displayed in the correct
character representation set.  This does not appear to be a problem in
Hungarian (so far), but will be in other languages.  Hmmm.....

Keep up the good work.  I look forward to seeing your ideas regarding the
multi-letter graphemes
 in the separate post (which has not yet arrived
here ... whoops .. just arrived!).

Later...
--Scott Jones