Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Hungarian Alphabet Sort (was Re: Collation sequence - proper and eff

From: gscottjones:mchsi at: 16-May-2002 7:44

From: "Carl Read"
> On 16-May-02, G. Scott Jones wrote: > > > This is extremely promising. I drew from the ISO-8859-2 character > > set to make a rule, and it initially seems to sort correctly. The > > time through is roughly the same as my hack (but I've not really > > set-up a clean time condition). > > I've thoughts about how to speed it up - will be testing them out.
Great!
> > The only problem so far occurs when > > I run my word sample list through more than once. It seems to > > magically have kept the original sort/s and continues to append new > > results to the block. I cannot seem to find where the problem is > > occurring. > > It's in here... > > forall series [ > clear blk > parse/case series/1 rule > append/only ptrs copy blk > append last ptrs series/1 > ] > > 'forall leaving 'series at its tail, so the 'clear that follows > doesn't clear it. Change it to the following and it should fix that > problem. (Though not your other one. See my other post about that.) > > foreach s series [ > clear blk > parse/case s rule > append/only ptrs copy blk > append last ptrs s > ]
Yep, needless to say, that fixed it. rule-4: pattern-rule { !"#$%&'()*+,-./0123456789:;<=>? @AaÁᡱÂâÄäÃãBbCcÆæÈèÇçDdÏïÐðEeÉéÊêËëÌìFf GgHhIiÍíÎîJjKkLlÅ奵£³MmNnÑñÒòOoÓóÔôÕõÖöPp QqRrÀàØøSs¦¶©¹ªºßTt«»ÞþUuÙùÚúÜüÛûVvWwXxYy ÝýZz¬¼¯¿Zz[\]^_`{|}~} Here is the rule pattern I **generated** from my table for the ISO-8859-2 character set. Currently, this is sorted big-uns before little-uns. If the character looks totally out of place, it is because this representation used the ISO-8859-1 implicit in REBOL. Which brings me to the next "problem", there will be no way to generate the proper character that isn't already contained in the standard character set. So some sorted solutions will not appear to be correct, until the result is displayed in the correct character representation set. This does not appear to be a problem in Hungarian (so far), but will be in other languages. Hmmm..... Keep up the good work. I look forward to seeing your ideas regarding the multi-letter graphemes in the separate post (which has not yet arrived here ... whoops .. just arrived!). Later... --Scott Jones