Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Czech Sort (was Re: I miss ..., bug in to-local-file ...)

 [1/5] from: gjones05::mail::orion::org at: 9-Jun-2001 7:23


Hi, PeKr, Cyphre, (Ladislav, others?), Volker, I was distracted last night trying to figure out the plugin problem for IE. Many hours spent; nothing to show for it. Anyway, this morning I completed my reference table. Some of the letters were very tricky to find. I basically scanned PeKr's companies pages for the letters, then looked at the source to determine the single byte code/char representation. I think I can read Czech now .... not! ;-) I used a language list reference to get all the alphabet. This list did not list "ch" as a separate character. I tried to put the letters in order based on the references I found and PeKr's instructions. This list follows the questions. Please look over the list to check for completeness and accuracy of order (hopefully I got the byte codes correct). First, my questions: 1) Does the uppercase version of "ch" look like "Ch" or "CH"? 2) A sort/case in REBOL normally puts the uppercase versions before the lower case versions. Would this be acceptable? PeKr mentioned that "lowercase letters first, uppercase follows". Does this mean that a sort/case should ideally put the lower case version before an upper case version? Now, for my reference table. The column abbreviations may be obvious (-:, but here is the translation: Dec: Decimal representation of the single computer byte representation of the letter UnD: The decimal representation of the Unicode word (meaning 2 bytes) for the letter (here for future use, if needed) L: ASCII letters linguist use to refer to letters UnHx: Unicode Hex for the letter Description: Linguistic description of letters Reference table version 0.1 (I have a lot of confidence, don't I? ;-) (best viewed with fixed width font) Dec UnD L UnHx Description 97 97 a 0061 LATIN SMALL LETTER A 225 225 a' 00e1 LATIN SMALL LETTER A WITH ACUTE 98 98 b 0062 LATIN SMALL LETTER B 99 99 c 0063 LATIN SMALL LETTER C 232 269 c< 010d LATIN SMALL LETTER C WITH CARON 100 100 d 0064 LATIN SMALL LETTER D 239 271 d< 010f LATIN SMALL LETTER D WITH CARON 101 101 e 0065 LATIN SMALL LETTER E 233 233 e' 00e9 LATIN SMALL LETTER E WITH ACUTE 236 283 e< 011b LATIN SMALL LETTER E WITH CARON 102 102 f 0066 LATIN SMALL LETTER F 103 103 g 0067 LATIN SMALL LETTER G 104 104 h 0068 LATIN SMALL LETTER H ch special character combination 105 105 i 0069 LATIN SMALL LETTER I 237 237 i' 00ed LATIN SMALL LETTER I WITH ACUTE 106 106 j 006a LATIN SMALL LETTER J 107 107 k 006b LATIN SMALL LETTER K 108 108 l 006c LATIN SMALL LETTER L 109 109 m 006d LATIN SMALL LETTER M 110 110 n 006e LATIN SMALL LETTER N 242 328 n< 0148 LATIN SMALL LETTER N WITH CARON 111 111 o 006f LATIN SMALL LETTER O 243 243 o' 00f3 LATIN SMALL LETTER O WITH ACUTE 112 112 p 0070 LATIN SMALL LETTER P 113 113 q 0071 LATIN SMALL LETTER Q 114 114 r 0072 LATIN SMALL LETTER R 248 345 r< 0159 LATIN SMALL LETTER R WITH CARON 115 115 s 0073 LATIN SMALL LETTER S 185 353 s< 0161 LATIN SMALL LETTER S WITH CARON 116 116 t 0074 LATIN SMALL LETTER T 187 357 t< 0165 LATIN SMALL LETTER T WITH CARON 117 117 u 0075 LATIN SMALL LETTER U 249 367 u0 016f LATIN SMALL LETTER U WITH RING ABOVE 250 250 u' 00fa LATIN SMALL LETTER U WITH ACUTE 118 118 v 0076 LATIN SMALL LETTER V 119 119 w 0077 LATIN SMALL LETTER W 120 120 x 0078 LATIN SMALL LETTER X 121 121 y 0079 LATIN SMALL LETTER Y 253 253 ' 00fd LATIN SMALL LETTER Y WITH ACUTE 122 122 z 007a LATIN SMALL LETTER Z 190 382 z< 017e LATIN SMALL LETTER Z WITH CARON 65 65 A 0041 LATIN CAPITAL LETTER A 193 193 A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE 66 66 B 0042 LATIN CAPITAL LETTER B 67 67 C 0043 LATIN CAPITAL LETTER C 200 268 C< 010c LATIN CAPITAL LETTER C WITH CARON 68 68 D 0044 LATIN CAPITAL LETTER D 207 270 D< 010e LATIN CAPITAL LETTER D WITH CARON 69 69 E 0045 LATIN CAPITAL LETTER E 201 201 E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE 204 282 E< 011a LATIN CAPITAL LETTER E WITH CARON 70 70 F 0046 LATIN CAPITAL LETTER F 71 71 G 0047 LATIN CAPITAL LETTER G 72 72 H 0048 LATIN CAPITAL LETTER H Ch special character combination 73 73 I 0049 LATIN CAPITAL LETTER I 205 205 I' 00cd LATIN CAPITAL LETTER I WITH ACUTE 74 74 J 004a LATIN CAPITAL LETTER J 75 75 K 004b LATIN CAPITAL LETTER K 76 76 L 004c LATIN CAPITAL LETTER L 77 77 M 004d LATIN CAPITAL LETTER M 78 78 N 004e LATIN CAPITAL LETTER N 210 327 N< 0147 LATIN CAPITAL LETTER N WITH CARON 79 79 O 004f LATIN CAPITAL LETTER O 211 211 O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE 80 80 P 0050 LATIN CAPITAL LETTER P 81 81 Q 0051 LATIN CAPITAL LETTER Q 82 82 R 0052 LATIN CAPITAL LETTER R 216 344 R< 0158 LATIN CAPITAL LETTER R WITH CARON 83 83 S 0053 LATIN CAPITAL LETTER S 169 352 S< 0160 LATIN CAPITAL LETTER S WITH CARON 84 84 T 0054 LATIN CAPITAL LETTER T 171 356 T< 0164 LATIN CAPITAL LETTER T WITH CARON 85 85 U 0055 LATIN CAPITAL LETTER U 217 366 U0 016e LATIN CAPITAL LETTER U WITH RING ABOVE 218 218 U' 00da LATIN CAPITAL LETTER U WITH ACUTE 86 86 V 0056 LATIN CAPITAL LETTER V 87 87 W 0057 LATIN CAPITAL LETTER W 88 88 X 0058 LATIN CAPITAL LETTER X 89 89 Y 0059 LATIN CAPITAL LETTER Y 221 221 Y' 00dd LATIN CAPITAL LETTER Y WITH ACUTE 90 90 Z 005a LATIN CAPITAL LETTER Z 174 381 Z< 017d LATIN CAPITAL LETTER Z WITH CARON I look forward to hearing from you and the next step (by the way, Volker, I only briefly looked through your code last night for the reason I've already explained; I will look more carefully when I go to the next step. Thanks!) --Scott Jones

 [2/5] from: petr:krenzelok:trz:cz at: 10-Jun-2001 1:19


----- Original Message ----- From: "GS Jones" <[gjones05--mail--orion--org]> To: <[rebol-list--rebol--com]> Sent: Saturday, June 09, 2001 2:23 PM Subject: [REBOL] Czech Sort (was Re: I miss ..., bug in to-local-file ...)
> Hi, PeKr, Cyphre, (Ladislav, others?), Volker, > > I was distracted last night trying to figure out the plugin problem for
IE.
> Many hours spent; nothing to show for it. > > Anyway, this morning I completed my reference table. Some of the letters
were
> very tricky to find. I basically scanned PeKr's companies pages for the > letters, then looked at the source to determine the single byte code/char > representation. I think I can read Czech now .... not! ;-) > > I used a language list reference to get all the alphabet. This list did
not
> list "ch" as a separate character. I tried to put the letters in order
based on
> the references I found and PeKr's instructions. This list follows the > questions. Please look over the list to check for completeness and
accuracy of
> order (hopefully I got the byte codes correct).
I am sorry I have not found free time to respond earlier! "Ch" is not single character! It is still TWO characters - the sort sequence is, however, different from english one: a b c d e f g h "ch" i j k l m n ..... so above is a little bit Czech special case .... a b c c^ d d^ ..... don't know how to express myself in english, but imagine ^ being upon "c", and rotated in 180 degrees :-) .. maybe a little composed image could help? http://www.rebol.cz/~can/rebol-view/czech-alphabet.png
> First, my questions: > 1) Does the uppercase version of "ch" look like "Ch" or "CH"?
Ch - still two letters ... just sorted as being one ...
> 2) A sort/case in REBOL normally puts the uppercase versions before the
lower
> case versions. Would this be acceptable? PeKr mentioned that "lowercase > letters first, uppercase follows". Does this mean that a sort/case should > ideally put the lower case version before an upper case version?
I am not sure about that one ....
> Now, for my reference table. The column abbreviations may be obvious (-:,
but
> here is the translation: > Dec: Decimal representation of the single computer byte representation of
the
> letter > UnD: The decimal representation of the Unicode word (meaning 2 bytes) for
the
> letter (here for future use, if needed) > L: ASCII letters linguist use to refer to letters
<<quoted lines omitted: 88>>
> 174 381 Z< 017d LATIN CAPITAL LETTER Z WITH CARON > I look forward to hearing from you and the next step (by the way, Volker,
I only
> briefly looked through your code last night for the reason I've already > explained; I will look more carefully when I go to the next step.
Thanks!) I will check once I get at my work to look to some of our dbase char sorting. However - I just asked about sorting. I not necessarily needed you spend all of your free time solving my problems :-) Anyway - thanks a lot for taking care! Cheers, -pekr-

 [3/5] from: gjones05:mail:orion at: 9-Jun-2001 19:38


Hi, PeKr,
> From Scott Jones
...
> > I tried to put the letters in order based on > > the references I found and PeKr's instructions. > > This list follows the questions. Please look > > over the list to check for completeness and > > accuracy of order (hopefully I got the byte > > codes correct).
From: "Petr Krenzelok"
> I am sorry I have not found free time to respond > earlier!
No problem (as we say ;-).
> "Ch" is not single character! It is still TWO > characters - the sort sequence is, however, > different from english one: > > a b c d e f g h "ch" i j k l m n ..... > > so above is a little bit Czech special case ....
Yes, this is what I understood from your earlier post. I just wanted to make sure it was not a blended character like some languages.
> a b c c^ d d^ ..... don't know how to express myself in english, but imagine > "^" being upon "c", and rotated in 180 degrees :-) > > .. maybe a little composed image could help? > http://www.rebol.cz/~can/rebol-view/czech-alphabet.png
Nice picture. It helps to confirm what I thought you meant.
> I will check once I get at my work to look to some of our > dbase char sorting.
Excellent idea!
> However - I just asked about sorting. I not necessarily > needed you spend all of your free time solving my > problems :-) Anyway - thanks a lot for taking care!
Oh, I know you were just asking in order to point out a "short-coming". As I have explained before, I look for little projects that will help me to improve my skills. Lately, I've been working on schemes, dialects, parsing and custom sorts, so this probem, like the http thing, has been perfect! I roughed out the main mechanics of the sort this morning, and it works better than expected. I have already sorted lists of hundreds of words taken from your work's website (don't worry, it was the Czech version :-), and it works great. All I really have left to do is to map out the punctuation, and to create a generic way to manage different languages, with their special characters. It will then be easy for a person to map their own language for a custom version of the 'sort. I originally thought I would be using the /compare refinement, but a much more natural solution has evolved (or so it appears to me). Maybe by Monday, if you get a chance to see how dbase manages sorts, I can put the finishing touches on it. By the way, I believe I see a way to use the /custom refinement on the http to get a more seamless file retrieval through read or open. It will be able to use all the great functionality that RT put into http, only allow a bit more control. That and cookie management will make a great addition to the http functionality. Have a nice Sunday (Nedele ?). --Scott Jones

 [4/5] from: petr:krenzelok:trz:cz at: 10-Jun-2001 18:18


Hi Scot, so here goes my Nedele's reply :-))
> > a b c c^ d d^ ..... don't know how to express myself in english, but
imagine
> > "^" being upon "c", and rotated in 180 degrees :-) > > > > .. maybe a little composed image could help? > > http://www.rebol.cz/~can/rebol-view/czech-alphabet.png > > Nice picture. It helps to confirm what I thought you meant.
Generated by one line of rebol code of course :-)
> > I will check once I get at my work to look to some of our > > dbase char sorting. > > Excellent idea!
I called my friend who does more coding them me :-) and he confirmed that first goes lowercase ("a") followed by uppercase ("A") letter. So I hope now you have enough indicies to finish first alpha of Czech alphabet sorting. Maybe we should more think in international terms while working on rebol concepts. I just wonder why noone from RT replied to my email re system/locale usage. It could be easily used in mezzanines like 'to-idate, if e.g. system/locale would get extended of some months-abv: ["Jan" "Feb" .....]. Maybe it would be handy? Date is used also in more robust gui components like 'request-date, but it seems to me that it is solved there: print mold req-funcs ; beware! - use only if you start View directly to console ... .... md: func [date][join pick system/locale/months date/month [" " date/year]]
> > However - I just asked about sorting. I not necessarily > > needed you spend all of your free time solving my > > problems :-) Anyway - thanks a lot for taking care! > > Oh, I know you were just asking in order to point out a "short-coming".
As I
> have explained before, I look for little projects that will help me to
improve
> my skills. Lately, I've been working on schemes, dialects, parsing and
custom
> sorts, so this probem, like the http thing, has been perfect!
Yes, I can feel my skills are improving too with such "small", but practical examples ...
> I roughed out the main mechanics of the sort this morning, and it works
better
> than expected. I have already sorted lists of hundreds of words taken
from your
> work's website (don't worry, it was the Czech version :-), and it works
great. What is the solution based upon? 'sort/compare? Or are you remaping chars according to some table first?
> All I really have left to do is to map out the punctuation, and to create
a
> generic way to manage different languages, with their special characters.
It
> will then be easy for a person to map their own language for a custom
version of
> the 'sort. > > I originally thought I would be using the /compare refinement, but a much
more
> natural solution has evolved (or so it appears to me). Maybe by Monday,
if you
> get a chance to see how dbase manages sorts, I can put the finishing
touches on
> it.
Looking forward to your solution! :-)
> By the way, I believe I see a way to use the /custom refinement on the
http to
> get a more seamless file retrieval through read or open.
It would be cool. I could scrap my previous solution. But hey, it works, I will better not touch it for some time :-)
> It will be able to use > all the great functionality that RT put into http, only allow a bit more > control. That and cookie management will make a great addition to the
http
> functionality.
It would be probably good if some nice additions would be considered by RT to become part of official distribution? Of course, the question is, how async protocols will change current schemes, because the word was - they will have to be reimplemented ...But maybe Holger could step in, and tell us what is, and what is not save to do to scheme code?
> Have a nice Sunday (Nedele ?).
:-) Right http://www.rebol.cz/~can/rebol-view/cz-weekdays.png -pekr-

 [5/5] from: gjones05:mail:orion at: 10-Jun-2001 15:45


Hi, PeKr, I sent this message in ISO-8859-2 - Central European (Windows) so that you could see the characters. It is getting late on your Nedele [ :-) ], but I thought I would give you an update just for the "heck of it." PK> I called my friend who does more coding them me :-) PK> and he confirmed that first goes lowercase ("a") PK> followed by uppercase ("A") letter. So I hope now PK> you have enough indicies to finish first alpha of Czech PK> alphabet sorting. Yes, thank you. I am trying to keep the Czech Sort similar to the "regular" 'sort, so the default is to do a non-case sensitive sort. The /case refinement will allow for a case sensitive sort, and for Czech, the small letters will come before the large ones (the opposite of the regular 'sort). Here is a sample output from: http://www.trz.cz/web/tz2000cz.nsf/_madsn6t42dkg6kobbdtpn8q8_?OpenPage I hope Třinecké železárny doesn't mind the exposure. :-O ***Here is the regular sort 1991 1993 2000 40 6.1 9000 9001 9001 9001 a a a a a a auditem. auditu automobilový breznu bylo certifikacním certifikát certifikát certifikáty cinnosti Certifikovaný CSN CSN CSN dávají definovanými dobu dodavatelé dodávky dvou evropských EN fáze hutních hutních ISO ISO ISO ISO jako jakosti jakosti jakosti jakosti je jsou jsou kdy který let mezinárodními Moravia na na našim nároky nebo než norem norem normami normami normy o o obhájily odpovídá Od pak parametry plne po podle podle podle podle podmínkami. podmínky pokrývá povýrobní požadavku požadavkum požadavky. predpisy. predvýrobní prípravná pro pro probíhala prodlužovacího provedeného provozech. prumysl. Poskytuje rady ríjnu roku roku rozhodnuto s se shode shodnost specifickými splnují stanovené systém systém systému systému Steel technické tedy tom Trinecké Trineckých Trineckých užitné úspešným v v v v ve více všechny vybudování vytvoreny výrobku výrobní VDA Výrobkové zakoncena zákazníkem zákazníkum základe zároven záruku záruky získaly zpusobilé že že železáren železárnách železárny Zavedený ***Here is the case sensitive sort 1991 1993 2000 40 6.1 9000 9001 9001 9001 a a a a a a auditem. auditu automobilový breznu bylo certifikacním certifikát certifikát certifikáty cinnosti dávají definovanými dobu dodavatelé dodávky dvou evropských fáze hutních hutních jako jakosti jakosti jakosti jakosti je jsou jsou kdy který let mezinárodními na na našim nároky nebo než norem norem normami normami normy o o obhájily odpovídá pak parametry plne po podle podle podle podle podmínkami. podmínky pokrývá povýrobní požadavku požadavkum požadavky. predpisy. predvýrobní prípravná pro pro probíhala prodlužovacího provedeného provozech. prumysl. rady ríjnu roku roku rozhodnuto s se shode shodnost specifickými splnují stanovené systém systém systému systému technické tedy tom užitné úspešným v v v v ve více všechny vybudování vytvoreny výrobku výrobní zakoncena zákazníkem zákazníkum základe zároven záruku záruky získaly zpusobilé že že železáren železárnách železárny Certifikovaný CSN CSN CSN EN ISO ISO ISO ISO Moravia Od Poskytuje Steel Trinecké Trineckých Trineckých Výrobkové VDA Zavedený Is this output looking like you would expect? PK> Maybe we should more think in international PK> terms while working on rebol concepts. I think that you are right, because this will bring the broadest appeal for the program. PK> I just wonder why noone from RT replied PK> to my email re system/locale usage. Did you send email to feedback also or to just the list? PK> It could be easily used in mezzanines like 'to-idate, PK> if e.g. system/locale would get extended of some PK> months-abv: ["Jan" "Feb" .....]. Maybe it would PK> be handy? I was thinking the same thing when I was working on ftp with Thorsten. A number of things could be localized. PK> What is the solution based upon? 'sort/compare? PK> Or are you remaping chars according to some table first? Originally, I thought a sort/compare would work, but I could find no easy way to then do other kinds of sorts, like case sensitive sorts, reverse, etc. So I went to a character map, similar to what I sent the other day. I substitute the index of the character for the character, then sort the block of indexes! Then I remap back to the Czech alphabet. The major tasks left are: 1) to find the most efficient way to manage the "ch" substitution (whether to do it while substituting the other characters, or do it before I start) 2) Pass through the rest of the 'sort refinements /case - done /skip - haven't thought much on this one, but hopefully it will be simple) /compare - should be simple /part - should be simple /all - haven't thought much on this one /reverse - should be simple 3) Create a language localization file format for the characters, special characters, dates, days 4) Wrap in a function or object that allows for easy language change (like with a refinement /cz, /de, etc) and avoid namepsace collisions. My biggest problem is that my brain is a little fuzzy today, so I haven't worked very efficiently. I should be able to post an alpha by tomorrow morning (*my* morning, that is :-). PK> Of course, the question is, how async protocols PK> will change current schemes, because the word PK> was - they will have to be reimplemented Uh, oh, I know nothing about this topic, except that I have an only very general idea about synchronous communications. Maybe Holger will write an article for the new e-zine!!!!!!! (Hi, Holger :-) Let me know if the sort samples above are way off. --Scott Jones

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted