[REBOL] Re: Czech Sort - First Alpha
From: gjones05:mail:orion at: 11-Jun-2001 8:11
Hi, PeKr, et al,
Here is the first alpha. I bundled all the files together for ease of testing.
I left tabs in this format, which should be preserved once copied to a file.
Also, feel free to suggest improvements to the regression testing examples (you
will see what I mean when you see one of the examples for h and ch!). I had
limited easy access to correct versions of names.
--Scott Jones
;========================
REBOL [
Title: "Czech Language Sort Function"
Date: 11-Jun-2001
Version: 0.0.1
Author: "G. Scott Jones, M.D."
File: %czech-sort.r
Purpose: {Sort support for Czech alphabet}
Comment: {This is the first alpha release. Petr Krenzelok supplied
the inspiration and initial language technical support. It is
mainly alpha because no Czech speaker have actually reviewed the
results! For these early versions, I've rolled the character
sort list into this file for convenience. The routine is
currently hard coded for Czech language only, but will readily
be made more generic for other languages. The code is heavily
commented for easy interpretation by others. The routine could
also be rewritten to be a wrapper for REBOL 'sort, with a path
refinement allowing for alternative language support. The
to-do list is so long as to make it pointless for me to list
at this stage. ;-) Now, I'll post to the list for review.
USAGE: czech-sort series /case /reverse
}
History: [
0.0.1 [11-Jun-2001 {First released for alpha review} "GSJ"]
]
]
char-list: {32 1 32
33 2 33 !
34 3 34 "
35 4 35 #
36 5 36 $
37 6 37 %
38 7 38 &
39 8 39 '
40 9 40 (
41 10 41 )
42 11 42 *
43 12 43 +
44 13 44 ,
45 14 45 -
46 15 46 .
47 16 47 /
48 17 48 0
49 18 49 1
50 19 50 2
51 20 51 3
52 21 52 4
53 22 53 5
54 23 54 6
55 24 55 7
56 25 56 8
57 26 57 9
58 27 58 :
59 28 59 ;
60 29 60 <
61 30 61 62 31 62 >
63 32 63 ?
64 33 64 @
97 34 97 a 0061 LATIN SMALL LETTER A
225 35 225 a' 00e1 LATIN SMALL LETTER A WITH ACUTE
65 77 65 A 0041 LATIN CAPITAL LETTER A
193 78 193 A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE
98 36 98 b 0062 LATIN SMALL LETTER B
66 79 66 B 0042 LATIN CAPITAL LETTER B
99 37 99 c 0063 LATIN SMALL LETTER C
232 38 269 c< 010d LATIN SMALL LETTER C WITH CARON
67 80 67 C 0043 LATIN CAPITAL LETTER C
200 81 268 C< 010c LATIN CAPITAL LETTER C WITH CARON
100 39 100 d 0064 LATIN SMALL LETTER D
239 40 271 d< 010f LATIN SMALL LETTER D WITH CARON
68 82 68 D 0044 LATIN CAPITAL LETTER D
207 83 270 D< 010e LATIN CAPITAL LETTER D WITH CARON
101 41 101 e 0065 LATIN SMALL LETTER E
233 42 233 e' 00e9 LATIN SMALL LETTER E WITH ACUTE
236 43 283 e< 011b LATIN SMALL LETTER E WITH CARON
69 84 69 E 0045 LATIN CAPITAL LETTER E
201 85 201 E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE
204 86 282 E< 011a LATIN CAPITAL LETTER E WITH CARON
102 44 102 f 0066 LATIN SMALL LETTER F
70 87 70 F 0046 LATIN CAPITAL LETTER F
103 45 103 g 0067 LATIN SMALL LETTER G
71 88 71 G 0047 LATIN CAPITAL LETTER G
104 46 104 h 0068 LATIN SMALL LETTER H
72 89 72 H 0048 LATIN CAPITAL LETTER H
127 47 127 ch special character combination
128 48 128 cH special character combination
129 90 129 Ch special character combination
130 91 130 CH special character combination
105 49 105 i 0069 LATIN SMALL LETTER I
237 50 237 i' 00ed LATIN SMALL LETTER I WITH ACUTE
73 92 73 I 0049 LATIN CAPITAL LETTER I
205 93 205 I' 00cd LATIN CAPITAL LETTER I WITH ACUTE
106 51 106 j 006a LATIN SMALL LETTER J
74 94 74 J 004a LATIN CAPITAL LETTER J
107 52 107 k 006b LATIN SMALL LETTER K
75 95 75 K 004b LATIN CAPITAL LETTER K
108 53 108 l 006c LATIN SMALL LETTER L
76 96 76 L 004c LATIN CAPITAL LETTER L
109 54 109 m 006d LATIN SMALL LETTER M
77 97 77 M 004d LATIN CAPITAL LETTER M
110 55 110 n 006e LATIN SMALL LETTER N
242 56 328 n< 0148 LATIN SMALL LETTER N WITH CARON
78 98 78 N 004e LATIN CAPITAL LETTER N
210 99 327 N< 0147 LATIN CAPITAL LETTER N WITH CARON
111 57 111 o 006f LATIN SMALL LETTER O
243 58 243 o' 00f3 LATIN SMALL LETTER O WITH ACUTE
79 100 79 O 004f LATIN CAPITAL LETTER O
211 101 211 O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE
112 59 112 p 0070 LATIN SMALL LETTER P
80 102 80 P 0050 LATIN CAPITAL LETTER P
113 60 113 q 0071 LATIN SMALL LETTER Q
81 103 81 Q 0051 LATIN CAPITAL LETTER Q
114 61 114 r 0072 LATIN SMALL LETTER R
248 62 345 r< 0159 LATIN SMALL LETTER R WITH CARON
82 104 82 R 0052 LATIN CAPITAL LETTER R
216 105 344 R< 0158 LATIN CAPITAL LETTER R WITH CARON
115 63 115 s 0073 LATIN SMALL LETTER S
154 64 353 s< 0161 LATIN SMALL LETTER S WITH CARON
83 106 83 S 0053 LATIN CAPITAL LETTER S
169 107 352 S< 0160 LATIN CAPITAL LETTER S WITH CARON
116 65 116 t 0074 LATIN SMALL LETTER T
187 66 357 t< 0165 LATIN SMALL LETTER T WITH CARON
84 108 84 T 0054 LATIN CAPITAL LETTER T
171 109 356 T< 0164 LATIN CAPITAL LETTER T WITH CARON
117 67 117 u 0075 LATIN SMALL LETTER U
249 68 367 u0 016f LATIN SMALL LETTER U WITH RING ABOVE
250 69 250 u' 00fa LATIN SMALL LETTER U WITH ACUTE
85 110 85 U 0055 LATIN CAPITAL LETTER U
217 111 366 U0 016e LATIN CAPITAL LETTER U WITH RING ABOVE
218 112 218 U' 00da LATIN CAPITAL LETTER U WITH ACUTE
118 70 118 v 0076 LATIN SMALL LETTER V
86 113 86 V 0056 LATIN CAPITAL LETTER V
119 71 119 w 0077 LATIN SMALL LETTER W
87 114 87 W 0057 LATIN CAPITAL LETTER W
120 72 120 x 0078 LATIN SMALL LETTER X
88 115 88 X 0058 LATIN CAPITAL LETTER X
121 73 121 y 0079 LATIN SMALL LETTER Y
253 74 253 y' 00fd LATIN SMALL LETTER Y WITH ACUTE
89 116 89 Y 0059 LATIN CAPITAL LETTER Y
221 117 221 Y' 00dd LATIN CAPITAL LETTER Y WITH ACUTE
122 75 122 z 007a LATIN SMALL LETTER Z
158 76 382 z< 017e LATIN SMALL LETTER Z WITH CARON
90 118 90 Z 005a LATIN CAPITAL LETTER Z
142 119 381 Z< 017d LATIN CAPITAL LETTER Z WITH CARON
91 120 91 [
92 121 92 \
93 122 93 ]
94 123 94 ^
95 124 95 _
96 125 96 `
123 126 123 {
124 127 124 |
125 128 125 }
126 129 126 ~}
special: reduce [
to-string to-char 127 "ch"
to-string to-char 128 "cH"
to-string to-char 129 "Ch"
to-string to-char 130 "CH"
]
;;;;set up sort data structures
data: copy []
data: parse/all char-list "^/"
;make regular sort map
cz-reg: copy data
forall cz-reg [cz-reg/1: to-integer first parse cz-reg/1 none]
cz-reg: head cz-reg
;make case-sensitive sort map
cz-case: copy data
mysort: func [a b] [
(to-integer pick parse a none 2) < (to-integer pick parse b none 2)
]
;rearrange the list based on second field
sort/compare cz-case :mysort
forall cz-case [cz-case/1: to-integer first parse cz-case/1 none]
cz-case: head cz-case
;;;;new sort function
;not all 'sort refinements yet supported
;local words have not been specified
;error condition roll-back of block to original not yet added
czech-sort: func [:blk /case /reverse][
either case [order: cz-case][order: cz-reg]
;backup for future error checking and roll-back
blk-backup: copy blk
forall blk [
;change double special characters to single arbitrary character
foreach [sc oc] special [
while [loc: find/case blk/1 oc][change/part loc sc length? oc]
]
;swap index position for characters
temp: copy []
foreach b blk/1 [
t: find order to-integer b
append temp index? t
]
blk/1: temp
]
blk: head blk
;sort through REBOL 'sort
either reverse [
sort/reverse blk
][
sort blk
]
forall blk [
temp: copy []
;change index integer back to characters
foreach b blk/1 [append temp to-char order/:b]
;make a word out of characters
blk/1: copy rejoin temp
;swap special character sequences back in for single char
foreach [sc oc] special [
while [loc: find blk/1 sc][remove loc insert loc oc]
]
]
;reset head and block returns changed
blk: head blk
]
;;;;now for some testing
;these may not be official spellings - it is just what I had available
months: ["Leden" "Únor" "Brezen" "Duben" "Kveten" "Cerven"
Cervenec
"Srpen" "Zárí" "Ríjen" "Listopad" "Prosinec"]
czech-sort months
print ["Check month sort/case: " equal? months ["Brezen" "Cerven"
Cervenec
"Duben" "Kveten" "Leden" "Listopad" "Prosinec" "Ríjen"
Srpen
"Únor" "Zárí"]]
;foreach m months [print m]
czech-sort/case months
print ["Check month sort/case: " equal? months ["Brezen" "Cerven"
Cervenec
"Duben" "Kveten" "Leden" "Listopad" "Prosinec" "Ríjen"
Srpen
"Únor" "Zárí"]]
;foreach m months [print m]
days: ["Pondelí" "Úterý" "Streda" "Ctvrtek" "Pátek" "Sobota" "Nedele"]
czech-sort days
print ["Check day sort: " equal? days ["Ctvrtek" "Nedele" "Pátek"
Pondelí
"Sobota" "Streda" "Úterý"]]
;foreach d days [print d]
czech-sort/case days
print ["Check day sort/case: " equal? days ["Ctvrtek" "Nedele" "Pátek"
Pondelí
"Sobota" "Streda" "Úterý"]]
;foreach d days [print d]
;look moe closely at h and ch orders
pseudo-h-ch: ["chechen" "Czech" "HacHeCh" "heChum" "CHOM"]
czech-sort pseudo-h-ch
print ["Check pseudo word h-ch: " equal? pseudo-h-ch ["Czech" "heChum"
HacHeCh
"chechen" "CHOM"]]
;foreach d pseudo-h-ch [print d]
pseudo-h-ch: ["chechen" "Czech" "HacHeCh" "heChum" "CHOM"]
czech-sort/case pseudo-h-ch
print ["Check pseudo word h-ch: " equal? pseudo-h-ch ["heChum" "chechen"
Czech
"HacHeCh" "CHOM"]]
;foreach d pseudo-h-ch [print d]