Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Czech Sort - First Alpha

From: gjones05:mail:orion at: 11-Jun-2001 8:11

Hi, PeKr, et al, Here is the first alpha. I bundled all the files together for ease of testing. I left tabs in this format, which should be preserved once copied to a file. Also, feel free to suggest improvements to the regression testing examples (you will see what I mean when you see one of the examples for h and ch!). I had limited easy access to correct versions of names. --Scott Jones ;======================== REBOL [ Title: "Czech Language Sort Function" Date: 11-Jun-2001 Version: 0.0.1 Author: "G. Scott Jones, M.D." File: %czech-sort.r Purpose: {Sort support for Czech alphabet} Comment: {This is the first alpha release. Petr Krenzelok supplied the inspiration and initial language technical support. It is mainly alpha because no Czech speaker have actually reviewed the results! For these early versions, I've rolled the character sort list into this file for convenience. The routine is currently hard coded for Czech language only, but will readily be made more generic for other languages. The code is heavily commented for easy interpretation by others. The routine could also be rewritten to be a wrapper for REBOL 'sort, with a path refinement allowing for alternative language support. The to-do list is so long as to make it pointless for me to list at this stage. ;-) Now, I'll post to the list for review. USAGE: czech-sort series /case /reverse } History: [ 0.0.1 [11-Jun-2001 {First released for alpha review} "GSJ"] ] ] char-list: {32 1 32 33 2 33 ! 34 3 34 " 35 4 35 # 36 5 36 $ 37 6 37 % 38 7 38 & 39 8 39 ' 40 9 40 ( 41 10 41 ) 42 11 42 * 43 12 43 + 44 13 44 , 45 14 45 - 46 15 46 . 47 16 47 / 48 17 48 0 49 18 49 1 50 19 50 2 51 20 51 3 52 21 52 4 53 22 53 5 54 23 54 6 55 24 55 7 56 25 56 8 57 26 57 9 58 27 58 : 59 28 59 ; 60 29 60 < 61 30 61 62 31 62 > 63 32 63 ? 64 33 64 @ 97 34 97 a 0061 LATIN SMALL LETTER A 225 35 225 a' 00e1 LATIN SMALL LETTER A WITH ACUTE 65 77 65 A 0041 LATIN CAPITAL LETTER A 193 78 193 A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE 98 36 98 b 0062 LATIN SMALL LETTER B 66 79 66 B 0042 LATIN CAPITAL LETTER B 99 37 99 c 0063 LATIN SMALL LETTER C 232 38 269 c< 010d LATIN SMALL LETTER C WITH CARON 67 80 67 C 0043 LATIN CAPITAL LETTER C 200 81 268 C< 010c LATIN CAPITAL LETTER C WITH CARON 100 39 100 d 0064 LATIN SMALL LETTER D 239 40 271 d< 010f LATIN SMALL LETTER D WITH CARON 68 82 68 D 0044 LATIN CAPITAL LETTER D 207 83 270 D< 010e LATIN CAPITAL LETTER D WITH CARON 101 41 101 e 0065 LATIN SMALL LETTER E 233 42 233 e' 00e9 LATIN SMALL LETTER E WITH ACUTE 236 43 283 e< 011b LATIN SMALL LETTER E WITH CARON 69 84 69 E 0045 LATIN CAPITAL LETTER E 201 85 201 E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE 204 86 282 E< 011a LATIN CAPITAL LETTER E WITH CARON 102 44 102 f 0066 LATIN SMALL LETTER F 70 87 70 F 0046 LATIN CAPITAL LETTER F 103 45 103 g 0067 LATIN SMALL LETTER G 71 88 71 G 0047 LATIN CAPITAL LETTER G 104 46 104 h 0068 LATIN SMALL LETTER H 72 89 72 H 0048 LATIN CAPITAL LETTER H 127 47 127 ch special character combination 128 48 128 cH special character combination 129 90 129 Ch special character combination 130 91 130 CH special character combination 105 49 105 i 0069 LATIN SMALL LETTER I 237 50 237 i' 00ed LATIN SMALL LETTER I WITH ACUTE 73 92 73 I 0049 LATIN CAPITAL LETTER I 205 93 205 I' 00cd LATIN CAPITAL LETTER I WITH ACUTE 106 51 106 j 006a LATIN SMALL LETTER J 74 94 74 J 004a LATIN CAPITAL LETTER J 107 52 107 k 006b LATIN SMALL LETTER K 75 95 75 K 004b LATIN CAPITAL LETTER K 108 53 108 l 006c LATIN SMALL LETTER L 76 96 76 L 004c LATIN CAPITAL LETTER L 109 54 109 m 006d LATIN SMALL LETTER M 77 97 77 M 004d LATIN CAPITAL LETTER M 110 55 110 n 006e LATIN SMALL LETTER N 242 56 328 n< 0148 LATIN SMALL LETTER N WITH CARON 78 98 78 N 004e LATIN CAPITAL LETTER N 210 99 327 N< 0147 LATIN CAPITAL LETTER N WITH CARON 111 57 111 o 006f LATIN SMALL LETTER O 243 58 243 o' 00f3 LATIN SMALL LETTER O WITH ACUTE 79 100 79 O 004f LATIN CAPITAL LETTER O 211 101 211 O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE 112 59 112 p 0070 LATIN SMALL LETTER P 80 102 80 P 0050 LATIN CAPITAL LETTER P 113 60 113 q 0071 LATIN SMALL LETTER Q 81 103 81 Q 0051 LATIN CAPITAL LETTER Q 114 61 114 r 0072 LATIN SMALL LETTER R 248 62 345 r< 0159 LATIN SMALL LETTER R WITH CARON 82 104 82 R 0052 LATIN CAPITAL LETTER R 216 105 344 R< 0158 LATIN CAPITAL LETTER R WITH CARON 115 63 115 s 0073 LATIN SMALL LETTER S 154 64 353 s< 0161 LATIN SMALL LETTER S WITH CARON 83 106 83 S 0053 LATIN CAPITAL LETTER S 169 107 352 S< 0160 LATIN CAPITAL LETTER S WITH CARON 116 65 116 t 0074 LATIN SMALL LETTER T 187 66 357 t< 0165 LATIN SMALL LETTER T WITH CARON 84 108 84 T 0054 LATIN CAPITAL LETTER T 171 109 356 T< 0164 LATIN CAPITAL LETTER T WITH CARON 117 67 117 u 0075 LATIN SMALL LETTER U 249 68 367 u0 016f LATIN SMALL LETTER U WITH RING ABOVE 250 69 250 u' 00fa LATIN SMALL LETTER U WITH ACUTE 85 110 85 U 0055 LATIN CAPITAL LETTER U 217 111 366 U0 016e LATIN CAPITAL LETTER U WITH RING ABOVE 218 112 218 U' 00da LATIN CAPITAL LETTER U WITH ACUTE 118 70 118 v 0076 LATIN SMALL LETTER V 86 113 86 V 0056 LATIN CAPITAL LETTER V 119 71 119 w 0077 LATIN SMALL LETTER W 87 114 87 W 0057 LATIN CAPITAL LETTER W 120 72 120 x 0078 LATIN SMALL LETTER X 88 115 88 X 0058 LATIN CAPITAL LETTER X 121 73 121 y 0079 LATIN SMALL LETTER Y 253 74 253 y' 00fd LATIN SMALL LETTER Y WITH ACUTE 89 116 89 Y 0059 LATIN CAPITAL LETTER Y 221 117 221 Y' 00dd LATIN CAPITAL LETTER Y WITH ACUTE 122 75 122 z 007a LATIN SMALL LETTER Z 158 76 382 z< 017e LATIN SMALL LETTER Z WITH CARON 90 118 90 Z 005a LATIN CAPITAL LETTER Z 142 119 381 Z< 017d LATIN CAPITAL LETTER Z WITH CARON 91 120 91 [ 92 121 92 \ 93 122 93 ] 94 123 94 ^ 95 124 95 _ 96 125 96 ` 123 126 123 { 124 127 124 | 125 128 125 } 126 129 126 ~} special: reduce [ to-string to-char 127 "ch" to-string to-char 128 "cH" to-string to-char 129 "Ch" to-string to-char 130 "CH" ] ;;;;set up sort data structures data: copy [] data: parse/all char-list "^/" ;make regular sort map cz-reg: copy data forall cz-reg [cz-reg/1: to-integer first parse cz-reg/1 none] cz-reg: head cz-reg ;make case-sensitive sort map cz-case: copy data mysort: func [a b] [ (to-integer pick parse a none 2) < (to-integer pick parse b none 2) ] ;rearrange the list based on second field sort/compare cz-case :mysort forall cz-case [cz-case/1: to-integer first parse cz-case/1 none] cz-case: head cz-case ;;;;new sort function ;not all 'sort refinements yet supported ;local words have not been specified ;error condition roll-back of block to original not yet added czech-sort: func [:blk /case /reverse][ either case [order: cz-case][order: cz-reg] ;backup for future error checking and roll-back blk-backup: copy blk forall blk [ ;change double special characters to single arbitrary character foreach [sc oc] special [ while [loc: find/case blk/1 oc][change/part loc sc length? oc] ] ;swap index position for characters temp: copy [] foreach b blk/1 [ t: find order to-integer b append temp index? t ] blk/1: temp ] blk: head blk ;sort through REBOL 'sort either reverse [ sort/reverse blk ][ sort blk ] forall blk [ temp: copy [] ;change index integer back to characters foreach b blk/1 [append temp to-char order/:b] ;make a word out of characters blk/1: copy rejoin temp ;swap special character sequences back in for single char foreach [sc oc] special [ while [loc: find blk/1 sc][remove loc insert loc oc] ] ] ;reset head and block returns changed blk: head blk ] ;;;;now for some testing ;these may not be official spellings - it is just what I had available months: ["Leden" "Únor" "Brezen" "Duben" "Kveten" "Cerven" Cervenec "Srpen" "Zárí" "Ríjen" "Listopad" "Prosinec"] czech-sort months print ["Check month sort/case: " equal? months ["Brezen" "Cerven" Cervenec "Duben" "Kveten" "Leden" "Listopad" "Prosinec" "Ríjen" Srpen "Únor" "Zárí"]] ;foreach m months [print m] czech-sort/case months print ["Check month sort/case: " equal? months ["Brezen" "Cerven" Cervenec "Duben" "Kveten" "Leden" "Listopad" "Prosinec" "Ríjen" Srpen "Únor" "Zárí"]] ;foreach m months [print m] days: ["Pondelí" "Úterý" "Streda" "Ctvrtek" "Pátek" "Sobota" "Nedele"] czech-sort days print ["Check day sort: " equal? days ["Ctvrtek" "Nedele" "Pátek" Pondelí "Sobota" "Streda" "Úterý"]] ;foreach d days [print d] czech-sort/case days print ["Check day sort/case: " equal? days ["Ctvrtek" "Nedele" "Pátek" Pondelí "Sobota" "Streda" "Úterý"]] ;foreach d days [print d] ;look moe closely at h and ch orders pseudo-h-ch: ["chechen" "Czech" "HacHeCh" "heChum" "CHOM"] czech-sort pseudo-h-ch print ["Check pseudo word h-ch: " equal? pseudo-h-ch ["Czech" "heChum" HacHeCh "chechen" "CHOM"]] ;foreach d pseudo-h-ch [print d] pseudo-h-ch: ["chechen" "Czech" "HacHeCh" "heChum" "CHOM"] czech-sort/case pseudo-h-ch print ["Check pseudo word h-ch: " equal? pseudo-h-ch ["heChum" "chechen" Czech "HacHeCh" "CHOM"]] ;foreach d pseudo-h-ch [print d]