I miss ..., bug in to-local-file ...

[1/9] from: petr::krenzelok::trz::cz at: 8-Jun-2001 13:22

Hi, 1) today I worked on some short virus user log analysing script producing .html file output, and while I was able to use Czech langueage characters, I have two following observations: a) Special characters are not correctly sorted, e.g. S^ is not following S , but is placed right after "Z" ... not sure if it is currently solvable ... b) we have system/locale settings, but even to-idate ignores them: ->> source to-idate to-idate: func [ "Returns a standard Internet date string." date [date!] /local str ][ str: form date/zone remove find str ":" if (first str) <> #"-" [insert str #"+"] if (length? str) <= 4 [insert next str #"0"] head insert str reform [ pick ["Mon," "Tue," "Wed," "Thu," "Fri," "Sat," "Sun,"] date/weekday date/day pick [ "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" Nov "Dec" ] date/month date/year date/time "" ] ] I would like to suggest adding to-local-date mezzanine, to return local date setting, if possible ... currently I am just using my own solution: system/locale/months: ["Leden" "�nor" "B�ezen" "Duben" "Kv�ten" "�erven" �ervenec "Srpen" "Z��" "��jen" "Listopad" "Prosinec"] system/locale/days: ["Pond�l�" "�ter�" "St�eda" "�tvrtek" "P�tek" Sobota "Ned�le"] months: [ Jan "01" Feb "02" Mar "03" Apr "04" May "05" Jun "06" Jul "07" Aug "08" Sep "09" Oct "10" Nov "11" Dec "12" ] to-local-date: func [rebdate][ tmp: parse to-string rebdate "-" return join "" [either (to-integer first tmp) < 10 [join "0" first tmp][first tmp] "." select months second tmp "." third tmp] ] ... but, isn't there any general solution to the problem? (maybe native function getting OS locale setting?) 2) ->> join to-local-file %/C/Work/ "ble.txt" == %C:\Workble.txt Cheers, -pekr-

[2/9] from: gjones05:mail:orion at: 8-Jun-2001 7:32

From: "Petr Krenzelok"

> 1) today I worked on some short virus user log analysing script > producing .html file output, and while I was able to use Czech langueage > characters, I have two following observations: > > a) Special characters are not correctly sorted, e.g. S^ is not following > "S", but is placed right after "Z" ... not sure if it is currently > solvable ...

... Hi, Petr, At least in the interim, a proper sort may be able to be accomplished with a custom sort. I'm trying to find the right code page that covers Czech. The following looks to be it. http://czyborra.com/charsets/iso8859.html#ISO-8859-2 Given that this is correct, next I need to confirm that the ordering is correct. For example, there are a number of what we call capital U's with the various diacritical marks. Does this table show them to be in the correct sort order? Does a "C" come before a "C" with the diacritical marks? --Scott Jones

[3/9] from: petr:krenzelok:trz:cz at: 8-Jun-2001 15:28

GS Jones wrote:

> From: "Petr Krenzelok" > > 1) today I worked on some short virus user log analysing script

<<quoted lines omitted: 14>>

> diacritical marks. Does this table show them to be in the correct sort order? > Does a "C" come before a "C" with the diacritical marks?

Yes, it does ..., what is more - lowercase letters first, uppercase follows ... One special case however :-) "ch" is regarded being special char combination, and it follows "h" :-) Maybe Ladislav or Richard (aka Cyphre :-) could comment? -pekr-

[4/9] from: agem:crosswinds at: 8-Jun-2001 16:28

>>>>>>>>>>>>>>>>>> Urspr�ngliche Nachricht <<<<<<<<<<<<<<<<<<

Am 08.06.01, 14:28:19, schrieb Petr Krenzelok <[Petr--Krenzelok--trz--cz]> zum Thema [REBOL] Re: I miss ..., bug in to-local-file ...:

> GS Jones wrote: > > From: "Petr Krenzelok" > > > 1) today I worked on some short virus user log analysing script > > > producing .html file output, and while I was able to use Czech

langueage

> > > characters, I have two following observations: > > > > > > a) Special characters are not correctly sorted, e.g. S^ is not

following

> > > "S", but is placed right after "Z" ... not sure if it is currently > > > solvable ...

<<quoted lines omitted: 3>>

> > > > At least in the interim, a proper sort may be able to be accomplished

with a

> > custom sort. I'm trying to find the right code page that covers

Czech. The

> > following looks to be it. > > > > http://czyborra.com/charsets/iso8859.html#ISO-8859-2 > > > > Given that this is correct, next I need to confirm that the ordering

is correct.

> > For example, there are a number of what we call capital U's with the

various

> > diacritical marks. Does this table show them to be in the correct

sort order?

> > Does a "C" come before a "C" with the diacritical marks? > Yes, it does ..., what is more - lowercase letters first, uppercase

follows ...

> One special case however :-) "ch" is regarded being special char

combination, and it

> follows "h" :-)

just a suggestion: instead of using a customized sort it may be usefull to use customized encoding. then you would translate the original strings to a �sort-form�, and sort with this. of course keep the original! which could be like parse string[any[ �a� (append sort-form to char! �a�) | �ch� (append sort-form to char! 1 + #�h�) ] ] with some (a lot?) clever organisation.

> Maybe Ladislav or Richard (aka Cyphre :-) could comment? > -pekr-

-Volker

[5/9] from: cyphre:volny:cz at: 8-Jun-2001 18:01

Hello Pekr, Scott

> Given that this is correct, next I need to confirm that the ordering is

correct.

> For example, there are a number of what we call capital U's with the

various

> diacritical marks. Does this table show them to be in the correct sort

order?

> Does a "C" come before a "C" with the diacritical marks? > > Yes, it does ..., what is more - lowercase letters first, uppercase

follows ...

> One special case however :-) "ch" is regarded being special char

combination, and it

> follows "h" :-) > Maybe Ladislav or Richard (aka Cyphre :-) could comment? >

Pekr is right. Thats our strange alphabet :-) Regards, Cyphre

[6/9] from: gjones05:mail:orion at: 8-Jun-2001 11:17

From: "Volker Nitsch" ...

> just a suggestion: instead of using a customized sort > it may be usefull to use customized encoding.

<<quoted lines omitted: 8>>

> ] > with some (a lot?) clever organisation.

Hi, Volker, (BTW, sorry about jumping in on Ladislav's challenge to you the other day...) I am feeling like an idiot today (which is not necessarily *that* unusual), but I am not sure that I understand what you are suggesting. It appears as though you are setting up a kind of map, where Czech special character S< 0160 LATIN CAPITAL LETTER S WITH CARON maps to the spot following the "less special" character S 0053 LATIN CAPITAL LETTER S. If that is a correct characterization, then I believe I understand that part (and was already working on a type of mapping). But I don't understand how the sort-form is sorted. I feel as though I am missing the point (I am looking at the sheet of music and not hearing the song again). Do not feel as though you have to spend in a lot of time trying to give "clue" to the "clueless", but if you have a very brief way to explain further, or a brief partial example, I would appreciate it. I believe that I already have a solution worked out, but I'm still collecting a mapping with any free moments that I have. Thanks for your input. You frequently have great ideas, so I wouldn't want to overlook a better method!!! --Scott Jones

[7/9] from: gjones05:mail:orion at: 8-Jun-2001 11:32

As I mentioned in note to Volker, I am putting together a complete map of the characters, their hex representation, and for completeness, their Unicode version, and I am trying to put them in order. When I am complete, I'll submit for our Czech friends' inspection. Just a quick question, though, looking through Petr's company's pages, the 'ch' appears as two characters, right? It is not like the blended letters that I have seen that are represented as one character. Now, off to lunch for me. --Scott Jones

[8/9] from: agem:crosswinds at: 8-Jun-2001 18:48

>>>>>>>>>>>>>>>>>> Urspr�ngliche Nachricht <<<<<<<<<<<<<<<<<<

Am 08.06.01, 17:17:56, schrieb "GS Jones" <[gjones05--mail--orion--org]> zum Thema [REBOL] Re: I miss ..., bug in to-local-file ...:

> From: "Volker Nitsch" > ...

<<quoted lines omitted: 13>>

> Hi, Volker, > (BTW, sorry about jumping in on Ladislav's challenge to you the other

day...) *grin* no it was funny :) i checked mail, ok, place for me, copied it from the pad, wanted to send and there was your message. with my text (very close) the �unfair� should be something like �comedy angry� (oh yes, english..) was a compliment in a way ;-)

> I am feeling like an idiot today (which is not necessarily *that*

unusual), but

> I am not sure that I understand what you are suggesting. It appears

as though

> you are setting up a kind of map, where Czech special character S<

0160 LATIN

> CAPITAL LETTER S WITH CARON maps to the spot following the "less

special"

> character S 0053 LATIN CAPITAL LETTER S. If that is a correct

characterization,

> then I believe I understand that part (and was already working on a

type of

> mapping).

yes, its correct. this confusing poor �snippet� above should suggest to do the mapping with parse. this mysterious [| �ch� ] should suggest to map �ch� to a single �char� too.

> But I don't understand how the sort-form is sorted.

sorry for the confusion. its not really needed my way, i think. i would map the strings before sorting, adding them to the data and doing the sort on these strings instead of the originals. this should be faster than sort/custom & mapping new every compare. but performance is not the matter i think. so sort/custom may be simpler. and somehow i feel you have thought at this already.. ;-)

> I feel as though > I am missing the point (I am looking at the sheet of music and not

hearing the

> song again). Do not feel as though you have to spend in a lot of time

trying to

> give "clue" to the "clueless", but if you have a very brief way to

explain

> further, or a brief partial example, I would appreciate it. I believe

that I

> already have a solution worked out, but I'm still collecting a mapping

with any

> free moments that I have.

i think i have an idea for a �mapping tool� - have to play with it. hmm...

[9/9] from: agem:crosswinds at: 8-Jun-2001 21:28

{ Hi Scott! not really sure if this is usefull (now, after coding..). but to illustrate the idea.. if you (all) think its worth some more work, tell me.. -Volker } [rebol [title: "sort-mapper" file: %czech0.r purpose: { to make it easier to generate sort-mappings for foreign languages. protoprototype. } usage: { look for ;CHANGE BASE-MAP. add your mapping/ordering changes. ;PATCH PARSE-ORDER to tell parse first look for "ch", then for "c" (longer before shorter) ;SORT-DEMO look here to sort a bit } comment: { slow. encodes ~15k/sec on 350mhz (10 * 3k) (without map generation) speedup with charsets possible? simple lookup-table for single chars should be best for speed i expect. - upper/lowercase makes problems with "ch" and that ("ch" "Ch" "cH" "CH" ..) need some kind of partly-mixed-case-parse.. at least for ä .. - chars > 127 throw error, except chars explicit inserted. can be changed. } ] ;protect-system ???: func ['word value] [ print [mold :word " : " mold :value] word :value ] ;generate base map ;--- base-map: copy [] ;some[+ "" | = "" ] repeat i 127 [ append base-map compose [+ (to string! to char! i)] ] ;CHANGE BASE-MAP. ;--- ; todo: swap somehow upper & lowercase.. p: find/case/tail base-map "h" ; '= means same priority, '+ means new char insert p [+ "ch" = "CH" = "Ch" = "cH"] p: find/case/tail base-map "a" ;some german ;) insert p [+ "ä" = "ae"] ;? base-map ;calculate codes ;--- explicit-map: copy [] ;some[find-string new-code] next-char: to char! 0 forskip base-map 2 [ if '+ = base-map/1 [next-char: next-char + 1] repend explicit-map [base-map/2 next-char] ] ;and restructuring. we need to be able to parse ;"ch" before "c", "ae" before "a".. ;without changing code ;--- to-tail: func ["move string to end of explicit-map" s /local p ] [ p: find/case explicit-map s append p copy/part p 2 remove/part p 2 ] ;PATCH PARSE-ORDER ;--- to-tail "c" to tail "C" to-tail "a" to-tail "&" ;? explicit-map ;generate rule ;--- step-rule: copy [] forskip explicit-map 2 [ append step-rule compose [ ;hacky line | (explicit-map/1) ( to paren! compose [ append out (explicit-map/2)]) ] ] step-rule: next step-rule ;? step-rule ;encode-func ;--- out: none encode: func [s] [ out: copy "" if not parse/all/case s [some [here: step-rule]] [ throw make error! mold copy/part skip here -10 100 ] out ] ;SORT-DEMO ;--- strings: [ {achz} x {ch} x {ahz} x {aiz} x ] s: strings forskip s 2 [s/2: encode s/1] ? strings sort/skip/compare strings 2 2 ? strings ;---benchmark ;--- print "---benchmarking encode-rate" dat: read %czech0.r probe length? dat start: now/precise/time loop 10 [encode dat] probe time: now/precise/time - start print [10 * (length? dat) / to decimal! time "bytes/sec"] print "done" ]

>>>>>>>>>>>>>>>>>> Urspr�ngliche Nachricht <<<<<<<<<<<<<<<<<<

Am 08.06.01, 19:48:03, schrieb Volker Nitsch <[agem--crosswinds--net]> zum Thema [REBOL] Re: I miss ..., bug in to-local-file ...:

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted