Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

I miss ..., bug in to-local-file ...

 [1/9] from: petr::krenzelok::trz::cz at: 8-Jun-2001 13:22


Hi, 1) today I worked on some short virus user log analysing script producing .html file output, and while I was able to use Czech langueage characters, I have two following observations: a) Special characters are not correctly sorted, e.g. S^ is not following S , but is placed right after "Z" ... not sure if it is currently solvable ... b) we have system/locale settings, but even to-idate ignores them: ->> source to-idate to-idate: func [ "Returns a standard Internet date string." date [date!] /local str ][ str: form date/zone remove find str ":" if (first str) <> #"-" [insert str #"+"] if (length? str) <= 4 [insert next str #"0"] head insert str reform [ pick ["Mon," "Tue," "Wed," "Thu," "Fri," "Sat," "Sun,"] date/weekday date/day pick [ "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" Nov "Dec" ] date/month date/year date/time "" ] ] I would like to suggest adding to-local-date mezzanine, to return local date setting, if possible ... currently I am just using my own solution: system/locale/months: ["Leden" "Únor" "Březen" "Duben" "Květen" "Červen" Červenec "Srpen" "Září" "Říjen" "Listopad" "Prosinec"] system/locale/days: ["Pondělí" "Úterý" "Středa" "Čtvrtek" "Pátek" Sobota "Neděle"] months: [ Jan "01" Feb "02" Mar "03" Apr "04" May "05" Jun "06" Jul "07" Aug "08" Sep "09" Oct "10" Nov "11" Dec "12" ] to-local-date: func [rebdate][ tmp: parse to-string rebdate "-" return join "" [either (to-integer first tmp) < 10 [join "0" first tmp][first tmp] "." select months second tmp "." third tmp] ] ... but, isn't there any general solution to the problem? (maybe native function getting OS locale setting?) 2) ->> join to-local-file %/C/Work/ "ble.txt" == %C:\Workble.txt Cheers, -pekr-

 [2/9] from: gjones05:mail:orion at: 8-Jun-2001 7:32


From: "Petr Krenzelok"
> 1) today I worked on some short virus user log analysing script > producing .html file output, and while I was able to use Czech langueage > characters, I have two following observations: > > a) Special characters are not correctly sorted, e.g. S^ is not following > "S", but is placed right after "Z" ... not sure if it is currently > solvable ...
... Hi, Petr, At least in the interim, a proper sort may be able to be accomplished with a custom sort. I'm trying to find the right code page that covers Czech. The following looks to be it. http://czyborra.com/charsets/iso8859.html#ISO-8859-2 Given that this is correct, next I need to confirm that the ordering is correct. For example, there are a number of what we call capital U's with the various diacritical marks. Does this table show them to be in the correct sort order? Does a "C" come before a "C" with the diacritical marks? --Scott Jones

 [3/9] from: petr:krenzelok:trz:cz at: 8-Jun-2001 15:28


GS Jones wrote:
> From: "Petr Krenzelok" > > 1) today I worked on some short virus user log analysing script
<<quoted lines omitted: 14>>
> diacritical marks. Does this table show them to be in the correct sort order? > Does a "C" come before a "C" with the diacritical marks?
Yes, it does ..., what is more - lowercase letters first, uppercase follows ... One special case however :-) "ch" is regarded being special char combination, and it follows "h" :-) Maybe Ladislav or Richard (aka Cyphre :-) could comment? -pekr-

 [4/9] from: agem:crosswinds at: 8-Jun-2001 16:28


>>>>>>>>>>>>>>>>>> Ursprüngliche Nachricht <<<<<<<<<<<<<<<<<<
Am 08.06.01, 14:28:19, schrieb Petr Krenzelok <[Petr--Krenzelok--trz--cz]> zum Thema [REBOL] Re: I miss ..., bug in to-local-file ...:
> GS Jones wrote: > > From: "Petr Krenzelok" > > > 1) today I worked on some short virus user log analysing script > > > producing .html file output, and while I was able to use Czech
langueage
> > > characters, I have two following observations: > > > > > > a) Special characters are not correctly sorted, e.g. S^ is not
following
> > > "S", but is placed right after "Z" ... not sure if it is currently > > > solvable ...
<<quoted lines omitted: 3>>
> > > > At least in the interim, a proper sort may be able to be accomplished
with a
> > custom sort. I'm trying to find the right code page that covers
Czech. The
> > following looks to be it. > > > > http://czyborra.com/charsets/iso8859.html#ISO-8859-2 > > > > Given that this is correct, next I need to confirm that the ordering
is correct.
> > For example, there are a number of what we call capital U's with the
various
> > diacritical marks. Does this table show them to be in the correct
sort order?
> > Does a "C" come before a "C" with the diacritical marks? > Yes, it does ..., what is more - lowercase letters first, uppercase
follows ...
> One special case however :-) "ch" is regarded being special char
combination, and it
> follows "h" :-)
just a suggestion: instead of using a customized sort it may be usefull to use customized encoding. then you would translate the original strings to a »sort-form«, and sort with this. of course keep the original! which could be like parse string[any[ »a« (append sort-form to char! »a«) | »ch« (append sort-form to char! 1 + #«h«) ] ] with some (a lot?) clever organisation.
> Maybe Ladislav or Richard (aka Cyphre :-) could comment? > -pekr-
-Volker

 [5/9] from: cyphre:volny:cz at: 8-Jun-2001 18:01


Hello Pekr, Scott
> Given that this is correct, next I need to confirm that the ordering is
correct.
> For example, there are a number of what we call capital U's with the
various
> diacritical marks. Does this table show them to be in the correct sort
order?
> Does a "C" come before a "C" with the diacritical marks? > > Yes, it does ..., what is more - lowercase letters first, uppercase
follows ...
> One special case however :-) "ch" is regarded being special char
combination, and it
> follows "h" :-) > Maybe Ladislav or Richard (aka Cyphre :-) could comment? >
Pekr is right. Thats our strange alphabet :-) Regards, Cyphre

 [6/9] from: gjones05:mail:orion at: 8-Jun-2001 11:17


From: "Volker Nitsch" ...
> just a suggestion: instead of using a customized sort > it may be usefull to use customized encoding.
<<quoted lines omitted: 8>>
> ] > with some (a lot?) clever organisation.
Hi, Volker, (BTW, sorry about jumping in on Ladislav's challenge to you the other day...) I am feeling like an idiot today (which is not necessarily *that* unusual), but I am not sure that I understand what you are suggesting. It appears as though you are setting up a kind of map, where Czech special character S< 0160 LATIN CAPITAL LETTER S WITH CARON maps to the spot following the "less special" character S 0053 LATIN CAPITAL LETTER S. If that is a correct characterization, then I believe I understand that part (and was already working on a type of mapping). But I don't understand how the sort-form is sorted. I feel as though I am missing the point (I am looking at the sheet of music and not hearing the song again). Do not feel as though you have to spend in a lot of time trying to give "clue" to the "clueless", but if you have a very brief way to explain further, or a brief partial example, I would appreciate it. I believe that I already have a solution worked out, but I'm still collecting a mapping with any free moments that I have. Thanks for your input. You frequently have great ideas, so I wouldn't want to overlook a better method!!! --Scott Jones

 [7/9] from: gjones05:mail:orion at: 8-Jun-2001 11:32


As I mentioned in note to Volker, I am putting together a complete map of the characters, their hex representation, and for completeness, their Unicode version, and I am trying to put them in order. When I am complete, I'll submit for our Czech friends' inspection. Just a quick question, though, looking through Petr's company's pages, the 'ch' appears as two characters, right? It is not like the blended letters that I have seen that are represented as one character. Now, off to lunch for me. --Scott Jones

 [8/9] from: agem:crosswinds at: 8-Jun-2001 18:48


>>>>>>>>>>>>>>>>>> Ursprüngliche Nachricht <<<<<<<<<<<<<<<<<<
Am 08.06.01, 17:17:56, schrieb "GS Jones" <[gjones05--mail--orion--org]> zum Thema [REBOL] Re: I miss ..., bug in to-local-file ...:
> From: "Volker Nitsch" > ...
<<quoted lines omitted: 13>>
> Hi, Volker, > (BTW, sorry about jumping in on Ladislav's challenge to you the other
day...) *grin* no it was funny :) i checked mail, ok, place for me, copied it from the pad, wanted to send and there was your message. with my text (very close) the »unfair« should be something like »comedy angry« (oh yes, english..) was a compliment in a way ;-)
> I am feeling like an idiot today (which is not necessarily *that*
unusual), but
> I am not sure that I understand what you are suggesting. It appears
as though
> you are setting up a kind of map, where Czech special character S<
0160 LATIN
> CAPITAL LETTER S WITH CARON maps to the spot following the "less
special"
> character S 0053 LATIN CAPITAL LETTER S. If that is a correct
characterization,
> then I believe I understand that part (and was already working on a
type of
> mapping).
yes, its correct. this confusing poor »snippet« above should suggest to do the mapping with parse. this mysterious [| »ch« ] should suggest to map »ch« to a single »char« too.
> But I don't understand how the sort-form is sorted.
sorry for the confusion. its not really needed my way, i think. i would map the strings before sorting, adding them to the data and doing the sort on these strings instead of the originals. this should be faster than sort/custom & mapping new every compare. but performance is not the matter i think. so sort/custom may be simpler. and somehow i feel you have thought at this already.. ;-)
> I feel as though > I am missing the point (I am looking at the sheet of music and not
hearing the
> song again). Do not feel as though you have to spend in a lot of time
trying to
> give "clue" to the "clueless", but if you have a very brief way to
explain
> further, or a brief partial example, I would appreciate it. I believe
that I
> already have a solution worked out, but I'm still collecting a mapping
with any
> free moments that I have.
i think i have an idea for a »mapping tool« - have to play with it. hmm...

 [9/9] from: agem:crosswinds at: 8-Jun-2001 21:28


{ Hi Scott! not really sure if this is usefull (now, after coding..). but to illustrate the idea.. if you (all) think its worth some more work, tell me.. -Volker } [rebol [title: "sort-mapper" file: %czech0.r purpose: { to make it easier to generate sort-mappings for foreign languages. protoprototype. } usage: { look for ;CHANGE BASE-MAP. add your mapping/ordering changes. ;PATCH PARSE-ORDER to tell parse first look for "ch", then for "c" (longer before shorter) ;SORT-DEMO look here to sort a bit } comment: { slow. encodes ~15k/sec on 350mhz (10 * 3k) (without map generation) speedup with charsets possible? simple lookup-table for single chars should be best for speed i expect. - upper/lowercase makes problems with "ch" and that ("ch" "Ch" "cH" "CH" ..) need some kind of partly-mixed-case-parse.. at least for &auml; .. - chars > 127 throw error, except chars explicit inserted. can be changed. } ] ;protect-system ???: func ['word value] [ print [mold :word " : " mold :value] word :value ] ;generate base map ;--- base-map: copy [] ;some[+ "" | = "" ] repeat i 127 [ append base-map compose [+ (to string! to char! i)] ] ;CHANGE BASE-MAP. ;--- ; todo: swap somehow upper & lowercase.. p: find/case/tail base-map "h" ; '= means same priority, '+ means new char insert p [+ "ch" = "CH" = "Ch" = "cH"] p: find/case/tail base-map "a" ;some german ;) insert p [+ "&auml;" = "ae"] ;? base-map ;calculate codes ;--- explicit-map: copy [] ;some[find-string new-code] next-char: to char! 0 forskip base-map 2 [ if '+ = base-map/1 [next-char: next-char + 1] repend explicit-map [base-map/2 next-char] ] ;and restructuring. we need to be able to parse ;"ch" before "c", "ae" before "a".. ;without changing code ;--- to-tail: func ["move string to end of explicit-map" s /local p ] [ p: find/case explicit-map s append p copy/part p 2 remove/part p 2 ] ;PATCH PARSE-ORDER ;--- to-tail "c" to tail "C" to-tail "a" to-tail "&" ;? explicit-map ;generate rule ;--- step-rule: copy [] forskip explicit-map 2 [ append step-rule compose [ ;hacky line | (explicit-map/1) ( to paren! compose [ append out (explicit-map/2)]) ] ] step-rule: next step-rule ;? step-rule ;encode-func ;--- out: none encode: func [s] [ out: copy "" if not parse/all/case s [some [here: step-rule]] [ throw make error! mold copy/part skip here -10 100 ] out ] ;SORT-DEMO ;--- strings: [ {achz} x {ch} x {ahz} x {aiz} x ] s: strings forskip s 2 [s/2: encode s/1] ? strings sort/skip/compare strings 2 2 ? strings ;---benchmark ;--- print "---benchmarking encode-rate" dat: read %czech0.r probe length? dat start: now/precise/time loop 10 [encode dat] probe time: now/precise/time - start print [10 * (length? dat) / to decimal! time "bytes/sec"] print "done" ]
>>>>>>>>>>>>>>>>>> Ursprüngliche Nachricht <<<<<<<<<<<<<<<<<<
Am 08.06.01, 19:48:03, schrieb Volker Nitsch <[agem--crosswinds--net]> zum Thema [REBOL] Re: I miss ..., bug in to-local-file ...:

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted