World: r4wp
[Rebol School] REBOL School
older newer | first last |
Pekr 20-Jun-2012 [395x3] | I am somehow not able to load one czech text properly .... |
I mean - text I need to input into the resulting file (UTF-8) is ANSI. I do print to-string read %text-slider.html, and in R3 console, Czech text is not correct .... | |
I'll try with some other version than rather old view.exe | |
Kaj 20-Jun-2012 [398] | The console may be broken. How about the actual text, in an editor? |
Pekr 20-Jun-2012 [399] | in editor, it's correct. Simply put - I read czech text from an ansi file, and it is distorted in console, ditto when writing it back to file of course .... |
Kaj 20-Jun-2012 [400x2] | When you cut and paste it from the console, or when you write it with REBOL? |
So you're saying the input file is not UTF-8? | |
Pekr 20-Jun-2012 [402x2] | Yes, ANSI. I solved it by re-saving the same source file as UTF-8 istead of ANSI. Still a bad complication, as by default, Windows sets Notepad to ANSI, so it is a bit inconvenient ... |
I am surprised R3 is not able to properly read/decode ANSI file with Czech alphabet ... | |
Endo 20-Jun-2012 [404x4] | Guiseppe: "I am not ablie to understand the use of Break. Why it is useful ?" I'll try to explain: >> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy x any [".txt" | ".dat" | skip] (print x)]] http://a.txthttp://b.dat;it prints just one line, from the first http:// to the last .dat >> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy x any [".txt" break | ".dat" break | skip] (print x)]] http://a.txt;now it works as expected, from http:// to .txt and breaks http://b.dat;and from the next http:// to .dat |
Guiseppe: "Could it be written as: ..." TO ANY doesn't work. but ANY [TO "..." BREAK | TO "..." BREAK] works. just be careful using ANY and TO together, because they both don't advance the series pointer. So you can easily put the console in an infinit loop (escape key also doesn't work) | |
But still there is a problem in your example. Here I'll try to explain: >> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy x any [thru ".txt" (print 1) break | thru ".dat" (print 2) break | skip (print 3)] (print x)]] 1 http://a.txt 2 http://b.dat it looks correct. but actually it depends on which one is first (.txt or .dat) here is the problem: >> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy x [thru ".dat" (print 1) | thru ".txt" (print 2) | skip (print 3)] (print x)]] 1 http://a.txthttp://b.dat | |
hmm.. links look weird in AltME, select all text, copy and paste to a text editor to see it correctly. | |
BrianH 20-Jun-2012 [408] | Petr, R3 can't decode any 8bit encodings with its built-in code, just ASCII (which is 7bit) and UTF-8. However, its binary handling is better so it should be easy to write your own converters. For R2, I would suggest looking at Gabriele's PowerMezz package; it has some great text converters. Of course you lose out on R3's PARSE if you use R2. |
Rebolek 21-Jun-2012 [409] | Pekr, look for Oldes' UTF8 package for Rebol 2, I believe it's on rebol.org, it can convert anything (it supports downloading code pages from net) to/from UTF8, I really saved me lot of time when I was working on translations for Windows Vista. |
Pekr 21-Jun-2012 [410x2] | Rebolek - thanks, I forgot about it. I needed it only once in the past, and so I used iconv command line tool via CALL .... |
Recently I switched to R3, as I don't need the gui, just a script to do some webpage source code post-processing .... | |
GiuseppeC 21-Jun-2012 [412x3] | End I have tried. Exchanging the position of .txt and .dat I have only a single line. How it could be solved ? |
Also, which is the purpose of the SKIP as third option ? | |
Also in Ladislav's example why there is only one END ? | |
Arnold 21-Jun-2012 [415] | On my mac the script I made on windows using a couple of international characters the chars are also displayed wrong. "Nederlands" "English" "Deutsch" "Français" "Español" "Italiano" "Português". When I saved as UTF-8 I hoped my problems would have resolved, but then REBOL complained my script had no REBOL header. :-( |
Ladislav 21-Jun-2012 [416x11] | In Ladislav's examples I am not ablie to understand the use of Break. Why it is useful ? - in any [</div> break ...] the BREAK keyword stops searching for the terminator when one was found (</div>). If you don't use BREAK, you simply don't stop searching even if you already found the terminator. |
Also in the second example why there isn't a end:" before "</div> break" ?" - it is because the first END: was already used and the position is remembered. (however, you can use end: twice if you like) | |
You should just remember that end: never fails, so the expression: end: </div> break | </br> break | ... is equivalent to: end: [</div> break | </br> break ...] , i.e., the end: part is known for all alternatives | |
'Could it be written as: parse/all s1 [any [to "http://"copy link TO any [</br> break | </div> break | skip] (print link)]]' - no, since: - TO ANY is not supported - if it were supported it would not do what you want (you want to find the first terminator whatever it is, while TO ANY would find the </div> if it were in the input text even when a "closer" </br> would be "closer" | |
Or parse/all s1 [any [to http://"copy link any [TO </br> break |TO </div> break | skip] (print link)]]" - this *is* supported, but it does not do what you want; it finds the </br> even if </div> occurs "sooner" | |
Finally, which is the purpose of the SKIP keywork in this context ? - that is the easiest question. The expression any [end: </div> break | </br> break | skip] simply checks whether it "sees" the </div> terminator. If it does then the search for the terminator is over. If it does not then we check immediately whether we do not "see" the second possible terminator. However, if we are not at the terminator, both alternatives fail and the third alternative has to advance to the next position to be able to finally find the terminator. | |
(if you do not advance you cannot expect to find the terminator) | |
This may be a simpler/more understandable description of the idea: terminator: [</div> | </br>] find-terminator: [start: any [end: terminator break | skip] (contents: copy/part start end)] | |
The code is a simplification anyway. It does not work well when the rule is expected to fail at the tail of the input if the terminator was not found. That would require the REJECT keyword or a more complicated expression. | |
The simplest way how to write the FIND-TERMINATOR would be recursive: find-terminator: [terminator | skip find-terminator] However, this version is recursive, which means that it fails when the search is "long" exceeding the available stack size. | |
By "it fails" I mean that the recursive expression would not be able to find the terminator even if it were present when the length of the search would exceed the available stack size. | |
PeterWood 21-Jun-2012 [427] | Arnold: I believe that Rebol/View uses Windows Codepages under Windows, MacRoman on OS X and ISO-8859-1 on Linux. Sadly this means it only really supports true ASCII characterrs cross platform unless you manage encoding your self. |
GiuseppeC 22-Jun-2012 [428] | Ladislav, some questions are still open. I am currently remotely connected to my machine. I'll study your "lesson" tomorrow and I'll reply. |
Arnold 22-Jun-2012 [429x3] | Peter, seeing your conversion routine on rebol.org, you know the ins and outs ;-) All the special ones I need are in the next 127. On the windows they show up. That's one of the points to be taken into account for Red development. |
And knowing even this small community has less members then the diacrits they are using in everyday living it is a requirement to deal with UTF-8 UCS or other encodings. | |
Why isn't there to-dir when there is to-file? Would adding to-dir: :dirize do the trick of making a to-dir function? Why isn't it implemented as a standard? | |
BrianH 22-Jun-2012 [432x2] | Because all of the to-somedatatypeword functions are specifically only for datatype conversion, and we don't have a dir! or directory! type. |
It doesn't really matter though; these aren't keywords. If you want to-dir: :dirize in your own code, put it in. Or put it in your rebol.r file. | |
Evgeniy Philippov 22-Jun-2012 [434] | I've decided to start hacking about a GPL REBOL written in Squeak Smalltalk. The goal is to start crafting a REBOL's copy maximally close to the original REBOL2. Another goal is to study REBOL2 :) |
GrahamC 22-Jun-2012 [435x2] | You're kidding! |
We're going to have more forks than users | |
Evgeniy Philippov 22-Jun-2012 [437x2] | Children's adventures :) |
Who is writing another strict rebol copy? I could join in | |
PeterWood 22-Jun-2012 [439] | Brian (Hostile Fork) was interested in developing a strict REBOL clone though his proposal was to use a "modern" cross platform approach such as QT/C++ 11. |
Evgeniy Philippov 22-Jun-2012 [440x3] | That is an OK approach. I could join Brian if he is working on that. |
Brian, are you here? | |
What are your status re: strict rebol clone and plans about it? | |
GrahamC 22-Jun-2012 [443] | Brian left after a disagreement ... |
Evgeniy Philippov 22-Jun-2012 [444] | I currently read his blog. I am also liker of Robert Piersig's books, and that's fantastic about Brian... |
older newer | first last |