World: r4wp
[Rebol School] REBOL School
older newer | first last |
GrahamC 19-Jun-2012 [376] | you'd have to use compose on mystring |
sqlab 19-Jun-2012 [377] | it works with global variables. maybe your variable is not visible to parse >> mys: "ac" == "ac" >> parse " abacbac ba" [any [to mys copy s to "b" (probe s)]] ac ac == false |
GiuseppeC 19-Jun-2012 [378] | How do I parse the same data with different ending ? I could have http://myfile.txt</BR> http://myfile.txt</DIV> I need something like PARSE mystring [copy link to [</ BR>|</ DIV>] |
Endo 19-Jun-2012 [379] | to doesn't accept block, so put "to" inside the block: s1: {http://myfile.txt</br>} s2: {http://myfile.txt</div>} parse s1 [copy link [to </br> | to </div>] (print link)] ;works parse s2 [copy link [to </br> | to </div>] (print link)] ;works too |
GiuseppeC 19-Jun-2012 [380] | I need something like the following parse s1 [any [to "http://"copy link [to </br>|to </div>]] Is it possible ? |
Ladislav 19-Jun-2012 [381x3] | Of course it is possible, if I understand if well what you want, is: s1: "a http://xxx</div>b http://yyy</br>" parse/all s1 [any [to "http://"copy link any [</br> break | </div> break | skip] (print link)]] |
If you do want to leave out the </br> and </div> substrings, the simplest way probably is: s1: "a http://xxx</div>b http://yyy</br>" parse/all s1 [any [to "http://"start: any [end: </br> break | </div> break | skip] (print copy/part start end)]] | |
Also note that it is easy to replace the </br> or </div> subrule by a more complicated subrule | |
GiuseppeC 19-Jun-2012 [384] | Thanks God ! ;-) |
Endo 20-Jun-2012 [385x2] | There is no documentaion about BREAK in PARSE (for R2), so it is always difficult to remember for me. Thanks Ladislav. |
Guiseppe: if you didn't read this before, here is a very good article: http://www.codeconscious.com/rebol/parse-tutorial.html other articles are also great, take a look at them all. | |
GiuseppeC 20-Jun-2012 [387x2] | In Ladislav's examples I am not ablie to understand the use of Break. Why it is useful ? Also in the second example why there isn't a "end:" before "</div> break" ? |
Also: parse/all s1 [any [to "http://"copy link any [</br> break | </div> break | skip] (print link)]] Could it be written as: parse/all s1 [any [to "http://"copy link TO any [</br> break | </div> break | skip] (print link)]] Or parse/all s1 [any [to "http://"copy link any [TO </br> break |TO </div> break | skip] (print link)]] Finally, which is the purpose of the SKIP keywork in this context ? | |
Pekr 20-Jun-2012 [389x3] | I use Artisteer to prototype web pages, and it saves content in UTF-8. Later on, I need to do few adaptations to such generated pages, so I opened it in R2, reparsed, inserted some stuff, deleted other, but it did not work out .... |
What are my options, apart from doing it in R3? | |
Use some external tool to convert it to ANSI, do adaptations, and covert it back to UTF-8? | |
Kaj 20-Jun-2012 [392] | Why don't you want to do it in R3? That's the obvious solution |
Pekr 20-Jun-2012 [393] | I am trying now. I somehow lost interest in R3, as it is non-finished, and dead product. But probably still easier than to use iconv together with R2, although I did it in the past that way, using CALL |
Kaj 20-Jun-2012 [394] | One of the few advantages of R3 is processing Unicode. It fixed the Russian Syllable website |
Pekr 20-Jun-2012 [395x3] | I am somehow not able to load one czech text properly .... |
I mean - text I need to input into the resulting file (UTF-8) is ANSI. I do print to-string read %text-slider.html, and in R3 console, Czech text is not correct .... | |
I'll try with some other version than rather old view.exe | |
Kaj 20-Jun-2012 [398] | The console may be broken. How about the actual text, in an editor? |
Pekr 20-Jun-2012 [399] | in editor, it's correct. Simply put - I read czech text from an ansi file, and it is distorted in console, ditto when writing it back to file of course .... |
Kaj 20-Jun-2012 [400x2] | When you cut and paste it from the console, or when you write it with REBOL? |
So you're saying the input file is not UTF-8? | |
Pekr 20-Jun-2012 [402x2] | Yes, ANSI. I solved it by re-saving the same source file as UTF-8 istead of ANSI. Still a bad complication, as by default, Windows sets Notepad to ANSI, so it is a bit inconvenient ... |
I am surprised R3 is not able to properly read/decode ANSI file with Czech alphabet ... | |
Endo 20-Jun-2012 [404x4] | Guiseppe: "I am not ablie to understand the use of Break. Why it is useful ?" I'll try to explain: >> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy x any [".txt" | ".dat" | skip] (print x)]] http://a.txthttp://b.dat;it prints just one line, from the first http:// to the last .dat >> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy x any [".txt" break | ".dat" break | skip] (print x)]] http://a.txt;now it works as expected, from http:// to .txt and breaks http://b.dat;and from the next http:// to .dat |
Guiseppe: "Could it be written as: ..." TO ANY doesn't work. but ANY [TO "..." BREAK | TO "..." BREAK] works. just be careful using ANY and TO together, because they both don't advance the series pointer. So you can easily put the console in an infinit loop (escape key also doesn't work) | |
But still there is a problem in your example. Here I'll try to explain: >> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy x any [thru ".txt" (print 1) break | thru ".dat" (print 2) break | skip (print 3)] (print x)]] 1 http://a.txt 2 http://b.dat it looks correct. but actually it depends on which one is first (.txt or .dat) here is the problem: >> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy x [thru ".dat" (print 1) | thru ".txt" (print 2) | skip (print 3)] (print x)]] 1 http://a.txthttp://b.dat | |
hmm.. links look weird in AltME, select all text, copy and paste to a text editor to see it correctly. | |
BrianH 20-Jun-2012 [408] | Petr, R3 can't decode any 8bit encodings with its built-in code, just ASCII (which is 7bit) and UTF-8. However, its binary handling is better so it should be easy to write your own converters. For R2, I would suggest looking at Gabriele's PowerMezz package; it has some great text converters. Of course you lose out on R3's PARSE if you use R2. |
Rebolek 21-Jun-2012 [409] | Pekr, look for Oldes' UTF8 package for Rebol 2, I believe it's on rebol.org, it can convert anything (it supports downloading code pages from net) to/from UTF8, I really saved me lot of time when I was working on translations for Windows Vista. |
Pekr 21-Jun-2012 [410x2] | Rebolek - thanks, I forgot about it. I needed it only once in the past, and so I used iconv command line tool via CALL .... |
Recently I switched to R3, as I don't need the gui, just a script to do some webpage source code post-processing .... | |
GiuseppeC 21-Jun-2012 [412x3] | End I have tried. Exchanging the position of .txt and .dat I have only a single line. How it could be solved ? |
Also, which is the purpose of the SKIP as third option ? | |
Also in Ladislav's example why there is only one END ? | |
Arnold 21-Jun-2012 [415] | On my mac the script I made on windows using a couple of international characters the chars are also displayed wrong. "Nederlands" "English" "Deutsch" "Français" "Español" "Italiano" "Português". When I saved as UTF-8 I hoped my problems would have resolved, but then REBOL complained my script had no REBOL header. :-( |
Ladislav 21-Jun-2012 [416x10] | In Ladislav's examples I am not ablie to understand the use of Break. Why it is useful ? - in any [</div> break ...] the BREAK keyword stops searching for the terminator when one was found (</div>). If you don't use BREAK, you simply don't stop searching even if you already found the terminator. |
Also in the second example why there isn't a end:" before "</div> break" ?" - it is because the first END: was already used and the position is remembered. (however, you can use end: twice if you like) | |
You should just remember that end: never fails, so the expression: end: </div> break | </br> break | ... is equivalent to: end: [</div> break | </br> break ...] , i.e., the end: part is known for all alternatives | |
'Could it be written as: parse/all s1 [any [to "http://"copy link TO any [</br> break | </div> break | skip] (print link)]]' - no, since: - TO ANY is not supported - if it were supported it would not do what you want (you want to find the first terminator whatever it is, while TO ANY would find the </div> if it were in the input text even when a "closer" </br> would be "closer" | |
Or parse/all s1 [any [to http://"copy link any [TO </br> break |TO </div> break | skip] (print link)]]" - this *is* supported, but it does not do what you want; it finds the </br> even if </div> occurs "sooner" | |
Finally, which is the purpose of the SKIP keywork in this context ? - that is the easiest question. The expression any [end: </div> break | </br> break | skip] simply checks whether it "sees" the </div> terminator. If it does then the search for the terminator is over. If it does not then we check immediately whether we do not "see" the second possible terminator. However, if we are not at the terminator, both alternatives fail and the third alternative has to advance to the next position to be able to finally find the terminator. | |
(if you do not advance you cannot expect to find the terminator) | |
This may be a simpler/more understandable description of the idea: terminator: [</div> | </br>] find-terminator: [start: any [end: terminator break | skip] (contents: copy/part start end)] | |
The code is a simplification anyway. It does not work well when the rule is expected to fail at the tail of the input if the terminator was not found. That would require the REJECT keyword or a more complicated expression. | |
The simplest way how to write the FIND-TERMINATOR would be recursive: find-terminator: [terminator | skip find-terminator] However, this version is recursive, which means that it fails when the search is "long" exceeding the available stack size. | |
older newer | first last |