• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[Rebol School] REBOL School

Pekr
20-Jun-2012
[395x3]
I am somehow not able to load one czech text properly ....
I mean - text I need to input into the resulting file (UTF-8) is 
ANSI. I do print to-string read %text-slider.html, and in R3 console, 
Czech text is not correct ....
I'll try with some other version than rather old view.exe
Kaj
20-Jun-2012
[398]
The console may be broken. How about the actual text, in an editor?
Pekr
20-Jun-2012
[399]
in editor, it's correct. Simply put - I read czech text from an ansi 
file, and it is distorted in console, ditto when writing it back 
to file of course ....
Kaj
20-Jun-2012
[400x2]
When you cut and paste it from the console, or when you write it 
with REBOL?
So you're saying the input file is not UTF-8?
Pekr
20-Jun-2012
[402x2]
Yes, ANSI. I solved it by re-saving the same source file as UTF-8 
istead of ANSI. Still a bad complication, as by default, Windows 
sets Notepad to ANSI, so it is a bit inconvenient ...
I am surprised R3 is not able to properly read/decode ANSI file with 
Czech alphabet ...
Endo
20-Jun-2012
[404x4]
Guiseppe: "I am not ablie to understand the use of Break. Why it 
is useful ?"
I'll try to explain:

>> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy 
x any [".txt" | ".dat" | skip] (print x)]]

http://a.txthttp://b.dat;it prints just one line, from the first 
http:// to the last .dat


>> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy 
x any [".txt" break | ".dat" break | skip] (print x)]]

http://a.txt;now it works as expected, from http:// to .txt 
and breaks
http://b.dat;and from the next http:// to .dat
Guiseppe: "Could it be written as: ..."
TO ANY doesn't work.
but ANY [TO "..." BREAK | TO "..." BREAK] works.

just be careful using ANY and TO together, because they both don't 
advance the series pointer. So you can easily put the console in 
an infinit loop (escape key also doesn't work)
But still there is a problem in your example. Here I'll try to explain:


>> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy 
x any [thru ".txt" (print 1) break | thru ".dat" (print 2) break 
| skip (print 3)] (print x)]]
1
http://a.txt
2
http://b.dat


it looks correct. but actually it depends on which one is first (.txt 
or .dat)

here is the problem:

>> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy 
x [thru ".dat" (print 1) | thru ".txt" (print 2) | skip (print 3)] 
(print x)]]
1
http://a.txthttp://b.dat
hmm.. links look weird in AltME, select all text, copy and paste 
to a text editor to see it correctly.
BrianH
20-Jun-2012
[408]
Petr, R3 can't decode any 8bit encodings with its built-in code, 
just ASCII (which is 7bit) and UTF-8. However, its binary handling 
is better so it should be easy to write your own converters. For 
R2, I would suggest looking at Gabriele's PowerMezz package; it has 
some great text converters. Of course you lose out on R3's PARSE 
if you use R2.
Rebolek
21-Jun-2012
[409]
Pekr, look for Oldes' UTF8 package for Rebol 2, I believe it's on 
rebol.org, it can convert anything (it supports downloading code 
pages from net) to/from UTF8, I really saved me lot of time when 
I was working on translations for Windows Vista.
Pekr
21-Jun-2012
[410x2]
Rebolek - thanks, I forgot about it. I needed it only once in the 
past, and so I used iconv command line tool  via CALL ....
Recently I switched to R3, as I don't need the gui, just a script 
to do some webpage source code post-processing ....
GiuseppeC
21-Jun-2012
[412x3]
End I have tried. Exchanging the position of .txt and .dat I have 
only a single line. How it could be solved ?
Also, which is the purpose of the SKIP as third option ?
Also in Ladislav's example why there is only one END ?
Arnold
21-Jun-2012
[415]
On my mac the script I made on windows using a couple of international 
characters the chars are also displayed wrong. "Nederlands" "English" 
"Deutsch" "Français"

 "Español" "Italiano" "Português". When I saved as UTF-8 I hoped my 
 problems would have resolved, but then REBOL complained my script 
 had no REBOL header. :-(
Ladislav
21-Jun-2012
[416x11]
In Ladislav's examples I am not ablie to understand the use of Break. 
Why it is useful ?

 - in any [</div> break ...] the BREAK keyword stops searching for 
 the terminator when one was found (</div>). If you don't use BREAK, 
 you simply don't stop searching even if you already found the terminator.
Also in the second example why there isn't a 

end:" before "</div> break" ?" - it is because the first END: was 
already used and the position is remembered. (however, you can use 
end: twice if you like)
You should just remember that end: never fails, so the expression:

    end: </div> break | </br> break | ...

is equivalent to:

    end: [</div> break | </br> break ...]

, i.e., the end: part is known for all alternatives
'Could it be written as:

parse/all s1 [any [to "http://"copy link TO any [</br> break | </div> 
break | skip] (print link)]]' - no, since:

- TO ANY is not supported

- if it were supported it would not do what you want (you want to 
find the first terminator whatever it is, while TO ANY would find 
the </div> if it were in the input text even when a "closer" </br> 
would be "closer"
Or
parse/all s1 [any [to 

http://"copy link any [TO </br> break |TO  </div> break | skip] 
(print link)]]" - this *is* supported, but it does not do what you 
want; it finds the </br> even if </div> occurs "sooner"
Finally, which is the purpose of the SKIP keywork in this context 
?
 - that is the easiest question. The expression

    any [end: </div> break | </br> break | skip]


simply checks whether it "sees" the </div> terminator. If it does 
then the search for the terminator is over. If it does not then we 
check immediately whether we do not "see" the second possible terminator. 
However, if we are not at the terminator, both alternatives fail 
and the third alternative has to advance to the next position to 
be able to finally find the terminator.
(if you do not advance you cannot expect to find the terminator)
This may be a simpler/more understandable description of the idea:

    terminator: [</div> | </br>]

    find-terminator: [start: any [end: terminator break | skip] (contents: 
    copy/part start end)]
The code is a simplification anyway. It does not work well when the 
rule is expected to fail at the tail of the input if the terminator 
was not found. That would require the REJECT keyword or a more complicated 
expression.
The simplest way how to write the FIND-TERMINATOR would be recursive:

find-terminator: [terminator | skip find-terminator]


However, this version is recursive, which means that it fails when 
the search is "long" exceeding the available stack size.
By "it fails" I mean that the recursive expression would not be able 
to find the terminator even if it were present when the length of 
the search would exceed the available stack size.
PeterWood
21-Jun-2012
[427]
Arnold: I believe that Rebol/View uses Windows Codepages under Windows, 
MacRoman on OS X and ISO-8859-1 on  Linux. Sadly this means it only 
really supports true ASCII characterrs cross platform unless you 
manage encoding your self.
GiuseppeC
22-Jun-2012
[428]
Ladislav, some questions are still open. I am currently remotely 
connected to my machine. I'll study your "lesson" tomorrow and I'll 
reply.
Arnold
22-Jun-2012
[429x3]
Peter, seeing your conversion routine on rebol.org, you know the 
ins and outs ;-)

All the special ones I need are in the next 127. On the windows they 
show up. That's one of the points to be taken into account for Red 
development.
And knowing even this small community has less members then the diacrits 
they are using in everyday living it is a requirement to deal with 
UTF-8 UCS or other encodings.
Why isn't there to-dir when there is to-file? Would adding to-dir: 
:dirize do the trick of making a to-dir function? Why isn't it implemented 
as a standard?
BrianH
22-Jun-2012
[432x2]
Because all of the to-somedatatypeword functions are specifically 
only for datatype conversion, and we don't have a dir! or directory! 
type.
It doesn't really matter though; these aren't keywords. If you want 
to-dir: :dirize in your own code, put it in. Or put it in your rebol.r 
file.
Evgeniy Philippov
22-Jun-2012
[434]
I've decided to start hacking about a GPL REBOL written in Squeak 
Smalltalk. The goal is to start crafting a REBOL's copy maximally 
close to the original REBOL2. Another goal is to study REBOL2    
:)
GrahamC
22-Jun-2012
[435x2]
You're kidding!
We're going to have more forks than users
Evgeniy Philippov
22-Jun-2012
[437x2]
Children's adventures :)
Who is writing another strict rebol copy? I could join in
PeterWood
22-Jun-2012
[439]
Brian (Hostile Fork) was interested in developing a strict REBOL 
clone though his proposal was to use a "modern" cross platform approach 
such as QT/C++ 11.
Evgeniy Philippov
22-Jun-2012
[440x3]
That is an OK approach. I could join Brian if he is working on that.
Brian, are you here?
What are your status re: strict rebol clone and plans about it?
GrahamC
22-Jun-2012
[443]
Brian left after a disagreement ...
Evgeniy Philippov
22-Jun-2012
[444]
I currently read his blog. I am also liker of Robert Piersig's books, 
and that's fantastic about Brian...