World: r3wp

Join the discussions in the REBOL3 world...

[Rebol School] Rebol School

older newer	first last
Geomol 26-Jun-2007 [472]	You could also go for a combination with one little string, that you change (by putting in the number) and print.
PatrickP61 26-Jun-2007 [473]	I'll see what I do with it ...
Geomol 26-Jun-2007 [474x3]	I think, Volker meant, you should make one large ruler of 125 chars.
	This is a way without copies: str: "----+-----" for Count 10 125 10 [change skip str either Count < 100 [8][7] Count prin str]
	Do you follow the code?
PatrickP61 26-Jun-2007 [477]	Hi Geomol, I just signed on. Will try out the code later -- many thanks
Gabriele 27-Jun-2007 [478x2]	prin ["---" count] not print "---" count
Gabriele 27-Jun-2007 [478x2]	prin will insert a space though, so you may want to do print join "---" count instead.
PatrickP61 27-Jun-2007 [480]	Hi All, Have any Rebolers dealt with UniCode files? Here is my situation. I work on an IBM AS400 that can "port" files over to the PC. Notebook opens it up just fine, but Rebol doesn't see it the same way. If I Cut & Paste the contents of the file into an empty notebook and save it, Rebol can see it just fine. Upon further study, I noticed at the bottom of the SAVE AS window that Encoding was set to UNICODE for the AS400 file, while the cut & paste one was set to ANSI. Does Rebol want ANSI text files only, or can it read UNICODE files too?
Geomol 27-Jun-2007 [481]	I guess, you have to convert it. I've once build a RebXML format, that could be transfered to/from XML. I can handle utf-8. You can find code to convert from utf-8 here: http://home.tiscali.dk/john.niclasen/rebxml/xml2rebxml.r (search for unicode) The other way can be found here: http://home.tiscali.dk/john.niclasen/rebxml/rebxml2xml.r (search for iso2utf-8)
PatrickP61 27-Jun-2007 [482]	Thanks Geomol, Since I am a newbie, I can easily resave the files as ANSI instead of UNICODE and avoid the conversion problem, at least in the short term. Once I get my "Convert to Table" program working, then I can look at your links to convert from UNICODE.
Gregg 27-Jun-2007 [483x2]	rejoin extract my-unicode-string 2
Gregg 27-Jun-2007 [483x2]	Obviously simplistic, just throwing away they extra byte for each char.
PatrickP61 27-Jun-2007 [485]	Hi Gregg -- So should I do something like this: InText: rejoin extract Read InFile 2
Gregg 27-Jun-2007 [486]	Try it in the console and see what you get. The console is your friend. :-)
PatrickP61 27-Jun-2007 [487]	It works!!!! Code to convert UNICODE to InFile: %"Test In unicode.txt" InText: rejoin extract Read InFile 2 write OutFile InText
Geomol 27-Jun-2007 [488]	I'm not too much into unicode. Is that utf-16, where every char is 2 byte? I think, my scripts can only handle utf-8.
PatrickP61 27-Jun-2007 [489x2]	When you try to save a document under Notebook, the encoding choices are UTF-8, UNICODE, ANSI among others. UNICODE may be the same as UTF-16 because it does look like every single character is saved as two bytes. The code (rejoin extract read InFile 2) does eliminate the double characters but I noticed that the entire file is still double spaced -- as if the newline is coded twice and not removed from the rejoin. But that extra newline may be an annoyance than anything else.
PatrickP61 27-Jun-2007 [489x2]	Hello my teachers. Is there a more elegant way to create a ruler than this in rebol... Str7: Str8: "" Ruler: rejoin [ for Count 10 90 10 [ Str8: rejoin [ Str8 "....+..." Count ] ] for Count 100 250 10 [ Str7: rejoin [ Str7 "....+.." Count ] ] ] print Ruler
Gregg 28-Jun-2007 [491x3]	I don't know about more elegant, but here's a func, just for fun.
	make-ruler: func [count /local res str-ct offset] [ res: head insert/dup copy "" "....+....+" count repeat ct count [ str-ct: form ct * 10 offset: subtract length? str-ct 1 change at res ct * 10 - offset str-ct ] res ]
	To match your ruler, do: make-ruler 25
PhilB 28-Jun-2007 [494]	Patrick ... on your AS400 problem .... how is the data transferred to the PC? Is it directly from an AS400 file via the data transfer utility built into, or is it a file from the IFS ? (I have used Rebol to read data transferred from an AS400 and didnt get the data as unicode.)
PatrickP61 28-Jun-2007 [495]	Hi PhilB -- The formatted text report is generated on the AS400 into the work spool area. I then can use the INavigator software on the PC to connect to it and drag and drop it on the PC, where I can look at it via Word or Notebook. I'm not sure where the encoding to UniCode is happening -- I suspect the INavigator software, but then, it may not be an issue since Rebol can convert it to readable text, even with the extra newline between each line, I'm sure that annoyance can be overcome too.
Anton 28-Jun-2007 [496x3]	Patrick, on the double newlines. Can you inspect the result of read InFile ? How many newlines are present at that point ? Useful rebol words: NEWLINE ; this is the newline character that rebol uses CR ; carriage return character LF ; linefeed character CRLF ; both CR and LF in a string
	There is READ and READ/BINARY READ is text mode and translates line terminators automatically from the target system into rebol's format, which is the same as unix (using LF).
	I don't think EXTRACT is at fault, it does a very simple job, getting every second character.
PatrickP61 28-Jun-2007 [499x4]	Hi Anton -- This is my simulated input for a unicode text file: Line1...10....+...20....+...30....+...40....+...50 Line2...10....+...20....+...30....+...40....+...50 If I run this code: InFile: %"Small In unicode.txt" InText: rejoin extract read InFile 2 ; Convert from UNICODE to ANSI but keeps double spacing. OutFile: %"Test Out.txt" write OutFile InText print InText I get these results �Line1...10....+...20....+...30....+...40....+...50 Line2...10....+...20....+...30....+...40....+...50 I get them in the output file when I use the Rebol editor, and in notebook (when I open the file) and I get them in console when PRINT InText.
	Notice the spanish y at the beginning of the output
	At first, I thought it just be some stray bytes comming from the AS400, but I was able to re-create a file using Notebook and get same results. Any of you should be able to test this out by: 1. Open Notebook 2. Type in some text 3. Save the file with Encoding to UNICODE
	Anton, Is it possible that Rebol is interpreting the CRLF as newline newline when dealing with unicode files?
Gregg 28-Jun-2007 [503]	Look at the binary/ascii value of those chars; what are they?
PatrickP61 28-Jun-2007 [504]	Gregg -- I dont know how to reveal the binary/ascii values of the file, but the spanish y looks like it may be hex FF. Do you have rebol code that can convert the characters into hex?
Gregg 28-Jun-2007 [505x3]	By default, REBOL shows binary values as hex, but you can change to other bases. Check out enbase/debase also. >> system/options/binary-base == 16 >> s: "Gregg" == "Gregg" >> as-binary s == #{4772656767}
	>> system/options/binary-base: 2 == 2 >> as-binary s == 2#{0100011101110010011001010110011101100111} >> system/options/binary-base: 64 == 64 >> as-binary s == 64#{R3JlZ2c=}
	Notice the leading base value at the head of the binary! value.
PatrickP61 28-Jun-2007 [508x3]	Ok -- I think I have it: my sample input is a two line text field in UNICODE like Line1 Line2 as-binary InText shows #{FFFE4C0069006E00650031000A000A004C0069006E0065003200}
	#{FFFE_4C00_6900_6E00_6500_3100_0A00_0A00_4C00_6900_6E00_6500_3200} _ ___y____L___i_____n____e____1____?____?____L____i_____n____e____2 What are those questionmarks?
	#{FF_4C_69_6E_65_31_0A_0A_4C_69_6E_65_32} <-- this is what I get when I use the extract routine for InText __y__L___i___n__e__1__?__?__L__i___n__e__2 <-- The extract is clearly NOT skipping the newline. What do you think?
Sunanda 28-Jun-2007 [511]	FFFE is a "byte order mark" -- something that has been slipped in at the beginning of the file to indicate the file is in UTF-16, little endian format....If it started FEFF you'd have to extract all the other bytes. Looks like the original file (or whatever did the EBCDIC to UTF-16 conversion on the AS400) is using A0A0 to mean newline. You may need to clean those up by hand:
PatrickP61 28-Jun-2007 [512]	Hi Sunanda, -- Thanks for your input on byte order mark. Aside from that would you have any idea as to why the extract will not remove the second A0? See notes above -- here is Greggs suggested code to convert UTF-16: InText: rejoin extract Read InFile 2 ; gets rid of every other byte except newline.
Sunanda 28-Jun-2007 [513]	If I'm reading it right: Your input has _0A00_0A00_ -- two new lines and your output has: _0A_0A_ -- two new lines Extract won't affect that -- it simply takes every second byte of the input string, regardless of whether they are newlines or not.
Tomc 28-Jun-2007 [514x4]	>> system/options/binary-base: 16 == 16 >> as-binary "foo" == #{666F6F} >> system/options/binary-base: 4 == 4 >> as-binary "foo" == #{666F6F} >>
	you cannot set any binary base ... no nibbles
	nor bases higher than 16 ...
	sigh
PatrickP61 28-Jun-2007 [518]	Sunanda -- Now I see what you are saying -- Out of the 4 bytes A0 00 A0 00, Extract did its job right by returning A0 A0 and got rid of the two 00!
Anton 29-Jun-2007 [519x2]	That's how it looks.
Anton 29-Jun-2007 [519x2]	What's this "notebook" program ? You mean "notepad" (which does have option to save to unicode) ?
Gregg 29-Jun-2007 [521]	nor bases higher than 16 ... -- Except base64. I have some old base conversion code, and I think Sunanda has some posted on REBOL.org as well, if you really need to convert to intermediate bases.
older newer	first last