r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Rebol School] Rebol School

Gregg
28-Jun-2007
[503]
Look at the binary/ascii value of those chars; what are they?
PatrickP61
28-Jun-2007
[504]
Gregg -- I dont know how to reveal the binary/ascii values of the 
file, but the spanish y looks like it may be hex FF.  Do you have 
rebol code that can convert the characters into hex?
Gregg
28-Jun-2007
[505x3]
By default, REBOL shows binary values as hex, but you can change 
to other bases. Check out enbase/debase also.

>> system/options/binary-base
== 16


>> s: "Gregg"
== "Gregg"
>> as-binary s
== #{4772656767}
>> system/options/binary-base: 2
== 2
>> as-binary s
== 2#{0100011101110010011001010110011101100111}
>> system/options/binary-base: 64
== 64
>> as-binary s
== 64#{R3JlZ2c=}
Notice the leading base value at the head of the binary! value.
PatrickP61
28-Jun-2007
[508x3]
Ok -- I think I have it:   my sample input is a two line text field 
in UNICODE like
Line1
Line2

as-binary InText shows #{FFFE4C0069006E00650031000A000A004C0069006E0065003200}
#{FFFE_4C00_6900_6E00_6500_3100_0A00_0A00_4C00_6900_6E00_6500_3200}

_ ___y____L___i_____n____e____1____?____?____L____i_____n____e____2
What are those questionmarks?
#{FF_4C_69_6E_65_31_0A_0A_4C_69_6E_65_32}  <-- this is what I get 
when I use the extract routine for InText

__y__L___i___n__e__1__?__?__L__i___n__e__2   <-- The extract is clearly 
NOT skipping the newline.
What do you think?
Sunanda
28-Jun-2007
[511]
FFFE is a "byte order mark" -- something that has been slipped in 
at the beginning of the file to indicate the file is in UTF-16, little 
endian format....If it started FEFF you'd have to extract all the 
other bytes. 

Looks like the original file (or whatever did the EBCDIC to UTF-16 
conversion on the AS400)  is using A0A0 to mean newline. You may 
need to clean those up by hand:
PatrickP61
28-Jun-2007
[512]
Hi Sunanda,  -- Thanks for your input on byte order mark.  Aside 
from that would you have any idea as to why the extract will not 
remove the second A0?  See notes above -- here is Greggs suggested 
code to convert UTF-16:

 InText: rejoin extract Read InFile 2    ; gets rid of every other 
 byte except newline.
Sunanda
28-Jun-2007
[513]
If I'm reading it right:
Your input has 
   _0A00_0A00_   -- two new lines
and your output has:
  _0A_0A_  -- two new lines

Extract won't affect that -- it simply takes every second byte of 
the input string, regardless of whether they are newlines or not.
Tomc
28-Jun-2007
[514x4]
>> system/options/binary-base: 16
== 16
>> as-binary "foo"
== #{666F6F}
>> system/options/binary-base: 4
== 4
>> as-binary "foo"
== #{666F6F}
>>
you cannot set any binary base ... no nibbles
nor bases higher than 16 ...
sigh
PatrickP61
28-Jun-2007
[518]
Sunanda -- Now I see what you are saying  -- Out of the 4 bytes A0 
00 A0 00, Extract did its job right by returning A0 A0 and got rid 
of the two 00!
Anton
29-Jun-2007
[519x2]
That's how it looks.
What's this "notebook" program ? You mean "notepad" (which does have 
option to save to unicode) ?
Gregg
29-Jun-2007
[521]
nor bases higher than 16 ...

 -- Except base64. I have some old base conversion code, and I think 
 Sunanda has some posted on REBOL.org as well, if you really need 
 to convert to intermediate bases.
Sunanda
29-Jun-2007
[522]
I have indeed:

http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=base-convert.r
Will handle integer <--> base conversions.
Up up base 36 out of the box
Up to base 255 if you adjust the configurable parameters:

http://www.rebol.org/cgi-bin/cgiwrap/rebol/documentation.r?script=base-convert.r#toc-19
PatrickP61
29-Jun-2007
[523]
my mistake  -- I mean Notepad  -- not Notebook
Anton
29-Jun-2007
[524]
:) ok
PatrickP61
2-Jul-2007
[525x4]
Question to all:   

If I have a block of data inside of In-text like this:
	Line A
	Line B
	Line C

How can I print the line number (position in the block) along with 
the contents of the line?               I tried this but it didn't 
work:
foreach Line In-text [ print rejoin	[ Count:	Count + 1	]	Line ]
Now that I think of it, I probably do not need to manuipulate a Count 
variable  -- I can probably use INDEX right?
I tried this out but not getting the results I wanted:

	Data:	head In-text
	while	[not tail? Data] [
			print	[index? Data Data ]
			Data:	next Data		]

I'm getting this:
	1 Line A Line B Line C
	2 Line B Line C
	3 Line C
Any suggestions?
Give me enough time, and I will figure it out --- :-)

Data:	head In-text
while	[not tail? Data] [
		print	[index? Data first Data ]
		Data:	next Data		]

Is there a better way to code this kind of thing?
Sunanda
2-Jul-2007
[529]
One way:
data: [a b c]
for n 1 length? data 1 [print [n data/:n]]
Brock
2-Jul-2007
[530x3]
>> blk: ["First line of data" "Second line of data" "Third line of 
data"]
>> while [not tail? blk][ print [index? blk first blk] blk: n
ext blk]
1 First line of data
2 Second line of data
3 Third line of data
blk: [
first line
second line
third line
]

while [not tail? blk][ print [index? blk first blk] blk: next blk]
Your first answer seems to work for me
PatrickP61
2-Jul-2007
[533x3]
My first attempt had   print [index? Data Data]   while the second 
attempt has   print [index? Data first Data]
The second one got the right part of the series
Sunanda  --  I like to see how to solve the same problem in different 
ways   thanks for the reply
Ashley
3-Jul-2007
[536]
for n 1 length? data 1
 -> "repeat n length? data"
PatrickP61
5-Jul-2007
[537]
Situation:	I want to read in an input file and parse it for some 
strings

Current:	My test code will do the parsing correctly IF the input 
block contains each line as a string

Problem:	When I try to run my code against the test file, It treats 
the contents of the file as a single string.

Question:	How do I have Rebol read in a file as one string per line 
instead of one string?
In-text:	[	"Line 1                        Page     1"
		"Line 2    Name      String-2"          
		"Line 4    Member    String-3 on 12/23/03"
		"Line 5    SEQNBR    abcdef               "                
		"Line 6       600    Desc 1 text         12/23/03"
		"Line 7      5400    Desc 2    Page 4    12/23/03"
		"Line 8    Number of records searched	]
 Get-page:		[thru "     Page "	copy Page-id to end]
 Get-file:		[thru "Name  "		copy Name-id to end]
 Get-member:	[thru "Member  "	copy Member-id to end]

 Page-id:	Name-id:	Member-id:	"-"

 for N 1 length? In-text 1 [
	parse In-text/:N	Get-page
	parse In-text/:N	Get-file
	parse In-text/:N	Get-member
	] 
 print	[ "Page"	Page-id		]
 print	[ "Name"	Name-id		]
 print	[ "Member"	Member-id	]
Sunanda
5-Jul-2007
[538]
Try
  in-text: read/lines %file-name
PatrickP61
5-Jul-2007
[539x2]
Thank you Sunanda -- That did work, but I thought Read/Lines would 
return a single line  -- no maybe that is Read/Line without the s 
 -- is that right?
In my example above, I have three parse rules defined.  I need to 
add several more.

Does the PARSE process the string once per rule?  i.e. Does it scan 
the string for Get-page, then Get-file, then Get-member (scan the 
string 3 times),  Or can I structure the pase rules together to process 
against the string once?
Tomc
5-Jul-2007
[541x2]
if your page,name & member always exist and are in that order ...

parse/all read %file [
 some [ 
	thru "Page " copy token integer! (print ["Page" token]) 
	thru "Name " copy token to newline(print ["Name" token])
	thru "Member " copy token to newline (print ["Member" token])
	]
]
snd the keywords only exist as key words
PatrickP61
5-Jul-2007
[543x2]
Tomc  -- This version means that I need to have the entire file read 
in as a string -- Not with Read/Lines -- Because the newline will 
the the "delimiter" within the string while the Read/Lines will delimit 
each newline to a separate string inside a block.  Do I have that 
right?
My Page, Name, & Member is always in the same order on separate pages 
within a file.  like so:
Line 1     Page 1
Line 2     Name
Line 3     Member
Line n...  Member
Line 50  Member
Line 51  Page   2
Line 52  Name
Line 53  Member
Line 54  Member
...
Sunanda
6-Jul-2007
[545]
Not sure this is a case for parse......You seem to have four types 
of line:
-- those with "page" in a specific location on the line
-- those with "name" in a specific location on the line
-- those with "member" in a specific location on the line

-- others which are to be ignored .... eg your orginal line 6 "Line 
6       600    Desc 1 text         12/23/03"

What I would do is:
* use read/lines to get a block

* for each line in the block, identify what record type it is by 
the fixed literal .... something like: if "page" = copy/part skip 
line 25 4 [....]

* perhaps use parse to extract the items I need, once I know the 
line type
***

If you just use parse in the way you propose, you run the risk of 
mis-identifying lines when there is a member called "page" or "name"
PatrickP61
6-Jul-2007
[546x4]
Thank you Sunanda -- I will give that a try.


Just to let you know -- My goal is to convert a printable report 
that is in a file into a spreadsheet.
Some fields will only appear once per page like PAGE.

Some fields could appear in a new section of the page multiple times 
like NAME in my example.
And some fields could appear many times per section like MEMBER:
_______________________
Page header          PAGE     1
Section header     NAME1.1
Detail lines            MEMBER1.1.1
Detail lines            MEMBER1.1.2
Section header     NAME1.2
Detail lines            MEMBER1.2.1
Detail lines            MEMBER1.2.2
Page header         PAGE    2
(repeat of above)____________


I want to create a spreadsheet that takes different capturable fields 
and place them on the same line as the detail lines like so...
______________________
Page   Name       Member
1          NAME1.1  MEMBER1.1.1
1          NAME1.1  MEMBER1.1.2
1          NAME1.2  MEMBER1.2.1
1          NAME1.2  MEMBER1.2.2

2          NAME2.1  MEMBER2.1.1  ...    (the version numbers are 
simply a way to relay which captured field I am referring to (Page, 
Name, Member)


Anyway -- that is my goal.  I have figured out how to do the looping, 
and can identify the record types, but you are right about the possiblity 
of mis-identifying lines.
This is my pseudocode approach:


New page is identified by a page header text that is the same on 
each page and the word PAGE at the end of the line

New section is identified by a section header text that is the same 
within the page and the text "NAME . . . . :"

Members lines do not have an identifying mark on the line but are 
always preceeded by the NAME line.

Member line continue until a new page is found, or the words "END 
OF NAME" is found (which I didnt show in my example above).


Initialize capture fields to -null-     like PAGE, NAME
Initialize OUTPUT-FLAG to OFF.

Loop through each line of the input file until end of file EOF.
/|\	If at a New-page line
 |	or at end of Name section
 |		Set OUTPUT-FLAG  OFF
 |	If OUTPUT-FLAG  ON

 |		Format output record from captured fields and current line (MEMBER)
 |		Write output record
 |	IF at New Name line
 |		Set OUTPUT-FLAG ON
 |	IF OUTPUT-FLAG OFF
 |		Get capture fields like PAGE-NUMBER when at a PAGE line
 |		Get NAME when at a NAME line.
 |____	Next line in the file.
Note to all -- Please realize this is a simplified version of the 
real report -- There are many more fields and other things to code 
for, but they are all similar items to the example PAGE, NAME, and 
MEMBER fields.
Oops -- I should put the IF at New Name line at the end of the loop, 
or put the capture of the name in that part.
Tomc
7-Jul-2007
[550]
Yes Patrick you have it right. The rules I gave would fail 
since you have multiple names/members

I would try to get away from the line by line mentality 
and try to break it into your conceptual record groupings
file, pages, sections, and details...

One trick I use is to replace a string delimiter for a record 
with a single char so parse returns a block of that record type. 

this is good because then when you work on each item in the block 
in turn
you know any fields you find do belong to this record and that 

you have not accidently skipped to a similar field in a later record.

something like this 


pages: read %file
replace/all/case pages "PAGE" "^L"
pages: parse/all pages "^L"

foreach page pages[
	p: first page
	page: find page newline
	replace/all/case page "NAME" "^L"
	sections: parse page "^L"
	foreach sec section [
		s: first section
		sec: find sec newline
		parse sec [
			any [thru "Member" copy detail to newline 
				newline (print [p tab s tab detail])
			]
		]
	]
]
PatrickP61
18-Jul-2007
[551x2]
I am a little confused about PORTS.  I want to control how much information 
is loaded into a block but I am not sure how to determine if data 
remains in a port.  Example:
In-port:	open/lines In-file
 while [not tail? In-port] [
	print In-port
	In-port:	next In-port
	]

 close In-port