areas, text, and read/binary
[1/10] from: reboler::programmer::net at: 11-Feb-2002 14:02
I am reading file contents with 'read/binary, because the file may be text or an image.
If the file is text then I display those contents in an 'area with 'to-string contents.
I can edit those contents, and then would like to save the changed contents (my-area/text).
Here is the problem: the line terminators in the saved text seem to multiply.
i.e. with each new save of the text an additional line terminator is added to the end
of each line.
Is there a way to avoid this?
Is it something unique to the way 'areas handle, maybe to do with 'wrap?.
I have tried various combinations of wrap?: true/false, write/string etc, and trim/auto
etc with no success.
Keep in mind that for reasons not mentioned here I would always like to 'read/binary
to get the file contents, whether text file or image file. (I know this problem disappears
if I just 'read text files, but it makes the script very much more complicated in several
places).
Problem in short: Can I read a file as binary, edit it's contents with 'to-string in
an area/text, and then write it without REBOL adding extraneous line-terminators?
[2/10] from: brett:codeconscious at: 12-Feb-2002 11:24
> I am reading file contents with 'read/binary, because the file may be text
or an image.
> If the file is text then I display those contents in an 'area with
'to-string contents.
> I can edit those contents, and then would like to save the changed
contents (my-area/text).
> Here is the problem: the line terminators in the saved text seem to
multiply.
> i.e. with each new save of the text an additional line terminator is added
to the end of each line.
I couldn't duplicate your problem. What operating system are you using -
what is the normal line termination
for your text files?
> Keep in mind that for reasons not mentioned here I would always like to
'read/binary to get the file contents, whether text file or image file. (I
know this problem disappears if I just 'read text files, but it makes the
script very much more complicated in several places).
READ without the binary refinement is doing real work for you. If you choose
to bypass this work using the /binary refinement you'll just have to do it
yourself elsewhere. The Core user guide says:
"When a file is read as text, all line terminators are converted to
newline (line feed)
characters. Line feeds (used as line terminators on Amiga, Linux, and
UNIX
systems), carriage returns (used as line terminators on Macintosh), and
the CR/LF
combination (PC and Internet) are all converted to the equivalent
newline
characters."
"Using a standard line terminator within scripts allows them to operate
in a
machine-independent fashion."
So REBOL defines newline as the line terminator. Let's say your OS uses
CR/LF as the line terminator. What
then does AREA get to work on. The code in REBOL is expecting just a LF so
how will it treat CR - I doubt
that it will treat as part of the line termination - it will be treated as
another ordinary text char. In this case you
will end up editing it out when using AREA. So now your text inconsistent
when you go to save via /binary or not.
If use /binary when you write you will have broken line termination (missing
CRs), if you don't use /binary you will
have extra CR's (the one you didn't edit out).
Worse, you application will not work across platforms even if you get it
working on one.
My understanding is that you are doing this to avoid the work of treating
text and binary files differently when calling READ.
I had the same problem when uploading my site to the web using FTP. My
solution was to define a function that decided what was appropriate, /binary
or not, depending ultimately on the extension of the file. I actually
defined my own scheme for mapping file extensions to mime types:
txt --> text/plain
htm --> text/html
html --> text/html
jpg --> image/jpeg
jpeg --> image/jpeg
etc.
It would be nice if this mapping facility came with Rebol.
Brett.
[3/10] from: joel:neely:fedex at: 11-Feb-2002 20:53
Hi, Alan,
alan parman wrote:
> Here is the problem: the line terminators in the saved text seem
> to multiply. i.e. with each new save of the text an additional
> line terminator is added to the end of each line.
>
> Is there a way to avoid this?
Have you tried WRITE/BINARY to put the text back to a file?
-jn-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[4/10] from: reboler:programmer at: 12-Feb-2002 12:39
Thanks, Brett and Joel, for your responses.
Here is a script to play around with so you can see what I mean.
You may need to look at this with a text viewer (such as NotePad) to see the changes.
Watch the file size change as you read/write.
;***** cut and paste the following into the console
write %test.txt {
test text
line 2,
line 3
line 4
write and read this file several times
while adding some text to the middle and end of lines
and then open in a text editor like NotePad
line 8
.
line 10
}
rebol []
file: %test.txt
main: layout [
across
my-area: area
return
text "Change what the Read button does, here"
return
read-field: field 400 "append clear my-area/text to-string contents: read/binary file"
button "Read File" [
do read-field/text
show my-area
]
return
text "Change what the Write button does, here"
return
write-field: field 400 "write/string file my-area/text" ;*****
button "Write File" [
do write-field/text
clear my-area/text
show my-area
sz: length? read file
append clear st/text rejoin ["size of file " length? read file]
show st
]
return
st: text rejoin ["size of file " length? read file]
return
button "Convert Lines" [write file read file]
text {This button does "write file read file"}
]
view main
[5/10] from: joel:neely:fedex at: 12-Feb-2002 13:02
Hi, Alan,
alan parman wrote:
> Thanks, Brett and Joel, for your responses.
>
> Here is a script to play around with so you can see what I mean.
>
As I asked in my earlier note, have you tried WRITE/BINARY
instead of WRITE/STRING ? After making that substitution in your
demo, I could repeatedly write and read the file without change.
The simple rule (AFAICT) is to "match" the modes of your reading
and writing (binary vs non-binary).
Hope this helps!
-jn-
[6/10] from: reboler:programmer at: 12-Feb-2002 14:55
Thanks, Joel.
Yes I have tried write/binary.
Try _changing_ some of the text then write/binary then look at the text in a NON-REBOL
text editor (try nNotePad)
You will see that the lines you changed end differently than the unchanged lines (at
least on a Windoze box).
I have found that if I write/binary and then convert-lines it is ok.
I am wondering if the read portion could be changed. Perhaps 'to-string is not the best
way to go?
[7/10] from: joel:neely:fedex at: 12-Feb-2002 15:36
Hi, Alan,
alan parman wrote:
> Thanks, Joel.
> Yes I have tried write/binary.
> Try _changing_ some of the text then write/binary then look at
> the text in a NON-REBOL text editor (try nNotePad)
>
I did.
> You will see that the lines you changed end differently than the
> unchanged lines (at least on a Windoze box).
>
They didn't (on w2000).
> I have found that if I write/binary and then convert-lines it is ok.
>
> I am wondering if the read portion could be changed. Perhaps
> 'to-string is not the best way to go?
>
OK. I'm officially puzzled.
-jn-
[8/10] from: brett:codeconscious at: 13-Feb-2002 17:14
Hi Alan, Joel,
Alan your demo script proves to me at least what I said before.
> I have found that if I write/binary and then convert-lines it is ok.
Actually no it is not. If you read/binary, add some text after the new
space
, save it with write/binary, convert-lines, then read it in again,
you'll see that your new text dropped to the next line.
To demonstrate what is happening add these two functions to your demo
script:
tag-crlf: func[ data [string! binary!]][
replace/all replace/all data CR "<CR>" NEWLINE "<LF>"
]
detag-crlf: func[ data [string! binary!]][
replace/all replace/all data "<CR>" CR "<LF>" NEWLINE
]
Then add these two buttons:
button "tag text" [tag-crlf my-area/text show my-area]
button "detag text" [detag-crlf my-area/text show my-area]
Play with these buttons and see how you are dealing with Carriage Return
characters (CR) and Line Feed (LF), with the various write, read and edit
combinations.
The point to understand is that Linefeed / LF / NEWLINE is the delimiter
between lines in REBOL. Text files in Windows, MS-DOS, are delimited by two
characters CR+LF.
The following statements apply to the windows platform (I haven't used
Win2000, nor XP nor CE.).
1) When Notepad sees CRLF in some text it breaks the text at that point and
puts the following text on the next line.
2) When a REBOL AREA sees a LF it does the same.
3) AREA ignores CR.
Try this code in your console - note that 0D is the hex representation for
CR and 0A for LF:
possibilities: [#{0a} #{0d} #{0a0d} #{0d0a}]
repeat p possibilities [
print "**********************************"
print ["Possibility: " mold p]
print ["write/binary %test.dat"]
write/binary %test.dat p
print ["write %test.txt"]
write %test.txt p
print [
"Read/binary %test.dat: "
mold r: read/binary %test.dat
either equal? p r [""]["<-- Changed"]
]
print ["Read/binary %test.txt: "
mold r: read/binary %test.txt
either equal? p r [""]["<-- Changed"]
]
print [
"Read %test.dat: " mold
r: to-binary read %test.dat
either equal? p r [""]["<-- Changed"]
]
print [
"Read %test.txt: "
mold r: to-binary read %test.txt
either equal? p r [""]["<-- Changed"]
]
]
Regards,
Brett.
[9/10] from: al:bri:xtra at: 13-Feb-2002 21:47
> 1) When Notepad sees CRLF in some text it breaks the text at that point
and puts the following text on the next line.
Note that Notepad behaves oddly with just a LF as line terminators.
Andrew Martin
ICQ: 26227169 http://valley.150m.com/
[10/10] from: reboler:programmer at: 13-Feb-2002 9:34
Thanks all (esp. Brett) for your input.
I think I understand enough now to better ask the proper question:
Is there a way to process a file that has been opened with 'read/binary, as if you had
opened it as plain 'read?
I was hoping that 'to-string - ing the file would do it, but it is not.
------ some background ----- (only for the truly interested :) )
My script reads all types of files and displays them in their appropriate format (images
as pictures, .txt and .r as text, etc)
or allows you to view them in other formats (hex, base-64, picture-as-text, rot-13, etc).
It also allows you to edit/compress/encrypt them, and view them normally even while compressed/encrypted.
You can also copy/move/delete. And there is where the problem is: I would like to do
these things with the option of NOT changing the original operating system format. In
other words at the time of the 'read I don't know what I am going to do with file, I
may or may not change it, I may or may not move it somewhere else. For example, I might
want to edit and move a text file WITHOUT changing it's line terminators.
Or I might want to move an encrypted text file without changing anything (so it can still
be decrypted).
Also, because of the editing capabilities of the script (and for speed), I am trying
to avoid doing extra 'reads.
If I could 'read/binary all files, and then process it at the 'write and pre-write stages
it would make the script very much more simple.