Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

areas, text, and read/binary

 [1/10] from: reboler::programmer::net at: 11-Feb-2002 14:02


I am reading file contents with 'read/binary, because the file may be text or an image. If the file is text then I display those contents in an 'area with 'to-string contents. I can edit those contents, and then would like to save the changed contents (my-area/text). Here is the problem: the line terminators in the saved text seem to multiply. i.e. with each new save of the text an additional line terminator is added to the end of each line. Is there a way to avoid this? Is it something unique to the way 'areas handle, maybe to do with 'wrap?. I have tried various combinations of wrap?: true/false, write/string etc, and trim/auto etc with no success. Keep in mind that for reasons not mentioned here I would always like to 'read/binary to get the file contents, whether text file or image file. (I know this problem disappears if I just 'read text files, but it makes the script very much more complicated in several places). Problem in short: Can I read a file as binary, edit it's contents with 'to-string in an area/text, and then write it without REBOL adding extraneous line-terminators?

 [2/10] from: brett:codeconscious at: 12-Feb-2002 11:24


> I am reading file contents with 'read/binary, because the file may be text
or an image.
> If the file is text then I display those contents in an 'area with
'to-string contents.
> I can edit those contents, and then would like to save the changed
contents (my-area/text).
> Here is the problem: the line terminators in the saved text seem to
multiply.
> i.e. with each new save of the text an additional line terminator is added
to the end of each line. I couldn't duplicate your problem. What operating system are you using - what is the normal line termination for your text files?
> Keep in mind that for reasons not mentioned here I would always like to
'read/binary to get the file contents, whether text file or image file. (I know this problem disappears if I just 'read text files, but it makes the script very much more complicated in several places). READ without the binary refinement is doing real work for you. If you choose to bypass this work using the /binary refinement you'll just have to do it yourself elsewhere. The Core user guide says: "When a file is read as text, all line terminators are converted to newline (line feed) characters. Line feeds (used as line terminators on Amiga, Linux, and UNIX systems), carriage returns (used as line terminators on Macintosh), and the CR/LF combination (PC and Internet) are all converted to the equivalent newline characters." "Using a standard line terminator within scripts allows them to operate in a machine-independent fashion." So REBOL defines newline as the line terminator. Let's say your OS uses CR/LF as the line terminator. What then does AREA get to work on. The code in REBOL is expecting just a LF so how will it treat CR - I doubt that it will treat as part of the line termination - it will be treated as another ordinary text char. In this case you will end up editing it out when using AREA. So now your text inconsistent when you go to save via /binary or not. If use /binary when you write you will have broken line termination (missing CRs), if you don't use /binary you will have extra CR's (the one you didn't edit out). Worse, you application will not work across platforms even if you get it working on one. My understanding is that you are doing this to avoid the work of treating text and binary files differently when calling READ. I had the same problem when uploading my site to the web using FTP. My solution was to define a function that decided what was appropriate, /binary or not, depending ultimately on the extension of the file. I actually defined my own scheme for mapping file extensions to mime types: txt --> text/plain htm --> text/html html --> text/html jpg --> image/jpeg jpeg --> image/jpeg etc. It would be nice if this mapping facility came with Rebol. Brett.

 [3/10] from: joel:neely:fedex at: 11-Feb-2002 20:53


Hi, Alan, alan parman wrote:
> Here is the problem: the line terminators in the saved text seem > to multiply. i.e. with each new save of the text an additional > line terminator is added to the end of each line. > > Is there a way to avoid this?
Have you tried WRITE/BINARY to put the text back to a file? -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

 [4/10] from: reboler:programmer at: 12-Feb-2002 12:39


Thanks, Brett and Joel, for your responses. Here is a script to play around with so you can see what I mean. You may need to look at this with a text viewer (such as NotePad) to see the changes. Watch the file size change as you read/write. ;***** cut and paste the following into the console write %test.txt { test text line 2, line 3 line 4 write and read this file several times while adding some text to the middle and end of lines and then open in a text editor like NotePad line 8 . line 10 } rebol [] file: %test.txt main: layout [ across my-area: area return text "Change what the Read button does, here" return read-field: field 400 "append clear my-area/text to-string contents: read/binary file" button "Read File" [ do read-field/text show my-area ] return text "Change what the Write button does, here" return write-field: field 400 "write/string file my-area/text" ;***** button "Write File" [ do write-field/text clear my-area/text show my-area sz: length? read file append clear st/text rejoin ["size of file " length? read file] show st ] return st: text rejoin ["size of file " length? read file] return button "Convert Lines" [write file read file] text {This button does "write file read file"} ] view main

 [5/10] from: joel:neely:fedex at: 12-Feb-2002 13:02


Hi, Alan, alan parman wrote:
> Thanks, Brett and Joel, for your responses. > > Here is a script to play around with so you can see what I mean. >
As I asked in my earlier note, have you tried WRITE/BINARY instead of WRITE/STRING ? After making that substitution in your demo, I could repeatedly write and read the file without change. The simple rule (AFAICT) is to "match" the modes of your reading and writing (binary vs non-binary). Hope this helps! -jn-

 [6/10] from: reboler:programmer at: 12-Feb-2002 14:55


Thanks, Joel. Yes I have tried write/binary. Try _changing_ some of the text then write/binary then look at the text in a NON-REBOL text editor (try nNotePad) You will see that the lines you changed end differently than the unchanged lines (at least on a Windoze box). I have found that if I write/binary and then convert-lines it is ok. I am wondering if the read portion could be changed. Perhaps 'to-string is not the best way to go?

 [7/10] from: joel:neely:fedex at: 12-Feb-2002 15:36


Hi, Alan, alan parman wrote:
> Thanks, Joel. > Yes I have tried write/binary. > Try _changing_ some of the text then write/binary then look at > the text in a NON-REBOL text editor (try nNotePad) >
I did.
> You will see that the lines you changed end differently than the > unchanged lines (at least on a Windoze box). >
They didn't (on w2000).
> I have found that if I write/binary and then convert-lines it is ok. > > I am wondering if the read portion could be changed. Perhaps > 'to-string is not the best way to go? >
OK. I'm officially puzzled. -jn-

 [8/10] from: brett:codeconscious at: 13-Feb-2002 17:14


Hi Alan, Joel, Alan your demo script proves to me at least what I said before.
> I have found that if I write/binary and then convert-lines it is ok.
Actually no it is not. If you read/binary, add some text after the new space , save it with write/binary, convert-lines, then read it in again, you'll see that your new text dropped to the next line. To demonstrate what is happening add these two functions to your demo script: tag-crlf: func[ data [string! binary!]][ replace/all replace/all data CR "<CR>" NEWLINE "<LF>" ] detag-crlf: func[ data [string! binary!]][ replace/all replace/all data "<CR>" CR "<LF>" NEWLINE ] Then add these two buttons: button "tag text" [tag-crlf my-area/text show my-area] button "detag text" [detag-crlf my-area/text show my-area] Play with these buttons and see how you are dealing with Carriage Return characters (CR) and Line Feed (LF), with the various write, read and edit combinations. The point to understand is that Linefeed / LF / NEWLINE is the delimiter between lines in REBOL. Text files in Windows, MS-DOS, are delimited by two characters CR+LF. The following statements apply to the windows platform (I haven't used Win2000, nor XP nor CE.). 1) When Notepad sees CRLF in some text it breaks the text at that point and puts the following text on the next line. 2) When a REBOL AREA sees a LF it does the same. 3) AREA ignores CR. Try this code in your console - note that 0D is the hex representation for CR and 0A for LF: possibilities: [#{0a} #{0d} #{0a0d} #{0d0a}] repeat p possibilities [ print "**********************************" print ["Possibility: " mold p] print ["write/binary %test.dat"] write/binary %test.dat p print ["write %test.txt"] write %test.txt p print [ "Read/binary %test.dat: " mold r: read/binary %test.dat either equal? p r [""]["<-- Changed"] ] print ["Read/binary %test.txt: " mold r: read/binary %test.txt either equal? p r [""]["<-- Changed"] ] print [ "Read %test.dat: " mold r: to-binary read %test.dat either equal? p r [""]["<-- Changed"] ] print [ "Read %test.txt: " mold r: to-binary read %test.txt either equal? p r [""]["<-- Changed"] ] ] Regards, Brett.

 [9/10] from: al:bri:xtra at: 13-Feb-2002 21:47


> 1) When Notepad sees CRLF in some text it breaks the text at that point
and puts the following text on the next line. Note that Notepad behaves oddly with just a LF as line terminators. Andrew Martin ICQ: 26227169 http://valley.150m.com/

 [10/10] from: reboler:programmer at: 13-Feb-2002 9:34


Thanks all (esp. Brett) for your input. I think I understand enough now to better ask the proper question: Is there a way to process a file that has been opened with 'read/binary, as if you had opened it as plain 'read? I was hoping that 'to-string - ing the file would do it, but it is not. ------ some background ----- (only for the truly interested :) ) My script reads all types of files and displays them in their appropriate format (images as pictures, .txt and .r as text, etc) or allows you to view them in other formats (hex, base-64, picture-as-text, rot-13, etc). It also allows you to edit/compress/encrypt them, and view them normally even while compressed/encrypted. You can also copy/move/delete. And there is where the problem is: I would like to do these things with the option of NOT changing the original operating system format. In other words at the time of the 'read I don't know what I am going to do with file, I may or may not change it, I may or may not move it somewhere else. For example, I might want to edit and move a text file WITHOUT changing it's line terminators. Or I might want to move an encrypted text file without changing anything (so it can still be decrypted). Also, because of the editing capabilities of the script (and for speed), I am trying to avoid doing extra 'reads. If I could 'read/binary all files, and then process it at the 'write and pre-write stages it would make the script very much more simple.