[REBOL] Speed and Memory Management in REBOL Re:(2)
From: joel:neely:fedex at: 10-Sep-2000 22:49
Hi, Bo...
Not to be a complainer ;-) but the venerable new-csv.r also
shows what appears to be a bug in parse (or at least a discrepancy
between documented and actual behavior...)
Consider the "csv" file below, created by a Well-Known Spreadsheet
from a Large Redmond Company:
1,1/1/00,"""Moo!"" the cow uttered."
2,1/2/00,"He said ""That's no lady! That's my podiatrist!"""
3,1/3/00,"This ""bad"" data brought to you by MICROS~1."
4,1/4/00,"I'm now crying ""Uncle!"""
Notice that the third field of each line is enclosed in quotation
marks and that embedded quotation marks are rendered as *pairs* of
quotation marks. This can be demonstrated by looking at another
copy of the same file, but saved in ".prn" format -- plain printable
text.
1 1/1/00 "Moo!" the cow uttered.
2 1/2/00 He said "That's no lady! That's my podiatrist!"
3 1/3/00 This "bad" data brought to you by MICROS~1.
4 1/4/00 I'm now crying "Uncle!"
The problem is that
parse/all some-string simple-string
is described as breaking some-string into a block of sub-strings
delimited by occurrences of simple-string (and as ignoring any
special
significance of characters in some-string -- such as
whitespace -- based on the presence of the /all refinement.
However, notice that the output of new-csv.r is the file:
["1" "1/1/00" "" "Moo!" " the cow uttered."]
["2" "1/2/00" "He said " "That's no lady! That's my podiatrist!" ""]
["3" "1/3/00" "This " "bad" " data brought to you by MICROS~1."]
["4" "1/4/00" "I'm now crying " "Uncle!" ""]
The erroneous number of fields per record result indicates that
parse/all is still special-casing the quotation marks, rather than
just treating them as plain data characters. As a result of this
glitch in the behavior of parse/all new-csv.r didn't produce the
expected result of
["1" "1/1/00" {"Moo!" the cow uttered.}]
["2" "1/2/00" {He said "That's no lady! That's my podiatrist!"}]
["3" "1/3/00" {This "bad" data brought to you by MICROS~1.}]
["4" "1/4/00" {I'm now crying "Uncle!"}]
(or its equivalent) with three fields per record, the most natural
interpretation of the original three-column spreadsheet.
-jn-
[bo--rebol--com] wrote: