Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Bug report (was: parse or) Re:

From: joel::neely::fedex::com at: 20-Sep-2000 16:40

Hello, all... [rryost--home--com] wrote:
> Hi Ryan: Here's a one liner that may help: > > >> st: "abcdef" > == "abcdef" > >> parse/all st "ed" > == ["abc" "" "f"] ; An inclusive OR, I guess. >
If by "inclusive OR" you mean that any of the characters in the delimiter string will terminate a field, then I agree. Consider this example:
>> stuff: {absdri.sdfoiwg,jfhwi,asdjfow,.wihl}
== "absdri.sdfoiwg,jfhwi,asdjfow,.wihl"
>> parse/all stuff ",."
== ["absdri" "sdfoiwg" "jfhwi" "asdjfow" "" "wihl"] The second (string!) argument supplies a list of delimiter characters, any of which will serve as a boundary between elements in the output block. Thus, either comma or period will cause a break. Notice, BTW, that the comma-period sequence in stuff creates a zero-length item in the output block.
> >> parse/all st "gh" > == ["abcdef"] ; No splitting as neither "g" nor "h" is present. > >> parse/all st "gb" > == ["a" "cdef"] ; Split at the single char that matched. >
HOWEVER, USE THIS PARSING OPTION WITH CAUTION: I believe there is a subtle bug in parse, as illustrated by:
>> parse/all {0:1:2:3} ":" == ["0" "1" "2" "3"] >> parse/all {:1:2:3} ":" == ["" "1" "2" "3"] >> parse/all {0::2:3} ":" == ["0" "" "2" "3"] >> parse/all {0:1::3} ":" == ["0" "1" "" "3"] >> parse/all {0:1:2:} ":" == ["0" "1" "2"]
(Input and output have been reformatted into parallel columns for ease of reading.) Notice that an empty (zero-length) field can appear anywhere in the input string EXCEPT at the end. I believe this to be a bug (or at least SOME sort of invertebrate!) because: 1) It's inconsistent: In all other cases, the last field is the content between the last delimiter and the end of the string. Other fields (between the beginning of the string and the first delimiter, or between consecutive delimiters) are allowed to have zero length. Why not the last? 2) It's inconvenient: A common use for the above type of parsing is to process "delimited ASCII" files, where each line represents a record, with the segments (before the first delimiter, between consecutive delimiters, and after the last delimiter) representing data fields. It is entirely possible (and not uncommon, in my own experience) for the last field to be an empty string. This parsing bug requires one either to write test-and-repair code for the case of an empty last field, or to write out the full parsing rules (example given below). Either way, it's extra coding work to deal with a common case. It's not hard to code, just a nuisance...
>> noncolon: complement charset {:}
== make bitset! #{ FFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF }
>> pfld: [copy _fld any noncolon (append _rec any [_fld ""])]
== [copy _fld any noncolon (append _rec any [_fld ""])]
>> prec: [(_rec: copy []) pfld any [{:} pfld]]
== [(_rec: copy []) pfld any [":" pfld]] Then, (back to 2-column layout)
>> parse/all {0:1:2:3} prec _rec == ["0" "1" "2" "3"] >> parse/all {:1:2:3} prec _rec == ["" "1" "2" "3"] >> parse/all {0::2:3} prec _rec == ["0" "" "2" "3"] >> parse/all {0:1::3} prec _rec == ["0" "1" "" "3"] >> parse/all {0:1:2:} prec _rec == ["0" "1" "2" ""]
Hope this is useful! -jn-