r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[I'm new] Ask any question, and a helpful person will try to answer.

PatrickP61
19-Jul-2007
[603x2]
Getting better, but still no cigar.


Here is my test code for Mold-CSV function in CSV.r script:  ( I 
hope this formats correctly on Altme)

In-block:	[					; (want in csv file)	; (want in excel) 
["Col A1"		"Col B1"		]	; Col A1,Col B1	; Col A1	Col B1
[2			3			]	; 2,3			;      2	     3
["'3"			"'4"			]	; '3,'4			; '3		'4
["4a"			"4b"			]	; 4a,4b			; 4a		4b

["^"5a^"^"^"^""		"^"^"^"5b^"^""		]	; "5a""""","""5b"""	; 5a""		"5b" 
["6a x"			"6b y"			]	; 6a x,6b y		; 6a x		6b y
["7a, x"			"7b,^"^" y"		]	; "7a, x","7b,"" y"""	; 7a, x		7b," y"

["^"8a ^"^",x"		"^"8b ^"^"^"^"y^""	]	; "8a "",x","8b """" y"	; 8a 
",x	8b "" y

["^"^"^"9a^"^" x"		"^"9b ^"^"y^"^"^""	]	; """9a"" x","9b ""y"""	; 
"9a" x	9b "y" 
]

 Out-block:	Mold-CSV In-block
 write %Book2.csv Out-block
____________

In the above, I have 3 "views" if you will of what I am after.

The first view is the In-block that I would like Mold-CSV to run 
against.

The second commented view is what I need Mold-CSV to generate to 
put into the csv file

The third commented view is what Microsoft Excel will generate when 
I open the CSV file.


Mold-CSV works fine for the first 6 lines, then it gives me this 
for lines 7,8 and 9:

7a, x
,{7b,"" y}		<-- Where did the braces come for for 7b?
{"8a "",x},"8b """"y"	<-- Same quest for 8a?

9a
 x
,"9b ""y"""	ok

Any ideas on how to solve this?
Correction:	Where did the braces come from for 7b?
Anton
19-Jul-2007
[605x6]
Rebol molds long strings, or strings which contain a quote " inside 
{ } instead of " ".
Watch this:
>> "{^""
== {^{"}
The string I typed above contains an open brace and a double-quote 
(which I've escaped).

Rebol probably saw that the string contains an escaped quote and 
decided to mold it with { } to avoid having to escape it.
However, now the opening brace { needs to be escaped.
In short, Rebol's molding of strings can be in two different formats 
depending on the input string, and depending on where you get your 
input string from, it can be hard to guess which one it is.

I would suggest to make your own Excel/CSV-string-molding function 
to ensure you have double-quotes as expected.

Other people have come to this exact same problem before, by the 
way.
I don't understand the exact logic of the CSV file quote formatting, 
but you could use a function similar to this:
	enquote: func [string][rejoin [{"} string {"}]]
>> print enquote "hel^"lo"
hel
lo"

and you could take it further by replacing single instances of " 
in the string with two instances, etc.
Hmm.. 

http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=csv.r

Mold-CSV just uses MOLD, so you should replace it with your own MOLD-CSV-STRING 
function, similar to the above enquote.
PatrickP61
19-Jul-2007
[611]
Anton, Is MOLD-CSV-STRING different than the MOLD-CSV in the above 
link?  I don't see it inside of that script.
Anton
19-Jul-2007
[612x5]
mold-csv-string: func [string][rejoin [{"} replace/all copy string 
{"} {""} {"}]]
With the link I was just verifying that we were talking about the 
same script in rebol.org.
MOLD-CSV uses rebol's built-in MOLD to mold the string. I am proposing 
to replace MOLD with the above one-liner MOLD-CSV-STRING function.
Just replace the two instances of MOLD with MOLD-CSV-STRING.
Wait a minute ! MOLD handles any value, while MOLD-CSV-STRING only 
handles series at the moment... hang on.
PatrickP61
19-Jul-2007
[617]
Anton,  What do you think of this approach -- I'm just thinking it 
through and am not sure if I have covered all the bases.

Since my input can contain any number of symbols, commas, single 
and double quotes, and rarely, but possibly braces, what if I attack 
the problem a different way.  Whenever an embedded comma, or double 
quote or something like that occurs within an spreadsheet cell, it 
will require some kind of "extra" formatting like two quotes or the 
like.  It may even be that there are unique combinations of such 
symbols, rare as that would be, to have complex formatting of an 
input block for rebol to convert it properly for CSV.


What if I shift gears and look at a TAB delimited file instead.  
I know that I will never have TAB embedded in my cells, and that 
I deal with the entire block as a series instead.  I could embed 
TAB wherever needed to separate the columns and leave the remaining 
string the way it is.  Would that work, or would I still need to 
do some formatting to handle it.  I think I'll open an excel spreadsheet, 
and work in reverse to see what it needs for TAB delimited file. 
 Any comments?
Anton
19-Jul-2007
[618x2]
That may alleviate the string quoting problem. Worth a try.
I think I got it.

mold-csv-string: func [value][append insert replace/all either any-string? 
:value [copy :value][mold :value] {"} {""} {"} {"}]
PatrickP61
20-Jul-2007
[620x4]
Anton,  I tried the mold-csv-string and got the following:
[
    [
Col A1
 
Col B1
] 
    [2 3] 
    [
'3
 
'4
] 
    [
4a
 
4b
] 
    [{
5a



} {



5b
}] 
    [{

6a x

} 
6b y
] 
    [
7a, x
 {7b,

 y

}] 
    [{
8a 

,x} {
8b 



y
}] 
    [{


9a

 x
} {
9b 

y


}]
]

Which is not usable for excel  -- Maybe I mis-understood how to use 
it...
I also checked out tab delimited and the rules for that is much much 
simpler.

If a quote or comma is embedded in the string and does not start 
with a quote, then I can leave the value as it is.

The only thing I need to handle is if a value starts with a quote, 
then I need to add additional " around it, which I should be able 
to do!
For those of you monitoring and want to see what I mean:

Out-string:	[		; (want in tab file)		; (want in excel)
	{Col A1^-Col B1^/}	; Col A1	Col B1		; Col A1	Col B1 
	{2^(tab)3^(line)}	; 2		3		;      2		     3
	{'3^-'4^/}		; '3		'4		; '3		'4
	{4a^-4b^/}		; 4a		4b		; 4a		4b
	{5a""^-"""""5b"^/}	; 5a""		"""""5b"		; 5a""		""5b
	{"""6a x"""^-6b y^/}	; """6a x"""	6b y		; "6a x"		6b y
	{7a, x^-7b," y"^/}	; 7a, x		7b," y"		; 7a, x		7b," y"
	{8a ",x^-8b "" y^/}	; 8a ",x		8b "" y		; 8a ",x		8b "" y
	{"""9a"" x"^-9b "y"^/}	; """9a"" x"	9b "y"		; "9a" x		9b "y"	
]

 write %Book2.txt Out-string 


Aside from just need to insert a ^(tab) or ^- at the appropriate 
places to separate a cell from each other, and a ^(line) or ^/ in 
a string, I will also need to check the first character of each "cell". 
 If it starts with a ", then I need to add another set around it. 
 See 5b and 9a above to see what I mean.
-- (I wish AltMe would show the same spacings / tabs that I type 
into the New Message box as it would appear on the postings )
Henrik
20-Jul-2007
[624x2]
I'm not really sure what you should use there, because it seems to 
me that rebol always reduces ^" to ", unless you enter them by hand.
also I don't know if I understand your problem correctly. it's a 
bit hard to follow. :-)
Gregg
20-Jul-2007
[626]
Just skimmed here, so no new advice to add, except to say that this 
is one of those cases where it's impossible for REBOL to get it "right", 
because everybody has different rules about how it should work. I 
agree that the Excel model is a good one, and I would like to see 
it support that. REBOL also treats things differently based on whether 
there are spaces adjacent to the quotes or not, making it even more 
fun.

	http://www.rebol.net/cookbook/recipes/0018.html
PatrickP61
20-Jul-2007
[627x2]
My end goal is to be able to take some formatted text of some kind, 
something that is generated by a utility of some kind, and generate 
a spreadsheet from it.  The formatted text can be of any type including 
" and the like.


I'm working in reverse, by creating a spreadsheet in MS excel with 
various kinds of data that I've shown above.  Some data with just 
alpha, just numbers, combinatins, leading quotes, trailing quotes, 
embedded quotes, embedded commas, spaces etc.  Then I saved the spreadsheet 
as CSV and another version as Tab delimited.


Then by looking at those files via notepad or other editor, I can 
see how the data must be in order for MS excel to accept it.  I initially 
had problems with the CSV model because embedded qutoes needs other 
qutoes added to that "cell" if you will.  The Tab delimited model 
has less restrictions on it.  The only thing that needs attention 
is when a "cell" starts with a quote, which needs additional quotes 
added to it.  Embedded qutoes or trailing qutoes don't need any modification.


Long story short -- I'm going with Tab delimited model and figuring 
out a rebol script to take data from an IBM utility dump (with rules 
on what data to capture), and model that info into an excel spreadsheet 
via Tab delimited file.
Hi Gregg -- The cookbook recipe is a good one for reading and processing 
CSV's as input.  My main issue is NOT the CSV part itself.  It is 
pretty simple really.  But as usual MS has some additional formatting 
rules whenever certain characters are embedded, and that is the part 
I'm having trouble with in order for a CSV file to be loaded as a 
spreadsheet. 


You don't happen to have one that lets you write CSV files as output 
for excel (with all the special rules etc)???   :-)
Gregg
21-Jul-2007
[629x5]
I don't , but let's see how close this gets us. First, here is a 
support func for building delimited series. In addition, you'll need 
my COLLECT func from REBOL.org, or something similar.
delimit: func [
        "Insert a delimiter between series values."
        series [series!] "Series to delimit. Will be modified."
        value            "The delimiter to insert between items."

        /skip   ;<-- be sure to use system/words/skip in this func

            size [integer!] "The number of items between delimiters. Default 
            is 1."
    ][
        ; By default, delimiters go between each item.
        ; MAX catches zero and negative sizes.
        size: max 1 any [size 1]

        ; If we aren't going to insert any delimiters, just return the series.

        ; This check means FORSKIP should always give us a series result,
        ; rather than NONE, so we can safely inline HEAD with it.
        if size + 1 > length? series [return series]
        ; We don't want a delimiter at the beginning.
        series: system/words/skip series size

        ; Use size+n because we're inserting a delimiter on each pass,

        ; and need to skip over that as well. If we're inserting a

        ; series into a string, we have to skip the length of that

        ; series. i.e. the delimiter value is more than a single item
        ; we need to skip.
        size: size + any [

            all [list? series  0] ; lists behave differently; no need to skip 
            dlm.

            all [any-string? series  series? value  length? value]
            all [any-string? series  length? form value]
            1
        ]
        head forskip series size [insert/only series value]
    ]
fmt-for-Excel: func [blk dlm] [
	rejoin delimit collect fld [
		foreach val blk [
			val: form val
			replace/all val {"} {""}
			if find val #"," [val: rejoin [{"} val {"}]]
			fld: val
		]
	] dlm
]

fmt-for-Excel [{"A"} "B" "C,C" {"4,4"}] #","
If this output looks correct, then you just need to test it some 
more, with other possible scenarios.

>> fmt-for-Excel [{"A"} "B" "C,C" {"4,4"}] #","
== {""A"",B,"C,C","""4,4"""}
The case I'm not sure about is the last one there, where you end 
up with triple quotes, because they're doubled first, then an outer 
pair is added because of the embedded delimiter.
PatrickP61
23-Jul-2007
[634]
Hey Gregg,  Thanks for the code.  I tried it out and while there 
is a few hicups, I am planning on using that code when I create an 
Excel CSV version in addition to the Tab delimited version.   Thank 
you!
Gregg
23-Jul-2007
[635]
Let me know what the hiccups are, if you can, so we can get a solid 
version out there for people to use. Thanks.
RobertS
1-Aug-2007
[636x7]
can you tell me why to-file does not care if a word holding the value 
of a file name is presented as
to-file :filename
or  to-file filename
source to-file is very simple  to-file: func [value] [ to file! :value 
]  ; which seems to apply : to a get-word! in the first case
Put another way, what is the rule for when a word must explicitly 
bear the sigil prefix of  :  ? I.e., when is a get-word! required 
and when does any word suffice?  ::word is an error but   to file! 
:myString  in a func is no different from  to file! myString   and 
how do you pur carriage-returns into this message-post box!  ;-)
ed: func [/file filename [string! file!] " afile name" /local fn 
] 
	[either file 
		[either exists? fn: to file! :filename
			[editor fn]
			[fn: ask "file name:  " editor to file! :fn] ]
		[editor {}]
	]
comment { this works the same

ed: func [/file filename [string! file!] " afile name" /local fn 
] 
	[either file 
		[either exists? fn: to file! filename
			[editor fn]
			[fn: ask "file name:  " editor to file! fn] ]
		[editor {}]
	]}
btiffin
1-Aug-2007
[643x3]
Robert.  get-words are "unevaluated", so no code will execute getting 
to the value.  Most datatypes will return the same value for get 
as for evaluate.  But getting functions will return the function, 
not the result of evaluating the function.  Umm, that's probably 
not a Ladislav level answer, but it's how I think about it.
I'm not completely clued in, but I think get-words can be faster 
as well, as the lexical scanner can skip the evaluation,  In your 
case; evaluating a filename, returns a filename, (and I only assume) 
is an extra (nearly empty?)step than just getting the filename.
For instance.  a: now  gives you a date time field that won't change 
whenever you reference a or get a and it's type is date!   a: :now 
gives you an a that will be the current time whenever it is evaluated., 
but if you get-word a with :a or get 'a you get back the native, 
not the datetime, so a's type reports as native!  It's funky and 
fun.
Gregg
1-Aug-2007
[646x3]
Carriage returns - click the pencil icon to change to ctrl+s as the 
send key.
>> logname: does [rejoin [now/date ".log"]]
>> to-file logname
== %1-Aug-2007.log
>> to-file :logname
== %?function?
Brian's explanation is good; it's something to play around with in 
the console, to get a feel for things.
RobertS
1-Aug-2007
[649]
what seems a little spooky is the way the behavior Gregg illustrates 
disappears when I define to-file as
to-file: func [value] [to file! value] ; cool - or spooky
btiffin
1-Aug-2007
[650x2]
REBOL is both and more.  :)   But in that last example, although 
you may have passed the "unevaluated" logname, the function by using 
 the  value  reference, evaluates it.  :)
Try func [:val]  and func ['val]  for even more fun
RobertS
1-Aug-2007
[652]
wilco ;-)