read bug?

[1/18] from: gchiu::compkarori::co::nz at: 1-May-2001 22:51

I've been trying to ascertain why my vidwiki script croaks with larger submissions. To make page changes it converts the page into part of the url which it then reads. It dies at 4109 chars. I saved such a file, and tested as follows test: read %longurl.r read to-url test and Rebol gives an error. ** Near: read to-url test -- Graham Chiu

[2/18] from: joel:neely:fedex at: 1-May-2001 7:07

Graham Chiu wrote:

> I've been trying to ascertain why my vidwiki script croaks > with larger submissions. > > To make page changes it converts the page into part of the > url which it then reads. It dies at 4109 chars. >

Graham, This sounds very much like the problem I described several months ago with POST. There was apparently a buffer-size-limit problem with reading from standard input. A single read from stdin would get only one buffer full, which meant that longer submissions would be broken. I was never able to get a solution that would reliably fetch all of the input, and was forced to rewrite the project in Perl as a consequence. (I was building a wiki-like site to support a software development team. As reliability/robustness were critical factors for the application, I had no choice.) Compared to the elegant way that REBOL abstracts out the tedious implementation details of file i/o, http, ftp, etc. and allows one simply to READ from a data source, the handling of cgi looks very much like an afterthought/kludge. I'd really hope that a future version would offer a unified scheme for reading cgi and environment data that didn't require so much voodoo. -jn-

[3/18] from: gchiu:compkarori at: 2-May-2001 11:04

Joel Neely <[joel--neely--fedex--com]> wrote:

> This sounds very much like the problem I described > several

<<quoted lines omitted: 3>>

> from > stdin would get only one buffer full, which meant that

Hi Joel, What was the character limit you reached?

> I was never able to get a solution that would reliably > fetch

Chris posted some code the other day ... I must relook at that.

> one simply to READ from a data source, the handling of > cgi looks > very much like an afterthought/kludge. I'd really hope > that a

I recall when Rebol first started out, many of the RT team of that time didn't even know what cgi was :-) -- Graham Chiu

[4/18] from: joel:neely:fedex at: 1-May-2001 19:00

Hi, Graham, Graham Chiu wrote:

> Joel Neely <[joel--neely--fedex--com]> wrote: > > ... There was apparently a buffer-size-limit > > problem with reading from standard input. A > > single read from stdin would get only one buffer... > > Hi Joel, > > What was the character limit you reached? >

I don't remember the exact number but it was close enough to a "round number" of 2k or 4k that it rang my bell when I finally figured out where the breakage was happening.

> Chris posted some code the other day ... I must relook at > that. >

Yes, by all means! If you get it to work consistenly, under stress testing, please let me/us know. I'd love to be able to use it, but just don't have time right now to do all the research and testing...

> I recall when Rebol first started out, many of the RT team > of that time didn't even know what cgi was :-) >

Clearly WAAAAAYYYYY before my time ;-) I've never had cause to fret over the knowledge or skills at RT, just the fact that they didn't get issued 48 hours in each day to implement and document all of their great ideas! -jn-

[5/18] from: carl:rebol at: 2-May-2001 0:49

I know of no such buffer limit. Generally speaking, REBOL uses expandable series for just about everything. If there is a buffer limit, then it's quite by accident somewhere, and we should track it down. Also, have you tried using read/custom with POST data as described in the manual? It allows you to keep the POST data out of the URL and put it into the body of the http packet where it belongs. Much better for large amounts of data, for example if you wanted to post an image or file or large REBOL block. The reason that CGI uses the REBOL CGI object with all of its fields is to make it similar to standard CGI interface found on most systems. CGI all around is generally a kludge from the start. This is not unique to REBOL. It was added as an afterthought to http. -Carl

[6/18] from: joel:neely:fedex at: 2-May-2001 8:13

Carl Sassenrath wrote:

> I know of no such buffer limit. Generally speaking, REBOL > uses expandable series for just about everything. If there > is a buffer limit, then it's quite by accident somewhere, > and we should track it down. >

A few months back I posted to the list a collection of tests I had performed with various ways to try to obtain the user data posted by a form. I encountered the problem trying to use Andrew Grossman's wiki.r script as a quick-and-dirty project notebook. The relevant portion of that script is 8<------------------------------------------------------------ if system/options/cgi/request-method = "GET" [ do decode-cgi system/options/cgi/query-string] if system/options/cgi/request-method = "POST" [ post: make string! input do decode-cgi post] 8<------------------------------------------------------------ The call to INPUT only gets a portion of the input if the user has typed more than a certain amount of data into the text area of the original form. This constrasts with the behavior of READ which gives you all of the data from its source, regardless of length. The tests I posted (late November of last year) documented a variety of attempted workarounds. I'll be glad to try to locate that original post and resubmit, but can't do it at the moment.

> Also, have you tried using read/custom with POST data > as described in the manual? It allows you to keep the > POST data out of the URL and put it into the body of > the http packet where it belongs. Much better for large > amounts of data, for example if you wanted to post an > image or file or large REBOL block. >

I'll be happy to stand corrected if I'm wrong, but I looked at the manual again just to be sure... It appears to me that READ/CUSTOM has to do with *sending* the POST data from client side to the server. What I thought we were dealing with in this thread was the problem of *receiving* long POST data on the server side. At least that's what I was referring to in my comments...

> The reason that CGI uses the REBOL CGI object with all of its > fields is to make it similar to standard CGI interface > found on most systems. >

No offense, but... ;-) The details of writing an FTP client are a bit complicated on most systems, but REBOL abstracts away from all that complexity and lets me say read ftp://my.server.com/foo/gorp.data without all the complexity. I had thought that similarly abstracting away from implementation details to give a tidier way to write CGI scripts would be The REBOL Way. The REBOL CGI interface is far more cumbersome to use than (for example) the Perl CGI interface. In that interface: 1) The param() function with no arguments gives back a hash containing all form parameters (e.g. a block! of key/value pairs or an object!). 2) All other relevant knowledge (headers, etc.) are in the environment hash, keyed by the header field name. This is not optimal, but is easy to keep straight because all user data are available via param() and everything else (system and protocol data) are available in %ENV . OTOH, I haven't yet figured out a simple scheme for keeping up with which data are top-level in SYSTEM/OPTIONS/CGI and which have to be looked up in SYSTEM/OPTIONS/CGI/OTHER-HEADERS Of course, this is relatively minor compared to the differences in handling GET vs. POST, which the Perl CGI interface renders completely invisible. Again, please let me stress that all of these comments are in the context of my overall high regard and appreciation for REBOL, yourself, and your entire team. I wouldn't be whining about the fabric on the seat covers if the engine and gearbox weren't working so wonderfully well! ;-) -jn-

[7/18] from: gchiu:compkarori at: 3-May-2001 8:14

On Wed, 02 May 2001 08:13:37 -0500 Joel Neely <[joel--neely--fedex--com]> wrote:

> > is a buffer limit, then it's quite by accident > somewhere, > > and we should track it down.

I've sent feedback so that it gets entered into the bug investigation system

> I'll be happy to stand corrected if I'm wrong, but I > looked

<<quoted lines omitted: 6>>

> this thread was the problem of *receiving* long POST data > on

Actually, Carl is correct. My problem is that reading a very long url ( cgi by url method ) kills 'read. I haven't been able to post a large enough amount of data to encounter the bug that you were experiencing. I was looking forward to it though :-) -- Graham Chiu

[8/18] from: holger:rebol at: 2-May-2001 14:14

On Thu, May 03, 2001 at 08:14:22AM +1200, Graham Chiu wrote:

> Actually, Carl is correct. My problem is that reading a > very long url ( cgi by url method ) kills 'read. I haven't > been able to post a large enough amount of data to encounter > the bug that you were experiencing. I was looking forward > to it though :-)

Are you sure this is a REBOL-specific bug, i.e. does it work correctly with Netscape, when using with GET (not POST) ? My experience is that using very long URLs causes problems with some web servers and web browsers (e.g. Netscape 4.x, IIRC). We recently switched one of our internal CGI-based systems over from GET to POST because with GET the combination of Netscape and Apache caused problems with long URLs. As a rule, FORMs should only use GET for small numbers of fields and limited field content. Otherwise use POST. About the other problem, i.e. getting data from a POST request within a REBOL CGI script: keep in mind that read-io is a very-low-level read request that returns as soon as the OS returns something. The amount of data returned is not necessarily what was requested. It can be less. This is not a bug, it is by design. If you see a limit of around 4096 bytes then this is caused by how the OS clusters its data. What you need to do in a CGI script is loop until read-io returns 0, e.g. cgi-str: make string! 100000 while [0 < read-io system/ports/input cgi-str 100000] [] -- Holger Kruse [holger--rebol--com]

[9/18] from: sterling:rebol at: 2-May-2001 14:36

It seems that this thread has wandered from POST to GET problems. I'll address the issue with GET as best I can. People see that a long URL used with read: read http://foo.com/cgi-bin/cgi.r?name=value-of-really-long-data gets truncated or otherwise fails to read. I ran some tests on my machine here (Linux w/ Apache) using Core 2.5 on both the client and CGI end. I made a really big URL of over 7K which looks like the example above. It made it through just fine. However, I have seen a truncation of sorts happen in some situations here though most have been using a browser going to a REBOL CGI script. Have you tried the same read using a browser? One possibility is that the receiving system truncates the GET data. Remember that when using a GET method, all CGI data is passed to the program on the server using environment variables. If any of the client browser, webserver, or system env. vars limits the length then the CGI script on the other end will not get all the data. So take REBOL out of the picture and see what happens: #!/usr/bin/perl print "Content-type: text/html\n\n"; while (($key, $val) = each %ENV) { print "$key = $val<BR>\n"; } There's a little perl (yuck) cgi that'll print the env. vars for you. Try your request with REBOL or a browser against that and see what you get. Also, with REBOL, turn 'trace/net on' and do the read. What you see printed to the console is EXACTLY what is being sent into the TCP port to the server. Let me know what you find out. Sterling

[10/18] from: gchiu:compkarori at: 3-May-2001 20:42

On Wed, 2 May 2001 14:36:27 -0700 [sterling--rebol--com] wrote:

> Let me know what you find out. > > Sterling

Hi, Here's a script to illustrate the problem. I'm not sure that my understanding of what 'read should be doing is correct here. The script dies when it attempts to do a read at 4109 chars at pass 10. I get an error messsage: ** User Error: Error. ..... the url is printed out .. and then could not be retrieved. Server response: HTTP/1.0 400 Bad Request ** Near: res: read join cgi urltest" Since 'read is I thought just returning what the web server is sending, why should 'read die if the remote cgi script dies? Should it not just return the web server error message? rebol [] url-encode: func [ {URL-encode a string} data "String to encode" /local new-data normal-char c ] compose [ new-data: make string! "" normal-char: (charset [ #"A" - #"Z" #"a" - #"z" #"@" #"." #"*" #"-" #"_" #"0" - #"9" ]) if not string? data [return new-data] parse data [some [ copy c normal-char (append new-data c) | copy c skip (append new-data reduce ["%" skip tail (to-hex 0 + first c) -2]) ]] new-data ] test: read http://www.compkarori.com/rwiki/test.r cgi: http://www.compkarori.com/cgi-local/vidwiki2.r?wiki=test2.r&action=submit&wiki-text lencgi: length? cgi print "length of test string should be 2529 chars" print join "length of test is " length? test counter: 0 test: join test copy/part test 1712 forever [ counter: counter + 1 print join "at pass " counter test: join test "A" urltest: url-encode test res: read join cgi urltest print join "past chars = " ( lencgi + length? urltest ) ] -- Graham Chiu

[11/18] from: gchiu:compkarori at: 3-May-2001 20:48

On Wed, 2 May 2001 14:14:46 -0700 Holger Kruse <[holger--rebol--com]> wrote:

> a > rule, FORMs should only use GET for small numbers of > fields and > limited field content. Otherwise use POST.

Isn't there a problem with POST in that ISP's caching servers can then hit the same form and keep reposting the data? That's what happened to my one of my forms - and my form which was set up to mail to a list started mailbombing everyone in the list. I tracked the problem down with web server logs, and asked my ISP to stop caching my pages as well as taking other measures. -- Graham Chiu

[12/18] from: gchiu::compkarori::co::nz at: 3-May-2001 23:14

Re: read bug? probably not!

On Wed, 2 May 2001 14:14:46 -0700 Holger Kruse <[holger--rebol--com]> wrote:

> My experience is that using very long URLs causes > problems

<<quoted lines omitted: 4>>

> combination > of Netscape and Apache caused problems with long URLs. As

I've done some more testing, and here's a test script at http://www.compkarori.com/cgi-local/testcgi.r #!/path/to/rebol -cs REBOL [ Title: "testcgi.r" File: %testcgi.r Date: 3-May-2001 Author: [ "Graham Chiu" ] Email: [ [gchiu--compkarori--co--nz] ] category: 'web ] if system/options/cgi/request-method = "GET" [ prin {Content-type: text/html^/^/<HTML>^/<HEAD>^/ <TITLE>Rebol GET test</TITLE>^/<BODY>server responds to GET</BODY>^/</HTML>^/} quit ] prin {Content-type: text/html^/^/<HTML>^/<HEAD>^/ <TITLE>Rebol GET test</TITLE>^/<BODY>Not a GET request</BODY>^/</HTML>^/} Now, as you can see, this script just prints a message. It doesn't actually do anything with the input. However, if you feed it the same script as I used before, it kills this script when the url length is greater than 4108 characters with "Server response: HTTP/1.0 400 Bad Request" The webserver is Apache. -- Graham Chiu

[13/18] from: sterling:rebol at: 3-May-2001 10:23

Re: read bug?

Well it worked flawlessly for me using both Core 2.5 and View/Pro 1.1. It didn't fail on either through pass 52 (4150 chars) where I stopped it. Are you, by chance, going through any proxies? Anything in between you and the web server has the opportunity to truncate the URL. You are right that read should not fail but it isn't. Notice the error you get back is not a URL Error or any other REBOL script error. It is a 400 response from the webserver telling you "Bad Request." Sterling

[14/18] from: brian:hawley at: 3-May-2001 14:57

Re: read bug? probably not!

Graham Chiu wrote:

>I've done some more testing, and here's a test script >at http://www.compkarori.com/cgi-local/testcgi.r

<<quoted lines omitted: 22>>

>with "Server response: HTTP/1.0 400 Bad Request" >The webserver is Apache.

Well, that may be an error in Apache. It should return a 414 (Request-URI Too Long). I suppose it's possible that the excessively long URL got mangled in transit, though. The HTTP standard warns against relying on URLs longer than 255 characters to work consistently. Something about older routers, proxies and servers (read, most ones out there) not being able to handle them. They suggest posting instead. Brian Hawley

[15/18] from: gchiu:compkarori at: 4-May-2001 12:27

Re: read bug?

On Thu, 3 May 2001 10:23:21 -0700 [sterling--rebol--com] wrote:

> Well it worked flawlessly for me using both Core 2.5 and > View/Pro

<<quoted lines omitted: 5>>

> truncate the > URL.

And it works for me on another ISP - so it looks as though my ISP is mangling the URL with older routers etc. Thanks for testing it for me. -- Graham Chiu

[16/18] from: gchiu:compkarori at: 4-May-2001 22:31

On Wed, 02 May 2001 08:13:37 -0500 Joel Neely <[joel--neely--fedex--com]> wrote:

> if system/options/cgi/request-method = "GET" [ > do decode-cgi system/options/cgi/query-string] > if system/options/cgi/request-method = "POST" [ > post: make string! input do decode-cgi post]

Hi Joel, I changed my code to use 'read/custom instead of just 'read and this has stopped the url mangling that I was experiencing. I then hit the POST character limit that you described above. Using Holger's/Chris' code -- if system/options/cgi/request-method = "POST" [ len_post: ( 20 + load system/options/cgi/content-length ) post: make string! len_post while [0 < read-io system/ports/input post len_post ] [] do decode-cgi post ] seems to have fixed it. My vidwikibeta script can now post over 17k without loss. If you want to stress it, the script is on my site. -- Graham Chiu http://www.compkarori.co.nz/index.r

[17/18] from: allenk:powerup:au at: 4-May-2001 21:19

While we are looking at cgi stuff... View/Pro 1.1.0.3.1 plus-to-space func is set into the global context after using decode-cgi (this func is nested inside decode-cgi and should be made local to it, or since it is a useful func it should be made a mezzanine func so its always accessible) ? plus-t No information on plus-t (word has no value)

>> decode-cgi "this=3"

== [this: "3"]

>> ? plus-t

Found these words: plus-to-space (function) Cheers, Allen K

[18/18] from: joel:neely:fedex at: 4-May-2001 6:33

Thanks, Graham! Graham Chiu wrote:

...

> Using Holger's/Chris' code -- > if system/options/cgi/request-method = "POST" [

<<quoted lines omitted: 6>>

> over 17k without loss. If you want to stress it, the script > is on my site.

This day is already overfulltorunningover ;-) but I'll try to look at it over the weekend. One of the things I had planned to do was merge wiki-like functionality with a REBOL-based mini-http-server script to provide a lightweight interactive note-taking tool for my laptop. Of course, it would also need a means to synch the content with other boxes (home box, desk box at work, big server at work, etc...) -jn-

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted