read bug?
[1/18] from: gchiu::compkarori::co::nz at: 1-May-2001 22:51
I've been trying to ascertain why my vidwiki script croaks
with larger submissions.
To make page changes it converts the page into part of the
url which it then reads. It dies at 4109 chars.
I saved such a file, and tested as follows
test: read %longurl.r
read to-url test
and Rebol gives an error.
** Near: read to-url test
--
Graham Chiu
[2/18] from: joel:neely:fedex at: 1-May-2001 7:07
Graham Chiu wrote:
> I've been trying to ascertain why my vidwiki script croaks
> with larger submissions.
>
> To make page changes it converts the page into part of the
> url which it then reads. It dies at 4109 chars.
>
Graham,
This sounds very much like the problem I described several
months ago with POST. There was apparently a buffer-size-limit
problem with reading from standard input. A single read from
stdin would get only one buffer full, which meant that longer
submissions would be broken.
I was never able to get a solution that would reliably fetch
all of the input, and was forced to rewrite the project in
Perl as a consequence. (I was building a wiki-like site to
support a software development team. As reliability/robustness
were critical factors for the application, I had no choice.)
Compared to the elegant way that REBOL abstracts out the tedious
implementation details of file i/o, http, ftp, etc. and allows
one simply to READ from a data source, the handling of cgi looks
very much like an afterthought/kludge. I'd really hope that a
future version would offer a unified scheme for reading cgi and
environment data that didn't require so much voodoo.
-jn-
[3/18] from: gchiu:compkarori at: 2-May-2001 11:04
Joel Neely <[joel--neely--fedex--com]> wrote:
> This sounds very much like the problem I described
> several
<<quoted lines omitted: 3>>
> from
> stdin would get only one buffer full, which meant that
Hi Joel,
What was the character limit you reached?
> I was never able to get a solution that would reliably
> fetch
Chris posted some code the other day ... I must relook at
that.
> one simply to READ from a data source, the handling of
> cgi looks
> very much like an afterthought/kludge. I'd really hope
> that a
I recall when Rebol first started out, many of the RT team
of that time didn't even know what cgi was :-)
--
Graham Chiu
[4/18] from: joel:neely:fedex at: 1-May-2001 19:00
Hi, Graham,
Graham Chiu wrote:
> Joel Neely <[joel--neely--fedex--com]> wrote:
> > ... There was apparently a buffer-size-limit
> > problem with reading from standard input. A
> > single read from stdin would get only one buffer...
>
> Hi Joel,
>
> What was the character limit you reached?
>
I don't remember the exact number but it was close enough
to a "round number" of 2k or 4k that it rang my bell when
I finally figured out where the breakage was happening.
> Chris posted some code the other day ... I must relook at
> that.
>
Yes, by all means! If you get it to work consistenly,
under stress testing, please let me/us know. I'd love to
be able to use it, but just don't have time right now to
do all the research and testing...
> I recall when Rebol first started out, many of the RT team
> of that time didn't even know what cgi was :-)
>
Clearly WAAAAAYYYYY before my time ;-) I've never had cause
to fret over the knowledge or skills at RT, just the fact
that they didn't get issued 48 hours in each day to
implement and document all of their great ideas!
-jn-
[5/18] from: carl:rebol at: 2-May-2001 0:49
I know of no such buffer limit. Generally speaking, REBOL
uses expandable series for just about everything. If there
is a buffer limit, then it's quite by accident somewhere,
and we should track it down.
Also, have you tried using read/custom with POST data
as described in the manual? It allows you to keep the
POST data out of the URL and put it into the body of
the http packet where it belongs. Much better for large
amounts of data, for example if you wanted to post an
image or file or large REBOL block.
The reason that CGI uses the REBOL CGI object with all of its
fields is to make it similar to standard CGI interface
found on most systems.
CGI all around is generally a kludge from the start. This
is not unique to REBOL. It was added as an afterthought to
http.
-Carl
[6/18] from: joel:neely:fedex at: 2-May-2001 8:13
Carl Sassenrath wrote:
> I know of no such buffer limit. Generally speaking, REBOL
> uses expandable series for just about everything. If there
> is a buffer limit, then it's quite by accident somewhere,
> and we should track it down.
>
A few months back I posted to the list a collection of tests
I had performed with various ways to try to obtain the user
data posted by a form. I encountered the problem trying to
use Andrew Grossman's wiki.r script as a quick-and-dirty
project notebook. The relevant portion of that script is
8<------------------------------------------------------------
if system/options/cgi/request-method = "GET" [
do decode-cgi system/options/cgi/query-string]
if system/options/cgi/request-method = "POST" [
post: make string! input do decode-cgi post]
8<------------------------------------------------------------
The call to INPUT only gets a portion of the input if the user
has typed more than a certain amount of data into the text area
of the original form. This constrasts with the behavior of
READ which gives you all of the data from its source, regardless
of length. The tests I posted (late November of last year)
documented a variety of attempted workarounds. I'll be glad to
try to locate that original post and resubmit, but can't do it
at the moment.
> Also, have you tried using read/custom with POST data
> as described in the manual? It allows you to keep the
> POST data out of the URL and put it into the body of
> the http packet where it belongs. Much better for large
> amounts of data, for example if you wanted to post an
> image or file or large REBOL block.
>
I'll be happy to stand corrected if I'm wrong, but I looked
at the manual again just to be sure... It appears to me that
READ/CUSTOM has to do with *sending* the POST data from client
side to the server. What I thought we were dealing with in
this thread was the problem of *receiving* long POST data on
the server side. At least that's what I was referring to in
my comments...
> The reason that CGI uses the REBOL CGI object with all of its
> fields is to make it similar to standard CGI interface
> found on most systems.
>
No offense, but... ;-)
The details of writing an FTP client are a bit complicated on
most systems, but REBOL abstracts away from all that complexity
and lets me say
read ftp://my.server.com/foo/gorp.data
without all the complexity. I had thought that similarly
abstracting away from implementation details to give a tidier
way to write CGI scripts would be The REBOL Way.
The REBOL CGI interface is far more cumbersome to use than
(for example) the Perl CGI interface. In that interface:
1) The param() function with no arguments gives back a
hash containing all form parameters (e.g. a block! of
key/value pairs or an object!).
2) All other relevant knowledge (headers, etc.) are in the
environment hash, keyed by the header field name.
This is not optimal, but is easy to keep straight because
all user data are available via param() and everything else
(system and protocol data) are available in %ENV .
OTOH, I haven't yet figured out a simple scheme for keeping
up with which data are top-level in SYSTEM/OPTIONS/CGI and
which have to be looked up in SYSTEM/OPTIONS/CGI/OTHER-HEADERS
Of course, this is relatively minor compared to the differences
in handling GET vs. POST, which the Perl CGI interface renders
completely invisible.
Again, please let me stress that all of these comments are in
the context of my overall high regard and appreciation for
REBOL, yourself, and your entire team. I wouldn't be whining
about the fabric on the seat covers if the engine and gearbox
weren't working so wonderfully well! ;-)
-jn-
[7/18] from: gchiu:compkarori at: 3-May-2001 8:14
On Wed, 02 May 2001 08:13:37 -0500
Joel Neely <[joel--neely--fedex--com]> wrote:
> > is a buffer limit, then it's quite by accident
> somewhere,
> > and we should track it down.
I've sent feedback so that it gets entered into the bug
investigation system
> I'll be happy to stand corrected if I'm wrong, but I
> looked
<<quoted lines omitted: 6>>
> this thread was the problem of *receiving* long POST data
> on
Actually, Carl is correct. My problem is that reading a
very long url ( cgi by url method ) kills 'read. I haven't
been able to post a large enough amount of data to encounter
the bug that you were experiencing. I was looking forward
to it though :-)
--
Graham Chiu
[8/18] from: holger:rebol at: 2-May-2001 14:14
On Thu, May 03, 2001 at 08:14:22AM +1200, Graham Chiu wrote:
> Actually, Carl is correct. My problem is that reading a
> very long url ( cgi by url method ) kills 'read. I haven't
> been able to post a large enough amount of data to encounter
> the bug that you were experiencing. I was looking forward
> to it though :-)
Are you sure this is a REBOL-specific bug, i.e. does it work
correctly with Netscape, when using with GET (not POST) ?
My experience is that using very long URLs causes problems
with some web servers and web browsers (e.g. Netscape 4.x,
IIRC). We recently switched one of our internal CGI-based
systems over from GET to POST because with GET the combination
of Netscape and Apache caused problems with long URLs. As a
rule, FORMs should only use GET for small numbers of fields and
limited field content. Otherwise use POST.
About the other problem, i.e. getting data from a POST request
within a REBOL CGI script: keep in mind that read-io is a
very-low-level read request that returns as soon as the OS
returns something. The amount of data returned is not necessarily
what was requested. It can be less. This is not a bug, it is by
design. If you see a limit of around 4096 bytes then this is caused
by how the OS clusters its data. What you need to do in a CGI
script is loop until read-io returns 0, e.g.
cgi-str: make string! 100000
while [0 < read-io system/ports/input cgi-str 100000] []
--
Holger Kruse
[holger--rebol--com]
[9/18] from: sterling:rebol at: 2-May-2001 14:36
It seems that this thread has wandered from POST to GET problems.
I'll address the issue with GET as best I can.
People see that a long URL used with read:
read http://foo.com/cgi-bin/cgi.r?name=value-of-really-long-data
gets truncated or otherwise fails to read. I ran some tests on my
machine here (Linux w/ Apache) using Core 2.5 on both the client and
CGI end. I made a really big URL of over 7K which looks like the
example above. It made it through just fine.
However, I have seen a truncation of sorts happen in some situations
here though most have been using a browser going to a REBOL CGI
script. Have you tried the same read using a browser?
One possibility is that the receiving system truncates the GET data.
Remember that when using a GET method, all CGI data is passed to the
program on the server using environment variables. If any of the
client browser, webserver, or system env. vars limits the length then
the CGI script on the other end will not get all the data.
So take REBOL out of the picture and see what happens:
#!/usr/bin/perl
print "Content-type: text/html\n\n";
while (($key, $val) = each %ENV) {
print "$key = $val<BR>\n";
}
There's a little perl (yuck) cgi that'll print the env. vars for you.
Try your request with REBOL or a browser against that and see what you
get. Also, with REBOL, turn 'trace/net on' and do the read. What you
see printed to the console is EXACTLY what is being sent into the TCP
port to the server.
Let me know what you find out.
Sterling
[10/18] from: gchiu:compkarori at: 3-May-2001 20:42
On Wed, 2 May 2001 14:36:27 -0700
[sterling--rebol--com] wrote:
> Let me know what you find out.
>
> Sterling
Hi,
Here's a script to illustrate the problem. I'm not sure
that my understanding of what 'read should be doing is
correct here.
The script dies when it attempts to do a read at 4109 chars
at pass 10. I get an error messsage:
** User Error: Error. ..... the url is printed out .. and
then
could not be
retrieved. Server response: HTTP/1.0 400 Bad Request
** Near: res: read join cgi urltest"
Since 'read is I thought just returning what the web server
is sending, why should 'read die if the remote cgi script
dies? Should it not just return the web server error
message?
rebol []
url-encode: func [
{URL-encode a string}
data "String to encode"
/local new-data normal-char c
] compose [
new-data: make string! ""
normal-char: (charset [
#"A" - #"Z" #"a" - #"z"
#"@" #"." #"*" #"-" #"_"
#"0" - #"9"
])
if not string? data [return new-data]
parse data [some [
copy c normal-char
(append new-data c) |
copy c skip
(append new-data reduce ["%" skip tail (to-hex 0 + first
c) -2])
]]
new-data
]
test: read http://www.compkarori.com/rwiki/test.r
cgi: http://www.compkarori.com/cgi-local/vidwiki2.r?wiki=test2.r&action=submit&wiki-text
lencgi: length? cgi
print "length of test string should be 2529 chars"
print join "length of test is " length? test
counter: 0
test: join test copy/part test 1712
forever [
counter: counter + 1
print join "at pass " counter
test: join test "A"
urltest: url-encode test
res: read join cgi urltest
print join "past chars = " ( lencgi + length? urltest )
]
--
Graham Chiu
[11/18] from: gchiu:compkarori at: 3-May-2001 20:48
On Wed, 2 May 2001 14:14:46 -0700
Holger Kruse <[holger--rebol--com]> wrote:
> a
> rule, FORMs should only use GET for small numbers of
> fields and
> limited field content. Otherwise use POST.
Isn't there a problem with POST in that ISP's caching
servers can then hit the same form and keep reposting the
data? That's what happened to my one of my forms - and my
form which was set up to mail to a list started mailbombing
everyone in the list. I tracked the problem down with web
server logs, and asked my ISP to stop caching my pages as
well as taking other measures.
--
Graham Chiu
[12/18] from: gchiu::compkarori::co::nz at: 3-May-2001 23:14
Re: read bug? probably not!
On Wed, 2 May 2001 14:14:46 -0700
Holger Kruse <[holger--rebol--com]> wrote:
> My experience is that using very long URLs causes
> problems
<<quoted lines omitted: 4>>
> combination
> of Netscape and Apache caused problems with long URLs. As
I've done some more testing, and here's a test script
at http://www.compkarori.com/cgi-local/testcgi.r
#!/path/to/rebol -cs
REBOL [
Title: "testcgi.r"
File: %testcgi.r
Date: 3-May-2001
Author: [ "Graham Chiu" ]
Email: [ [gchiu--compkarori--co--nz] ]
category: 'web
]
if system/options/cgi/request-method = "GET" [
prin {Content-type: text/html^/^/<HTML>^/<HEAD>^/
<TITLE>Rebol GET test</TITLE>^/<BODY>server responds to
GET</BODY>^/</HTML>^/}
quit
]
prin {Content-type: text/html^/^/<HTML>^/<HEAD>^/
<TITLE>Rebol GET test</TITLE>^/<BODY>Not a GET
request</BODY>^/</HTML>^/}
Now, as you can see, this script just prints a message. It
doesn't actually do anything with the input. However, if
you feed it the same script as I used before, it kills this
script when the url length is greater than 4108 characters
with "Server response: HTTP/1.0 400 Bad Request"
The webserver is Apache.
--
Graham Chiu
[13/18] from: sterling:rebol at: 3-May-2001 10:23
Re: read bug?
Well it worked flawlessly for me using both Core 2.5 and View/Pro
1.1. It didn't fail on either through pass 52 (4150 chars) where I
stopped it. Are you, by chance, going through any proxies? Anything
in between you and the web server has the opportunity to truncate the
URL.
You are right that read should not fail but it isn't. Notice the
error you get back is not a URL Error or any other REBOL script
error. It is a 400 response from the webserver telling you "Bad
Request."
Sterling
[14/18] from: brian:hawley at: 3-May-2001 14:57
Re: read bug? probably not!
Graham Chiu wrote:
>I've done some more testing, and here's a test script
>at http://www.compkarori.com/cgi-local/testcgi.r
<<quoted lines omitted: 22>>
>with "Server response: HTTP/1.0 400 Bad Request"
>The webserver is Apache.
Well, that may be an error in Apache. It should return a
414 (Request-URI Too Long). I suppose it's possible that
the excessively long URL got mangled in transit, though.
The HTTP standard warns against relying on URLs longer than
255 characters to work consistently. Something about older
routers, proxies and servers (read, most ones out there) not
being able to handle them. They suggest posting instead.
Brian Hawley
[15/18] from: gchiu:compkarori at: 4-May-2001 12:27
Re: read bug?
On Thu, 3 May 2001 10:23:21 -0700
[sterling--rebol--com] wrote:
> Well it worked flawlessly for me using both Core 2.5 and
> View/Pro
<<quoted lines omitted: 5>>
> truncate the
> URL.
And it works for me on another ISP - so it looks as though
my ISP is mangling the URL with older routers etc.
Thanks for testing it for me.
--
Graham Chiu
[16/18] from: gchiu:compkarori at: 4-May-2001 22:31
On Wed, 02 May 2001 08:13:37 -0500
Joel Neely <[joel--neely--fedex--com]> wrote:
> if system/options/cgi/request-method = "GET" [
> do decode-cgi system/options/cgi/query-string]
> if system/options/cgi/request-method = "POST" [
> post: make string! input do decode-cgi post]
Hi Joel,
I changed my code to use 'read/custom instead of just 'read
and this has stopped the url mangling that I was
experiencing. I then hit the POST character limit that you
described above.
Using Holger's/Chris' code --
if system/options/cgi/request-method = "POST" [
len_post: ( 20 + load system/options/cgi/content-length )
post: make string! len_post
while [0 < read-io system/ports/input post len_post ] []
do decode-cgi post
]
seems to have fixed it. My vidwikibeta script can now post
over 17k without loss. If you want to stress it, the script
is on my site.
--
Graham Chiu
http://www.compkarori.co.nz/index.r
[17/18] from: allenk:powerup:au at: 4-May-2001 21:19
While we are looking at cgi stuff...
View/Pro 1.1.0.3.1
plus-to-space func is set into the global context after using decode-cgi
(this func is nested inside decode-cgi and should be made local to it, or
since it is a useful func it should be made a mezzanine func so its always
accessible)
? plus-t
No information on plus-t (word has no value)
>> decode-cgi "this=3"
== [this: "3"]
>> ? plus-t
Found these words:
plus-to-space (function)
Cheers,
Allen K
[18/18] from: joel:neely:fedex at: 4-May-2001 6:33
Thanks, Graham!
Graham Chiu wrote:
...
> Using Holger's/Chris' code --
> if system/options/cgi/request-method = "POST" [
<<quoted lines omitted: 6>>
> over 17k without loss. If you want to stress it, the script
> is on my site.
This day is already overfulltorunningover ;-) but I'll try to
look at it over the weekend. One of the things I had planned
to do was merge wiki-like functionality with a REBOL-based
mini-http-server script to provide a lightweight interactive
note-taking tool for my laptop. Of course, it would also need
a means to synch the content with other boxes (home box, desk
box at work, big server at work, etc...)
-jn-
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted