CGI: reading POST-method data with read-io

[1/11] from: alex::pini::mclink::it at: 24-Aug-2000 0:52

>- Open Your Mind -<

(Mr. Sassenrath should receive a CC of this message as a comment on the Network Protocols chapter, Second Revision, Draft 1) I've read the User's Guide, the old FAQs, the how-to and the recent Networking chapter, but I still lack insight on the inner workings of read-io. I can make conjectures, they may even work, but I don't like that: in the long run, I could accumulate all sorts of mistakes. ---------- Buffer ---------- We need a buffer. If we need to read 2000 bytes, the buffer is made 2002 bytes long with data1: make string! 2002 I guess the extra 2 bytes are needed to store ancillary information, but their content is not my concern, for now. Can we make the buffer as in data2: make string! 2000 or even data3: copy "" since strings can (usually?) be extended as needed? Will read-io work with a buffer like data3? Is data1 initialized like that for better performance only or is it *required* by the inner workings? ---------- Correct buffer-length ---------- According to CGI quasi-official specs, once I've checked the message body is in URL-encoded format, I must read no more than CONTENT_LENGTH bytes from system/ports/input. So if I want to go elegant and read *exactly* CONTENT_LENGTH bytes I make my buffer exactly 2 bytes longer than that and use read-io to read exactly CONTENT_LENGTH bytes, so I don't waste memory in a prudent 128 MB buffer, right? ---------- Data readiness ---------- It has been pointed out (but I can't remember where) that the message body could possibly be transported very slowly to the CGI script, due to heavy traffic on the net, so it is possible that the message body is not *fully* there when read-io is issued. On the other hand, the length you ask read-io is a *maximum* length. IIRC, Jan posted a reading cycle on the list to take care of this (will look for it as soon as I can). If CONTENT_LENGTH is, say, 22000 bytes, what happens if I request my 22000 bytes but only 1000 are available? Will read-io return the 1000 bytes immediately (which means I have to do the reading cycle) or will it wait until the whole 22000 bytes are transported and return them all? What about timeouts? ---------- In the end ---------- Will read-io be fully described somewhere? ((-: TIA. Alessandro Pini ([alex--pini--mclink--it]) Now I'm mumblin' and I'm screamin' / And I don't know what I'm singin' (Weird Al Yankovic)

[2/11] from: al:bri:xtra at: 24-Aug-2000 19:58

Alessandro wrote:

> Will read-io be fully described somewhere? ((-:

I was under the impression that 'read-io was a temporary solution, while a better networking solution is being developed. I wouldn't rely on it to exist in future Rebol versions. But then I could be wrong. Anyone from Rebol Tech want to confirm or deny? Andrew Martin ICQ: 26227169 http://members.xoom.com/AndrewMartin/

[3/11] from: petr:krenzelok:trz:cz at: 24-Aug-2000 10:34

[Al--Bri--xtra--co--nz] wrote:

> Alessandro wrote: > > Will read-io be fully described somewhere? ((-: > > I was under the impression that 'read-io was a temporary solution, while a > better networking solution is being developed. I wouldn't rely on it to > exist in future Rebol versions. > > But then I could be wrong. >

IIRC Carl once said it was meant to be just a temporary solution, but looking at new networking docs at http://www.rebol.com/docs/network.html, you can notice read-io's still there ... -pekr-

[4/11] from: g:santilli:tiscalinet:it at: 24-Aug-2000 19:37

Hello [alex--pini--mclink--it]! On 24-Ago-00, you wrote: a> I've read the User's Guide, the old FAQs, the how-to and the a> recent Networking chapter, but I still lack insight on the a> inner workings of read-io. I can make conjectures, they may a> even work, but I don't like that: in the long run, I could a> accumulate all sorts of mistakes. I hope I can help... a> ---------- Buffer ---------- a> We need a buffer. If we need to read 2000 bytes, the buffer is a> made 2002 bytes long with a> data1: make string! 2002 (This is probably a bug. I think it was going to be fixed...) a> I guess the extra 2 bytes are needed to store ancillary a> information, but their content is not my concern, for now. Can a> we make the buffer as in a> data2: make string! 2000 You should be able to, but it creates problems if the length is a multiple of 16. I had a lot of troubles because of this... :-) a> or even a> data3: copy "" READ-IO is very low-level, and does not extend the string. Anyway, since the new experimental release of core has asyncronous TCP ports, READ-IO should no more be needed. a> ---------- Correct buffer-length ---------- a> According to CGI quasi-official specs, once I've checked the a> message body is in URL-encoded format, I must read no more a> than CONTENT_LENGTH bytes from system/ports/input. So if I a> want to go elegant and read *exactly* CONTENT_LENGTH bytes I a> make my buffer exactly 2 bytes longer than that and use a> read-io to read exactly CONTENT_LENGTH bytes, so I don't waste a> memory in a prudent 128 MB buffer, right? Yup. a> ---------- Data readiness ---------- [...] a> CONTENT_LENGTH is, say, 22000 bytes, what happens if I request a> my 22000 bytes but only 1000 are available? Will read-io a> return the 1000 bytes immediately (which means I have to do Yes. Regards, Gabriele. -- Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/

[5/11] from: petr:krenzelok:trz:cz at: 25-Aug-2000 1:01

[g--santilli--tiscalinet--it] wrote:

> READ-IO is very low-level, and does not extend the string. Anyway, > since the new experimental release of core has asyncronous TCP > ports, READ-IO should no more be needed.

Could you be more specific, please? How do you want to substitute read-io read-by-buffer-size-parts functionality? Thanks, -pekr-

[6/11] from: g:santilli:tiscalinet:it at: 25-Aug-2000 20:03

Hello [petr--krenzelok--trz--cz]! On 25-Ago-00, you wrote: p> Could you be more specific, please? How do you want to p> substitute read-io read-by-buffer-size-parts functionality? You need to use READ-IO when you a) want the raw data and b) don't want to wait for the remote part to close the port. COPY/PART had a similar functionality on /BINARY TCP ports, but it still waited if there were no data in the port. Now you can just use COPY or COPY/PART, because it no more waits for data. Imagine a client-server communication. With the new experimental core: Asynchronous communication: Server:

>> listen: open tcp://:10000 >> conn: first listen

Client:

>> conn: open tcp://localhost:10000 >> insert conn "hello server!"

Server:

>> copy conn

== "hello server!"

>> insert conn "hello client!"

Client:

>> copy conn

== "hello client!" and so on. Synchronous communication: Server:

>> listen: open/wait tcp://:10000 >> conn: first listen

Client:

>> conn: open tcp://localhost:10000 >> insert conn "1234567890"

Server:

>> copy/part conn 20

(waits...) Client:

>> insert conn "1234567890"

Server: (ends waiting) == "12345678901234567890" You don't even need to use a cycle if you use syncronous operations (which you can use if you don't need to handle multiple connections or other events). Regards, Gabriele. -- Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/

[7/11] from: holger:rebol at: 25-Aug-2000 16:23

On Fri, Aug 25, 2000 at 08:03:31PM +0200, [g--santilli--tiscalinet--it] wrote:

> Hello [petr--krenzelok--trz--cz]! > > You need to use READ-IO when you a) want the raw data and b) don't > want to wait for the remote part to close the port. COPY/PART had > a similar functionality on /BINARY TCP ports, but it still waited > if there were no data in the port. Now you can just use COPY or > COPY/PART, because it no more waits for data.

Please wait for the next experimental build (i.e. a version AFTER Core 2.4.34, View 0.10.25 or Command 0.6.12) before publically releasing any scripts that use the new non-blocking features in REBOL, because we need to change the behavior (again), for better compatibility with older scripts, and to avoid future legacy problems, This change will break scripts that rely on the behavior in the current exp-build. The changes are: copy on TCP ports opened without refinements will work the way it used to originally (i.e. block if there is no data). If you don't want copy to block then use /no-wait on the open call. This means instead of a /wait refinement there will be a /no-wait refinement, and the default behavior will be reversed compared to the current exp-build, and thus will be the same way it used to be in Core <= 2.3 (blocking). The main reason is that with this change scripts which loop on copy until it returns none will continue to work on new versions of REBOL, without adapting the script. Also, if there is no data then copy will return an empty series, not none, as it does in current exp-builds, allowing you to distinguish the no data available yet, would block case from an end of stream, i.e. the peer closing the connection (for which copy will still return none). There will also be some changes to wait to make it more useful, e.g. wait [0 port] will poll the port (i.e. return the port if there is data, and none otherwise). This also works with multiple ports. Old versions of Core already used to do this on some platforms, somewhat inofficially, but current versions don't. It will become an official feature on all platforms starting with the next experimental build. There will also be a new "/all" refinement to wait which causes wait to return a block of all ports that have data waiting (instead of returning just one port). This allows you to write your own scheduler (e.g. round-robin) for handling incoming data on a multiplexed server written in REBOL. We are also planning other enhancements to wait and ports in general in the future, to make it easier to handle interactive, asynchronous connections, to support asynchronous sending, asynchronous connecting, asynchronous accepting of connections, asynchronous UDP operation, and to simplify the handling of multiple connections at the same time, e.g. for downloads in the background and for multiplexed web/ftp servers. Lots of good stuff ahead :). Please avoid using read-io whenever possible. It is a very low-level function that exposes your script to operating system-dependent oddities. For instance the amount of data typically returned may vary with the operating system, making testing more difficult for you. You also lose the convenience of line feed conversion etc., which may cause unexpected problems with your script when moving between Windows and Unix machines. "Normal" port functions in REBOL (copy, insert etc.) do these things automatically for you. We realize that in the past some shortcomings in the "normal" port functions (in particular copy blocking) have prevented you from doing some useful things, and sometimes read-io seemed to help, but these issues should be resolved in the next exp builds, and then the use of read-io will be discouraged even more than it is now. And just in case you are wondering, those new port features together with work on some additional enhancements in VID are the reasons why the new exp build for View we promised earlier is not out yet -- sorry for that delay. RSN, really... -- Holger Kruse [holger--rebol--com]

[8/11] from: galtbarber:mailandnews at: 25-Aug-2000 20:08

Expletives of Joy!! I must say that I am extremely delighted! You guys are right on the bulls-eye!!! This is the kind of excellence Rebol Tech is famous for! These are just what we needed, and well thought out, too! Hip hip hooray!!! -Galt p.s. Could we also get more than 63 ports per process on Windows? Would help when you are doing a http or proxy server or load-tester. Seriously, I already have a bitchin rebol app that works, but it hits that limit! Real servers have hundreds or thousands of conns going.

[9/11] from: tim:johnsons-web at: 25-Aug-2000 16:51

Hello: When Holger speaks, I pay attention. I have the following line of code for reading POST input from CGI as follows: ;======================================= either tmp > 0 [ buffer: make string! (tmp + 10) ; allocate storage space while [tmp > 0] ; and read ; I think Holger suggests this is not good: [tmp: tmp - read-io system/ports/input buffer tmp] return buffer ] What would be a safe alternative, please? TIA -Tim [holger--rebol--com] wrote:

[10/11] from: holger:rebol at: 28-Aug-2000 16:37

On Fri, Aug 25, 2000 at 04:51:17PM -0800, [tim--johnsons-web--com] wrote:

> Hello: > When Holger speaks, I pay attention. I have the following line

<<quoted lines omitted: 8>>

> ] > What would be a safe alternative, please?

For stdin/out you may still have to use read-io to be fully interoperable across platforms. That's because REBOL does not yet fully support non-socket streams at the port level, e.g. within a 'wait block. That is likely to change in the future though. My point about not using read-io mostly refered to TCP streams. -- Holger Kruse [holger--rebol--com]

[11/11] from: tim:johnsons-web at: 28-Aug-2000 16:20

Thank you Holger! [holger--rebol--com] wrote:

> > of code for reading POST input from CGI as follows: > > [tmp: tmp - read-io system/ports/input buffer tmp] > > What would be a safe alternative, please? > > For stdin/out you may still have to use read-io to be fully > interoperable across platforms.

I run try a couple of alternatives, and run them by some rebol Gurus, but thanks for your answer. Tim

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted