Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Writing a protocol -- a mini intro

From: holger:rebol at: 6-Feb-2001 10:51

On Tue, Feb 06, 2001 at 08:45:40AM -0800, [jeff--rebol--net] wrote:
> Hi. I'm not even sure what the core pdf says on this topic, > but here is a brief document I put together to help anyone > who is interested in writing their own protocol. It covers > the concepts and basics. I'll probably be unprepared for > any real deep questions, but if you play with some of the > stuff below you should be well on your way to hacking custom > protocols like a pro!
Thanks, Jeff, good stuff. Some clarifications and additional info follows. A lot of this only applies to current experimentals.
> ;-- ( voodoo ) > port-flags: system/standard/port-flags/pass-thru
Ports currently work in one of three (in the future: four) modes: 1. normal (default). In this mode ports are stream-oriented (byte streams or line streams). When the port is opened for reading the port mechanism reads all data from the port until EOF, does any line feed conversions, divides data into lines if the /lines refinement is used etc., and then presents the port contents to the user as a series. Users can do all (well, most...) operations on this type of port than can be done on a series, e.g. insert, append, pick, remove and positioning of the index. Once the port is closed, if it was opened for writing it is formatted back into a serial stream and written to the handler. The port handler is only accessed during opening and closing. All other operations are done within the port mechanism itself, on the buffered series. 2. direct. In this mode ports are stream-oriented (byte streams or line streams), just like in the default mode, but data is not buffered, except for a few characters at a time for reading (in /lines mode to accumulate lines or for crlf conversion). Any copy or pick operation reads data from the port handler, and any insert or append writes data to the port handler. There is no notion of a "file position" in this mode, i.e. you cannot "seek" (access data randomly). It is possible to skip forward, but this is really implemented by reading data from the port handler and then throwing that data away. Also note that pick is a destructive operation, i.e. subsequent pick operations read subsequent data items in a stream (as opposed to series or ports in default mode, where pick on a particular item always reads that same item). The reason for this is that the model presented by the port handler to the port mechanism is that of a sequential data stream, not that of a collection of bytes. It also means that the port mechanism (and the user) does not know how much data is available (so length? and index? will not report useful data). Direct mode is also the only one of the stream-oriented modes that allows a port to be used in the 'wait native (although not all direct ports support wait -- mostly because of OS limitations). 3. In the future: no-cache (not implemented yet, name subject to change). This mode is a mixture of default and direct. The model the port handler presents to the port mechanism is one of a collection of bytes (e.g. a file on harddisk), but the data is not buffered. Instead any operation on the contents accesses the port handler directly. This mode supports arbitrary 'skip and 'at operations, forward and backward, without having to read the whole file. The user model is similar to the default mode, except that there are some limitations. For instance you cannot insert data in the middle of a file, only at the end. You cannot remove data unless you remove everything until the end (which really is a 'clear). You can change data only if the length of the changed data does not change etc. This is the mode you will want to use if you, e.g. want to implement your own database in REBOL, and need random access to a very large file on harddisk. All of the modes described above impose a byte stream model or line stream model (with the /lines refinement) on the port. The common code for that (buffering where necessary, crlf conversion, splitting data into lines etc.) is done by the general port mechanism. The API between the port mechanism and the port handler is very narrow. It basically only consists of 'open, 'close, 'read and 'write. The 'read and 'write functions here correspond to the read-io and write-io natives, i.e. they read and write sequences of bytes. They are NOT related to the 'read and 'write natives. The 'read and 'write natives are basically just shorthand for open/copy/close and open/insert/close, respectively, whereas the 'read and 'write functions in port handlers are for low-level read and write operations, corresponding to read-io and write-io. All higher-level operations (insert, remove, pick, skip, at, copy, ...) are handled within the port mechanism and then translated into read-io and write-io in a way defined by the particular port mode. Port handlers in one of these three modes therefore never "see" any of these high-level operations, and cannot override them. Ports in default mode may have their 'read function called during opening and their 'write function during closing. Ports in other modes have 'read and 'write called when users call insert, copy, pick etc. These three modes are user-selectable. A port handler always starts up in one of these modes. It is possible for a port handler to force direct mode by setting port-flags: system/standard/port-flags/direct in the protocol definition. This is used, e.g. by SMTP to insure that every insert (every email being sent) is forwarded directly to the port handler. Without it the port mechanism would buffer all outgoing data and only present it to the port handler once the port is closed. Having the handler override the mode takes the burden of adding the /direct refinement to 'open or 'write off the user. 4. pass-thru. This mode gives a port handler complete control over all operations. The port layer then performs no buffering or conversion whatsoever, but passes all user requests on to the port handler. This includes init, open, close, pick, copy, find, select, remove, change, insert, update. Some other functions are translated in straight-forward ways, e.g. first becomes pick 1. Read and write are not called for these ports by the port mechanism itself, but can still be called if a user calls read-io or write-io directly. Pass-thru is not selectable by the user. The developer always has to set it in the port definition. Pass-thru is typically used for "strange" port handlers that want to use the port mechanism as the API model, but which don't conform to the typical byte or line stream model provided by the buffering mechanism in the general port mechanism. For instance ports that operate on "records" often use this model. This includes odbc, oracle, udp, directory reading, and a few others, and some ports implemented in REBOL, such as pop, imap, ftp (for directory reading only) and nntp. Sometimes ports that use a byte stream model but require unusual processing (like 'crypt ports in Express) use pass-thru mode as well. In addition to the requests described above there are a few more that a port handler can support. This includes 'get-sub-port, 'awake, 'get-modes and 'set-modes. For the moment please do not touch these. There will still be some slight changes to the way they work in the next versions.
> PORT/STATE/FLAGS: this is a setting for the port which determines how > your port is called. You OR in your settings to what you find there > at the end of the OPEN function.
Yes, either that, or you set the default when you define the port, as SMTP does.
> The two major types are PASS-THRU > ports and DIRECT ports.
And default ports. Their port handler API is identical to direct ports, but read and write get called at open/close time, instead of during pick/copy/insert. See above.
> For now, stick with using PASS-THRU. Most of > the network protocol handlers are written as PASS-THRU ports with the > exception of the SMTP:// protocol handler.
Record-oriented ports are typically pass-thru. Stream-oriented ports (HTTP, Daytime, Finger etc.) don't override port-flags at all, i.e. they operate in default or direct mode, as selected by the user. FTP overrides the mode only for directory reading (record mode).
> This is a preferred way to get your tcp port started because OPEN-PROTO > will also take care of handling proxy connections for you. Afterwards, > the actual live tcp port is found inside port/sub-port. This is the > port you can pass to READ-IO and WRITE-IO. SUB-PORTS are opened, by > default, with the /lines refinement.
Yes, the main reason why read-io and write-io exist is that, until recently there was really no clean way to change a port's mode after the port had been opened. The HTTP handler wanted to use the underlying TCP port in /lines mode when parsing the HTTP header, but then wanted to transfer all other data transparently. This was only possible by having the HTTP handler bypass the port mechanism and call the 'read and 'write functions of the TCP sub-port directly by calling read-io and write-io. Now that we have get-modes and set-modes it is possible to change some of the port modes at any time, so read-io and write-io do not have to be used as often. It can still be useful in passing 'read and 'write through from a higher-level port to a lower-level port, because passing requests through that way means that data does not have to be copied between buffers as the request traverses port layers.
> Right now the protocols are synchronous. Every thing is written like > "Send this, then wait for the result and see if it's this other > thing". Asynchronous protocols (which are coming soon to a REBOL near > you) are going to be like "Send this, and let me know some time in the > near or distant future when you've got something for me to look at". > It'll be very different approach and the protocols will all have to be > revamped.
Async ports should not break any existing port handlers, neither our own, nor properly written third-party handlers. However for a protocol to support asynchronous operation it would have to be rewritten. All of the support functions implemented in the root protocol (open-proto, confirm etc.) will get versions that operate in asynchronous mode, for use by asynchronous protocols. -- Holger Kruse [holger--rebol--com]