[REBOL] Re: Writing a protocol -- a mini intro
From: holger:rebol at: 6-Feb-2001 10:51
On Tue, Feb 06, 2001 at 08:45:40AM -0800, [jeff--rebol--net] wrote:
> Hi. I'm not even sure what the core pdf says on this topic,
> but here is a brief document I put together to help anyone
> who is interested in writing their own protocol. It covers
> the concepts and basics. I'll probably be unprepared for
> any real deep questions, but if you play with some of the
> stuff below you should be well on your way to hacking custom
> protocols like a pro!
Thanks, Jeff, good stuff. Some clarifications and additional info follows.
A lot of this only applies to current experimentals.
> ;-- ( voodoo )
> port-flags: system/standard/port-flags/pass-thru
Ports currently work in one of three (in the future: four) modes:
1. normal (default). In this mode ports are stream-oriented (byte streams
or line streams). When the port is opened for reading the port mechanism
reads all data from the port until EOF, does any line feed conversions,
divides data into lines if the /lines refinement is used etc., and then
presents the port contents to the user as a series. Users can do all
(well, most...) operations on this type of port than can be done on a
series, e.g. insert, append, pick, remove and positioning of the index.
Once the port is closed, if it was opened for writing it is formatted
back into a serial stream and written to the handler. The port handler
is only accessed during opening and closing. All other operations are done
within the port mechanism itself, on the buffered series.
2. direct. In this mode ports are stream-oriented (byte streams or line
streams), just like in the default mode, but data is not buffered, except
for a few characters at a time for reading (in /lines mode to accumulate lines
or for crlf conversion). Any copy or pick operation reads data from the port handler,
and any insert or append writes data to the port handler. There is no notion
of a "file position" in this mode, i.e. you cannot "seek" (access data randomly).
It is possible to skip forward, but this is really implemented by reading data from
the port handler and then throwing that data away. Also note that pick is a
destructive
operation, i.e. subsequent pick operations read subsequent data
items in a stream (as opposed to series or ports in default mode, where pick
on a particular item always reads that same item). The reason for this is that
the model presented by the port handler to the port mechanism is that of a
sequential data stream, not that of a collection of bytes. It also means that
the port mechanism (and the user) does not know how much data is available
(so length? and index? will not report useful data). Direct mode is also the only
one of the stream-oriented modes that allows a port to be used in the 'wait native
(although not all direct ports support wait -- mostly because of OS limitations).
3. In the future: no-cache (not implemented yet, name subject to change). This mode
is a mixture of default and direct. The model the port handler presents to the port
mechanism is one of a collection of bytes (e.g. a file on harddisk), but the data is
not
buffered. Instead any operation on the contents accesses the port handler directly.
This mode supports arbitrary 'skip and 'at operations, forward and backward, without
having to read the whole file. The user model is similar to the default mode,
except that there are some limitations. For instance you cannot insert data in the
middle of a file, only at the end. You cannot remove data unless you remove everything
until the end (which really is a 'clear). You can change data only if the length of
the changed data does not change etc. This is the mode you will want to use if you, e.g.
want to implement your own database in REBOL, and need random access to a very large
file on harddisk.
All of the modes described above impose a byte stream model or line stream model
(with the /lines refinement) on the port. The common code for that (buffering where
necessary, crlf conversion, splitting data into lines etc.) is done by the general
port mechanism. The API between the port mechanism and the port handler is very narrow.
It basically only consists of 'open, 'close, 'read and 'write.
The 'read and 'write functions here correspond to the read-io and write-io natives,
i.e. they read and write sequences of bytes. They are NOT related to the 'read and
'write natives. The 'read and 'write natives are basically just shorthand for
open/copy/close and open/insert/close, respectively, whereas the 'read and 'write
functions in port handlers are for low-level read and write operations, corresponding
to read-io and write-io.
All higher-level operations (insert, remove, pick, skip, at, copy, ...) are handled
within the port mechanism and then translated into read-io and write-io in a way
defined by the particular port mode. Port handlers in one of these three modes
therefore never "see" any of these high-level operations, and cannot override them.
Ports in default mode may have their 'read function called during opening and their
'write function during closing. Ports in other modes have 'read and 'write called
when users call insert, copy, pick etc.
These three modes are user-selectable. A port handler always starts up in one of
these modes. It is possible for a port handler to force direct mode by setting
port-flags: system/standard/port-flags/direct
in the protocol definition. This is used, e.g. by SMTP to insure that every insert
(every email being sent) is forwarded directly to the port handler. Without it
the port mechanism would buffer all outgoing data and only present it to the
port handler once the port is closed. Having the handler override the mode takes
the burden of adding the /direct refinement to 'open or 'write off the user.
4. pass-thru. This mode gives a port handler complete control over all operations.
The port layer then performs no buffering or conversion whatsoever, but passes
all user requests on to the port handler. This includes init, open, close, pick,
copy, find, select, remove, change, insert, update. Some other functions are
translated in straight-forward ways, e.g. first becomes pick 1. Read and write
are not called for these ports by the port mechanism itself, but can still be
called if a user calls read-io or write-io directly.
Pass-thru is not selectable by the user. The developer always has to set it in the
port definition. Pass-thru is typically used for "strange" port handlers that want
to use the port mechanism as the API model, but which don't conform to the typical
byte or line stream model provided by the buffering mechanism in the general port
mechanism. For instance ports that operate on "records" often use this model.
This includes odbc, oracle, udp, directory reading, and a few others, and
some ports implemented in REBOL, such as pop, imap, ftp (for directory reading
only) and nntp. Sometimes ports that use a byte stream model but require unusual
processing (like 'crypt ports in Express) use pass-thru mode as well.
In addition to the requests described above there are a few more that a port
handler can support. This includes 'get-sub-port, 'awake, 'get-modes and
'set-modes. For the moment please do not touch these. There will still be some
slight changes to the way they work in the next versions.
> PORT/STATE/FLAGS: this is a setting for the port which determines how
> your port is called. You OR in your settings to what you find there
> at the end of the OPEN function.
Yes, either that, or you set the default when you define the port,
as SMTP does.
> The two major types are PASS-THRU
> ports and DIRECT ports.
And default ports. Their port handler API is identical to direct ports, but
read and write get called at open/close time, instead of during pick/copy/insert.
See above.
> For now, stick with using PASS-THRU. Most of
> the network protocol handlers are written as PASS-THRU ports with the
> exception of the SMTP:// protocol handler.
Record-oriented ports are typically pass-thru. Stream-oriented ports
(HTTP, Daytime, Finger etc.) don't override port-flags at all, i.e.
they operate in default or direct mode, as selected by the user. FTP
overrides the mode only for directory reading (record mode).
> This is a preferred way to get your tcp port started because OPEN-PROTO
> will also take care of handling proxy connections for you. Afterwards,
> the actual live tcp port is found inside port/sub-port. This is the
> port you can pass to READ-IO and WRITE-IO. SUB-PORTS are opened, by
> default, with the /lines refinement.
Yes, the main reason why read-io and write-io exist is that, until
recently there was really no clean way to change a port's mode after
the port had been opened. The HTTP handler wanted to use the underlying
TCP port in /lines mode when parsing the HTTP header, but then wanted
to transfer all other data transparently. This was only possible by
having the HTTP handler bypass the port mechanism and call the 'read
and 'write functions of the TCP sub-port directly by calling read-io
and write-io.
Now that we have get-modes and set-modes it is possible to change some of
the port modes at any time, so read-io and write-io do not have to be used
as often. It can still be useful in passing 'read and 'write through from
a higher-level port to a lower-level port, because passing requests
through that way means that data does not have to be copied between
buffers as the request traverses port layers.
> Right now the protocols are synchronous. Every thing is written like
> "Send this, then wait for the result and see if it's this other
> thing". Asynchronous protocols (which are coming soon to a REBOL near
> you) are going to be like "Send this, and let me know some time in the
> near or distant future when you've got something for me to look at".
> It'll be very different approach and the protocols will all have to be
> revamped.
Async ports should not break any existing port handlers, neither our
own, nor properly written third-party handlers. However for a protocol
to support asynchronous operation it would have to be rewritten. All
of the support functions implemented in the root protocol (open-proto,
confirm etc.) will get versions that operate in asynchronous mode, for
use by asynchronous protocols.
--
Holger Kruse
[holger--rebol--com]