Writing a protocol -- a mini intro

[1/4] from: brett::codeconscious::com at: 7-Feb-2001 9:09

Thank you Jeff and Holger! This information is exactly what I was looking for.

> What this > all goes to show is that the port mechanism doesn't necessarily have > to be used in network protocols, but can be used for various things > you would like to behave like a port. A port is a generalization of > series, an interface to an external series

And this was what I suspected - a great treasure not yet plundered. Now I've got quite a bit of studying to do. :) Brett.

[2/4] from: jeff:rebol at: 6-Feb-2001 8:45

Hi. I'm not even sure what the core pdf says on this topic, but here is a brief document I put together to help anyone who is interested in writing their own protocol. It covers the concepts and basics. I'll probably be unprepared for any real deep questions, but if you play with some of the stuff below you should be well on your way to hacking custom protocols like a pro! -jeff ===Writing your own Protocol Handler Below is a simple port handler. I've labeled a few places Voodoo which means don't worry about what that part really does, for know, just think of those as magic incantations you have to do to make this example work properly: make root-protocol [ ;-- ( voodoo ) port-flags: system/standard/port-flags/pass-thru open: func [port][ print "OPEN!" ;- When you get to open ; the URL has been parsed: print [ "user is" port/user newline "pass is" port/pass newline "host is" port/host newline "port-id is" port/port-id newline "path is" port/path newline "target is" port/target newline ] ;-- Say how big we are port/state/tail: 2000 port/state/index: 0 ;-- Take our flags and merge ; them into the port/state/flags ( Voodoo ) port/state/flags: port/state/flags or port-flags ] copy: func [port][print "COPY" random 1000] insert: func [port data][print ["INSERT data:" mold data] "OK"] read: func [port][print "READ"] close: func [port][print "CLOSE"] pick: func [port][print ["PICK:" port/state/index] port/state/index ** 2] ;-- This installs the handler ( Voodoo ) ; args: scheme name, this object, protocol port-id (80 for http, 25 smtp, etc..) net-utils/net-install simple self 0 ] Put the above in a script and DO it, then try these operations at the console: x: open simple://usr:[pass--host--dom]/path/target pick x 3 pick x 4 x: skip x 4 pick x 2 insert x ["foo" "bar"] close x x: read simple://usr:[pass--host--dom]/path/target What you should observe is that OPENING this port handler and doing PICKs and INSERTs on the port does a rather different thing than doing a READ on this port handler. PICK will print out the port/state/index and return that index to the power of 2. READS call OPEN, COPY and CLOSE. COPY returns a random number between 1 and 1000. What this all goes to show is that the port mechanism doesn't necessarily have to be used in network protocols, but can be used for various things you would like to behave like a port. A port is a generalization of series, an interface to an external series. So if you think about the REBOL Protocol handlers, they are an attempt to bend the Internet protocols into looking as much like external series as possible. ---OPEN PICK INSERT COPY READ CLOSE in PORT CONTEXT Keep in mind that we have defined OPEN, PICK, INSERT, COPY, READ, and CLOSE inside the port's context. If we need to use the global versions of these REBOL functions, you must use explicit paths to SYSTEM/WORDS, ie: system/words/pick "12345" 3 Weird looking errors can result if you forget to use the global definitions and instead refer to one of these locally defined functions within your protocol handler. If you need to use these functions repeatedly in your protocol handler, it helps to make some equivalences at the top of your protocol handler, ie: s-pick: get in system/words 'pick ... pick: func [port][ s-pick "12345" port/state/index ] ---SOME VOODOO PORT/STATE/FLAGS: this is a setting for the port which determines how your port is called. You OR in your settings to what you find there at the end of the OPEN function. The two major types are PASS-THRU ports and DIRECT ports. For now, stick with using PASS-THRU. Most of the network protocol handlers are written as PASS-THRU ports with the exception of the SMTP:// protocol handler. NET-UTILS/NET-INSTALL is the function responsible for turning your port handler into one of the globally accessible protocol schemes. You pass NET-INSTALL the scheme name, which is a word, SELF (the object that defines your protocol handler) and the port-id. We used 0 because we're not actually opening a network port. If we were opening a network port, you would set the appropriate port-id and usually you would allow the root-protocol OPEN function to handle opening the low-level tcp port for us. This is usually accomplished by first calling OPEN-PROTO from within our open function: open: func [port][ open-proto port ... This is a preferred way to get your tcp port started because OPEN-PROTO will also take care of handling proxy connections for you. Afterwards, the actual live tcp port is found inside port/sub-port. This is the port you can pass to READ-IO and WRITE-IO. SUB-PORTS are opened, by default, with the /lines refinement. ---ROOT-PROTOCOL BIG MAMA At the base of all the protocols is a protocol called the Root-Protocol. This thing has the phases of a network transaction defined: INIT, OPEN, READ, WRITE, PICK, and others (To see, print mold root-protocol -- keep in mind that you'll see the function OPEN and OPEN-PROTO. They are actually the same function.) Other protocols make themselves out of the root protocol changing what they need but letting the root-protocol take care a lot of the repetitive work. For instance, the INIT function in the root-protocol takes care of parsing your url into pieces so that when you get to OPEN you have everything you need. Your protocol usually does not need to define an INIT function most of the time, just taking the one derived from the root-protocol. The root-protocol is largely a functioning protocol handler in its own right. You can see the basic network functionality of the root-protocol handler by looking at WHOIS and FINGER. Here is how both WHOIS and FINGER are created in REBOL: make Root-Protocol [ open-check: [[any [port/user ""]] none] net-utils/net-install Finger self 79 net-utils/net-install Whois make self [] 43 ] The open-check thing means "send this, expect that". The root-protocol will take care of doing that for you if you have an open-check, close-check or a write-check (they are NONE by default in the root-protocol). So above FINGER and WHOIS are almost the same thing as the root-protocol itself: just open this port, send the port/user and return the result. DAYTIME protocol is entirely the root-protocol. Here is how it is defined: make Root-Protocol [ net-utils/net-install Daytime self 13 ] There is no open check, just connect to tcp port 13, and read what is there. You can create a clone of any protocol handler very easily. Here we create a clone of http: make system/schemes/http/handler [ net-utils/net-install web self 80 ] Now you can do: read web://www.rebol.com You could also clone a handler and make a change to one of the protocol phase functions, redefine CLOSE, or READ, etc.. It might all seem a little bizarre, but it really is a pretty simple system. A port-handler has certain functions that are expected to take care of the different parts of a port transaction. Define the ones you need.. let the Root-Protocol take care of the common cases. ---NET-UTILS Net-utils is an object containing utility functions for the protocols. There's a number of them so they're all bundled together to prevent crowding the global name space. For example, in the case of FINGER and WHOIS, the root protocol OPEN-PROTO function will pass open-check to NET-UTILS/CONFIRM along with the sub-port (which is the actual tcp level port). NET-UTILS/CONFIRM takes care of inserting the send data (if any) into the port and confirming the response back from the port -- firing an error if they don't match. A NONE for the expect part means it doesn't look for a response, and a NONE for the send part means it doesn't send anything initially. These checks can also look for multiple responses: open-check: [none ["HELLO" "HOWDY"]] The above open-check would cause open-proto to send nothing but read from the newly opened tcp port looking for either "HELLO" or "HOWDY". If it doesn't find either of those it will generate an error. NET-UTILS also contains NET-LOG which is called from CONFIRM to help trace network activity, the URL-parser which is invoked in root-protocol/INIT, and a bunch of proxy stuff. NET-UTILS is a whole pocket knife of protocol handler utilities. ---SYNCHRONICITY Right now the protocols are synchronous. Every thing is written like Send this, then wait for the result and see if it's this other thing . Asynchronous protocols (which are coming soon to a REBOL near you) are going to be like "Send this, and let me know some time in the near or distant future when you've got something for me to look at". It'll be very different approach and the protocols will all have to be revamped.

[3/4] from: holger:rebol at: 6-Feb-2001 10:51

On Tue, Feb 06, 2001 at 08:45:40AM -0800, [jeff--rebol--net] wrote:

> Hi. I'm not even sure what the core pdf says on this topic, > but here is a brief document I put together to help anyone

<<quoted lines omitted: 3>>

> stuff below you should be well on your way to hacking custom > protocols like a pro!

Thanks, Jeff, good stuff. Some clarifications and additional info follows. A lot of this only applies to current experimentals.

> ;-- ( voodoo ) > port-flags: system/standard/port-flags/pass-thru

Ports currently work in one of three (in the future: four) modes: 1. normal (default). In this mode ports are stream-oriented (byte streams or line streams). When the port is opened for reading the port mechanism reads all data from the port until EOF, does any line feed conversions, divides data into lines if the /lines refinement is used etc., and then presents the port contents to the user as a series. Users can do all (well, most...) operations on this type of port than can be done on a series, e.g. insert, append, pick, remove and positioning of the index. Once the port is closed, if it was opened for writing it is formatted back into a serial stream and written to the handler. The port handler is only accessed during opening and closing. All other operations are done within the port mechanism itself, on the buffered series. 2. direct. In this mode ports are stream-oriented (byte streams or line streams), just like in the default mode, but data is not buffered, except for a few characters at a time for reading (in /lines mode to accumulate lines or for crlf conversion). Any copy or pick operation reads data from the port handler, and any insert or append writes data to the port handler. There is no notion of a "file position" in this mode, i.e. you cannot "seek" (access data randomly). It is possible to skip forward, but this is really implemented by reading data from the port handler and then throwing that data away. Also note that pick is a destructive operation, i.e. subsequent pick operations read subsequent data items in a stream (as opposed to series or ports in default mode, where pick on a particular item always reads that same item). The reason for this is that the model presented by the port handler to the port mechanism is that of a sequential data stream, not that of a collection of bytes. It also means that the port mechanism (and the user) does not know how much data is available (so length? and index? will not report useful data). Direct mode is also the only one of the stream-oriented modes that allows a port to be used in the 'wait native (although not all direct ports support wait -- mostly because of OS limitations). 3. In the future: no-cache (not implemented yet, name subject to change). This mode is a mixture of default and direct. The model the port handler presents to the port mechanism is one of a collection of bytes (e.g. a file on harddisk), but the data is not buffered. Instead any operation on the contents accesses the port handler directly. This mode supports arbitrary 'skip and 'at operations, forward and backward, without having to read the whole file. The user model is similar to the default mode, except that there are some limitations. For instance you cannot insert data in the middle of a file, only at the end. You cannot remove data unless you remove everything until the end (which really is a 'clear). You can change data only if the length of the changed data does not change etc. This is the mode you will want to use if you, e.g. want to implement your own database in REBOL, and need random access to a very large file on harddisk. All of the modes described above impose a byte stream model or line stream model (with the /lines refinement) on the port. The common code for that (buffering where necessary, crlf conversion, splitting data into lines etc.) is done by the general port mechanism. The API between the port mechanism and the port handler is very narrow. It basically only consists of 'open, 'close, 'read and 'write. The 'read and 'write functions here correspond to the read-io and write-io natives, i.e. they read and write sequences of bytes. They are NOT related to the 'read and 'write natives. The 'read and 'write natives are basically just shorthand for open/copy/close and open/insert/close, respectively, whereas the 'read and 'write functions in port handlers are for low-level read and write operations, corresponding to read-io and write-io. All higher-level operations (insert, remove, pick, skip, at, copy, ...) are handled within the port mechanism and then translated into read-io and write-io in a way defined by the particular port mode. Port handlers in one of these three modes therefore never "see" any of these high-level operations, and cannot override them. Ports in default mode may have their 'read function called during opening and their 'write function during closing. Ports in other modes have 'read and 'write called when users call insert, copy, pick etc. These three modes are user-selectable. A port handler always starts up in one of these modes. It is possible for a port handler to force direct mode by setting port-flags: system/standard/port-flags/direct in the protocol definition. This is used, e.g. by SMTP to insure that every insert (every email being sent) is forwarded directly to the port handler. Without it the port mechanism would buffer all outgoing data and only present it to the port handler once the port is closed. Having the handler override the mode takes the burden of adding the /direct refinement to 'open or 'write off the user. 4. pass-thru. This mode gives a port handler complete control over all operations. The port layer then performs no buffering or conversion whatsoever, but passes all user requests on to the port handler. This includes init, open, close, pick, copy, find, select, remove, change, insert, update. Some other functions are translated in straight-forward ways, e.g. first becomes pick 1. Read and write are not called for these ports by the port mechanism itself, but can still be called if a user calls read-io or write-io directly. Pass-thru is not selectable by the user. The developer always has to set it in the port definition. Pass-thru is typically used for "strange" port handlers that want to use the port mechanism as the API model, but which don't conform to the typical byte or line stream model provided by the buffering mechanism in the general port mechanism. For instance ports that operate on "records" often use this model. This includes odbc, oracle, udp, directory reading, and a few others, and some ports implemented in REBOL, such as pop, imap, ftp (for directory reading only) and nntp. Sometimes ports that use a byte stream model but require unusual processing (like 'crypt ports in Express) use pass-thru mode as well. In addition to the requests described above there are a few more that a port handler can support. This includes 'get-sub-port, 'awake, 'get-modes and 'set-modes. For the moment please do not touch these. There will still be some slight changes to the way they work in the next versions.

> PORT/STATE/FLAGS: this is a setting for the port which determines how > your port is called. You OR in your settings to what you find there > at the end of the OPEN function.

Yes, either that, or you set the default when you define the port, as SMTP does.

> The two major types are PASS-THRU > ports and DIRECT ports.

And default ports. Their port handler API is identical to direct ports, but read and write get called at open/close time, instead of during pick/copy/insert. See above.

> For now, stick with using PASS-THRU. Most of > the network protocol handlers are written as PASS-THRU ports with the > exception of the SMTP:// protocol handler.

Record-oriented ports are typically pass-thru. Stream-oriented ports (HTTP, Daytime, Finger etc.) don't override port-flags at all, i.e. they operate in default or direct mode, as selected by the user. FTP overrides the mode only for directory reading (record mode).

> This is a preferred way to get your tcp port started because OPEN-PROTO > will also take care of handling proxy connections for you. Afterwards, > the actual live tcp port is found inside port/sub-port. This is the > port you can pass to READ-IO and WRITE-IO. SUB-PORTS are opened, by > default, with the /lines refinement.

Yes, the main reason why read-io and write-io exist is that, until recently there was really no clean way to change a port's mode after the port had been opened. The HTTP handler wanted to use the underlying TCP port in /lines mode when parsing the HTTP header, but then wanted to transfer all other data transparently. This was only possible by having the HTTP handler bypass the port mechanism and call the 'read and 'write functions of the TCP sub-port directly by calling read-io and write-io. Now that we have get-modes and set-modes it is possible to change some of the port modes at any time, so read-io and write-io do not have to be used as often. It can still be useful in passing 'read and 'write through from a higher-level port to a lower-level port, because passing requests through that way means that data does not have to be copied between buffers as the request traverses port layers.

> Right now the protocols are synchronous. Every thing is written like > "Send this, then wait for the result and see if it's this other

<<quoted lines omitted: 3>>

> It'll be very different approach and the protocols will all have to be > revamped.

Async ports should not break any existing port handlers, neither our own, nor properly written third-party handlers. However for a protocol to support asynchronous operation it would have to be rewritten. All of the support functions implemented in the root protocol (open-proto, confirm etc.) will get versions that operate in asynchronous mode, for use by asynchronous protocols. -- Holger Kruse [holger--rebol--com]

[4/4] from: jeff:rebol at: 6-Feb-2001 13:44

Applause to Holger for the additional the clarifications and deeper information on port handlers.

> Pass-thru is typically used for "strange" port handlers that > want to use the port mechanism as the API model, but which > don't conform to the typical byte or line stream model > proviaded by the buffering mechanism in the general port > mechanism. For instance ports that operate on "records" > often use this model.

I recommended starting out with pass-thru ports for pedagogical reasons, because you can define each of the protocol phases and see them in action-- which lets you see the engine operation, so to speak. :) As the custom protocol writer gets deeper into it all, deciding what port paradigm is most fitting helps determine the appropriate port type from those outlined by Mr. Kruse. Getting the whole port mechanism to do most of the work for you is the name of the game. -jeff

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted