Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Rugby / TCP woes

From: holger:rebol at: 27-Nov-2001 8:35

On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote:
> 1) Yes, it should not be slower in any way imo! TCP connection is no magic - just > raw packets on network. In reality, it should be just reverse - to set-up TCP > connection, machine requesting connection sends SYN packet, remote machine sends > SYN, ACK packet, acknowledging connection acceptance, then first machine once > again confirms by ACK packet - so, actually setting-up tcp connection is three > way process, while sending packet containing data means sending PSH, ACK with > data, while other side is confirming with ACK, or PUSH, ACK, if sending data too > ... or something like that ...
More or less, although there are some subtle differences, e.g. in the ACK delay strategy, in buffer sizes and in the precise behavior of the Nagle algorithm. Also, the PSH flag is implemented very inconsistently across platforms. All of these issues can affect performance, in particular for asynchronous, full-duplex communication. One thing you can try is "error? try [set-modes port [no-delay: true]]". This disables Nagle and in some situations can improve performance. Don't use it for high-volume streaming though. For more information on this and other "unexplainable" performance differences between platforms, ask the Samba developers. They could tell you some stories :-). The REBOL network adaptation layer does not vary much by platform, not enough to explain those differences. It is more likely that the performance differences are the result of a problem in the script which, together with high CPU use, interacts with differences in the paging and scheduling algorithms of different operating systems, leading to different performance. Other software which combine high CPU use with a lot of I/O, e.g. compilers, often show similar performance variations.
> > SOme other news: Rebol seems to be inconsistent in its network behaviour. I > > tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU eating. > > Shouldn't the same script run the same on all platforms?
Yes, it should, for the same input. If there are indeed differences then chances are that the input to the script (perhaps its timing) is different, and the script might have a race condition that triggers the difference in behavior. Incorrectly handled errors (e.g. from ports being closed in different order due to different timing) could explain such problems. Also, AFAIK Maarten uses the (undocumented) async-modes field in ports to implement async i/o. That field was never intended for use by anyone outside RT (anyone other than me, actually), and incorrect use may very well lead to undefined behavior, or to behavior that varies by platform. The reason why its undocumented is because it is very tricky to use correctly, in particular if you want asynchronous behavior for all situations (accepting a connection, connecting, reading, writing). "CPU eating" can easily be explained by having async-modes in the wrong state for a port that is part of system/wait-list or the argument to 'wait. In particular watch out for errors created by the other end (e.g. a closed connection), and how they are handled. An error handler which is too "global", fails to properly clean up after a port error, and leaves such a port in wait-list with async-modes in a wrong state, could easily explain busy-looping. If you have reproducable performance differences on different platforms then the best way to track down the cause is to run tcpdump. That might also reveal the reason why reusing a TCP connection slows things down. In our experience reusing TCP connections significantly improves performance, and we make use of that in Express. -- Holger Kruse [holger--rebol--com]