Mailing List Archive: Re: Rugby / TCP woes

[REBOL] Re: Rugby / TCP woes

From: holger:rebol at: 27-Nov-2001 8:35


On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote:
> 1) Yes, it should not be slower in any way imo! TCP connection is no magic - just
> raw packets on network. In reality, it should be just reverse - to set-up TCP
> connection, machine requesting connection sends SYN packet, remote machine sends
> SYN, ACK packet, acknowledging connection acceptance, then first machine once
> again confirms by ACK packet - so, actually setting-up tcp connection is three
> way process, while sending packet containing data means sending PSH, ACK with
> data, while other side is confirming with ACK, or PUSH, ACK, if sending data too
> ... or something like that ...

More or less, although there are some subtle differences, e.g. in the
ACK delay strategy, in buffer sizes and in the precise behavior of the
Nagle algorithm. Also, the PSH flag is implemented very inconsistently
across platforms. All of these issues can affect performance, in
particular for asynchronous, full-duplex communication.

One thing you can try is "error? try [set-modes port [no-delay: true]]".
This disables Nagle and in some situations can improve performance.
Don't use it for high-volume streaming though. For more information on
this and other "unexplainable" performance differences between
platforms, ask the Samba developers. They could tell you some stories
:-).

The REBOL network adaptation layer does not vary much by platform, not
enough to explain those differences. It is more likely that the
performance differences are the result of a problem in the script which,
together with high CPU use, interacts with differences in the paging and
scheduling algorithms of different operating systems, leading to
different performance. Other software which combine high CPU use with a
lot of I/O, e.g. compilers, often show similar performance variations.

> > SOme other news: Rebol seems to be inconsistent in its network behaviour. I
> > tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU eating.
> > Shouldn't the same script run the same on all platforms?

Yes, it should, for the same input. If there are indeed differences then
chances are that the input to the script (perhaps its timing) is
different, and the script might have a race condition that triggers the
difference in behavior. Incorrectly handled errors (e.g. from ports
being closed in different order due to different timing) could explain
such problems.

Also, AFAIK Maarten uses the (undocumented) async-modes field in ports
to implement async i/o. That field was never intended for use by
anyone outside RT (anyone other than me, actually), and incorrect
use may very well lead to undefined behavior, or to behavior that varies
by platform. The reason why its undocumented is because it is very
tricky to use correctly, in particular if you want asynchronous behavior
for all situations (accepting a connection, connecting, reading,
writing). "CPU eating" can easily be explained by having async-modes
in the wrong state for a port that is part of system/wait-list or the
argument to 'wait. In particular watch out for errors created by the
other end (e.g. a closed connection), and how they are handled. An
error handler which is too "global", fails to properly clean up
after a port error, and leaves such a port in wait-list with async-modes
in a wrong state, could easily explain busy-looping.

If you have reproducable performance differences on different platforms
then the best way to track down the cause is to run tcpdump. That might
also reveal the reason why reusing a TCP connection slows things down.
In our experience reusing TCP connections significantly improves
performance, and we make use of that in Express.

--
Holger Kruse
[holger--rebol--com]