Mailing List Archive: Re: Rugby / TCP woes

[REBOL] Re: Rugby / TCP woes

From: koopmans:itr:ing:nl at: 27-Nov-2001 17:49


On Tuesday 27 November 2001 17:35, you wrote:
> On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote:
> > 1) Yes, it should not be slower in any way imo! TCP connection is no
> > magic - just raw packets on network. In reality, it should be just
> > reverse - to set-up TCP connection, machine requesting connection sends
> > SYN packet, remote machine sends SYN, ACK packet, acknowledging
> > connection acceptance, then first machine once again confirms by ACK
> > packet - so, actually setting-up tcp connection is three way process,
> > while sending packet containing data means sending PSH, ACK with data,
> > while other side is confirming with ACK, or PUSH, ACK, if sending data
> > too ... or something like that ...
>
> More or less, although there are some subtle differences, e.g. in the
> ACK delay strategy, in buffer sizes and in the precise behavior of the
> Nagle algorithm. Also, the PSH flag is implemented very inconsistently
> across platforms. All of these issues can affect performance, in
> particular for asynchronous, full-duplex communication.
>
> One thing you can try is "error? try [set-modes port [no-delay: true]]".
> This disables Nagle and in some situations can improve performance.
> Don't use it for high-volume streaming though. For more information on
> this and other "unexplainable" performance differences between
> platforms, ask the Samba developers. They could tell you some stories
>
> :-).
>
> The REBOL network adaptation layer does not vary much by platform, not
> enough to explain those differences. It is more likely that the
> performance differences are the result of a problem in the script which,
> together with high CPU use, interacts with differences in the paging and
> scheduling algorithms of different operating systems, leading to
> different performance. Other software which combine high CPU use with a
> lot of I/O, e.g. compilers, often show similar performance variations.
>

Aha. And I tested under high CPU load. Now I get it.

> > > SOme other news: Rebol seems to be inconsistent in its network
> > > behaviour. I tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and
> > > observes CPU eating. Shouldn't the same script run the same on all
> > > platforms?
>
> Yes, it should, for the same input. If there are indeed differences then
> chances are that the input to the script (perhaps its timing) is
> different, and the script might have a race condition that triggers the
> difference in behavior. Incorrectly handled errors (e.g. from ports
> being closed in different order due to different timing) could explain
> such problems.

There are no errors there, otherwise I'd see them? Or do you mean race
conditions internally in Rebol?

> Also, AFAIK Maarten uses the (undocumented) async-modes field in ports
> to implement async i/o. That field was never intended for use by
> anyone outside RT (anyone other than me, actually), and incorrect
> use may very well lead to undefined behavior, or to behavior that varies
> by platform. The reason why its undocumented is because it is very
> tricky to use correctly, in particular if you want asynchronous behavior
> for all situations (accepting a connection, connecting, reading,
> writing). "CPU eating" can easily be explained by having async-modes
> in the wrong state for a port that is part of system/wait-list or the
> argument to 'wait. In particular watch out for errors created by the
> other end (e.g. a closed connection), and how they are handled. An
> error handler which is too "global", fails to properly clean up
> after a port error, and leaves such a port in wait-list with async-modes
> in a wrong state, could easily explain busy-looping.
>

No. I use non-blocking connections, I dumped async-modes after version 1
(currently 4.3).  So it has to do with using non-buffered, non-blocking
connections. I noticed that not reading a buffer completely on any side of a
connection may result in CPU eating on some platforms, but that is fixed now.
It feels like a state machine gone mad. No proof of course ;-)

> If you have reproducable performance differences on different platforms
> then the best way to track down the cause is to run tcpdump. That might
> also reveal the reason why reusing a TCP connection slows things down.
> In our experience reusing TCP connections significantly improves
> performance, and we make use of that in Express.

Except under high load on an Intranet. 330/sec or 80/sec between 2 1 Ghz
Linux 2.4.14, Reiserfs with fast ethernet is a noticable difference.

--Maarten