Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Rugby / TCP woes

From: koopmans:itr:ing:nl at: 27-Nov-2001 17:49

On Tuesday 27 November 2001 17:35, you wrote:
> On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote: > > 1) Yes, it should not be slower in any way imo! TCP connection is no > > magic - just raw packets on network. In reality, it should be just > > reverse - to set-up TCP connection, machine requesting connection sends > > SYN packet, remote machine sends SYN, ACK packet, acknowledging > > connection acceptance, then first machine once again confirms by ACK > > packet - so, actually setting-up tcp connection is three way process, > > while sending packet containing data means sending PSH, ACK with data, > > while other side is confirming with ACK, or PUSH, ACK, if sending data > > too ... or something like that ... > > More or less, although there are some subtle differences, e.g. in the > ACK delay strategy, in buffer sizes and in the precise behavior of the > Nagle algorithm. Also, the PSH flag is implemented very inconsistently > across platforms. All of these issues can affect performance, in > particular for asynchronous, full-duplex communication. > > One thing you can try is "error? try [set-modes port [no-delay: true]]". > This disables Nagle and in some situations can improve performance. > Don't use it for high-volume streaming though. For more information on > this and other "unexplainable" performance differences between > platforms, ask the Samba developers. They could tell you some stories > > :-). > > The REBOL network adaptation layer does not vary much by platform, not > enough to explain those differences. It is more likely that the > performance differences are the result of a problem in the script which, > together with high CPU use, interacts with differences in the paging and > scheduling algorithms of different operating systems, leading to > different performance. Other software which combine high CPU use with a > lot of I/O, e.g. compilers, often show similar performance variations. >
Aha. And I tested under high CPU load. Now I get it.
> > > SOme other news: Rebol seems to be inconsistent in its network > > > behaviour. I tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and > > > observes CPU eating. Shouldn't the same script run the same on all > > > platforms? > > Yes, it should, for the same input. If there are indeed differences then > chances are that the input to the script (perhaps its timing) is > different, and the script might have a race condition that triggers the > difference in behavior. Incorrectly handled errors (e.g. from ports > being closed in different order due to different timing) could explain > such problems.
There are no errors there, otherwise I'd see them? Or do you mean race conditions internally in Rebol?
> Also, AFAIK Maarten uses the (undocumented) async-modes field in ports > to implement async i/o. That field was never intended for use by > anyone outside RT (anyone other than me, actually), and incorrect > use may very well lead to undefined behavior, or to behavior that varies > by platform. The reason why its undocumented is because it is very > tricky to use correctly, in particular if you want asynchronous behavior > for all situations (accepting a connection, connecting, reading, > writing). "CPU eating" can easily be explained by having async-modes > in the wrong state for a port that is part of system/wait-list or the > argument to 'wait. In particular watch out for errors created by the > other end (e.g. a closed connection), and how they are handled. An > error handler which is too "global", fails to properly clean up > after a port error, and leaves such a port in wait-list with async-modes > in a wrong state, could easily explain busy-looping. >
No. I use non-blocking connections, I dumped async-modes after version 1 (currently 4.3). So it has to do with using non-buffered, non-blocking connections. I noticed that not reading a buffer completely on any side of a connection may result in CPU eating on some platforms, but that is fixed now. It feels like a state machine gone mad. No proof of course ;-)
> If you have reproducable performance differences on different platforms > then the best way to track down the cause is to run tcpdump. That might > also reveal the reason why reusing a TCP connection slows things down. > In our experience reusing TCP connections significantly improves > performance, and we make use of that in Express.
Except under high load on an Intranet. 330/sec or 80/sec between 2 1 Ghz Linux 2.4.14, Reiserfs with fast ethernet is a noticable difference. --Maarten