Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Rugby / TCP woes

 [1/11] from: koopmans::itr::ing::nl at: 27-Nov-2001 12:57


Hi all, I have had some discussions about adding persistent connections to Rugby. It may be good to know that I tested this feature for 4.3 but that on an Ethernet the setup time for /no-wait/direct TCP ports is so short that reusing connections is actually fourt times slower. I have no clue why! SOme other news: Rebol seems to be inconsistent in its network behaviour. I tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU eating. Shouldn't the same script run the same on all platforms? Under NT 4 I managed to eat up all CPU if I didn't read alkl bytes on a server before closing a port. This is fixed in 4.3 My (wild) guess: Rebol puts a small wrapper on TCP/IP stacks in different OS'es with more and more advanced features, such as non-blocking I/O. As some platforms/kernels have (different) bugs you see this in Rebol scripts.... --Maarten

 [2/11] from: petr:krenzelok:trz:cz at: 27-Nov-2001 14:14


Maarten Koopmans wrote:
> Hi all, > > I have had some discussions about adding persistent connections to Rugby. > It may be good to know that I tested this feature for 4.3 but that on an > Ethernet the setup time for /no-wait/direct TCP ports is so short that > reusing connections is actually fourt times slower. I have no clue why!
1) Yes, it should not be slower in any way imo! TCP connection is no magic - just raw packets on network. In reality, it should be just reverse - to set-up TCP connection, machine requesting connection sends SYN packet, remote machine sends SYN, ACK packet, acknowledging connection acceptance, then first machine once again confirms by ACK packet - so, actually setting-up tcp connection is three way process, while sending packet containing data means sending PSH, ACK with data, while other side is confirming with ACK, or PUSH, ACK, if sending data too ... or something like that ... 2) as for keeping connection "alive". I thought you are doing so with /deffered type of Rugby connection, no? The only one "problem" is - you close the port after you get-result ticket. Another problem is, that you have to explicitly poll server each n secs, if the result is already available. If I understand it correctly, you use http tunneling. Althought I don't fully understand what is going on with http tunneling, isn't it really possible to re-use already opened channel for transfer from server side, to client side? Scenario: - connection to Rugby server - server stores port in a block of port - client stores port in a block of port Is that right so far? And now - what is the problem for e.g. for chat server, to redistribute (insert-to-port) messages to each client port registered, instead of letting clients to poll the server? And btw- what does polling mean here? Is server contacted with new connection? As becuase I just looked at 'http-result-available? function, and it seems to me, that you only do 'copy on port, or is there really reconnection happening to the server?
> SOme other news: Rebol seems to be inconsistent in its network behaviour. I > tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU eating. > Shouldn't the same script run the same on all platforms? >
What is more, there seems to be one strange thing happening. On czech version of W95, W98, on various set-ups, ranging from P300 to P650, the result of rugby communication is always the same - 40 - 44 sec for loop of 100 'echos. W2Kcz are OK. I am really curious, what is slowing the communication down down ... -pekr-

 [3/11] from: dockimbel:free at: 27-Nov-2001 15:29


Hi Pekr, I'm just curious, could you show us your client test script ? What value do you pass to 'echo ? -Doc Petr Krenzelok wrote: [...]

 [4/11] from: koopmans:itr:ing:nl at: 27-Nov-2001 15:17


Hey, See below....
> Maarten Koopmans wrote: > > Hi all,
<<quoted lines omitted: 12>>
> confirming with ACK, or PUSH, ACK, if sending data too ... or something > like that ...
Exactly. Although behaviour may differ on the internet. It *is* strange. But you see that I have currently no reason to do persistent connections in Rugby.
> 2) as for keeping connection "alive". I thought you are doing so with > /deffered type of Rugby connection, no? The only one "problem" is - you
<<quoted lines omitted: 7>>
> - server stores port in a block of port > - client stores port in a block of port
Yes.
> Is that right so far? And now - what is the problem for e.g. for chat > server, to redistribute (insert-to-port) messages to each client port
<<quoted lines omitted: 3>>
> 'http-result-available? function, and it seems to me, that you only do > 'copy on port, or is there really reconnection happening to the server?
Yes. result-available? reads data from the client port (the *client* TCP stack) and does *not* poll the server. If it sees that it has all the data (=the return data) it closes the underlying port once you read it in the application. So: all requests use the same port for request/return. In fact, wait-for-result is just: until [result-available? index get-result index] Then the mysterious httpr (r for rugby) is just a copy of RT's http protocol that does not wait for the first two lines of the return header. Truely non-blocking. Drawback: you loose automatic http redirects and such. You get the complete http response back, so you can implement that yourself.
> > SOme other news: Rebol seems to be inconsistent in its network behaviour. > > I tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU
<<quoted lines omitted: 4>>
> 100 'echos. W2Kcz are OK. I am really curious, what is slowing the > communication down down ...
Yes... platform dependant behaviour. Don't get it either. --Maarten

 [5/11] from: cyphre:seznam:cz at: 27-Nov-2001 16:42


Hi Nenad and all, I have found the same problem on my WIN98SEcz configuration. So here is the way I tested it: open two Rebol consoles for example SERVER and CLIENT: type in SERVER console: do %rugby.r serve/with [echo] tcp://:9001 this should run rugby server at localhost port 9001 with 'echo service available... then type in CLIENT console: do %rugby.r do get-rugby-service tcp://localhost:9001 ;echo is available locally s: now/time/precise loop 100 [echo "lala"] now/time/precise - s == 0:00:40.15 ;this is my result on [Celeron--633MHz] running WIN98SEcz + REBOL/View 1.2.1.3.1 I can't say that Rugby is faster on other systems since I haven't other systems available to test. But I believe Pekr and Maarten's reports that the result is on WIN2k and Linux about 100echos per 1 second or so. I would like to know the results on other platforms so please try this little test on your machines and let us know... regards, Cyphre

 [6/11] from: holger:rebol at: 27-Nov-2001 8:35


On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote:
> 1) Yes, it should not be slower in any way imo! TCP connection is no magic - just > raw packets on network. In reality, it should be just reverse - to set-up TCP
<<quoted lines omitted: 4>>
> data, while other side is confirming with ACK, or PUSH, ACK, if sending data too > ... or something like that ...
More or less, although there are some subtle differences, e.g. in the ACK delay strategy, in buffer sizes and in the precise behavior of the Nagle algorithm. Also, the PSH flag is implemented very inconsistently across platforms. All of these issues can affect performance, in particular for asynchronous, full-duplex communication. One thing you can try is "error? try [set-modes port [no-delay: true]]". This disables Nagle and in some situations can improve performance. Don't use it for high-volume streaming though. For more information on this and other "unexplainable" performance differences between platforms, ask the Samba developers. They could tell you some stories :-). The REBOL network adaptation layer does not vary much by platform, not enough to explain those differences. It is more likely that the performance differences are the result of a problem in the script which, together with high CPU use, interacts with differences in the paging and scheduling algorithms of different operating systems, leading to different performance. Other software which combine high CPU use with a lot of I/O, e.g. compilers, often show similar performance variations.
> > SOme other news: Rebol seems to be inconsistent in its network behaviour. I > > tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU eating. > > Shouldn't the same script run the same on all platforms?
Yes, it should, for the same input. If there are indeed differences then chances are that the input to the script (perhaps its timing) is different, and the script might have a race condition that triggers the difference in behavior. Incorrectly handled errors (e.g. from ports being closed in different order due to different timing) could explain such problems. Also, AFAIK Maarten uses the (undocumented) async-modes field in ports to implement async i/o. That field was never intended for use by anyone outside RT (anyone other than me, actually), and incorrect use may very well lead to undefined behavior, or to behavior that varies by platform. The reason why its undocumented is because it is very tricky to use correctly, in particular if you want asynchronous behavior for all situations (accepting a connection, connecting, reading, writing). "CPU eating" can easily be explained by having async-modes in the wrong state for a port that is part of system/wait-list or the argument to 'wait. In particular watch out for errors created by the other end (e.g. a closed connection), and how they are handled. An error handler which is too "global", fails to properly clean up after a port error, and leaves such a port in wait-list with async-modes in a wrong state, could easily explain busy-looping. If you have reproducable performance differences on different platforms then the best way to track down the cause is to run tcpdump. That might also reveal the reason why reusing a TCP connection slows things down. In our experience reusing TCP connections significantly improves performance, and we make use of that in Express. -- Holger Kruse [holger--rebol--com]

 [7/11] from: koopmans:itr:ing:nl at: 27-Nov-2001 17:49


On Tuesday 27 November 2001 17:35, you wrote:
> On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote: > > 1) Yes, it should not be slower in any way imo! TCP connection is no
<<quoted lines omitted: 24>>
> different performance. Other software which combine high CPU use with a > lot of I/O, e.g. compilers, often show similar performance variations.
Aha. And I tested under high CPU load. Now I get it.
> > > SOme other news: Rebol seems to be inconsistent in its network > > > behaviour. I tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and
<<quoted lines omitted: 6>>
> being closed in different order due to different timing) could explain > such problems.
There are no errors there, otherwise I'd see them? Or do you mean race conditions internally in Rebol?
> Also, AFAIK Maarten uses the (undocumented) async-modes field in ports > to implement async i/o. That field was never intended for use by
<<quoted lines omitted: 10>>
> after a port error, and leaves such a port in wait-list with async-modes > in a wrong state, could easily explain busy-looping.
No. I use non-blocking connections, I dumped async-modes after version 1 (currently 4.3). So it has to do with using non-buffered, non-blocking connections. I noticed that not reading a buffer completely on any side of a connection may result in CPU eating on some platforms, but that is fixed now. It feels like a state machine gone mad. No proof of course ;-)
> If you have reproducable performance differences on different platforms > then the best way to track down the cause is to run tcpdump. That might > also reveal the reason why reusing a TCP connection slows things down. > In our experience reusing TCP connections significantly improves > performance, and we make use of that in Express.
Except under high load on an Intranet. 330/sec or 80/sec between 2 1 Ghz Linux 2.4.14, Reiserfs with fast ethernet is a noticable difference. --Maarten

 [8/11] from: petr:krenzelok:trz:cz at: 27-Nov-2001 18:06


[holger--rebol--com] wrote:
>On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote: >>1) Yes, it should not be slower in any way imo! TCP connection is no magic - just
<<quoted lines omitted: 53>>
>In our experience reusing TCP connections significantly improves >performance, and we make use of that in Express.
Holger, guru stuff. Very interesting. I ran ethereal packet monitor, but saw nothing, just some strange packets sent to DNS server. You described Rebol internals without describing them at all :-) The nicest part is the port for Holger (tm) :-) It made my day .... btw: is there any change in networking stuff in IOS already, in comparison to View for e.g.? Or will we have to wait till 3.0? -pekr-

 [9/11] from: petr:krenzelok:trz:cz at: 27-Nov-2001 18:08


Maarten Koopmans wrote:
>Hey, >See below....
<<quoted lines omitted: 17>>
>non-blocking. Drawback: you loose automatic http redirects and such. You get >the complete http response back, so you can implement that yourself.
ehm, then you do reuse already opened ports. So how does your aproach here differes from persistent connection you had in mind??? Thanks, -pekr-

 [10/11] from: greggirwin:mindspring at: 27-Nov-2001 11:13


Hi Cyphre, << I would like to know the results on other platforms so please try this little test on your machines and let us know... >> W2K on a P900 yields:
>> s: now/time/precise loop 100 [echo "lala"] now/time/precise - s
== 0:00:00.982
>> s: now/time/precise loop 100 [echo "lala"] now/time/precise - s
== 0:00:00.991
>> s: now/time/precise loop 100 [echo "lala"] now/time/precise - s
== 0:00:00.951 --Gregg

 [11/11] from: ammonjohnson:y:ahoo at: 27-Nov-2001 15:47


I didn't see the first of this thread, but jumping in the middle, I get:
>> s: now/time/precise loop 100 [echo "lala"] now/time/precise - s
** Script Error: echo expected target argument of type: file none ** Where: do-boot ** Near: echo "lala"
>From REBOL/Link 0.9.7.3.1
Enjoy!! Ammon

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted