Rugby / TCP woes
[1/11] from: koopmans::itr::ing::nl at: 27-Nov-2001 12:57
Hi all,
I have had some discussions about adding persistent connections to Rugby.
It may be good to know that I tested this feature for 4.3 but that on an
Ethernet the setup time for /no-wait/direct TCP ports is so short that
reusing connections is actually fourt times slower. I have no clue why!
SOme other news: Rebol seems to be inconsistent in its network behaviour. I
tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU eating.
Shouldn't the same script run the same on all platforms?
Under NT 4 I managed to eat up all CPU if I didn't read alkl bytes on a
server before closing a port. This is fixed in 4.3
My (wild) guess: Rebol puts a small wrapper on TCP/IP stacks in different
OS'es with more and more advanced features, such as non-blocking I/O. As some
platforms/kernels have (different) bugs you see this in Rebol scripts....
--Maarten
[2/11] from: petr:krenzelok:trz:cz at: 27-Nov-2001 14:14
Maarten Koopmans wrote:
> Hi all,
>
> I have had some discussions about adding persistent connections to Rugby.
> It may be good to know that I tested this feature for 4.3 but that on an
> Ethernet the setup time for /no-wait/direct TCP ports is so short that
> reusing connections is actually fourt times slower. I have no clue why!
1) Yes, it should not be slower in any way imo! TCP connection is no magic - just
raw packets on network. In reality, it should be just reverse - to set-up TCP
connection, machine requesting connection sends SYN packet, remote machine sends
SYN, ACK packet, acknowledging connection acceptance, then first machine once
again confirms by ACK packet - so, actually setting-up tcp connection is three
way process, while sending packet containing data means sending PSH, ACK with
data, while other side is confirming with ACK, or PUSH, ACK, if sending data too
... or something like that ...
2) as for keeping connection "alive". I thought you are doing so with /deffered
type of Rugby connection, no? The only one "problem" is - you close the port
after you get-result ticket. Another problem is, that you have to explicitly poll
server each n secs, if the result is already available. If I understand it
correctly, you use http tunneling. Althought I don't fully understand what is
going on with http tunneling, isn't it really possible to re-use already opened
channel for transfer from server side, to client side? Scenario:
- connection to Rugby server
- server stores port in a block of port
- client stores port in a block of port
Is that right so far? And now - what is the problem for e.g. for chat server, to
redistribute (insert-to-port) messages to each client port registered, instead of
letting clients to poll the server? And btw- what does polling mean here? Is
server contacted with new connection? As becuase I just looked at
'http-result-available? function, and it seems to me, that you only do 'copy on
port, or is there really reconnection happening to the server?
> SOme other news: Rebol seems to be inconsistent in its network behaviour. I
> tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU eating.
> Shouldn't the same script run the same on all platforms?
>
What is more, there seems to be one strange thing happening. On czech version of
W95, W98, on various set-ups, ranging from P300 to P650, the result of rugby
communication is always the same - 40 - 44 sec for loop of 100 'echos. W2Kcz are
OK. I am really curious, what is slowing the communication down down ...
-pekr-
[3/11] from: dockimbel:free at: 27-Nov-2001 15:29
Hi Pekr,
I'm just curious, could you show us your client test script ? What value do you pass
to 'echo ?
-Doc
Petr Krenzelok wrote:
[...]
[4/11] from: koopmans:itr:ing:nl at: 27-Nov-2001 15:17
Hey,
See below....
> Maarten Koopmans wrote:
> > Hi all,
<<quoted lines omitted: 12>>
> confirming with ACK, or PUSH, ACK, if sending data too ... or something
> like that ...
Exactly. Although behaviour may differ on the internet. It *is* strange. But
you see that I have currently no reason to do persistent connections in Rugby.
> 2) as for keeping connection "alive". I thought you are doing so with
> /deffered type of Rugby connection, no? The only one "problem" is - you
<<quoted lines omitted: 7>>
> - server stores port in a block of port
> - client stores port in a block of port
Yes.
> Is that right so far? And now - what is the problem for e.g. for chat
> server, to redistribute (insert-to-port) messages to each client port
<<quoted lines omitted: 3>>
> 'http-result-available? function, and it seems to me, that you only do
> 'copy on port, or is there really reconnection happening to the server?
Yes. result-available? reads data from the client port (the *client* TCP
stack) and does *not* poll the server. If it sees that it has all the data
(=the return data) it closes the underlying port once you read it in the
application.
So: all requests use the same port for request/return. In fact,
wait-for-result is just:
until [result-available? index get-result index]
Then the mysterious httpr (r for rugby) is just a copy of RT's http protocol
that does not wait for the first two lines of the return header. Truely
non-blocking. Drawback: you loose automatic http redirects and such. You get
the complete http response back, so you can implement that yourself.
> > SOme other news: Rebol seems to be inconsistent in its network behaviour.
> > I tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU
<<quoted lines omitted: 4>>
> 100 'echos. W2Kcz are OK. I am really curious, what is slowing the
> communication down down ...
Yes... platform dependant behaviour. Don't get it either.
--Maarten
[5/11] from: cyphre:seznam:cz at: 27-Nov-2001 16:42
Hi Nenad and all,
I have found the same problem on my WIN98SEcz configuration. So here is the
way I tested it:
open two Rebol consoles for example SERVER and CLIENT:
type in SERVER console:
do %rugby.r
serve/with [echo] tcp://:9001
this should run rugby server at localhost port 9001 with 'echo service
available...
then type in CLIENT console:
do %rugby.r
do get-rugby-service tcp://localhost:9001 ;echo is available locally
s: now/time/precise loop 100 [echo "lala"] now/time/precise - s
== 0:00:40.15 ;this is my result on [Celeron--633MHz] running WIN98SEcz +
REBOL/View 1.2.1.3.1
I can't say that Rugby is faster on other systems since I haven't other
systems available to test.
But I believe Pekr and Maarten's reports that the result is on WIN2k and
Linux about 100echos per 1 second or so.
I would like to know the results on other platforms so please try this
little test on your machines and let us know...
regards,
Cyphre
[6/11] from: holger:rebol at: 27-Nov-2001 8:35
On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote:
> 1) Yes, it should not be slower in any way imo! TCP connection is no magic - just
> raw packets on network. In reality, it should be just reverse - to set-up TCP
<<quoted lines omitted: 4>>
> data, while other side is confirming with ACK, or PUSH, ACK, if sending data too
> ... or something like that ...
More or less, although there are some subtle differences, e.g. in the
ACK delay strategy, in buffer sizes and in the precise behavior of the
Nagle algorithm. Also, the PSH flag is implemented very inconsistently
across platforms. All of these issues can affect performance, in
particular for asynchronous, full-duplex communication.
One thing you can try is "error? try [set-modes port [no-delay: true]]".
This disables Nagle and in some situations can improve performance.
Don't use it for high-volume streaming though. For more information on
this and other "unexplainable" performance differences between
platforms, ask the Samba developers. They could tell you some stories
:-).
The REBOL network adaptation layer does not vary much by platform, not
enough to explain those differences. It is more likely that the
performance differences are the result of a problem in the script which,
together with high CPU use, interacts with differences in the paging and
scheduling algorithms of different operating systems, leading to
different performance. Other software which combine high CPU use with a
lot of I/O, e.g. compilers, often show similar performance variations.
> > SOme other news: Rebol seems to be inconsistent in its network behaviour. I
> > tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and observes CPU eating.
> > Shouldn't the same script run the same on all platforms?
Yes, it should, for the same input. If there are indeed differences then
chances are that the input to the script (perhaps its timing) is
different, and the script might have a race condition that triggers the
difference in behavior. Incorrectly handled errors (e.g. from ports
being closed in different order due to different timing) could explain
such problems.
Also, AFAIK Maarten uses the (undocumented) async-modes field in ports
to implement async i/o. That field was never intended for use by
anyone outside RT (anyone other than me, actually), and incorrect
use may very well lead to undefined behavior, or to behavior that varies
by platform. The reason why its undocumented is because it is very
tricky to use correctly, in particular if you want asynchronous behavior
for all situations (accepting a connection, connecting, reading,
writing). "CPU eating" can easily be explained by having async-modes
in the wrong state for a port that is part of system/wait-list or the
argument to 'wait. In particular watch out for errors created by the
other end (e.g. a closed connection), and how they are handled. An
error handler which is too "global", fails to properly clean up
after a port error, and leaves such a port in wait-list with async-modes
in a wrong state, could easily explain busy-looping.
If you have reproducable performance differences on different platforms
then the best way to track down the cause is to run tcpdump. That might
also reveal the reason why reusing a TCP connection slows things down.
In our experience reusing TCP connections significantly improves
performance, and we make use of that in Express.
--
Holger Kruse
[holger--rebol--com]
[7/11] from: koopmans:itr:ing:nl at: 27-Nov-2001 17:49
On Tuesday 27 November 2001 17:35, you wrote:
> On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote:
> > 1) Yes, it should not be slower in any way imo! TCP connection is no
<<quoted lines omitted: 24>>
> different performance. Other software which combine high CPU use with a
> lot of I/O, e.g. compilers, often show similar performance variations.
Aha. And I tested under high CPU load. Now I get it.
> > > SOme other news: Rebol seems to be inconsistent in its network
> > > behaviour. I tested on Linux 2.4.x libc6, but Petr runs on 2.2.16 and
<<quoted lines omitted: 6>>
> being closed in different order due to different timing) could explain
> such problems.
There are no errors there, otherwise I'd see them? Or do you mean race
conditions internally in Rebol?
> Also, AFAIK Maarten uses the (undocumented) async-modes field in ports
> to implement async i/o. That field was never intended for use by
<<quoted lines omitted: 10>>
> after a port error, and leaves such a port in wait-list with async-modes
> in a wrong state, could easily explain busy-looping.
No. I use non-blocking connections, I dumped async-modes after version 1
(currently 4.3). So it has to do with using non-buffered, non-blocking
connections. I noticed that not reading a buffer completely on any side of a
connection may result in CPU eating on some platforms, but that is fixed now.
It feels like a state machine gone mad. No proof of course ;-)
> If you have reproducable performance differences on different platforms
> then the best way to track down the cause is to run tcpdump. That might
> also reveal the reason why reusing a TCP connection slows things down.
> In our experience reusing TCP connections significantly improves
> performance, and we make use of that in Express.
Except under high load on an Intranet. 330/sec or 80/sec between 2 1 Ghz
Linux 2.4.14, Reiserfs with fast ethernet is a noticable difference.
--Maarten
[8/11] from: petr:krenzelok:trz:cz at: 27-Nov-2001 18:06
[holger--rebol--com] wrote:
>On Tue, Nov 27, 2001 at 02:14:47PM +0100, Petr Krenzelok wrote:
>>1) Yes, it should not be slower in any way imo! TCP connection is no magic - just
<<quoted lines omitted: 53>>
>In our experience reusing TCP connections significantly improves
>performance, and we make use of that in Express.
Holger, guru stuff. Very interesting. I ran ethereal packet monitor, but
saw nothing, just some strange packets sent to DNS server. You described
Rebol internals without describing them at all :-) The nicest part is
the port for Holger (tm) :-) It made my day ....
btw: is there any change in networking stuff in IOS already, in
comparison to View for e.g.? Or will we have to wait till 3.0?
-pekr-
[9/11] from: petr:krenzelok:trz:cz at: 27-Nov-2001 18:08
Maarten Koopmans wrote:
>Hey,
>See below....
<<quoted lines omitted: 17>>
>non-blocking. Drawback: you loose automatic http redirects and such. You get
>the complete http response back, so you can implement that yourself.
ehm, then you do reuse already opened ports. So how does your aproach
here differes from persistent connection you had in mind???
Thanks,
-pekr-
[10/11] from: greggirwin:mindspring at: 27-Nov-2001 11:13
Hi Cyphre,
<< I would like to know the results on other platforms so please try this
little test on your machines and let us know... >>
W2K on a P900 yields:
>> s: now/time/precise loop 100 [echo "lala"] now/time/precise - s
== 0:00:00.982
>> s: now/time/precise loop 100 [echo "lala"] now/time/precise - s
== 0:00:00.991
>> s: now/time/precise loop 100 [echo "lala"] now/time/precise - s
== 0:00:00.951
--Gregg
[11/11] from: ammonjohnson::yahoo at: 27-Nov-2001 15:47
I didn't see the first of this thread, but jumping in the middle, I get:
>> s: now/time/precise loop 100 [echo "lala"] now/time/precise - s
** Script Error: echo expected target argument of type: file none
** Where: do-boot
** Near: echo "lala"
>From REBOL/Link 0.9.7.3.1
Enjoy!!
Ammon
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted