[REBOL] Re: P2P Q&A
From: holger:rebol at: 3-Oct-2001 8:29
On Wed, Oct 03, 2001 at 09:03:51AM +0200, Maarten Koopmans wrote:
> It works well but you are switching between threads which is overhead you
> can save if you do it yourself within one process. Unless you have native
> threads, SMP, and multiple processors, because you are not switching then.
Actually, you are, if the threads are tightly coupled, and you may end
up with less performance than if you just forgot about SMP support in
your threading layer altogether. Sun learned that the hard way in their
attempt to support Solaris SMP in Java.
There are basically three types of threading support:
1. Single CPU, non-preemptive, switching completely under application
control (in the case of an interpreter "application" is the interpreter,
not the script).
2. Single CPU, preemptive (at least partially, e.g. by having no control
over switching during OS calls, or by having Unix signals trigger
switching).
3. Multiple CPUs. Whether switching is preemptive on each CPU or not
does not matter (for the purpose of this analysis).
Intuitively 3 appears to perform better than 2, and 2 better than 1.
The rule is: if your threads are loosely coupled (do not have many
resource dependencies) then the intuition is correct, and 3 is
preferable. However if threads are tightly coupled then it is the
other way around: 1 performs best and 3 worst. Here is why:
With tightly coupled threads only one thread can physically run
at any time anyway, because it needs to lock all (low-level)
resources. For interpreters you would have to, e.g., lock symbol
tables, name-value lookup tables (for REBOL: contexts) and a
lot of other things. Almost every statement in a script makes
changes in some way, so you need to lock exclusively.
Locking at a smaller granularity than "everything" is
usually impossible, because there is no natural hierarchy
to data structures, and multiple locks would thus inevitably
either lead to deadlocks or to such an enormous number of
master lock
lock layers and individual locks in order to
avoid deadlocks that the overall performance gets even worse than
by just using one single global lock.
Knowing that, you can pretty much forget about SMP benefits, because
only one thread and thus one CPU can run at any time anyway. That
means cases 2 and 3 become very similar in performance. 3 is actually
worse, because when switching from one thread to another in case 2
you only have ONE THREAD switch. In case 3 you have TWO TASK switches
(because each CPU switches between one thread in your overall task
and some OTHER task). MUCH more overhead. By going from 2 to 3 you
are trading light-weight switching for heavy-weight switching, without
any net benefit from parallelism: bad idea.
And 1 performs better than 2 and 3 because it allows implicit locking.
If the runtime system controls thread switching then it does not need
to lock resources that are shared among threads at all. The active
thread owns them by convention while it executes. Of course this only
applies to low-level resources, within the interpreter, which are
accessed "below" the switching layer. It says nothing about high-level
resources which are accessed "above" the switching layer, e.g.
non-atomic data structures within your script. You will still have
to provide semaphore locks for those, but that is true for all three
cases.
It is interesting to see that the natural reaction of a lot of people
is to assume that preemptive switching is always superior to
non-preemptive switching. This is actually not the case. The intuition
probably comes from the experience with Windows, where the change
to preemptive switching provided so many advantages. However that is
only because a lot of Windows applications were derived from MS-DOS
or modeled after existing MS-DOS applications, and were therefore
not written in a multitasking-friendly manner to start with. With
non-preemptive switching a run-away application could lock up the
whole system. With preemptive switching it would only result in a
task not responding
, keeping the system alive. If all tasks "played
by the rules" then you would not see any difference between preemptive
and non-preemptive switching at all. Of course for an operating
system you would still want preemptive switching at the task level,
but that's a different matter...
Things are a lot different if each thread runs within an interpreter
(regardless of whether it is a "real" interpreter like REBOL or a
bytecode interpreter like Java). In that situation the interpreter
itself can "play nicely" with the threading engine, within the
interpreter loop, making run-away threads impossible, i.e. even
with something like "forever []" in a non-preemptive threading
environment you would still not lock up other threads, because the
interpreter itself would preempt threads, even if the underlying
threading engine is non-preemptive. This means in this kind of
environment preemptive threading has no advantages over
non-preemptive threading, but it does have one disadvantage:
higher overhead and thus worse performance because of locking
requirements. The same is true for SMP threading (preemptive or
non-preemptive).
It is no surprise that, from my experience, most people running Java
in Solaris SMP environments choose not to enable Java's native SMP
support, but stick with Java's single-CPU "green-threads" layer
instead, which is basically a non-preemptive single-CPU threading
engine with wrappers around system functions to allow "controlled
preemption".
Same thing for REBOL: if/when we do support multi-threading it is
very likely going to be a solution that falls into category one,
supporting single CPUs only (per REBOL process), with implicit
internal locking and some primitives at the REBOL language layer
to manipulate threads and priorities, and provide mutual exclusion.
At the script layer the engine would appear preemptive, but below
the hood it would not be, thus no SMP.
> > We need multithread to make really serious apps with Rebol, especially for
> serious server
> > apps ! IIRC, /Serve use multithreading internally.
> >
> > [...]
> Async I/O I thought ;-) Jeff, Holger?
Yes, async i/o. No real multithreading. Of course single-threading and
async i/o combined with a state machine per pending task leads to
pretty much exactly the same performance (sometimes even better) as
multi-threading with blocking i/o per thread. One advantage of the
single-threading solution is that load balancing, fairness, resource
management, sequencing etc. usually become much easier, because the
information required to manage that is explicitly available in the
data structures, not hidden within the threading environment. The main
drawback is that writing programs that way takes some getting used to.
Sometimes people refer to the first technique as "multi-threading" as
well, simply because the end result is the same, and to benefit from
the "buzz" that the term multi-threading creates among the less
technical crowd :-).
For instance a lot of so-called "multi-threaded filesystems"
or "multi-threaded TCP/IP stacks" don't really use multi-threading
at all, but actually use single-threading with async i/o and state
machines. The user never knows the difference, neither do benchmarks.
You only recognize it once you see the source code.
--
Holger Kruse
[holger--rebol--com]