Mailing List Archive: Re: About CONTINUATIONS

[REBOL] Re: About CONTINUATIONS

From: koopmans:itr:ing:nl at: 25-Feb-2002 10:10


OK, my 2 eurocents ;-)

> Ok, since you are not going to stop complaining until someone from RT makes
> a statement, I hereby do so :-).
>
> Continuations were not "removed" by us. They became a victim of a complete
> reimplementation of REBOL, by necessity. There was never an option of
> keeping continuations and still advance REBOL in the way it has happened,
> and there is still no such option.
>

What is interesting is the necessity. Speed? Features? Below you suggest that
certain features would be impossible otherwise, such as View or library.
We'd take that as the necessity for now.

> In order to understand how continuations fit into a language interpreter we
> first need to examine the different ways an interpreter can be constructed.
> There are three fundamental ways of doing this:
>
> 1) Using a Stack Machine (SM). This is what REBOL 2.x does. In a stack
> machine-based interpreter the calling path through the interpreted program
> somewhat corresponds to the calling path within the interpreter, i.e.
> whenever a function is called in the program, the interpreter also calls a
> function within the interpreter, to evaluate the function. Similarly for
> expressions etc. Recursive interpretation is handled by recursively calling
> interpreter functions, i.e. such functions have to be reentrant.
>
> This means that in an SM-based interpreter a significant portion of the
> "state" of the interpreter is kept implicitly, on the stack or in local
> variables, CPU registers etc., i.e. it is not available at the language
> level. Because of that continuations (which require all state information
> to be explicit, to allow it to be manipulated at the language level) cannot
> be implemented in a purely SM-based interpreter.
>

OTOH REBOL exposes a lot of 'external' information to be fully reflexive.
This gets you close to the point where you want to be in most cases.
Especially combined with catch/throw.

> 2) Using a Finite State Machine (FSM). This is what REBOL 1.x did. In an
> FSM-based interpreter all interpreter state is stored explicitly. The CPU
> stack is hardly used at all, and the interpreter does not recurse on
> itself.  Rather recursive evaluation is performed by extending the size of
> the state structure. This can allow the interpreter stack to become a
> first-class datatype at the language level.
>
> 3) A combination. See below.
>
> The distinction between FSM-based and SM-based systems is not unique to
> interpreters. You will find a similar distinction in other places.  For
> example in an application (top-down or event-driven, i.e. bottom-up), and
> in different pieces of operating systems (filesystem, network stack, GUI
> engine etc.), so anyone who has some non-trivial experience in software
> development should be familiar with the significance of this (and the
> problems that occur when both models "clash" in some way).
>

Well said.  But the clashing is often solvable (otherwise you wouldn't have
GUI engines, and VI ruled ;-)

> So what are the advantages and disadvantages of both methods, when applied
> to interpreters ?
>
> FSM-based interpreters tend to be more flexible. Some features, such as
> continuations, are only available in FSM-based interpreters, because they
> require explicit access to the interpreter state. Other features, such as
> threading and coroutines are much easier to implement in FSM-based
> interpreters.
>
> FSM-based interpreters integrate better with environments which are
> FSM-based as well. For instance if the OS is FSM-based, such as PalmOS,
> then having an FSM-based interpreter tends to allow better integration,
> smoother user interaction etc.
>

Yes. But better is not the same as impossible! Harder and with different
trade-offs, yes. More or less feasible, yes. Impossible, no. See below.

> On the other hand SM-based interpreters integrate better with operating
> systems that are SM-based as well, which pretty much covers all modern
> operating systems, with the exception of parts of the GUI engine. In
> particular operating systems which rely heavily on a "top down with
> callbacks" model, like Windows, integrate better with SM-based
> interpreters, if those callbacks are supposed to be exposed at the language
> level in some way.
>
> FSM-based interpreters usually cannot adequately expose OS features that
> are SM-based at the language level. For instance, REBOL exposes OS library
> calls in /Pro and /Command, with callbacks in the soon-to-be-released new
> View/Pro and Command versions. This would be pretty much impossible to do
> with an FSM-based environment. To understand why, just imagine that within
> the callback a new continuation chain branches off, the main branch
> returns, through the OS function, and then the program decides to continue
> within the continuation, and finally return through the OS function again.
> Obviously that is impossible, because the stack frame of the OS function no
> longer exists. A large number of features of contemporary operating systems
> rely on a stack frame, in one way or another, i.e. exposing those features
> at the language level is only possible by interpreters which correlate
> stack frame to interpreter state. An FSM-based interpreter does not do
> that, so it cannot expose those features.
>

First, I like the soon-to-be-released part ;-)

But then... there are abstraction layers possible (at a cost, yes). MzScheme
clearly shows that you can have all sorts of OS features available in a
continuation-based language (proof by counter-example ;-)  This is not to say
integration is easy, of course. Or desirable.

*Cannot* is perhaps a bit too ....  stated? Desirable with regards to all
kinds of cost factors is a different thing. What I read between the lines is
that given the RT constraints (little manpower, lots of real-life features,
agressive cross-platformness) FSM is not the best choice.

> FSM-based interpreters are inherently slower and usually require more
> memory: C compilers, CPUs etc. have been designed with SMs in mind and
> highly optimize SM-related operations, which are for the most part
> standardized for each platform these days (stack frames, calling
> conventions etc.). On the other hand FSM designs always require explicit
> FSM code, more elaborate memory management etc., and cannot make use of the
> existing resources as well. For instance where an SM-based interpreter can
> implicitly store the result of a condition in a CPU flag and act on it
> immediately (conditional jump), an FSM-based interpreter needs to allocate
> a data structure for it and store the result as a flag, then loop back in
> the interpreter, examine the state, and finally examine the flag, remove it
> from the state, and act on it. That is an order of magniture more work.
>

Yes. But most problems to day can be solved pretty fast with lots of memory.
Of course this is a recurring argument, as hardware always gets faster, and
there 's is a turnaround point for some kinds of application every x years.

For FSM based interpreters (possibly with C-code generation) that point has
passed. That is not to deny that efficiency is a good thing and that a more
efficient environment is always desirable. Especially if you have a product
like Express coming, which has to be able to run on PDA's (probably) as well.

Different cost metrics, again.

> FSM-based interpreters are usually much more difficult to write and
> maintain, just like event-driven programs tend to be more difficult to work
> with than procedural programs (with few exceptions, like class-based GUI
> engines). That is probably what Joe Marshall's comments regarding
> qualifications and requirements for developers comes from. The side effect
> of this is that FSM-based interpreters tend to be less agile: development
> and testing take more planning and effort. Changes happen more slowly.
>

So... higher cost (I know, I keep repeating)....

> 3) Finally, here is that third method I mentioned above: It is possible for
> an interpreter to use a "hybrid" method of FSM and SM, by using a stack
> machine most of the time, and capturing pending information within an
> on-the-fly continuation structure. The main advantage of this method is
> that most of the time, performance is about as good as you can expect from
> an SM-based interpreter.  The problem is that this is usually not
> practical, at least not in non-trivial interpreters. That's because there
> are only two ways of doing it: explicitly, by unrolling and recreating
> function calls, or by "hacking it in", using assembly language stubs that
> analyze and synthesize stack frames. Even though some languages have
> successfully used both methods in some limited scope and on selected
> platforms, they are not suitable for REBOL.  Explicit unrolling is
> infeasible because it would have to be supported in too many places.
> REBOL/Core has around 100 natives. Add to that actions and ops, and it
> becomes several hundred. Adding code to serialize the state to all of those
> functions would take months to implement and probably years to test. The
> "hack" solution is easier to do on some platforms, but very
> platform-specific, i.e. not suitable for REBOL. Besides, even though
> continuation could be supported either way, this would still not solve the
> problem that continuations are fundamentally incompatible with stack-based
> OS features, such as library calls with callbacks.
>
> > Continuations make loads of intersting and advanced things possible
> > without detracting anything away from the user / programmer who doesn't
> > use or understand them.
>
> Yes, and they also make loads of interesting and advanced things
> IMpossible. In an error!-based error handling scheme such as REBOL it is
> possible to safely handle error conditions even with libraries and
> callbacks, and with event-driven programs. With continuations this would be
> impossible. You would end up with an environment which offers multiple
> features, but without the ability to use them in combination. From a
> modularity point of view that is much worse than offering a smaller set of
> features with full orthogonality.
>
> > Scheme and Common Lisp both have them and so does Stackless Python and
> > ML. There is loads of interesting Comp Sci literature on continuations
> > and they are a well understood and used feature for advanced programming
> > in the above languages, especially in the field of modelling concurrency
> > and parallel programming.
>
> Yes, that's the point, really. Continuations are a somewhat theoretical
> concept.  They severely clash with how operating systems and runtime
> systems are designed. This is why they are only implemented in languages
> which are academic in nature. When a language matures to the point that it
> becomes interesting beyond the academic realm, other considerations, such
> as performance, integration with existing environments etc. become more
> important. This kind of maturing is exactly what happened in the transition
> from REBOL 1.x to 2.x.
>
> > From what I can see they didn't fully explore the continuation model to
> > it's full extent, Joe Marshall states as such. Iam sure there is a LOT
> > they could have learned from advanced Scheme & ML implementations which
> > STILL have these features. These languages are not SO fundamentally
> > different from REBOL which is at it's core a prefix functional language.
>
> That's completely besides the point. Continuations have nothing to do with
> the language model, only with the interpreter model.
>
> > This is all theoretical but raises some interesting issues regarding
> > language design and implementation.
>
> Yes, exactly. Implementation more than design. The only aspect interesting
> for design is whether continuations should be first class.
>
> > I was a bit disturbed that Joe Marshall stated that with his
> > implementation "nobody else in the company could extend the
> > interpreter.." SURELY NOT? What about Carl Sassenrath? Also "You need a
> > degree in Comp. Sci from a few select places to really understand all
> > this.." IS THAT A PROBLEM? Is there anything wrong with that?
>
> Don't take these comments too literally :-). Joe was just trying to make a
> point. The current interpreter is an order of magnitude easier to maintain,
> extend and test.
>
> > The solutions *ARE* there if you take the time to learn and understand
> > them.
>
> Sort of. Solutions are not universal. They only apply to certain domains.
> For instance if you sacrifice OS interaction, or graphics, or networking or
> certain other things, then this may allow you to make certain
> implementation changes that allow the implementation of other features.
> Java demonstrates this nicely: it has one interesting and important
> feature: operational safety and isolation of the virtual machine from its
> environment. It pays a high price though: no pointers, no unions, no arrays
> of structures, very limited casting, extremely high numbers of run-time
> tests, and thus overall bad performance. Some say, in retrospect, that this
> was too high a price.
>
> What this illustrates is that just because a solution is theoretically
> possible and even has been shown to work well in one domain does not mean
> that is is even applicable in another one, and some choices are mutually
> exclusive. That's why there IS more than one language you can choose from.
> Keeping continuations in REBOL would have prevented REBOL from entering
> certain domains. For instance /View would have been impossible, for
> performance reasons. There is the library/callback issue. There is agility,
> integration with existing software packages and libraries. There are other
> issues...
>
> And to answer the inevitable second question, regarding tail recursion
> optimization (or more generally, tail call optimization):
>
> First of all, we need to distinguish between the interpreter and compiler
> case, because the way they deal with tail call optimization is quite
> different:
>
> In a compiler the optimization is not performed at run time, but at compile
> time, and it involves unrolling the stack frame before making the final
> call. Sometimes that is as easy as replacing "jsr...; rts" with "jmp", but
> arguments make it more tricky. In any case, there is generally NO reason
> NOT to do it. The only price you pay is, perhaps, slightly longer
> compilation time. The resulting code is usually shorter, quicker and needs
> less memory, i.e. it is "better".
>
> In an interpreter the situation is different. The interpreter needs to
> determine at run time whether a call is a "tail call" and allows
> optimization. Because of that there is a trade-off: the benefit of
> successful tail call optimization has to be weighed against the additional
> effort to determine, dynamically, whether such optimization is possible. If
> that determination on the average takes up more CPU time or memory than the
> actual optimization saves, then such an optimization should not be
> implemented. Compilers obviously do not have that problem.
>
> I do not have any hard data regarding that particular trade-off, but my
> impression is that in "normal" programs (i.e. programs that were not
> written particularly to prove a point, demonstrate a concept etc., in the
> academic domain), it is very rare to have tail calls that benefit from
> optimization, so I tend to suspect that adding tail call optimization may,
> in most cases, have a detrimental effect.
>
> It would certainly be possible to add tail call optimization back in,
> although it is somewhat more difficult to do in an SM-based interpreter
> than an FSM-based interpreter (because it is not enough to just prune the
> state structure -- the stack has to be unrolled, too). However I could
> think of probably dozens of feature requests for REBOL that I would
> consider more important, benefitting more users, and resulting in more
> significant improvements, so don't count on tail call optimization to be
> very high on our priority list :-).

And there is something else to realize: let's take REBOL and Scheme as
examples. Both allow for easy semantic prototyping of other languages,
sublanguages etc.  The important thing is "easy" h