r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[World] For discussion of World language

Janko
2-Dec-2011
[156x2]
thanks for explanations to both of you. so it's more of a backend 
similarity
I wasn't aware of byte-code VM specifics .. so thanks for that too
Andreas
2-Dec-2011
[158]
Thanks for the comprehensive Q&A, John.
Geomol
2-Dec-2011
[159x2]
@Janko, look at "Countdown: 1" tomorrow.
You're welcome, Andreas.
Andreas
2-Dec-2011
[161x2]
I also think that 256-bit VM insn size sounds a bit wasteful. That'll 
thrash the data cache easily.
I'm not sure that holding complex values as immediates is worth it, 
even for complex-heavy code.
BrianH
2-Dec-2011
[163]
REBOL code is interpreted, but not its source. The slow part of a 
source interpreter is parsing the source into the intermediate code, 
the AST. REBOL is an AST evaluator. The advantage to that relative 
to a bytecode VM is that you can extend the runtime with more fast 
operations without breaking the bytecode encoding, but the disadvantage 
is that the interpreter overhead is larger so if you want your operations 
to be efficient you have to use larger ones. This is why C-like code 
is slow in REBOL, but high-level code can be fast.


If you want to get the advantages of a bytecode VM with the extensibility 
advantages of REBOL's model you could go with an address-threaded 
interpreter. Address-threaded interpreters have more data going through 
the processor than bytecode interpreters do, but it you need to support 
higher-level operations they are more efficient overall. However, 
if you don't need to support higher-level operations and only need 
to support a tiny number of low-level operations then bytecode can 
be encoded in a much smaller amount of space. If your language is, 
for instance, a spreadsheet formula evaluator then you might even 
be able to have 4-bit bytecodes, with two operations per byte, and 
have an interpreter that fits entirely in the instruction cache of 
a processor. Bytecodes can be much faster then.


Still, Lua's bytecode VM, as efficient as it is, has been running 
into performance limits as well. Fortunately, a bytecode model that 
maps well enough to the native code model (remember what I said earlier 
about C-like bytecode VMs?) can have the bytecodes translated to 
native code at runtime and then execute the native code. For C-like 
code that is usually even faster than address-threading. This is 
why LuaJIT has been doing so well when compared to Lua's bytecode 
VM.


World being Lua-like means that it can improve using methods similar 
to the ones that Lua has been using to improve. That's definitely 
a good thing, since it means that Geomol doesn't have to work from 
scratch :)
Andreas
2-Dec-2011
[164]
As you talk about registers in context of your VM, I assume that 
your VM is register-based, right? (Not stack-based.)
BrianH
2-Dec-2011
[165]
If you want to compare to a source interpreter, try old versions 
of TCL before it switched to bytecode interpretation. That was *slow*, 
not like REBOL at all.
Andreas
2-Dec-2011
[166x3]
If so, I'd probably try splitting immediate complex-loads into two 
insns. Then reduce insn size to 128-bit (if possible) and check the 
effect on code size.
Just a thought, though. You (John) probably already tried that :)
Looking very much forward to playing with World.
Geomol
2-Dec-2011
[169]
Uh, I'm not 100% on the technical terms. Registers are on a stack, 
and a register pointer is an offset. So you can have lots of registers, 
and the VM can access each of them very fast. Values can also float 
in memory (not on stack), and then we have a real pointer to it. 
See blog tomorrow.
Andreas
2-Dec-2011
[170]
Ah, then it's stack-based with deep access into the stack.
Geomol
2-Dec-2011
[171]
ok
BrianH
2-Dec-2011
[172]
Can you access registers that aren't on the top of the stack? Direct 
addressing, rather than stack operations?
Geomol
2-Dec-2011
[173x2]
Is that good? :)
Brian, yes.
Andreas
2-Dec-2011
[175]
That's not why I was asking :)
BrianH
2-Dec-2011
[176]
That will make the JIT easier then :)
Andreas
2-Dec-2011
[177x2]
But it's an even stronger incentive to keep instruction size small 
:)
As I assume that your stack slots/registers are also 256-bit wide?
BrianH
2-Dec-2011
[179]
The JITs for the JVM model have been notoriously difficult because 
they were strictly stack-based, which only made sense on the stack-based 
hardware the JVM was based on, not the register-based hardware it 
was being JIT-compiled to. One of the reasons LuaJIT is so good is 
that Lua's model is register-based.
Geomol
2-Dec-2011
[180]
Stack slots are 192 bits = 24 bytes.
Andreas
2-Dec-2011
[181x2]
Ok.
(As long as it's a 64b-multiple, thats fine.)
Geomol
2-Dec-2011
[183]
yup
BrianH
2-Dec-2011
[184x2]
Is your bytecode polymorphic, or is it statically typed? A polymorphic 
VM like REBOL's wouldn't have problems with higher-level series like 
unicode!, but to support that on a static-type VM you would need 
either a lot of opcodes or compiling to a lot of code to support 
it.
REBOL's actions are a polymorphic VM, btw.
Andreas
2-Dec-2011
[186]
(And to not evade that question completely: there is some compelling 
evidence that register-based VMs enable faster-performing VM implementations.)
Geomol
2-Dec-2011
[187]
Polymorphic.
Andreas
2-Dec-2011
[188]
(That's only when you use interpretation for the VM implementation. 
If your VM implementation is a JIT-compiler, register-based VMs match 
typical target architectures more closely, as Brian mentioned.)
Geomol
2-Dec-2011
[189]
I have only done very little in compiler optimisation, so it can 
only be better, I guess.
BrianH
2-Dec-2011
[190x2]
Andreas: The compelling evidence being Lua, which is the main register-based 
VM in popular use for which that is true. However, that depends on 
a number of other factors, not the least of which is the target architecture, 
or instruction-set design, or how well the register model maps to 
the underlying register model. It might be noted that there are not 
that many hardware platforms with 192-bit registers, so that might 
affect things.
Geomol: Are series operations included in your polymorphic opcodes, 
like they are in REBOL? Or are you sticking to the C-like opcodes?
Andreas
2-Dec-2011
[192]
Brian: the compelling evidence being http://dl.acm.org/citation.cfm?id=1328195.1328197
Geomol
2-Dec-2011
[193]
That you will need to figure out yourself within a few days. It's 
more fun that way. :)
BrianH
2-Dec-2011
[194x3]
Andreas: So in the case of register-based hardware architectures 
with a lot of 64bit registers (AMD64, PowerPC, Alpha), for a value 
space that rarely has direct values more than 64bit in size (the 
JVM model) implementing a variable-based procedural language rather 
than a stack-based language, a register-based model is faster than 
a stack-based model. Yup.
I'm surprised that in that case the register-based VM is only a small 
amount faster than the stack-based VM. Their register-based VM most 
have been crappy.
most -> must
Andreas
2-Dec-2011
[197x2]
Or their stack VM rather good. But we're getting seriously off-topic 
here.
(And I'm not surprised. The target architectures characteristics 
you mentioned above (reg count, width) are far less important than 
you seem to assume when implementing the VM using an interpreter.)
BrianH
2-Dec-2011
[199]
Back on topic: Do World's compiled functions implement the modifiable 
immediate block semantics of REBOL?
Steeve
2-Dec-2011
[200]
Well the claimed speed improvement is confusing me.
R3 slower than R2 on the Geomol's computer, uh !

And sorry but I also think that the memory footprint of the bytecodes 
is outrageous :-)
Andreas
2-Dec-2011
[201]
I faintly remember that R3 was slower than R2 for many things I benchmarked 
as well.
BrianH
2-Dec-2011
[202]
I guess it's better than R3's 128bit bytecodes :)
Steeve
2-Dec-2011
[203]
On a Mac again ?
Andreas
2-Dec-2011
[204x2]
On Linux.
128bit < 256bit, so no :)