Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 28 |
Nodes: | 6 (0 / 6) |
Uptime: | 43:10:30 |
Calls: | 422 |
Calls today: | 1 |
Files: | 1,024 |
Messages: | 90,180 |
Would you agree 'nest-sys' are peculiar to colon definitions. That
EXECUTE is a different class of function. It's not doing a 'call'
as such and not leaving anything on the 'return stack'?
dxf <dxforth@gmail.com> writes:
Would you agree 'nest-sys' are peculiar to colon definitions. That
EXECUTE is a different class of function. It's not doing a 'call'
as such and not leaving anything on the 'return stack'?
That's certainly the case for threaded-code implementations.
For native-code implementations the implementation of EXECUTE is
usually an indirect call; sometimes an indirect tail-call, i.e. a
jump.
In VFX64 5.43:
: foo execute ; ok
see foo
FOO
( 0050A250 488BD3 ) MOV RDX, RBX
( 0050A253 488B5D00 ) MOV RBX, [RBP]
( 0050A257 488D6D08 ) LEA RBP, [RBP+08]
( 0050A25B 48FFD2 ) CALL RDX
( 0050A25E C3 ) RET/NEXT
However:
see execute
EXECUTE
( 004211B0 53 ) PUSH RBX
( 004211B1 488B5D00 ) MOV RBX, [RBP]
( 004211B5 488D6D08 ) LEA RBP, [RBP+08]
( 004211B9 C3 ) RET/NEXT
( 10 bytes, 4 instructions )
The push-ret combination is an extremely slow form of an indirect
jump; so where is the return address (nest-sys) here? It's the return >address of the surrounding call. E.g., if you do
' + ' execute foo
it's the call in FOO.
SwiftForth 4.0.0-RC89:
see foo
4519B7 4028CB ( EXECUTE ) JMP E90F0FFBFF ok
That's a tail-call to EXECUTE. When EXECUTE is not tail-called, the
code of EXECUTE is invoked with call:
: bar execute . ; ok
see bar
4519D3 4028CB ( EXECUTE ) CALL E8F30EFBFF
4519D8 40B043 ( . ) JMP E96696FBFF ok
see execute
4028CB RBX RCX MOV 488BCB
4028CE 0 [RBP] RBX MOV 488B5D00
4028D2 8 [RBP] RBP LEA 488D6D08
4028D6 4028DD JRCXZ E305
4028D8 RDI RCX ADD 4801F9
4028DB RCX JMP FFE1
4028DD RET C3 ok
This special-cases the 0 EXECUTE case as NOOP, and also adds an offset
(the image start?) to the xt before performing the indirect jump, but
if you ignore those parts, this EXECUTE does the same things as VFX's,
except that it uses the much faster indirect jmp rather than push-ret.
lxf 1.7-172-983:
see foo
8692BC4 8050E6E 11 88C8000 5 normal FOO
8050E6E 8BC3 mov eax , ebx
8050E70 8B5D00 mov ebx , [ebp]
8050E73 8D6D04 lea ebp , [ebp+4h]
8050E76 FFD0 call eax
8050E78 C3 ret near
Here the EXECUTE is compiled inline and essentially implemented as
indirect call. lxf does not perform tail-call optimization.
see execute
868E2FC 88D6B47 11 88D475B 92 prim EXECUTE
88D6B47 8BC3 mov eax , ebx
88D6B49 8B5D00 mov ebx , [ebp]
88D6B4C 8D6D04 lea ebp , [ebp+4h]
88D6B4F FFD0 call eax
88D6B51 C3 ret near
The same code as FOO; after all, both words do the same thing.
iForth 5.1-mini (I think):
FORTH> ' foo idis
$10226000 : foo 488BC04883ED088F4500 H.@H.m..E. >$1022600A pop rbx 5B [
$1022600B or rbx, rbx 4809DB H.[ >$1022600E je $10226016 offset NEAR
0F8402000000 ......
$10226014 call rbx FFD3 .S >$10226016 ; 488B45004883C508FFE0 H.E.H.E..` ok
The use of call here is interesting, because iForth uses RSP as
data-stack pointer (e.g., the "pop rbx" moves the xt into rbx) and rbp
as return-stack pointer. Note the 10 bytes at the start of foo that
are not shown. If I disassemble that code (into AT&T syntax), it
looks as follows:
0x10226000: mov %rax,%rax
0x10226003: sub $0x8,%rbp
0x10226007: pop 0x0(%rbp)
0x1022600a: pop %rbx
0x1022600b: or %rbx,%rbx
0x1022600e: je 0x10226016
0x10226014: call *%rbx
0x10226016: mov 0x0(%rbp),%rax
0x1022601a: add $0x8,%rbp
0x1022601e: jmp *%rax
So here we see the first and last three instructions disassembled
(which "idis" does not do). The third instruction moves the return
address from the RSP stack to the RBP stack, and the second
instruction adjusts RBP for that. Note that this invocation via call
is not the usual way to invoke a colon definition from compiled code
in iForth. E.g.:
FORTH> : x . . ;
FORTH> ' x idis
$10226940 : x 488BC04883ED088F4500 H.@H.m..E. >$1022694A lea rbp, [rbp -8 +] qword
488D6DF8 H.mx
$1022694E mov [rbp 0 +] qword, $1022695B d#
48C745005B692210 HGE.[i".
$10226956 jmp .+A ( $1013888A ) offset NEAR
E92F1FF1FF i/.q.
$1022695B jmp .+A ( $1013888A ) offset NEAR
E92A1FF1FF i*.q.
$10226960 ; 488B45004883C508FFE0 H.E.H.E..`
Note that both calls to "." jump to ".+A", i.e., they skip the first
three instructions. The first invocation of "." pushes the return
address explicitly in the instructions at $1022694A and $1022694E, the
second invocation is a tail-call.
Back to EXECUTE: This means that iForth implements EXECUTE as pushing
the return address (in a convoluted way).
In the general case (no-tail EXECUTE) in all these native-code systems
a compiled EXECUTE pushes the return address.
This is not a problem for standard code because colon definitions and >does>-following code is not allowed to inspect stuff on the return
stack that it did not push there, and because other words either don't
access the return stack, or ticking them is non-standard (e.g., ' R@
is non-standard).
Could it be done without call? How would the return to the code after
the EXECUTE happen? One way to do it would be as follows:
The code for general (non-tail) EXECUTE:
... stack adjustments
mov rax, ra
jmp rdx # execute the xt
ra:
and for a constant the xt code would be:
... stack adjustment
mov rbx, const
jmp rax
while for a colon definition the xt code would be:
push rax
entry: #entry point for compiled code
... code of the colon definition
ret
The disadvantage of the scheme is that it does not pair the ret with a
call, but with a push, which leads to slow branch mispredictions. It
seems to me that if you want to use ret for EXIT and call for compiled
colon definitions, having a call for a non-tail EXECUTE is the most
efficient way to go.
- anton