Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 27 |
Nodes: | 6 (0 / 6) |
Uptime: | 38:00:19 |
Calls: | 631 |
Calls today: | 2 |
Files: | 1,187 |
D/L today: |
22 files (29,767K bytes) |
Messages: | 173,681 |
I wanted to generate a random sequence consisting of a number, NOOP,
and DROP. When EVALUATEing the sequence, starting with an empty
stack, it should end with an empty stack, and should not underflow the
stack.
I implemented this twice. The first attempt just generated a random word/number in the normal case, and a specific kind of word/number in
the boundary cases:
: genseq {: u -- :}
0 u 0 do
case
dup 0= ?of 1+ ." 9 " endof
dup i + u u>= ?of 1- ." drop " endof
dup i + 1+ u u>= ?of ." noop " endof
rnd 3 um* nip 1- dup case
-1 of ." drop " endof
0 of ." noop " endof
1 of ." 8 " endof
endcase
+
0 endcase
loop
drop ;
When I do 16 GENSEQ with seed 1234, I get the following sequence:
9 drop 9 8 8 noop drop drop 8 noop drop noop noop 8 drop drop
The requirements are distributed across several places in this word,
which I did not like. I also did not like the non-randomness of the
sequence of DROPs at the end, and the non-randomness of the 9 whenever
the stack is empty. I also found that I needed to add a clause for
when the stack depth is one less than the number of remaining words.
So I thought about other ways to do it.
Eventually I came up with the following approach: Starts with counters
for the different kinds of words/numbers. Whenever one word/number is generated, its counter is reduced. At each step the probability of generating a kind of word/number is proportional to the remaining
counter for that word/number. The same-depth requirement is
implemented by having the same initial counter for the number and
DROP. The no-underflow requirement is implemented by skipping the
output of a DROP if the count of remaining DROPs = the count of the
remaining numbers.
Here's the code:
create counts 3 cells allot
: initcounts ( u -- )
dup 3 / dup counts ! dup counts 2 th ! 2* - counts cell+ ! ;
: .counts ( -- )
counts 3 cell array>mem mem+do
i @ 4 .r
loop ;
: countsth-- ( u -- )
-1 counts rot th +! ;
: select1 ( u -- )
\ u is the index of the word
dup case
0 of ." 9 " countsth-- endof
1 of ." noop " countsth-- endof
2 of counts @ counts 2 th @ u< if
." drop " countsth--
else
drop endif
endof
endcase ;
: sum ( addr u -- u2 )
\ addr u is an array with u elements, u2 is the sum of the elements
0 -rot cell array>mem mem+do
i @ +
loop ;
: genseq ( u -- )
initcounts
begin
counts 3 sum dup while
rnd um* nip ( sum )
3 0 u+do ( sum1 )
counts i th @ 2dup u< if
i select1 then
- loop
drop
repeat
drop ;
The code is much longer, but I like it better. The requirements are concentrated in two places.
16 GENSEQ generates
9 9 noop noop drop noop 9 drop noop drop noop 9 drop 9 noop drop
for seed 123. A disadvantage of this approach is that it does not
generate a word/number on each iteration through the main loop. I
have thought about how to achieve this, but the result would be quite
a bit more complex; it's also only a minor issue. E.g., for
generating 1000 words/numbers with seed 123, the main loop has 1036 iterations.
The second approach takes quite a bit more time; for generating a
million words/numbers into a string on a Zen 4, 192_876_892 cycles
compared to 146_644_156 cycles.
The original goal was to produce a sequence where various parts of the
text interpreter have branch mispredictions; the less predictable the sequence, the better. In this respect the second sequence is
significantly better. For a sequence of 1000 words/numbers,
text-interpreted 1000 times, the number of mispredictions is:
728_322 first approach
821_813 second approach
(with 247_125 mispredictions coming from (mainly) Gforth startup).
I wanted to generate a random sequence consisting of a number, NOOP,
and DROP. When EVALUATEing the sequence, starting with an empty
stack, it should end with an empty stack, and should not underflow the
stack.
I implemented this twice. The first attempt just generated a random word/number in the normal case, and a specific kind of word/number in
the boundary cases:
: genseq {: u -- :}
0 u 0 do
case
dup 0= ?of 1+ ." 9 " endof
dup i + u u>= ?of 1- ." drop " endof
dup i + 1+ u u>= ?of ." noop " endof
rnd 3 um* nip 1- dup case
-1 of ." drop " endof
0 of ." noop " endof
1 of ." 8 " endof
endcase
+
0 endcase
loop
drop ;
When I do 16 GENSEQ with seed 1234, I get the following sequence:
9 drop 9 8 8 noop drop drop 8 noop drop noop noop 8 drop drop
The requirements are distributed across several places in this word,
which I did not like. I also did not like the non-randomness of the
sequence of DROPs at the end, and the non-randomness of the 9 whenever
the stack is empty. I also found that I needed to add a clause for
when the stack depth is one less than the number of remaining words.
So I thought about other ways to do it.
Eventually I came up with the following approach: Starts with counters
for the different kinds of words/numbers. Whenever one word/number is generated, its counter is reduced. At each step the probability of generating a kind of word/number is proportional to the remaining
counter for that word/number. The same-depth requirement is
implemented by having the same initial counter for the number and
DROP. The no-underflow requirement is implemented by skipping the
output of a DROP if the count of remaining DROPs = the count of the
remaining numbers.
Here's the code:
create counts 3 cells allot
: initcounts ( u -- )
dup 3 / dup counts ! dup counts 2 th ! 2* - counts cell+ ! ;
: .counts ( -- )
counts 3 cell array>mem mem+do
i @ 4 .r
loop ;
: countsth-- ( u -- )
-1 counts rot th +! ;
: select1 ( u -- )
\ u is the index of the word
dup case
0 of ." 9 " countsth-- endof
1 of ." noop " countsth-- endof
2 of counts @ counts 2 th @ u< if
." drop " countsth--
else
drop endif
endof
endcase ;
: sum ( addr u -- u2 )
\ addr u is an array with u elements, u2 is the sum of the elements
0 -rot cell array>mem mem+do
i @ +
loop ;
: genseq ( u -- )
initcounts
begin
counts 3 sum dup while
rnd um* nip ( sum )
3 0 u+do ( sum1 )
counts i th @ 2dup u< if
i select1 then
- loop
drop
repeat
drop ;
The code is much longer, but I like it better. The requirements are concentrated in two places.
16 GENSEQ generates
9 9 noop noop drop noop 9 drop noop drop noop 9 drop 9 noop drop
for seed 123. A disadvantage of this approach is that it does not
generate a word/number on each iteration through the main loop. I
have thought about how to achieve this, but the result would be quite
a bit more complex; it's also only a minor issue. E.g., for
generating 1000 words/numbers with seed 123, the main loop has 1036 iterations.
The second approach takes quite a bit more time; for generating a
million words/numbers into a string on a Zen 4, 192_876_892 cycles
compared to 146_644_156 cycles.
The original goal was to produce a sequence where various parts of the
text interpreter have branch mispredictions; the less predictable the sequence, the better. In this respect the second sequence is
significantly better. For a sequence of 1000 words/numbers,
text-interpreted 1000 times, the number of mispredictions is:
728_322 first approach
821_813 second approach
(with 247_125 mispredictions coming from (mainly) Gforth startup).
- anton
Am 30.09.2025 um 18:33 schrieb Anton Ertl:...
Eventually I came up with the following approach: Starts with counters
for the different kinds of words/numbers. Whenever one word/number is
generated, its counter is reduced. At each step the probability of
generating a kind of word/number is proportional to the remaining
counter for that word/number. The same-depth requirement is
implemented by having the same initial counter for the number and
DROP. The no-underflow requirement is implemented by skipping the
output of a DROP if the count of remaining DROPs = the count of the
remaining numbers.
Here's the code:
create counts 3 cells allot
: initcounts ( u -- )
dup 3 / dup counts ! dup counts 2 th ! 2* - counts cell+ ! ;
: .counts ( -- )
counts 3 cell array>mem mem+do
i @ 4 .r
loop ;
: countsth-- ( u -- )
-1 counts rot th +! ;
: select1 ( u -- )
\ u is the index of the word
dup case
0 of ." 9 " countsth-- endof
1 of ." noop " countsth-- endof
2 of counts @ counts 2 th @ u< if
." drop " countsth--
else
drop endif
endof
endcase ;
: sum ( addr u -- u2 )
\ addr u is an array with u elements, u2 is the sum of the elements
0 -rot cell array>mem mem+do
i @ +
loop ;
: genseq ( u -- )
initcounts
begin
counts 3 sum dup while
rnd um* nip ( sum )
3 0 u+do ( sum1 )
counts i th @ 2dup u< if
i select1 then
- loop
drop
repeat
drop ;
How about
1) quick generation of a suboptimal list
2) create an array of pointers to the list elements
3) random shuffle the array e.g. >https://en.wikipedia.org/wiki/Fisher-Yates_shuffle
I used something similar - but with a whole slew of stack operations.
Very handy thingy. I use it until this day.
( abc -- abcabc)-a-a >r over over r@ rot rot r> \ >r 2dup r@ -rot r>
Am 01.10.2025 um 17:11 schrieb Hans Bezemer:
I used something similar - but with a whole slew of stack operations. Very handy thingy. I use it until this day.
( abc -- abcabc)-a-a >r over over r@ rot rot r> \ >r 2dup r@ -rot r>
Different and probably not as efficient as your code generator:
MinForth 3.6
# : 3DUP { a b c == a b c a b c } ;
On 2/10/2025 4:42 am, minforth wrote:
Am 01.10.2025 um 17:11 schrieb Hans Bezemer:Very handy thingy. I use it until this day.
I used something similar - but with a whole slew of stack operations.
( abc -- abcabc)-a-a >r over over r@ rot rot r> \ >r 2dup r@ -rot r>
Different and probably not as efficient as your code generator:
MinForth 3.6
# : 3DUP { a b c == a b c a b c } ;
For 3DUP I believe this is the one to beat:
: 3DUP ( a b c -- a b c a b c ) dup 2over rot ;
With NTF/LFX the locals version will break even. For others, well, it may
be better not to look. For a straight-forward example of 'stack juggling', >locals handle it rather poorly.
For 3DUP I believe this is the one to beat:
: 3DUP ( a b c -- a b c a b c ) dup 2over rot ;
With NTF/LFX the locals version will break even.
For others, well, it may
be better not to look. For a straight-forward example of 'stack juggling', >locals handle it rather poorly.
r 1->0 third 1->2 >l >l 1->1 dup 1->1mov -$08[r14],r13 mov r15,$10[r10] >l 1->1 mov [r10],r13
2->1 add r14,$08 mov rax,rbp mov rbx,[r14]mov -$08[r10],r15 mov rax,[rbx] lea rbp,-$08[rbp] add r14,$08
In article <68de4aaa@news.ausics.net>, dxf <dxforth@gmail.com> wrote:
On 2/10/2025 4:42 am, minforth wrote:
Am 01.10.2025 um 17:11 schrieb Hans Bezemer:Very handy thingy. I use it until this day.
I used something similar - but with a whole slew of stack operations.
( abc -- abcabc)-a-a >r over over r@ rot rot r> \ >r 2dup r@ -rot r>
Different and probably not as efficient as your code generator:
MinForth 3.6
# : 3DUP { a b c == a b c a b c } ;
For 3DUP I believe this is the one to beat:
: 3DUP ( a b c -- a b c a b c ) dup 2over rot ;
CODE 3DUP
PUSH RSP[3*CELL_SIZE]
PUSH RSP[3*CELL_SIZE]
PUSH RSP[3*CELL_SIZE]
NEXT,
...
dxf <dxforth@gmail.com> writes:
For 3DUP I believe this is the one to beat:
: 3DUP ( a b c -- a b c a b c ) dup 2over rot ;
With NTF/LFX the locals version will break even.
As we already discussed in the thread including ><2021Sep11.083507@mips.complang.tuwien.ac.at>, NTF/LXF produces the
same (optimal for the calling convention used by NTF/LXF) code for
3DUP versions using the data stack, return stack, or locals. That's
because the actual data flow is always the same, and NTF/LXF can see
this data flow in all three cases.
- anton--
In article <2025Oct2.224440@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
dxf <dxforth@gmail.com> writes:
For 3DUP I believe this is the one to beat:
: 3DUP ( a b c -- a b c a b c ) dup 2over rot ;
With NTF/LFX the locals version will break even.
As we already discussed in the thread including
<2021Sep11.083507@mips.complang.tuwien.ac.at>, NTF/LXF produces the
same (optimal for the calling convention used by NTF/LXF) code for
3DUP versions using the data stack, return stack, or locals. That's
because the actual data flow is always the same, and NTF/LXF can see
this data flow in all three cases.
The problem with 3DUP is that it is actually used in context.
What is the data that is going to 3DUP ped? In my view this
amounts to double use of data that is in registers (32 in the riscv)
anyway, after an optimiser does his thing.
Am 03.10.2025 um 11:02 schrieb albert@spenarnc.xs4all.nl:
The problem with 3DUP is that it is actually used in context.
What is the data that is going to 3DUP ped? In my view this
amounts to double use of data that is in registers (32 in the riscv)
anyway, after an optimiser does his thing.
Code inlining will mend it.
r 1->0 third 1->1 >l >l 1->1 dup 1->1mov -8[r14],r13 mov [r10],r13 >l 1->1 mov [r10],r13
2->3 mov [r15],r13 mov rax,rbp + 3->2mov r9,[r14] ;s 0->1 lea rbp,-$08[rbp] add r15,r9
Locals-haters, come to Gforth, where locals are implemented inefficiently:-). The code for 3DUP.2 is actually optimal for
Gforth's calling convention.
dxf <dxforth@gmail.com> writes:
For 3DUP I believe this is the one to beat:
: 3DUP ( a b c -- a b c a b c ) dup 2over rot ;
With NTF/LFX the locals version will break even.
As we already discussed in the thread including <2021Sep11.083507@mips.complang.tuwien.ac.at>, NTF/LXF produces the
same (optimal for the calling convention used by NTF/LXF) code for
3DUP versions using the data stack, return stack, or locals. That's
because the actual data flow is always the same, and NTF/LXF can see
this data flow in all three cases.
For others, well, it may
be better not to look. For a straight-forward example of 'stack juggling', >> locals handle it rather poorly.
Other Forth systems implement locals poorly. LXF/NTF demonstrates
that this is not due to some natural law, however.
There have been some improvements in Gforth since that time. Let's
see how the versions used in that thread look on today's gforth-fast.
Here are the versions of 3DUP:
: 3dup.1 ( a b c -- a b c a b c ) >r 2dup r@ -rot r> ;
: 3dup.2 ( a b c -- a b c a b c ) 2 pick 2 pick 2 pick ;
: 3dup.3 {: a b c :} a b c a b c ;
: 3dup.4 ( a b c -- a b c a b c ) dup 2over rot ;
And here's the gforth-fast code on AMD64:
3dup.1 3dup.2 3dup.3 3dup.4
r 1->0 third 1->2 >l >l 1->1 dup 1->1mov -$08[r14],r13 mov r15,$10[r10] >l 1->1 mov [r10],r13
sub r14,$08 third 2->3 mov -$08[rbp],r13 sub r10,$08 2dup 0->2 mov r9,$08[r10] mov rdx,$08[r10] 2over 1->3
mov r13,$10[r10] third 3->1 mov rax,rbp mov r15,$18[r10
mov r15,$08[r10] mov [r10],r13 add r10,$10 mov r9,$10[r10] i 2->3 sub r10,$18 lea rbp,-$10[rbp] rot 3->1
mov r9,[r14] mov $10[r10],r15 mov -$10[rax],rdx mov [r10],r15 -rot 3->2 mov $08[r10],r9 mov r13,[r10] sub r10,$10
mov [r10],r9 ;s 1->1 >l @local0 1->1 mov $08[r10],r9
sub r10,$08 mov rbx,[r14] @local0 1->1 ;s 1->1
2->1 add r14,$08 mov rax,rbp mov rbx,[r14]mov -$08[r10],r15 mov rax,[rbx] lea rbp,-$08[rbp] add r14,$08
sub r10,$10 jmp eax mov -$08[rax],r13 mov rax,[rbx]
mov $10[r10],r13 @local1 1->2 jmp eax
mov r13,[r14] mov r15,$08[rbp]
add r14,$08 @local2 2->1
;s 1->1 mov -$08[r10],r15
mov rbx,[r14] sub r10,$10
add r14,$08 mov $10[r10],r13
mov rax,[rbx] mov r13,$10[rbp]
jmp eax @local0 1->2
mov r15,$00[rbp]
@local1 2->3
mov r9,$08[rbp]
@local2 3->1
mov -$10[r10],r9
sub r10,$18
mov $10[r10],r15
mov $18[r10],r13
mov r13,$10[rbp]
lit 1->2
#24
mov r15,$50[rbx]
lp+! 2->1
add rbp,r15
;s 1->1
mov rbx,[r14]
add r14,$08
mov rax,[rbx]
jmp eax
- anton
I wanted to generate a random sequence consisting of a number, NOOP,
and DROP. When EVALUATEing the sequence, starting with an empty
stack, it should end with an empty stack, and should not underflow the
stack.
I implemented this twice. The first attempt just generated a random word/number in the normal case, and a specific kind of word/number in
the boundary cases:
: genseq {: u -- :}
0 u 0 do
case
dup 0= ?of 1+ ." 9 " endof
dup i + u u>= ?of 1- ." drop " endof
dup i + 1+ u u>= ?of ." noop " endof
rnd 3 um* nip 1- dup case
-1 of ." drop " endof
0 of ." noop " endof
1 of ." 8 " endof
endcase
+
0 endcase
loop
drop ;
When I do 16 GENSEQ with seed 1234, I get the following sequence:
9 drop 9 8 8 noop drop drop 8 noop drop noop noop 8 drop drop
The requirements are distributed across several places in this word,
which I did not like. I also did not like the non-randomness of the
sequence of DROPs at the end, and the non-randomness of the 9 whenever
the stack is empty. I also found that I needed to add a clause for
when the stack depth is one less than the number of remaining words.
So I thought about other ways to do it.
Eventually I came up with the following approach: Starts with counters
for the different kinds of words/numbers. Whenever one word/number is generated, its counter is reduced. At each step the probability of generating a kind of word/number is proportional to the remaining
counter for that word/number. The same-depth requirement is
implemented by having the same initial counter for the number and
DROP. The no-underflow requirement is implemented by skipping the
output of a DROP if the count of remaining DROPs = the count of the
remaining numbers.
Here's the code:
create counts 3 cells allot
: initcounts ( u -- )
dup 3 / dup counts ! dup counts 2 th ! 2* - counts cell+ ! ;
: .counts ( -- )
counts 3 cell array>mem mem+do
i @ 4 .r
loop ;
: countsth-- ( u -- )
-1 counts rot th +! ;
: select1 ( u -- )
\ u is the index of the word
dup case
0 of ." 9 " countsth-- endof
1 of ." noop " countsth-- endof
2 of counts @ counts 2 th @ u< if
." drop " countsth--
else
drop endif
endof
endcase ;
: sum ( addr u -- u2 )
\ addr u is an array with u elements, u2 is the sum of the elements
0 -rot cell array>mem mem+do
i @ +
loop ;
: genseq ( u -- )
initcounts
begin
counts 3 sum dup while
rnd um* nip ( sum )
3 0 u+do ( sum1 )
counts i th @ 2dup u< if
i select1 then
- loop
drop
repeat
drop ;
The code is much longer, but I like it better. The requirements are concentrated in two places.
16 GENSEQ generates
9 9 noop noop drop noop 9 drop noop drop noop 9 drop 9 noop drop
for seed 123. A disadvantage of this approach is that it does not
generate a word/number on each iteration through the main loop. I
have thought about how to achieve this, but the result would be quite
a bit more complex; it's also only a minor issue. E.g., for
generating 1000 words/numbers with seed 123, the main loop has 1036 iterations.
The second approach takes quite a bit more time; for generating a
million words/numbers into a string on a Zen 4, 192_876_892 cycles
compared to 146_644_156 cycles.
The original goal was to produce a sequence where various parts of the
text interpreter have branch mispredictions; the less predictable the sequence, the better. In this respect the second sequence is
significantly better. For a sequence of 1000 words/numbers,
text-interpreted 1000 times, the number of mispredictions is:
728_322 first approach
821_813 second approach
(with 247_125 mispredictions coming from (mainly) Gforth startup).