Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.
(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,
f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
What could specialized hardware look like?
The central part could be a unit which is handed a block of numbers
to check, for example a range of 2^32 numbers. The current records
for number of iterations and maximum number reached would also be
stored in that unit.
In that unit, the calculaiton would be performed by a combination
of adder and shifter. The input would always be odd (so the last
bit could be dropped), and it would calculate (3*n+1)>>ctz(3*n+1).
The adder would compute (3*n+1)/2, and would be specialzed 128-bit
adder. Why specialized? The logic is far less complex than a general-purpose adder. For example, a two-bit adder has the formula
y1 = (!a1&a0) | (!c_in&a1&!a0);
y0 = (c_in&!a1&!a0) | (!c_in&a1&!a0) | (a1&a0);
and its propagate and generate bits are
p = (a1);
g = (a1&a0);
The maximum number of trailing zeros after the (3*n+1)/2 step is
three and can be determined cheaply from bits <3:1> of the input,
so the input to the shifter (two levels only) can be provided
in parallel. The machine would be designed that adding and
shifting can be done in a single cycle.
The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.
Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.
Which numbers to chose? It is well known that only certain
numbers with remainders modulo 2^n are interesting. The number
can be seen on OEIS A076227. For example, modulo 2^10, only 64
numbers are "interesting", or 6.25% (but only eight bits would
be needed to store them because the last two are always one).
Modulo 2^16 would be 734 numbers, or 3.22%, so this would cut the
amount of numbers to be checked by half. One would use a small ROM,
I presume. The number of bits would be a tradeoff between area and
amount of compuation. My current guess would be that 2^10 could
be a sweet spot. One third of the numbers would further be
excluded by checking for divisiblity by three.
The problem is embarassingly parallel. One could chose a size so
each parcel is calculated in a few seconds or even minutes. to avoid
having to avoid a large amount of communication with the outside.
Because all units would be working all the time, and if one has
a smaller unit one could simply build more of them, I suspect that
power would be the main limiting factor. An adder design that
might not be very fast, but uses lower power, low voltage and low
frequency would be beneficial.
Comments?
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.
(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,
f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Comments?
Interesting problem,
waste of power.
On 14/02/2026 19:48, MitchAlsup wrote:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.
(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,
f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that? "Induction" says nothing of the sort.
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers. There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.
Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/. There are plenty
of mathematical conjectures that are true up to very large numbers, but
fail later on.
Comments?
Interesting problem,
waste of power.
It is a mathematically very interesting problem. Whether there is
anything to be gained by continuing the search for exceptions by brute force, is a different matter - but /if/ it is worth doing, then doing so more efficiently is a good idea.
David Brown <david.brown@hesbynett.no> posted:
On 14/02/2026 19:48, MitchAlsup wrote:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.
(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,
f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that? "Induction" says nothing of the sort.
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers. There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.
By proving up to 2^71, you simultaneously prove that it cannot be proven
by computer-like arithmetic processes. What if the number was 2^71,000,000 you still have not proven it? Thereby, it is not provable by simple computer arithmetic processes.
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.
(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,
f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
What could specialized hardware look like?
The central part could be a unit which is handed a block of numbers
to check, for example a range of 2^32 numbers. The current records
for number of iterations and maximum number reached would also be
stored in that unit.
In that unit, the calculaiton would be performed by a combination
of adder and shifter. The input would always be odd (so the last
bit could be dropped), and it would calculate (3*n+1)>>ctz(3*n+1).
The adder would compute (3*n+1)/2, and would be specialzed 128-bit
adder. Why specialized? The logic is far less complex than a general-purpose adder. For example, a two-bit adder has the formula
y1 = (!a1&a0) | (!c_in&a1&!a0);
y0 = (c_in&!a1&!a0) | (!c_in&a1&!a0) | (a1&a0);
and its propagate and generate bits are
p = (a1);
g = (a1&a0);
The maximum number of trailing zeros after the (3*n+1)/2 step is
three and can be determined cheaply from bits <3:1> of the input,
so the input to the shifter (two levels only) can be provided
in parallel. The machine would be designed that adding and
shifting can be done in a single cycle.
The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.
Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.
Which numbers to chose? It is well known that only certain
numbers with remainders modulo 2^n are interesting. The number
can be seen on OEIS A076227. For example, modulo 2^10, only 64
numbers are "interesting", or 6.25% (but only eight bits would
be needed to store them because the last two are always one).
Modulo 2^16 would be 734 numbers, or 3.22%, so this would cut the
amount of numbers to be checked by half. One would use a small ROM,
I presume. The number of bits would be a tradeoff between area and
amount of compuation. My current guess would be that 2^10 could
be a sweet spot. One third of the numbers would further be
excluded by checking for divisiblity by three.
The problem is embarassingly parallel. One could chose a size so
each parcel is calculated in a few seconds or even minutes. to avoid
having to avoid a large amount of communication with the outside.
Because all units would be working all the time, and if one has
a smaller unit one could simply build more of them, I suspect that
power would be the main limiting factor. An adder design that
might not be very fast, but uses lower power, low voltage and low
frequency would be beneficial.
Comments?
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.
(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,
f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
What could specialized hardware look like?
The central part could be a unit which is handed a block of numbers
to check, for example a range of 2^32 numbers. The current records
for number of iterations and maximum number reached would also be
stored in that unit.
In that unit, the calculaiton would be performed by a combination
of adder and shifter. The input would always be odd (so the last
bit could be dropped), and it would calculate (3*n+1)>>ctz(3*n+1).
The adder would compute (3*n+1)/2, and would be specialzed 128-bit
adder. Why specialized? The logic is far less complex than a general-purpose adder. For example, a two-bit adder has the formula
y1 = (!a1&a0) | (!c_in&a1&!a0);
y0 = (c_in&!a1&!a0) | (!c_in&a1&!a0) | (a1&a0);
and its propagate and generate bits are
p = (a1);
g = (a1&a0);
The maximum number of trailing zeros after the (3*n+1)/2 step is
three and can be determined cheaply from bits <3:1> of the input,
so the input to the shifter (two levels only) can be provided
in parallel. The machine would be designed that adding and
shifting can be done in a single cycle.
The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.
Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.
Which numbers to chose? It is well known that only certain
numbers with remainders modulo 2^n are interesting. The number
can be seen on OEIS A076227. For example, modulo 2^10, only 64
numbers are "interesting", or 6.25% (but only eight bits would
be needed to store them because the last two are always one).
Modulo 2^16 would be 734 numbers, or 3.22%, so this would cut the
amount of numbers to be checked by half. One would use a small ROM,
I presume. The number of bits would be a tradeoff between area and
amount of compuation. My current guess would be that 2^10 could
be a sweet spot. One third of the numbers would further be
excluded by checking for divisiblity by three.
The problem is embarassingly parallel. One could chose a size so
each parcel is calculated in a few seconds or even minutes. to avoid
having to avoid a large amount of communication with the outside.
Because all units would be working all the time, and if one has
a smaller unit one could simply build more of them, I suspect that
power would be the main limiting factor. An adder design that
might not be very fast, but uses lower power, low voltage and low
frequency would be beneficial.
Comments?
I wonder how difficult it would be to implement a parallel tester in an FPGA. It looks simple enough and there are hundreds of DSP blocks
available to use. Could test in blocks of hundreds of numbers at a time. Running at 100 MHz * 200 tests at a time = how long to get to 2^71?
I wonder if the calculation can be broken down further. Numbers are all
just defined by rules. I wonder if some calculus may apply. Limit of function approaches one for a finite number of steps. There may be a
whole range of algorithms, is there one for the number two? Then the
number three?
Occasionally I have made useless machinery. Great puzzle work. Made the
game of life in hardware.
On 2026-02-14 4:57 a.m., Thomas Koenig wrote:
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.
(A reminder, if any is needed: From a starting number n, the transformation, recursively applied,
f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
All the work is done on general-purpose hardware, which has many capabilities that are not needed, and consume area and power that special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
What could specialized hardware look like?
The central part could be a unit which is handed a block of numbers
to check, for example a range of 2^32 numbers. The current records
for number of iterations and maximum number reached would also be
stored in that unit.
In that unit, the calculaiton would be performed by a combination
of adder and shifter. The input would always be odd (so the last
bit could be dropped), and it would calculate (3*n+1)>>ctz(3*n+1).
The adder would compute (3*n+1)/2, and would be specialzed 128-bit
adder. Why specialized? The logic is far less complex than a general-purpose adder. For example, a two-bit adder has the formula
y1 = (!a1&a0) | (!c_in&a1&!a0);
y0 = (c_in&!a1&!a0) | (!c_in&a1&!a0) | (a1&a0);
and its propagate and generate bits are
p = (a1);
g = (a1&a0);
The maximum number of trailing zeros after the (3*n+1)/2 step is
three and can be determined cheaply from bits <3:1> of the input,
so the input to the shifter (two levels only) can be provided
in parallel. The machine would be designed that adding and
shifting can be done in a single cycle.
The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.
Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked
against the previous, pre-set maximum, with would also signal to
the outside world.
Which numbers to chose? It is well known that only certain
numbers with remainders modulo 2^n are interesting. The number
can be seen on OEIS A076227. For example, modulo 2^10, only 64
numbers are "interesting", or 6.25% (but only eight bits would
be needed to store them because the last two are always one).
Modulo 2^16 would be 734 numbers, or 3.22%, so this would cut the
amount of numbers to be checked by half. One would use a small ROM,
I presume. The number of bits would be a tradeoff between area and
amount of compuation. My current guess would be that 2^10 could
be a sweet spot. One third of the numbers would further be
excluded by checking for divisiblity by three.
The problem is embarassingly parallel. One could chose a size so
each parcel is calculated in a few seconds or even minutes. to avoid
having to avoid a large amount of communication with the outside.
Because all units would be working all the time, and if one has
a smaller unit one could simply build more of them, I suspect that
power would be the main limiting factor. An adder design that
might not be very fast, but uses lower power, low voltage and low
frequency would be beneficial.
Comments?I wonder how difficult it would be to implement a parallel tester in
an FPGA. It looks simple enough and there are hundreds of DSP blocks available to use. Could test in blocks of hundreds of numbers at a
time. Running at 100 MHz * 200 tests at a time = how long to get to
2^71?
I wonder if the calculation can be broken down further. Numbers are
all just defined by rules. I wonder if some calculus may apply. Limit
of function approaches one for a finite number of steps. There may be
a whole range of algorithms, is there one for the number two? Then
the number three?
Occasionally I have made useless machinery. Great puzzle work. Made
the game of life in hardware.
On 15/02/2026 09:58, Robert Finch wrote:
I wonder how difficult it would be to implement a parallel tester in an
FPGA. It looks simple enough and there are hundreds of DSP blocks
available to use. Could test in blocks of hundreds of numbers at a time.
Running at 100 MHz * 200 tests at a time = how long to get to 2^71?
One of the nice things about this is that you don't need DSP blocks - it
all works well with very simple operations well-suited to an FPGA. Your calculation blocks would hold an n-bit number (wider than the number you
are testing for, since rounds of the Collatz function can grow a lot)
and a counter. Initialise the block to the number you want to test.
For each round, check the LSB. If it is 0, right-shift your register.
If it is even, replace "x" with "(x << 1) + x". That's just some simple logic - no DSP blocks involved. If your register or your counter
overflow, you've found an "interesting" number and can pass that out to
a PC to test it fully.
There's lots of scope for both high-level work (the housekeeping and tracking of operations for each of your worker elements) and low-level
work (getting the carry operations to work at high speed).
"Unknown" is a correct theoretically answer, but I can provide a very educated guess that on average it would take ~500 steps and the worst
case would not exceed 2000 steps.
So, bad news are that overall it will take more than a million years on
your machine. And good news are that it'll take less than 2 million
years.
Just thinking about a way to replace ~N iterations by a MUL, ADD and SHR
Even with 128-bit numbers, I would only need two MULs, two ADD/ADC pairs
and a SHRD/SHR, so maybe 8-10 cycles?
Robert Finch <robfi680@gmail.com> posted:
On 2026-02-15 7:14 a.m., David Brown wrote:
On 15/02/2026 09:58, Robert Finch wrote:Yeah, I figured that out after coding. 3n+1 = n+n+n+1
I wonder how difficult it would be to implement a parallel tester in
an FPGA. It looks simple enough and there are hundreds of DSP blocks
available to use. Could test in blocks of hundreds of numbers at a time. >>>> Running at 100 MHz * 200 tests at a time = how long to get to 2^71?
One of the nice things about this is that you don't need DSP blocks - it
3n+1 = (n<<1)+n+1
one add with carry in and n gets wired in directly and also shifted
up by 1 (no logic shifter).
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Just thinking about a way to replace ~N iterations by a MUL, ADD and SHR
Even with 128-bit numbers, I would only need two MULs, two ADD/ADC pairs
and a SHRD/SHR, so maybe 8-10 cycles?
https://link.springer.com/article/10.1007/s11227-025-07337-0 gives
an improved algorithm, which they are using for the supercomputer
version. They are doing:
n = n0
repeat
n = n + 1
alpha = ctz(n)
n = n * 3^alpha / 2^alpha
n = n - 1
beta = ctz(n)
n = n / 2^beta
until n < n_0
They are also playing some strange tricks with parallel solving that
are not described clearly, or I am not clever enough to understand
what they are doing. I would have to look at their code, I guess.
I wonder how difficult it would be to implement a parallel tester in an >FPGA. It looks simple enough and there are hundreds of DSP blocks
available to use. Could test in blocks of hundreds of numbers at a time. >Running at 100 MHz * 200 tests at a time = how long to get to 2^71?
Thomas Koenig wrote:
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Just thinking about a way to replace ~N iterations by a MUL, ADD and SHR >>>
Even with 128-bit numbers, I would only need two MULs, two ADD/ADC pairs >>> and a SHRD/SHR, so maybe 8-10 cycles?
https://link.springer.com/article/10.1007/s11227-025-07337-0 gives
an improved algorithm, which they are using for the supercomputer
version.-a They are doing:
n = n0
repeat
-a-a n = n + 1
-a-a alpha = ctz(n)
-a-a n = n * 3^alpha / 2^alpha
Strange syntax...
ctz(n) must count trailing zeroes, right?
If so, then ctz(n+1) will return 1 or more for odd n, so that gives
3n+3, which after the division by 2 returns a value which is one too high.
-a-a n = n - 1
but this is corrected here!
The key must be that the multiplication by a power of three also works
for n with multiple trailing 1 bits.
The "x/2^alpha" part is of course just x>>alpha.
For an even n they still do a multiplication by 1 and a SHR 0, but the
code is branchless.
-a-a beta = ctz(n)
-a-a n = n / 2^beta
until n < n_0
I am guessing that the best way to do the n*3^alpha parts is a table of powers of 3, so
let mut n = n0;
loop {
-a n += 1;
-a let alpha = ctz(n);
-a n = (n * power_of_three[alpha]) >> alpha;
-a n -= 1;
-a beta = ctz(n);
-a n >>= beta;
-a if n <= n_0 {break};
}
:-)
They are also playing some strange tricks with parallel solving that
are not described clearly, or I am not clever enough to understand
what they are doing. I would have to look at their code, I guess.
Terje
Michael S <already5chosen@yahoo.com> schrieb:
"Unknown" is a correct theoretically answer, but I can provide a
very educated guess that on average it would take ~500 steps and
the worst case would not exceed 2000 steps.
So, bad news are that overall it will take more than a million
years on your machine. And good news are that it'll take less than
2 million years.
Not as bad, very many simplifications are possible.
For one, you can discount all even numbers as starting values.
You can go further: If you look at residues modulo 2^n of your
starting numbers, there are only A076227(n) "interesting" remaiders
that you need to look at.
If you take n=39, only 3611535862 cases
need to be looked at, so only 3611535862/2^39 of all cases need to
be looked at, a reduction of a factor around 150. You can also
do some more reductions (only 2/3 of those numbers need checking
if you can calculate the remainder by 3 fast). There are different
tradeoffs for different n, of course (a factor of 16 for n=10,
for example).
The main key is parallelization.
On 2026-02-15 4:59 p.m., Terje Mathisen wrote:Worst case - few thousands, or less. I didn't yet encounter 1000.
Thomas Koenig wrote:
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Just thinking about a way to replace ~N iterations by a MUL, ADD
and SHR
Even with 128-bit numbers, I would only need two MULs, two
ADD/ADC pairs and a SHRD/SHR, so maybe 8-10 cycles?
https://link.springer.com/article/10.1007/s11227-025-07337-0 gives
an improved algorithm, which they are using for the supercomputer
version.a They are doing:
n = n0
repeat
aa n = n + 1
aa alpha = ctz(n)
aa n = n * 3^alpha / 2^alpha
Strange syntax...
ctz(n) must count trailing zeroes, right?
If so, then ctz(n+1) will return 1 or more for odd n, so that gives
3n+3, which after the division by 2 returns a value which is one
too high.
aa n = n - 1
but this is corrected here!
The key must be that the multiplication by a power of three also
works for n with multiple trailing 1 bits.
The "x/2^alpha" part is of course just x>>alpha.
For an even n they still do a multiplication by 1 and a SHR 0, but
the code is branchless.
aa beta = ctz(n)
aa n = n / 2^beta
until n < n_0
I am guessing that the best way to do the n*3^alpha parts is a
table of powers of 3, so
let mut n = n0;
loop {
a n += 1;
a let alpha = ctz(n);
a n = (n * power_of_three[alpha]) >> alpha;
a n -= 1;
a beta = ctz(n);
a n >>= beta;
a if n <= n_0 {break};
}
:-)
They are also playing some strange tricks with parallel solving
that are not described clearly, or I am not clever enough to
understand what they are doing. I would have to look at their
code, I guess.
TerjeHow many iterations are done in one loop?
I think it would need to be
at least 10 on average to beat the simple approach. The simple
approach only taking one clock per iteration. The 128-bit multiply
and 128-bit count trailing zeros are slow in an FPGA.
And, can the approach consume a more limited number of iterations?
For example counting only the 6 LSB zeros?
I found a mistake in my code handling groups. It can only do about a 24x24=576 matrix of calculations in parallel @ 200MHz. So that's 1/4
million years?
On 2026-02-15 4:59 p.m., Terje Mathisen wrote:Right.
Thomas Koenig wrote:How many iterations are done in one loop? I think it would need to be at least 10 on average to beat the simple approach. The simple approach
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Just thinking about a way to replace ~N iterations by a MUL, ADD and
SHR
Even with 128-bit numbers, I would only need two MULs, two ADD/ADC
pairs
and a SHRD/SHR, so maybe 8-10 cycles?
https://link.springer.com/article/10.1007/s11227-025-07337-0 gives
an improved algorithm, which they are using for the supercomputer
version.|e-a They are doing:
n = n0
repeat
|e-a|e-a n = n + 1
|e-a|e-a alpha = ctz(n)
|e-a|e-a n = n * 3^alpha / 2^alpha
Strange syntax...
ctz(n) must count trailing zeroes, right?
If so, then ctz(n+1) will return 1 or more for odd n, so that gives
3n+3, which after the division by 2 returns a value which is one too
high.
|e-a|e-a n = n - 1
but this is corrected here!
The key must be that the multiplication by a power of three also works
for n with multiple trailing 1 bits.
The "x/2^alpha" part is of course just x>>alpha.
For an even n they still do a multiplication by 1 and a SHR 0, but the
code is branchless.
|e-a|e-a beta = ctz(n)
|e-a|e-a n = n / 2^beta
until n < n_0
I am guessing that the best way to do the n*3^alpha parts is a table
of powers of 3, so
let mut n = n0;
loop {
-a|e-a n += 1;
-a|e-a let alpha = ctz(n);
-a|e-a n = (n * power_of_three[alpha]) >> alpha;
-a|e-a n -= 1;
-a|e-a beta = ctz(n);
-a|e-a n >>= beta;
-a|e-a if n <= n_0 {break};
}
:-)
They are also playing some strange tricks with parallel solving that
are not described clearly, or I am not clever enough to understand
what they are doing. I would have to look at their code, I guess.
Terje
only taking one clock per iteration. The 128-bit multiply and 128-bit
count trailing zeros are slow in an FPGA.
And, can the approach consume a more limited number of iterations? For > example counting only the 6 LSB zeros?It should work directly with a max(ctz,N) wrapper.
I found a mistake in my code handling groups. It can only do about a 24x24=576 matrix of calculations in parallel @ 200MHz. So that's 1/4 > million years?FPGAs typically run at much, much lower frequencies than modern CPUs, so if the code above can average 3-5 iterations per loop, and do it in
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.
The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.
Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.
The problem is embarassingly parallel.
On 2/14/2026 1:57 AM, Thomas Koenig wrote:
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.
big snip
The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.
Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked
against the previous, pre-set maximum, with would also signal to
the outside world.
Hold these thoughts
snip
The problem is embarassingly parallel.
Sort of. You could do it that way, but you would be giving up some performance. Specifically, as you say, you want to check the number
against the previous input value and the preset maximum. But in a
parallel implementation, you want to check against the largest value
checked *across all the parallel threads*. In order to do that, the
threads need to communicate, thus violating the "embarrassingly
parallel" rule. I don't know how much performance you would give up
so it may not make much difference,
nor do I know the cost of keeping
some "global" largest value checked so far. There is a tradeoff here.
On Sun, 15 Feb 2026 21:53:10 -0500
Robert Finch <robfi680@gmail.com> wrote:
On 2026-02-15 4:59 p.m., Terje Mathisen wrote:
Thomas Koenig wrote:
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Just thinking about a way to replace ~N iterations by a MUL, ADD
and SHR
Even with 128-bit numbers, I would only need two MULs, two
ADD/ADC pairs and a SHRD/SHR, so maybe 8-10 cycles?
https://link.springer.com/article/10.1007/s11227-025-07337-0
gives an improved algorithm, which they are using for the
supercomputer version.a They are doing:
n = n0
repeat
aa n = n + 1
aa alpha = ctz(n)
aa n = n * 3^alpha / 2^alpha
Strange syntax...
ctz(n) must count trailing zeroes, right?
If so, then ctz(n+1) will return 1 or more for odd n, so that
gives 3n+3, which after the division by 2 returns a value which
is one too high.
aa n = n - 1
but this is corrected here!
The key must be that the multiplication by a power of three also
works for n with multiple trailing 1 bits.
The "x/2^alpha" part is of course just x>>alpha.
For an even n they still do a multiplication by 1 and a SHR 0, but
the code is branchless.
aa beta = ctz(n)
aa n = n / 2^beta
until n < n_0
I am guessing that the best way to do the n*3^alpha parts is a
table of powers of 3, so
let mut n = n0;
loop {
a n += 1;
a let alpha = ctz(n);
a n = (n * power_of_three[alpha]) >> alpha;
a n -= 1;
a beta = ctz(n);
a n >>= beta;
a if n <= n_0 {break};
}
:-)
They are also playing some strange tricks with parallel solving
that are not described clearly, or I am not clever enough to
understand what they are doing. I would have to look at their
code, I guess.
TerjeHow many iterations are done in one loop?
Worst case - few thousands, or less. I didn't yet encounter 1000.
Mean - 6 or 7. May be, 8, but that's unlikely.
I think it would need to be
at least 10 on average to beat the simple approach. The simple
approach only taking one clock per iteration. The 128-bit multiply
and 128-bit count trailing zeros are slow in an FPGA.
And, can the approach consume a more limited number of iterations?
For example counting only the 6 LSB zeros?
I found a mistake in my code handling groups. It can only do about
a 24x24=576 matrix of calculations in parallel @ 200MHz. So that's
1/4 million years?
Much less.
My 1st estimate was for more stupid loop that always counts down to 1.
On Sun, 15 Feb 2026 19:19:16 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
Michael S <already5chosen@yahoo.com> schrieb:
"Unknown" is a correct theoretically answer, but I can provide a
very educated guess that on average it would take ~500 steps and
the worst case would not exceed 2000 steps.
So, bad news are that overall it will take more than a million
years on your machine. And good news are that it'll take less than
2 million years.
Not as bad, very many simplifications are possible.
For one, you can discount all even numbers as starting values.
O.k.
You can go further: If you look at residues modulo 2^n of your
starting numbers, there are only A076227(n) "interesting" remaiders
that you need to look at.
Can not say that I understood.
If you take n=39, only 3611535862 cases
need to be looked at, so only 3611535862/2^39 of all cases need to
be looked at, a reduction of a factor around 150. You can also
do some more reductions (only 2/3 of those numbers need checking
if you can calculate the remainder by 3 fast). There are different tradeoffs for different n, of course (a factor of 16 for n=10,
for example).
On Mon, 16 Feb 2026 12:31:31 +0200
Michael S <already5chosen@yahoo.com> wrote:
On Sun, 15 Feb 2026 19:19:16 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
Michael S <already5chosen@yahoo.com> schrieb:
"Unknown" is a correct theoretically answer, but I can provide a
very educated guess that on average it would take ~500 steps and
the worst case would not exceed 2000 steps.
So, bad news are that overall it will take more than a million
years on your machine. And good news are that it'll take less
than 2 million years.
Not as bad, very many simplifications are possible.
For one, you can discount all even numbers as starting values.
O.k.
You can go further: If you look at residues modulo 2^n of your
starting numbers, there are only A076227(n) "interesting"
remaiders that you need to look at.
Can not say that I understood.
If you take n=39, only 3611535862 cases
need to be looked at, so only 3611535862/2^39 of all cases need to
be looked at, a reduction of a factor around 150. You can also
do some more reductions (only 2/3 of those numbers need checking
if you can calculate the remainder by 3 fast). There are
different tradeoffs for different n, of course (a factor of 16
for n=10, for example).
Hopefully now I figured out what you are talking about, except of 2/3
part and apart from the meaning of A076227(n).
I experimented a little with these ideas and it seems that reduction
in number of steps is nowhere near the reduction in number of tested
cases. My test vectors were 2**32 consecutive points starting at
2**52 or 2**53. I tried different modulo filters in range from 2**5
to 2**31. Two modes of filtering were used.
In "simple" mode I just filtered out "non-interesting" inputs, but
did nothing special about "interesting" inputs.
In "advanced" mode I accelerated processing of "interesting" inputs by
means of starting their testing from point x0= (x/M)*A[x%M]+B[x%M],
where tables A[] and B[] were pre-calculated during filter build step. Obviously, "advanced" mode filtering requires ~3 times more memory
plus multiplier + plus some logic.
Here are my results:
$ ./nsteps2 4503599627370496 0x100000000
Initial value = 4503599627370496 (0010000000000000).
#values = 4294967296 (0000000100000000)
Unfiltered : 22501638882 steps. 4294967296 tests. 5.239 steps/iter
n*4+3 : 17132929762 steps. 1073741824 tests. 15.956 steps/iter
n*2**5+m : 15924970210 steps. 939524096 tests. 16.950 steps/iter
n*2**5+m * : 8408777442 steps. 939524096 tests. 8.950 steps/iter n*2**10+m : 11948770018 steps. 503316480 tests. 23.740 steps/iter n*2**10+m * : 7088288929 steps. 503316480 tests. 14.083 steps/iter n*2**15+m : 9003713250 steps. 289013760 tests. 31.153 steps/iter n*2**15+m * : 4818894105 steps. 289013760 tests. 16.674 steps/iter n*2**20+m : 6973854434 steps. 181379072 tests. 38.449 steps/iter n*2**20+m * : 3496505674 steps. 181379072 tests. 19.277 steps/iter n*2**25+m : 5574414306 steps. 125809408 tests. 44.308 steps/iter n*2**25+m * : 2543845543 steps. 125809408 tests. 20.220 steps/iter n*2**29+m : 4660796778 steps. 95710352 tests. 48.697 steps/iter n*2**29+m * : 2008917951 steps. 95710352 tests. 20.990 steps/iter n*2**30+m : 4493358006 steps. 89757100 tests. 50.061 steps/iter n*2**30+m * : 1915335033 steps. 89757100 tests. 21.339 steps/iter n*2**31+m : 4212222570 steps. 81419608 tests. 51.735 steps/iter n*2**31+m * : 1780554493 steps. 81419608 tests. 21.869 steps/iter
$ ./nsteps2 0x20000000000000 0x100000000
Initial value = 9007199254740992 (0020000000000000).
#values = 4294967296 (0000000100000000)
Unfiltered : 22501540484 steps. 4294967296 tests. 5.239 steps/iter
n*4+3 : 17132831364 steps. 1073741824 tests. 15.956 steps/iter
n*2**5+m : 15924871812 steps. 939524096 tests. 16.950 steps/iter
n*2**5+m * : 8408679044 steps. 939524096 tests. 8.950 steps/iter n*2**10+m : 11948671620 steps. 503316480 tests. 23.740 steps/iter n*2**10+m * : 7088271089 steps. 503316480 tests. 14.083 steps/iter n*2**15+m : 9003614852 steps. 289013760 tests. 31.153 steps/iter n*2**15+m * : 4818766141 steps. 289013760 tests. 16.673 steps/iter n*2**20+m : 6973756036 steps. 181379072 tests. 38.449 steps/iter n*2**20+m * : 3496390679 steps. 181379072 tests. 19.277 steps/iter n*2**25+m : 5574315908 steps. 125809408 tests. 44.308 steps/iter n*2**25+m * : 2543741535 steps. 125809408 tests. 20.219 steps/iter n*2**29+m : 4660698380 steps. 95710352 tests. 48.696 steps/iter n*2**29+m * : 2008935493 steps. 95710352 tests. 20.990 steps/iter n*2**30+m : 4493259608 steps. 89757100 tests. 50.060 steps/iter n*2**30+m * : 1915241920 steps. 89757100 tests. 21.338 steps/iter n*2**31+m : 4212124172 steps. 81419608 tests. 51.734 steps/iter n*2**31+m * : 1780581916 steps. 81419608 tests. 21.869 steps/iter 'Advanced' mode marked with '*'
As you can see, 2**10 filter reduces # of processed points by factor
of 8.5, but number of processing steps reduced only 1.9x in simple
mode and 3.2x in advanced mode.
For 2**20 filter ratios are, respectively, 23.7, 3.2 and 6.4
For 2**31 filter, 52.7, 5.3 and 12.6
Pay attention that for implementation in FPGA even filtering with
modulo 2**29 (~12M entries) is impractical. Assuming advanced mode,
12M entries occupy 1000 Mbits. In non-expensive FPGA it simply does
not fit. In very expensive FPGA it fits, but barely.
Let's take, for example, Altera Stratix-10 GX 2800. By the standard of previous decade, it is a real behemoth with 2.75M Logic Elements
(LEs). Assuming that each tester unit occupies 500 LEs there is room
for 5500 units. Looking into my tables above, you want 1 filter per
~20 tester units, so 275 filters needed. And how many available? Just
two!
For modulo 2**20 it's a little better. Each filter consists of ~45,000 entries and occupies ~2.7 Mbits. So you can fit 80+ filters in GX
2800. But you want 5500 / 19 = 290, so still unbalanced.
For modulo 2**15 (2205 entries, ~100 Kbits) there is enough memory to
fit desired number of replica. But small filter like that is
significantly less effective, reducing # of steps (in advanced mode)
only by 4.7x.
It is possible that there exist smarter algorithms for partition of
work between parallel tester units that can greatly reduce need for
access to filtering tables relatively to simple scheme that I assumed
in above estimates. But I think that they point stands - this sort of filtering is not a silver bullet. It can improve throughput, but it's unlikely that it can improve it by more than one order of magnitude.
I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the number
would take a large number of iterations though.
I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer to
build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.
On 2/14/2026 1:57 AM, Thomas Koenig wrote:
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture.-a Graphics cards have sped this
up enormously.
big snip
The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.
Also, each cycle, the result from the previous round would be
compared against the input value.-a If it is lower, the next number
would be chosen.-a Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.
Hold these thoughts
snip
The problem is embarassingly parallel.
Sort of.-a You could do it that way, but you would be giving up some performance.-a Specifically, as you say, you want to check the number against the previous input value and the preset maximum.-a But in aKISS.
parallel implementation, you want to check against the largest value
checked *across all the parallel threads*.-a In order to do that, the threads need to communicate, thus violating the "embarrassingly
parallel" rule.-a I don't know how much performance you would give up so
it may not make much difference, nor do I know the cost of keeping some "global" largest value checked so far.-a There is a tradeoff here.
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the number
would take a large number of iterations though.
I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer to
build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible.
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler that you
are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and stop.
Your test unit has to provide supervisor circuity with the mean to
fetch the initial value that caused suspected failure and to clear
alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.
2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently pipelined.
But it should be able to monitor ready signals of all test units and to supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.
3. Supervisor
Again, in proof-of-concept design the supervisor does not have to be
very realistic. But it should be able to selectively exercise all core feature related to alert, including fetching of initial data and
clearing of alert.
After you coded everything correctly, do not even try to compile test
with 8000 units. That is totally unrealistic. Even if you find huge
FPGA in which such number of testers could fit (and I am not sure that
such FPGA exists) the compilation will take many hours, possibly days.
On 2026-02-18 11:22 a.m., Michael S wrote:
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the
number would take a large number of iterations though.
I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer
to build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible.
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler that
you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.
I found the mistake. Core was working, but groups of cores were not.
Done signals were missing.
2. Scheduler.There is really simple scheduling ATM. There is just a bunch of done
In proof-of-concept design the scheduler does not have to be very realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different units. Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.
signals that are gated together. Meaning the slowest calculation of
the group will slow everything else down. Once all the dones are true
the next set of numbers are loaded.
3. Supervisor
Again, in proof-of-concept design the supervisor does not have to be
very realistic. But it should be able to selectively exercise all
core feature related to alert, including fetching of initial data
and clearing of alert.
After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you find
huge FPGA in which such number of testers could fit (and I am not
sure that such FPGA exists) the compilation will take many hours,
possibly days.
Yeah 8,000 was definitely out of the picture. I had since coded 24x24
matrix (576) and although the logic would fit that turned out not to
be able to route. So, it is down to 20x20. 20x20 with no real
scheduling fits.
I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.
On 2026-02-18 11:22 a.m., Michael S wrote:The obvious choice is to have a work-to-do list, each work item
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the number
would take a large number of iterations though.
I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer to
build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible.
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
-a That is successful ending. Here you signal to the scheduler that you
-a are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and stop.
Your test unit has to provide supervisor circuity with the mean to
fetch the initial value that caused suspected failure and to clear
alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.
I found the mistake. Core was working, but groups of cores were not.
Done signals were missing.
There is really simple scheduling ATM. There is just a bunch of done
2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently pipelined.
But it should be able to monitor ready signals of all test units and to
supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.
signals that are gated together. Meaning the slowest calculation of the group will slow everything else down. Once all the dones are true the
next set of numbers are loaded.
Robert Finch wrote:
On 2026-02-18 11:22 a.m., Michael S wrote:
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200 MHz.
Some of the number would take a large number of iterations though.
I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer
to build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible.
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
a That is successful ending. Here you signal to the scheduler that
you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.
I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.
There is really simple scheduling ATM. There is just a bunch of
2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.
The obvious choice is to have a work-to-do list, each work item
containing a start-end set of numbers to test.
Each worker thread would simply finish its current task, store
the/any results, and then ask for the next available task.
A single producer - multiple consumers queue like this is very simple
to get right with minimum admins: Just a single LOCK XADD to update
the queue back index, grabbing the corresponding task.
I used this exact setup for my wire speed multi-threaded ntpd deamon
about two decades ago. (I did need a few more XADD instructions to
handle queue full/empty situations which are not applicable for the
current problem.)
Terje
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler that you
are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems
clear that constant-factor improvements are completely irrelevant
(even with large constants).
I guess it all boils down to the "useless" part of the title?
=== Stefan
On Thu, 19 Feb 2026 11:51:03 +0100Oops! :-)
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
Robert Finch wrote:
On 2026-02-18 11:22 a.m., Michael S wrote:
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200 MHz.
Some of the number would take a large number of iterations though.
I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer
to build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible.
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
-a That is successful ending. Here you signal to the scheduler that
you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks>>>> like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.
I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.
There is really simple scheduling ATM. There is just a bunch of
2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.
The obvious choice is to have a work-to-do list, each work item
containing a start-end set of numbers to test.
Each worker thread would simply finish its current task, store
the/any results, and then ask for the next available task.
A single producer - multiple consumers queue like this is very simple
to get right with minimum admins: Just a single LOCK XADD to update
the queue back index, grabbing the corresponding task.
There is no XADD in FPGA, just gate, multipliers and memory blocks.
Scaling does not depend on the number of threads, only on the frequencyI used this exact setup for my wire speed multi-threaded ntpd deamon
about two decades ago. (I did need a few more XADD instructions to
handle queue full/empty situations which are not applicable for the
current problem.)
Did your ntpd deamon scale to 300-400 threads?
300 is the # of test units that fits in FPGAs that appears to be the
most cost-effective for this sort of job, e.g. 10CX220YU484E5G.
On Wed, 18 Feb 2026 22:56:19 -0500Yes. 128-bit processing.
Robert Finch <robfi680@gmail.com> wrote:
On 2026-02-18 11:22 a.m., Michael S wrote:
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the
number would take a large number of iterations though.
I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer
to build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible.
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler that
you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.
I found the mistake. Core was working, but groups of cores were not.
Done signals were missing.
There is really simple scheduling ATM. There is just a bunch of done
2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.
signals that are gated together. Meaning the slowest calculation of
the group will slow everything else down. Once all the dones are true
the next set of numbers are loaded.
3. SupervisorYeah 8,000 was definitely out of the picture. I had since coded 24x24
Again, in proof-of-concept design the supervisor does not have to be
very realistic. But it should be able to selectively exercise all
core feature related to alert, including fetching of initial data
and clearing of alert.
After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you find
huge FPGA in which such number of testers could fit (and I am not
sure that such FPGA exists) the compilation will take many hours,
possibly days.
matrix (576) and although the logic would fit that turned out not to
be able to route. So, it is down to 20x20. 20x20 with no real
scheduling fits.
I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.
Does 20x20 means 400 cores with something like 80-bit initial values
and 128-bit processing?
If you didn't make further mistakes then it should be rather big, order
of 300K LEs. It would not fit in the biggest low-end Altera device
(Cyclone 10GX 10CX220).
On the Xilinx side, it would not fit in any Spartan and in most Artix devices. May be, it would fit in the biggest Artix UltraScale+ (AU25P).I am not using an too inexpensive device. I have a Kintex-325 with about
But more realistically for 400 test units one would want Arria 10 on
Altera side or Kintex on Xilinx side. Both are not prohibitively
expensive, but not cheap.
What device are you using?
If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.
On 2026-02-19 5:10 a.m., Michael S wrote:
On Wed, 18 Feb 2026 22:56:19 -0500
Robert Finch <robfi680@gmail.com> wrote:
On 2026-02-18 11:22 a.m., Michael S wrote:
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200
MHz. Some of the number would take a large number of iterations
though.
I found something strange, the code to count for more numbers
does not increase the size of the core very much. (But it takes
longer to build.) I am not sure if this occurs because there is
a mitsake somewhere, or if the tools are able to figure out how
to share all the logic. I am thinking the logic for adders is
shared somehow. 128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible.
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler
that you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as
after success - waiting for next initial value.
I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.
There is really simple scheduling ATM. There is just a bunch of
2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different
units. Supplying the same data to all units simultaneously is a
sort of mistake that explains your observed behavior.
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.
3. SupervisorYeah 8,000 was definitely out of the picture. I had since coded
Again, in proof-of-concept design the supervisor does not have to
be very realistic. But it should be able to selectively exercise
all core feature related to alert, including fetching of initial
data and clearing of alert.
After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you
find huge FPGA in which such number of testers could fit (and I
am not sure that such FPGA exists) the compilation will take many
hours, possibly days.
24x24 matrix (576) and although the logic would fit that turned
out not to be able to route. So, it is down to 20x20. 20x20 with
no real scheduling fits.
I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.
Does 20x20 means 400 cores with something like 80-bit initial valuesYes. 128-bit processing.
and 128-bit processing?
If you didn't make further mistakes then it should be rather big,
order of 300K LEs. It would not fit in the biggest low-end Altera
device (Cyclone 10GX 10CX220).
IIRC it is about 150k LUTs. It uses the simplest approach, which is
not very fast iteration wise, but it allows a lot of cores and a fast
clock cycles time. One core is very small and has a fast iteration (1
clock cycle). But it takes a lot more iterations than can be done
with a more brainiac approach.
Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.
On the Xilinx side, it would not fit in any Spartan and in most
Artix devices. May be, it would fit in the biggest Artix
UltraScale+ (AU25P). But more realistically for 400 test units one
would want Arria 10 on Altera side or Kintex on Xilinx side. Both
are not prohibitively expensive, but not cheap.
What device are you using?I am not using an too inexpensive device. I have a Kintex-325 with
about 200k LUTs.
If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.
There could be more mistakes for sure. But I am sure the lowest level
core works. It seemed to in simulation. I made a slight mod to it
since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good
for verification.
1 Core is:
module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;
always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end
endmodule
On 14/02/2026 19:48, MitchAlsup wrote:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture.-a Graphics cards have sped this
up enormously.
(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,
-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that?-a "Induction" says nothing of the sort.
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers.-a There are other things proven about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.
Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are plenty
of mathematical conjectures that are true up to very large numbers, but
fail later on.
Comments?
Interesting problem,
waste of power.
It is a mathematically very interesting problem.-a Whether there is
anything to be gained by continuing the search for exceptions by brute force, is a different matter - but /if/ it is worth doing, then doing so more efficiently is a good idea.
On 2026-02-19 5:10 a.m., Michael S wrote:
On Wed, 18 Feb 2026 22:56:19 -0500
Robert Finch <robfi680@gmail.com> wrote:
On 2026-02-18 11:22 a.m., Michael S wrote:
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200
MHz. Some of the number would take a large number of iterations
though.
I found something strange, the code to count for more numbers
does not increase the size of the core very much. (But it takes
longer to build.) I am not sure if this occurs because there is
a mitsake somewhere, or if the tools are able to figure out how
to share all the logic. I am thinking the logic for adders is
shared somehow. 128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible.
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler
that you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as
after success - waiting for next initial value.
I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.
There is really simple scheduling ATM. There is just a bunch of
2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different
units. Supplying the same data to all units simultaneously is a
sort of mistake that explains your observed behavior.
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.
3. SupervisorYeah 8,000 was definitely out of the picture. I had since coded
Again, in proof-of-concept design the supervisor does not have to
be very realistic. But it should be able to selectively exercise
all core feature related to alert, including fetching of initial
data and clearing of alert.
After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you
find huge FPGA in which such number of testers could fit (and I
am not sure that such FPGA exists) the compilation will take many
hours, possibly days.
24x24 matrix (576) and although the logic would fit that turned
out not to be able to route. So, it is down to 20x20. 20x20 with
no real scheduling fits.
I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.
Does 20x20 means 400 cores with something like 80-bit initial valuesYes. 128-bit processing.
and 128-bit processing?
If you didn't make further mistakes then it should be rather big,
order of 300K LEs. It would not fit in the biggest low-end Altera
device (Cyclone 10GX 10CX220).
IIRC it is about 150k LUTs. It uses the simplest approach, which is
not very fast iteration wise, but it allows a lot of cores and a fast
clock cycles time. One core is very small and has a fast iteration (1
clock cycle). But it takes a lot more iterations than can be done
with a more brainiac approach.
Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.
On the Xilinx side, it would not fit in any Spartan and in most
Artix devices. May be, it would fit in the biggest Artix
UltraScale+ (AU25P). But more realistically for 400 test units one
would want Arria 10 on Altera side or Kintex on Xilinx side. Both
are not prohibitively expensive, but not cheap.
What device are you using?I am not using an too inexpensive device. I have a Kintex-325 with
about 200k LUTs.
If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.
There could be more mistakes for sure. But I am sure the lowest level
core works. It seemed to in simulation. I made a slight mod to it
since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good
for verification.
1 Core is:
module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;
always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end
endmodule
On 14/02/2026 22:22, MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
On 14/02/2026 19:48, MitchAlsup wrote:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture.-a Graphics cards have sped this
up enormously.
(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,
-a-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that?-a "Induction" says nothing of the sort. >>>
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers.-a There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.
By proving up to 2^71, you simultaneously prove that it cannot be proven
by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.
All that has been proven is that if the conjecture is not true, then the smallest exception to the pattern is greater than 2 ^ 71.-a Maybe it is 2
^ 71 + 1.-a Current computer technology could take the search a lot
further than just 2 ^ 71 - if someone wanted to pay for it!-a Future computer technology could perhaps greatly increase the bounds.-a Maybe Thomas's "useless machine" could lead to far greater bounds without excessive costs - and maybe research into that could lead to useful new technology or techniques with applications beyond an interesting maths problem.
There are lots of examples of conjectures where the first exception is a high number, and all testing of low numbers fitted the conjectures. Sometimes such counter-examples are far beyond the reach of any brute-
force computational testing, but sometimes not.
We have not proven that the Collatz Conjecture can be disproven by brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.-a And we have known all along that computational methods can never prove a conjecture like this
to be true - that much should be obvious.
Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more insights - maybe there is a pattern in the numbers that lead to peaks of growth of
the Collatz function before dropping towards 1.-a Such patterns can
inspire proofs - or more targeted searches for exceptions.
Ergo, you need a different means.
You don't imagine that the only thing mathematicians have done with the Collatz Conjecture is test it to large numbers, do you?
Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are plenty >>> of mathematical conjectures that are true up to very large numbers, but
fail later on.
You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.
Surely everyone who has done high-school mathematics already realises
that a conjecture like this cannot be proven by trail and error alone? Brute-force testing is about looking for exceptions to /disprove/ the conjecture, or gaining insights that can lead to its proof or disproof.
On 2026-02-19 5:10 a.m., Michael S wrote:
If it is smaller than Kintex KU5P then I'd strongly suspect that youThere could be more mistakes for sure. But I am sure the lowest level
didn't clean out all mistakes.
core works. It seemed to in simulation. I made a slight mod to it since
I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good for verification.
1 Core is:
module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;
always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end
endmodule
On 2/15/2026 5:50 AM, David Brown wrote:
On 14/02/2026 22:22, MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
On 14/02/2026 19:48, MitchAlsup wrote:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying >>>>>> to disprove the Collatz conjecture.|e-a Graphics cards have sped this >>>>>> up enormously.
(A reminder, if any is needed:|e-a From a starting number n, the
transformation, recursively applied,
|e-a|e-a|e-a|e-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.|e-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that?|e-a "Induction" says nothing of the
sort.
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/>>>> been proven to be true for all numbers.|e-a There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions >>>> to the pattern, but no one has any notion of how to prove it in
general.
By proving up to 2^71, you simultaneously prove that it cannot be proven >>> by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.
All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.|e-a Maybe
it is 2 ^ 71 + 1.|e-a Current computer technology could take the search
a lot further than just 2 ^ 71 - if someone wanted to pay for it!|e
Future computer technology could perhaps greatly increase the
bounds.|e-a Maybe Thomas's "useless machine" could lead to far greater
bounds without excessive costs - and maybe research into that could
lead to useful new technology or techniques with applications beyond
an interesting maths problem.
There are lots of examples of conjectures where the first exception is
a high number, and all testing of low numbers fitted the conjectures. >> Sometimes such counter-examples are far beyond the reach of any brute-
force computational testing, but sometimes not.
We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.|e-a And we have known
all along that computational methods can never prove a conjecture like
this to be true - that much should be obvious.
Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more insights
- maybe there is a pattern in the numbers that lead to peaks of growth
of the Collatz function before dropping towards 1.|e-a Such patterns can
inspire proofs - or more targeted searches for exceptions.
Ergo, you need a different means.
You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?
In theory, large numbers could disprove it, if an exception were found.> However, it fails to prove it either; and if no value is found (or would
be found) it doesn't prove anything.
You do have to detect an infinite loop, but that's easy to do:Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.|e-a There are
plenty
of mathematical conjectures that are true up to very large numbers, but >>>> fail later on.
You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.
Surely everyone who has done high-school mathematics already realises >> that a conjecture like this cannot be proven by trail and error alone?
Brute-force testing is about looking for exceptions to /disprove/ the >> conjecture, or gaining insights that can lead to its proof or disproof.
Yeah, pretty much...
My own thinking (mostly since looking at this thread, earlier today) has
led me to realize first that the probability of any number containing a divergent path is less than the reciprocal of the value (related to
prior post in this thread).
Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite series > (since any non-divergent path immediately collapses back into a
convergent path).
BGB wrote:
On 2/15/2026 5:50 AM, David Brown wrote:
On 14/02/2026 22:22, MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
On 14/02/2026 19:48, MitchAlsup wrote:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying >>>>>>> to disprove the Collatz conjecture.|e-a Graphics cards have sped this >>>>>>> up enormously.
(A reminder, if any is needed:|e-a From a starting number n, the >>>>>>> transformation, recursively applied,
|e-a|e-a|e-a|e-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many >>>>>>> capabilities that are not needed, and consume area and power that >>>>>>> special-purpose hardware would not need.|e-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that?|e-a "Induction" says nothing of the >>>>> sort.
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/ >>>>> been proven to be true for all numbers.|e-a There are other things
proven
about it, such as the asymptotic rarity of numbers that are exceptions >>>>> to the pattern, but no one has any notion of how to prove it in
general.
By proving up to 2^71, you simultaneously prove that it cannot be
proven
by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.
All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.|e-a Maybe >>> it is 2 ^ 71 + 1.|e-a Current computer technology could take the search >>> a lot further than just 2 ^ 71 - if someone wanted to pay for it!|e
Future computer technology could perhaps greatly increase the
bounds.|e-a Maybe Thomas's "useless machine" could lead to far greater
bounds without excessive costs - and maybe research into that could
lead to useful new technology or techniques with applications beyond
an interesting maths problem.
There are lots of examples of conjectures where the first exception
is a high number, and all testing of low numbers fitted the
conjectures. Sometimes such counter-examples are far beyond the reach
of any brute- force computational testing, but sometimes not.
We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.|e-a And we have known
all along that computational methods can never prove a conjecture
like this to be true - that much should be obvious.
Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more
insights - maybe there is a pattern in the numbers that lead to peaks
of growth of the Collatz function before dropping towards 1.|e-a Such
patterns can inspire proofs - or more targeted searches for exceptions.
Ergo, you need a different means.
You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?
In theory, large numbers could disprove it, if an exception were found.
However, it fails to prove it either; and if no value is found (or
would be found) it doesn't prove anything.
Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.|e-a There are >>>>> plenty
of mathematical conjectures that are true up to very large numbers, >>>>> but
fail later on.
You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.
Surely everyone who has done high-school mathematics already realises
that a conjecture like this cannot be proven by trail and error
alone? Brute-force testing is about looking for exceptions to /
disprove/ the conjecture, or gaining insights that can lead to its
proof or disproof.
Yeah, pretty much...
My own thinking (mostly since looking at this thread, earlier today)
has led me to realize first that the probability of any number
containing a divergent path is less than the reciprocal of the value
(related to prior post in this thread).
Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite series
(since any non-divergent path immediately collapses back into a
convergent path).
You do have to detect an infinite loop, but that's easy to do:
Even if you don't employ the classic one-step/two-step approach, simply stopping after 8k iterations and then let the supervisor check it
properly would be fine.
Terje
Robert Finch wrote:
On 2026-02-19 5:10 a.m., Michael S wrote:
If it is smaller than Kintex KU5P then I'd strongly suspect thatThere could be more mistakes for sure. But I am sure the lowest
you didn't clean out all mistakes.
level core works. It seemed to in simulation. I made a slight mod
to it since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its
good for verification.
1 Core is:
module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;
always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end
endmodule
I have a minimal knowledge of Verilog and may be wrong but...
Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.
According to my experiments, combining 2 steps have very small
negative effect on achivable clock and increases area by ~1.8x. So,
to me it looks like a win.
That's the combined step I am talking about:
x = n/4;
switch (x % 4) {
case 0: n = x; break;
case 1: n = 3*x+1; break;
case 2: n = 2*x+2; break;
case 3: n = 9*x+8; break;
}
On 2/21/2026 2:40 PM, Terje Mathisen wrote:
BGB wrote:
On 2/15/2026 5:50 AM, David Brown wrote:
On 14/02/2026 22:22, MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
On 14/02/2026 19:48, MitchAlsup wrote:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying >>>>>>>> to disprove the Collatz conjecture.|e-a Graphics cards have sped this >>>>>>>> up enormously.
(A reminder, if any is needed:|e-a From a starting number n, the >>>>>>>> transformation, recursively applied,
|e-a|e-a|e-a|e-a f(n) = 3*n+1, if n is odd; =n/2 if n is even. >>>>>>>>
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many >>>>>>>> capabilities that are not needed, and consume area and power that >>>>>>>> special-purpose hardware would not need.|e-a Also, the hardware >>>>>>>> is limited to 64-bit integers, and the range of tested numbers >>>>>>>> is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that?|e-a "Induction" says nothing of the >>>>>> sort.
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/ >>>>>> been proven to be true for all numbers.|e-a There are other things >>>>>> proven
about it, such as the asymptotic rarity of numbers that are
exceptions
to the pattern, but no one has any notion of how to prove it in
general.
By proving up to 2^71, you simultaneously prove that it cannot be
proven
by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple >>>>> computer
arithmetic processes.
All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.|e
Maybe it is 2 ^ 71 + 1.|e-a Current computer technology could take the >>>> search a lot further than just 2 ^ 71 - if someone wanted to pay for
it!|e Future computer technology could perhaps greatly increase the
bounds.|e-a Maybe Thomas's "useless machine" could lead to far greater >>>> bounds without excessive costs - and maybe research into that could
lead to useful new technology or techniques with applications beyond
an interesting maths problem.
There are lots of examples of conjectures where the first exception
is a high number, and all testing of low numbers fitted the
conjectures. Sometimes such counter-examples are far beyond the
reach of any brute- force computational testing, but sometimes not.
We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.|e-a And we have known >>>> all along that computational methods can never prove a conjecture
like this to be true - that much should be obvious.
Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more
insights - maybe there is a pattern in the numbers that lead to
peaks of growth of the Collatz function before dropping towards 1.|e >>>> Such patterns can inspire proofs - or more targeted searches for
exceptions.
Ergo, you need a different means.
You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?
In theory, large numbers could disprove it, if an exception were found.
However, it fails to prove it either; and if no value is found (or
would be found) it doesn't prove anything.
Being true up to 2 ^ 71 makes it reasonable to believe it is always >>>>>> true, but that is very different from being /proven/.|e-a There are >>>>>> plenty
of mathematical conjectures that are true up to very large
numbers, but
fail later on.
You need cannot even /prove/ it using BigNum computer arithmetic in >>>>> finite
time.
Surely everyone who has done high-school mathematics already
realises that a conjecture like this cannot be proven by trail and
error alone? Brute-force testing is about looking for exceptions
to / disprove/ the conjecture, or gaining insights that can lead to
its proof or disproof.
Yeah, pretty much...
My own thinking (mostly since looking at this thread, earlier today)
has led me to realize first that the probability of any number
containing a divergent path is less than the reciprocal of the value
(related to prior post in this thread).
Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite
series (since any non-divergent path immediately collapses back into
a convergent path).
You do have to detect an infinite loop, but that's easy to do:
Even if you don't employ the classic one-step/two-step approach,
simply stopping after 8k iterations and then let the supervisor check
it properly would be fine.
Yeah, assuming it is being tested.
Another option could be to detect an overflow from whatever size of
integer, assuming the starting point is not within a region where near- immediate overflow is likely.
Seemingly the definitions on Wikipedia don't appear to exclude n==0,
which could lead to an infinite loop: 0 is even, and 0/2==0. Can
probably assume 0 is excluded because that would make it pointlessly
trivial (even if they say, say, "positive integer" rather than "counting number" or similar, where the set of positive integers includes 0, but "counting numbers" do not).
But, yeah, it would be false if the input set included 0.
Terje
On Sat, 21 Feb 2026 20:36:51 +0200
Michael S <already5chosen@yahoo.com> wrote:
Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.
According to my experiments, combining 2 steps have very small
negative effect on achivable clock and increases area by ~1.8x. So,
to me it looks like a win.
That's the combined step I am talking about:
x = n/4;
switch (x % 4) {
case 0: n = x; break;
case 1: n = 3*x+1; break;
case 2: n = 2*x+2; break;
case 3: n = 9*x+8; break;
}
Mistake in C above. Should be
switch (n % 4) {
On 2/15/2026 5:50 AM, David Brown wrote:
On 14/02/2026 22:22, MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
On 14/02/2026 19:48, MitchAlsup wrote:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying >>>>>> to disprove the Collatz conjecture.-a Graphics cards have sped this >>>>>> up enormously.
(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,
-a-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that?-a "Induction" says nothing of the sort. >>>>
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers.-a There are other things proven >>>> about it, such as the asymptotic rarity of numbers that are exceptions >>>> to the pattern, but no one has any notion of how to prove it in
general.
By proving up to 2^71, you simultaneously prove that it cannot be proven >>> by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.
All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.-a Maybe
it is 2 ^ 71 + 1.-a Current computer technology could take the search a
lot further than just 2 ^ 71 - if someone wanted to pay for it!
Future computer technology could perhaps greatly increase the bounds.
Maybe Thomas's "useless machine" could lead to far greater bounds
without excessive costs - and maybe research into that could lead to
useful new technology or techniques with applications beyond an
interesting maths problem.
There are lots of examples of conjectures where the first exception is
a high number, and all testing of low numbers fitted the conjectures.
Sometimes such counter-examples are far beyond the reach of any brute-
force computational testing, but sometimes not.
We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.-a And we have known all
along that computational methods can never prove a conjecture like
this to be true - that much should be obvious.
Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more insights
- maybe there is a pattern in the numbers that lead to peaks of growth
of the Collatz function before dropping towards 1.-a Such patterns can
inspire proofs - or more targeted searches for exceptions.
Ergo, you need a different means.
You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?
In theory, large numbers could disprove it, if an exception were found. However, it fails to prove it either; and if no value is found (or would
be found) it doesn't prove anything.
Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are plenty >>>> of mathematical conjectures that are true up to very large numbers, but >>>> fail later on.
You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.
Surely everyone who has done high-school mathematics already realises
that a conjecture like this cannot be proven by trail and error alone?
Brute-force testing is about looking for exceptions to /disprove/ the
conjecture, or gaining insights that can lead to its proof or disproof.
Yeah, pretty much...
My own thinking (mostly since looking at this thread, earlier today) has
led me to realize first that the probability of any number containing a divergent path is less than the reciprocal of the value (related to
prior post in this thread).
Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite series (since any non-divergent path immediately collapses back into a
convergent path).
Effectively, one has to multiply an infinite series of near 0
probabilities. Or, effectively, for all values of n, the effective probability of a non-zero probability converges to 1/Inf, or, 0...
Probably doesn't count as a sufficient proof by standards of
mathematical rigor, but at least for me, I can say that it seems true
enough that it always converges (and, will always continue to converge
no matter how big the numbers get).
While in some hand wavy sense, one could try to push the probability to infinity to defeat the infinite reciprocal, infinity does not exist
within the set of positive integers...
On 2/14/2026 1:18 PM, David Brown wrote:
On 14/02/2026 19:48, MitchAlsup wrote:FWIW:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture.-a Graphics cards have sped this
up enormously.
(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,
-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that?-a "Induction" says nothing of the sort.
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers.-a There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.
Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are
plenty of mathematical conjectures that are true up to very large
numbers, but fail later on.
Comments?
Interesting problem,
waste of power.
It is a mathematically very interesting problem.-a Whether there is
anything to be gained by continuing the search for exceptions by brute
force, is a different matter - but /if/ it is worth doing, then doing
so more efficiently is a good idea.
Is there any number likely to disprove it? No.
Would running it out to some arbitrary precision prove it? No.
Granted, finding a value where it failed would disprove it, but this is unlikely to happen.
But, can note:
-aFor all positive even numbers, n/2 is less than n;
-aFor all positive odd numbers, n*3+1 is even;
-a-a So, it becomes (n*3+1)/2
-a For all positive even numbers:
-a-a-a Probability of n/2 being an integer is 100%.
-a-a-a Probability of n/2 being even is 50%.
-a For (n*3+1)/2:
-a-a-a Probability of this being even is 50%.
-a-a-a-a-a So, 50% value gets bigger, 50% it gets smaller.
-a-a-a Probability of a double odd: 25%
-a-a-a Probability of a triple odd: 12.5%
-a-a-a ...
OK, so it would seem like the probability of for N finding a path that diverges, becomes 1/2^k, where k=log2(n).
Where, the paths that get smaller can't include a path that gets bigger unless the current path itself can get bigger, where the probability of
this happening quickly approaches 0.
If there was any place likely to find a divergence, it would be for
small integers, this is empirically known to be false.
Granted, saying that the probability approaches 0 is not the same as
saying that the probability is 0.
The probability of it being 0 seems directly proportional to the
reciprocal:
-a If p>(1/n): A divergence would be expected.
-a If p<(1/n): A divergence would be impossible.
-a But, p=1/n, as (1/n)=(1/(2**log2(n)))
Though, since n*3+1 is always even, it is p=1/(2**ceil(log2(n))).
1/(2**ceil(log2(n))) < (1/n) for all non-even integers (would only be
equal for powers of 2, but powers of 2 always converge).
So, it would seem like probability of divergence converges on 0.
Dunno, though, this is just how it seems to me at the moment.
Michael S wrote:
On Sat, 21 Feb 2026 20:36:51 +0200
Michael S <already5chosen@yahoo.com> wrote:
Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.
According to my experiments, combining 2 steps have very small
negative effect on achivable clock and increases area by ~1.8x. So,
to me it looks like a win.
That's the combined step I am talking about:
x = n/4;
switch (x % 4) {
case 0: n = x; break;
case 1: n = 3*x+1; break;
case 2: n = 2*x+2; break;
case 3: n = 9*x+8; break;
}
Mistake in C above. Should be
switch (n % 4) {
I did notice that. :-)
The one thing that worries me (sw on a 64-bit platform) about the
code is the 9* on 128-bit variables:
9*x+8 =>
Do we use SHLD + SHL here or something else?
How about MUL & LEA?
;; Input in r10:r9, output in rdx:rax
mov rax,r9
mul rax,rbx ;; RBX == 9
lea r10,[r10+r10*8]
add rdx,r10
That looks like 5-6 clock cycles, so the branch misses from the
switch statement would probably dominate unless you do as I suggested
and use lookup tables instead:
let bot2 = n & 3;
let x = n >> 2;
n = x*multab[bot2] + addtab[bot2];
but if we do that, then (at least for a sw implementation) it would
be better to pick a lot more of the LS bits, at least 8-12?
Terje
It is a mathematically very interesting problem. Whether there is anything to be gained by continuing the search for exceptions by brute force, is
a different matter - but /if/ it is worth doing, then doing so more efficiently is a good idea.
On 21/02/2026 20:48, BGB wrote:
On 2/15/2026 5:50 AM, David Brown wrote:
On 14/02/2026 22:22, MitchAlsup wrote:
David Brown <david.brown@hesbynett.no> posted:
On 14/02/2026 19:48, MitchAlsup wrote:
Thomas Koenig <tkoenig@netcologne.de> posted:
Some people are throwing large amounts of computing power at trying >>>>>>> to disprove the Collatz conjecture.-a Graphics cards have sped this >>>>>>> up enormously.
(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,
-a-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.
is conjectured to always reach 1 for any positive integer 1).
While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.
3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.
Yes, that is often a micro-optimisation used.
All the work is done on general-purpose hardware, which has many >>>>>>> capabilities that are not needed, and consume area and power that >>>>>>> special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so
Induction would say it has been proven by then.
Would you like to re-think that?-a "Induction" says nothing of the
sort.
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/ >>>>> been proven to be true for all numbers.-a There are other things proven >>>>> about it, such as the asymptotic rarity of numbers that are exceptions >>>>> to the pattern, but no one has any notion of how to prove it in
general.
By proving up to 2^71, you simultaneously prove that it cannot be
proven
by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.
All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.-a Maybe
it is 2 ^ 71 + 1.-a Current computer technology could take the search
a lot further than just 2 ^ 71 - if someone wanted to pay for it!
Future computer technology could perhaps greatly increase the bounds.
Maybe Thomas's "useless machine" could lead to far greater bounds
without excessive costs - and maybe research into that could lead to
useful new technology or techniques with applications beyond an
interesting maths problem.
There are lots of examples of conjectures where the first exception
is a high number, and all testing of low numbers fitted the
conjectures. Sometimes such counter-examples are far beyond the reach
of any brute- force computational testing, but sometimes not.
We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.-a And we have known
all along that computational methods can never prove a conjecture
like this to be true - that much should be obvious.
Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more
insights - maybe there is a pattern in the numbers that lead to peaks
of growth of the Collatz function before dropping towards 1.-a Such
patterns can inspire proofs - or more targeted searches for exceptions.
Ergo, you need a different means.
You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?
In theory, large numbers could disprove it, if an exception were found.
However, it fails to prove it either; and if no value is found (or
would be found) it doesn't prove anything.
Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are
plenty
of mathematical conjectures that are true up to very large numbers, >>>>> but
fail later on.
You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.
Surely everyone who has done high-school mathematics already realises
that a conjecture like this cannot be proven by trail and error
alone? Brute-force testing is about looking for exceptions to /
disprove/ the conjecture, or gaining insights that can lead to its
proof or disproof.
Yeah, pretty much...
My own thinking (mostly since looking at this thread, earlier today)
has led me to realize first that the probability of any number
containing a divergent path is less than the reciprocal of the value
(related to prior post in this thread).
Mathematicians don't consider someone's gut feelings as proof.-a At best,
it can be an inspiration about what to focus on.-a Random guesses about
what someone things is the probability of something are rarely of any
use to anyone.
It does not even make sense to talk about "probability" here - a number either leads to a divergent path, a convergent path ending in 1, or a different cycle.-a It is not a random process.
You can make statements about the density of numbers that do not lead to
1, or limits to the density.-a There has been a fair amount of work done
on the Collatz Conjecture in this way, putting limits on the proportion
of numbers that are divergent or lead to other cycles (it turns out they
are very rare, if they exist at all).-a But those limits do not come from guesses, they come from proofs.
Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite series
(since any non-divergent path immediately collapses back into a
convergent path).
You can't prove that a number leads to a divergent path by trial and
error, even if you find such a number.-a But if someone found a number
that appeared to lead to a divergent path, it may be possible to prove
that it diverges by other methods.-a If alternative cycles exist, then
they could be found by trial and error (though no one is hopeful that
this will be practically possible).
Effectively, one has to multiply an infinite series of near 0
probabilities. Or, effectively, for all values of n, the effective
probability of a non-zero probability converges to 1/Inf, or, 0...
Such arguments are invariably nonsense unless you qualify them far
better - especially when "probability" has no meaning here.
Probably doesn't count as a sufficient proof by standards of
mathematical rigor, but at least for me, I can say that it seems true
enough that it always converges (and, will always continue to converge
no matter how big the numbers get).
You are correct that it doesn't count as proof.
While in some hand wavy sense, one could try to push the probability
to infinity to defeat the infinite reciprocal, infinity does not exist
within the set of positive integers...
Let me give another argument in the same line as yours - with the same
level of "hand-wavy sense" and the same level of completely invalid mathematics.
With a ruler (unmarked) and compass, you can construct lots of things by algorithmic sequences.-a You can, for example, bisect an arbitrary angle.
-aYou can construct an exact 65537 sided polynomial.-a So if we take a random sequence of N steps, maybe it can be used to trisect an arbitrary angle.-a It's unlikely, but it is obvious that it is plausible.-a So we
can say that there is a probability "p(N)" that any sequence of length N will trisect an arbitrary angle.-a No one has found such a trisector algorithm yet, so we know p(N) is small.-a However, it is also obvious
that as the number of steps grows, so does this probability (always
limited by 1).
So now consider q(N) = 1 - p(N), the probability that no sequence of
length N an trisect an angle.-a As we have reasoned, q(N) will start at 1 (you can't trisect an angle in one step), but will drop below 1 and decrease.-a The probability that /no/ sequence of steps will be able to trisect an angle is thus the product of all q(N).-a It's a product of a decreasing sequence of numbers between 1 and 0 - clearly it converges to 0.-a Thus at some point, there has to be a sequence of steps that can trisect an angle - we have proved its existence, even though we have not figured out how to do it.
The problem with this argument is that it is bollocks.-a Real mathematics has proved that trisection of arbitrary angles by ruler and compass is impossible.
We don't yet know if the Collatz Conjecture is true or not.-a But your "probability argument" is of no more worth there than it is for
trisecting angles.
On Sat, 21 Feb 2026 15:39:42 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:
Robert Finch wrote:
On 2026-02-19 5:10 a.m., Michael S wrote:I have a minimal knowledge of Verilog and may be wrong but...
If it is smaller than Kintex KU5P then I'd strongly suspect thatThere could be more mistakes for sure. But I am sure the lowest
you didn't clean out all mistakes.
level core works. It seemed to in simulation. I made a slight mod
to it since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its
good for verification.
1 Core is:
module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;
always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end
endmodule
I am not a Verilog expert either (use VHDL exclusively), but to me the
code of Robert Finch looks o.k from the HDL perspective (functionality
is something else, I addressed it in my replay below).
His Verilog appears to be an exact equivalent of VHDL code below,
which is perfectly legal synthesizable code under VHDL-2008 rules.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity Collatz_conjecture is
port (
rst : in boolean;
clk : in std_logic;
startn : in unsigned(127 downto 0);
count : out unsigned(127 downto 0);
done : out boolean
);
end entity Collatz_conjecture;
architecture a of Collatz_conjecture is
signal n : unsigned(127 downto 0);
begin
process (clk)
begin
if rising_edge(clk) then
if rst then
n <= startn;
count <= (others => '0');
done <= false;
elsif not done then
if n(0)='0' then
n <= '0' & n(n'high downto 1);
else
n <= n+n+n+1;
end if;
if n < startn then
done <= true;
end if;
count <= count + 1;
end if;
end if;
end process;
end architecture a;
Michael S wrote:
On Sat, 21 Feb 2026 15:39:42 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:
Robert Finch wrote:
On 2026-02-19 5:10 a.m., Michael S wrote:I have a minimal knowledge of Verilog and may be wrong but...
If it is smaller than Kintex KU5P then I'd strongly suspect thatThere could be more mistakes for sure. But I am sure the lowest
you didn't clean out all mistakes.
level core works. It seemed to in simulation. I made a slight mod
to it since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its
good for verification.
1 Core is:
module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;
always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end
endmodule
I am not a Verilog expert either (use VHDL exclusively), but to me
the code of Robert Finch looks o.k from the HDL perspective
(functionality is something else, I addressed it in my replay
below). His Verilog appears to be an exact equivalent of VHDL code
below, which is perfectly legal synthesizable code under VHDL-2008
rules.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity Collatz_conjecture is
port (
rst : in boolean;
clk : in std_logic;
startn : in unsigned(127 downto 0);
count : out unsigned(127 downto 0);
done : out boolean
);
end entity Collatz_conjecture;
architecture a of Collatz_conjecture is
signal n : unsigned(127 downto 0);
begin
process (clk)
begin
if rising_edge(clk) then
if rst then
n <= startn;
count <= (others => '0');
done <= false;
elsif not done then
if n(0)='0' then
n <= '0' & n(n'high downto 1);
else
n <= n+n+n+1;
end if;
if n < startn then
done <= true;
end if;
count <= count + 1;
end if;
end if;
end process;
end architecture a;
Well then I just don't get this at all.
I have a VHDL textbook that explicitly states that IN ports are input
only, OUT ports are output only, and INOUT are input and output
(tri-state).
On Sat, 21 Feb 2026 01:56:03 -0500
Robert Finch <robfi680@gmail.com> wrote:
On 2026-02-19 5:10 a.m., Michael S wrote:
On Wed, 18 Feb 2026 22:56:19 -0500Yes. 128-bit processing.
Robert Finch <robfi680@gmail.com> wrote:
On 2026-02-18 11:22 a.m., Michael S wrote:
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200
MHz. Some of the number would take a large number of iterations
though.
I found something strange, the code to count for more numbers
does not increase the size of the core very much. (But it takes
longer to build.) I am not sure if this occurs because there is
a mitsake somewhere, or if the tools are able to figure out how
to share all the logic. I am thinking the logic for adders is
shared somehow. 128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible.
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler
that you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as
after success - waiting for next initial value.
I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.
There is really simple scheduling ATM. There is just a bunch of
2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different
units. Supplying the same data to all units simultaneously is a
sort of mistake that explains your observed behavior.
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.
3. SupervisorYeah 8,000 was definitely out of the picture. I had since coded
Again, in proof-of-concept design the supervisor does not have to
be very realistic. But it should be able to selectively exercise
all core feature related to alert, including fetching of initial
data and clearing of alert.
After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you
find huge FPGA in which such number of testers could fit (and I
am not sure that such FPGA exists) the compilation will take many
hours, possibly days.
24x24 matrix (576) and although the logic would fit that turned
out not to be able to route. So, it is down to 20x20. 20x20 with
no real scheduling fits.
I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.
Does 20x20 means 400 cores with something like 80-bit initial values
and 128-bit processing?
If you didn't make further mistakes then it should be rather big,
order of 300K LEs. It would not fit in the biggest low-end Altera
device (Cyclone 10GX 10CX220).
IIRC it is about 150k LUTs. It uses the simplest approach, which is
not very fast iteration wise, but it allows a lot of cores and a fast
clock cycles time. One core is very small and has a fast iteration (1
clock cycle). But it takes a lot more iterations than can be done
with a more brainiac approach.
Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.
According to my experiments, combining 2 steps have very small negative effect on achivable clock and increases area by ~1.8x. So, to me it
looks like a win.
That's the combined step I am talking about:
x = n/4;
switch (x % 4) {
case 0: n = x; break;
case 1: n = 3*x+1; break;
case 2: n = 2*x+2; break;
case 3: n = 9*x+8; break;
}
On the Xilinx side, it would not fit in any Spartan and in mostI am not using an too inexpensive device. I have a Kintex-325 with
Artix devices. May be, it would fit in the biggest Artix
UltraScale+ (AU25P). But more realistically for 400 test units one
would want Arria 10 on Altera side or Kintex on Xilinx side. Both
are not prohibitively expensive, but not cheap.
What device are you using?
about 200k LUTs.
According to Xilinx marketing materials XC7K325T has 326080 equivalents
of traditional LEs.
That explains why your design fits.
Pay attention that while Kintex evaluation boards could be
non-expensive, right now buying devices for actual production work is
rather costly, especially in small quantities.
The lowest price I see on digikey site is 1147 USD. That's a bad value relatively to many other FPGAs.
If it is smaller than Kintex KU5P then I'd strongly suspect that youThere could be more mistakes for sure. But I am sure the lowest level
didn't clean out all mistakes.
core works. It seemed to in simulation. I made a slight mod to it
since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good
for verification.
1 Core is:
module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;
always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end
endmodule
Ok.
1. Your core misses one critical part - overflow check.
Assuming 80-bit initial count, 128-bit register will overflow
eventually. It would happen very rarely, but it would happen.
The check for overflow does not have to be precise. Conservative
approach, like looking at 2 MS bits and one LS bit and signaling
overflow not when it really happened but when it *could* happen, i.e.
when MS bits > 0 and LS bit='1' is good enough. But the check can't be omitted.
2. I see that instead of calculating n = (n*3+1)/2 as one step you are
doing it in two separate steps. I am not sure that it is really makes
your core smaller or faster. And it certainly increases # of iterations
by 1/3rd.
3.
128-bit count is pointless loss of resources. 12 or 13 bit are
sufficient. If the count exceeded that size then you already found
something of high interest and it is a job of supervisor to find out
what exactly had happened.
128-bit counter actually not just too big, it is also pointless,
because if we pretend to believe that our purpose is to disprove
Collatz's conjecture then supervisor would have to implement stop
condition with some sort of timeout that will effectively duplicate functionality of the counter.
4.
Your module does not latch the value of startn. It means that caller
module has to do it on it's behalf. That is the most likely place where
one can make a mistake.
5.
Coding style comment. Normally, name rst is used for asynchronous
reset signal. In your design rst is a synchronous signal. For the
benefit of casual reader I suggest to find different name.
I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems clear
that constant-factor improvements are completely irrelevant (even with
large constants).
I guess it all boils down to the "useless" part of the title?
On 2026-02-21 1:36 p.m., Michael S wrote:
On Sat, 21 Feb 2026 01:56:03 -0500Re-wrote the core using Michael's suggestion of iterations. Looks
Robert Finch <robfi680@gmail.com> wrote:
On 2026-02-19 5:10 a.m., Michael S wrote:
On Wed, 18 Feb 2026 22:56:19 -0500Yes. 128-bit processing.
Robert Finch <robfi680@gmail.com> wrote:
On 2026-02-18 11:22 a.m., Michael S wrote:
On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:
I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200
MHz. Some of the number would take a large number of iterations
though.
I found something strange, the code to count for more numbers
does not increase the size of the core very much. (But it takes
longer to build.) I am not sure if this occurs because there is
a mitsake somewhere, or if the tools are able to figure out how
to share all the logic. I am thinking the logic for adders is
shared somehow. 128-bit arithmetic is being used.
There is a mistake in your code. No other explanation is possible. >>>>>>
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value. >>>>>> -a-a-a That is successful ending. Here you signal to the scheduler >>>>>> that you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks >>>>>> like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and >>>>>> stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as
after success - waiting for next initial value.
I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.
There is really simple scheduling ATM. There is just a bunch of
2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different
units. Supplying the same data to all units simultaneously is a
sort of mistake that explains your observed behavior.
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.
3. SupervisorYeah 8,000 was definitely out of the picture. I had since coded
Again, in proof-of-concept design the supervisor does not have to
be very realistic. But it should be able to selectively exercise
all core feature related to alert, including fetching of initial
data and clearing of alert.
After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you
find huge FPGA in which such number of testers could fit (and I
am not sure that such FPGA exists) the compilation will take many
hours, possibly days.
24x24 matrix (576) and although the logic would fit that turned
out not to be able to route. So, it is down to 20x20. 20x20 with
no real scheduling fits.
I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.
Does 20x20 means 400 cores with something like 80-bit initial values
and 128-bit processing?
If you didn't make further mistakes then it should be rather big,
order of 300K LEs. It would not fit in the biggest low-end Altera
device (Cyclone 10GX 10CX220).
IIRC it is about 150k LUTs. It uses the simplest approach, which is
not very fast iteration wise, but it allows a lot of cores and a fast
clock cycles time. One core is very small and has a fast iteration (1
clock cycle). But it takes a lot more iterations than can be done
with a more brainiac approach.
Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.
According to my experiments, combining 2 steps have very small negative
effect on achivable clock and increases area by ~1.8x. So, to me it
looks like a win.
That's the combined step I am talking about:
-a-a x = n/4;
-a-a switch (x % 4) {
-a-a-a-a case 0: n = x; break;
-a-a-a-a case 1: n = 3*x+1; break;
-a-a-a-a case 2: n = 2*x+2; break;
-a-a-a-a case 3: n = 9*x+8; break;
-a-a }
On the Xilinx side, it would not fit in any Spartan and in mostI am not using an too inexpensive device. I have a Kintex-325 with
Artix devices. May be, it would fit in the biggest Artix
UltraScale+ (AU25P). But more realistically for 400 test units one
would want Arria 10 on Altera side or Kintex on Xilinx side. Both
are not prohibitively expensive, but not cheap.
What device are you using?
about 200k LUTs.
According to Xilinx marketing materials XC7K325T has 326080 equivalents
of traditional LEs.
That explains why your design fits.
Pay attention that while Kintex evaluation boards could be
non-expensive, right now buying devices for actual production work is
rather costly, especially in small quantities.
The lowest price I see on digikey site is 1147 USD. That's a bad value
relatively to many other FPGAs.
If it is smaller than Kintex KU5P then I'd strongly suspect that youThere could be more mistakes for sure. But I am sure the lowest level
didn't clean out all mistakes.
core works. It seemed to in simulation. I made a slight mod to it
since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good
for verification.
1 Core is:
module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;
always_ff @(posedge clk)
if (rst) begin
-a-a-a-an <= startn;
-a-a-a-acount <= 0;
-a-a-a-adone <= FALSE;
end
else begin
-a-a-a-aif (!done) begin
-a-a-a-a-a-a-a if (~n[0])
-a-a-a-a-a-a-a-a-a-a-a n <= n >> 2'd1;
-a-a-a-a-a-a-a else
-a-a-a-a-a-a-a-a-a-a-a n <= n+n+n+1;
-a-a-a-a-a-a-a if (n < startn)
-a-a-a-a-a-a-a-a-a-a-a done <= TRUE;
-a-a-a-a-a-a-a count <= count + 1;
-a-a-a-aend
end
endmodule
Ok.
1. Your core misses one critical part - overflow check.
Assuming 80-bit initial count, 128-bit register will overflow
eventually. It would happen very rarely, but it would happen.
The check for overflow does not have to be precise. Conservative
approach, like looking at 2 MS bits and one LS bit and signaling
overflow not when it really happened but when it *could* happen, i.e.
when MS bits > 0 and LS bit='1' is good enough. But the check can't be
omitted.
2. I see that instead of calculating n = (n*3+1)/2 as one step you are
doing it in two separate steps. I am not sure that it is really makes
your core smaller or faster. And it certainly increases # of iterations
by 1/3rd.
3.
128-bit count is pointless loss of resources. 12 or 13 bit are
sufficient. If the count exceeded that size then you already found
something of high interest and it is a job of supervisor to find out
what exactly had happened.
128-bit counter actually not just too big, it is also pointless,
because if we pretend to believe that our purpose is to disprove
Collatz's conjecture then supervisor would have to implement stop
condition with some sort of timeout that will effectively duplicate
functionality of the counter.
4.
Your module does not latch the value of startn. It means that caller
module has to do it on it's behalf. That is the most likely place where
one can make a mistake.
5.
Coding style comment. Normally, name rst is used for asynchronous
reset signal. In your design rst is a synchronous signal. For the
benefit of casual reader I suggest to find different name.
something like the below code.
Renamed the rst signal ld to better reflect its usage.
Timing is close to the same, but not as many cores can fit. It is a trade-off, fewer cores, but they process more iterations in one go. Net result is more numbers processed in the same length of time.
I am using 16x16 cores now. Overflow check using 130-bit numbers.
import const_pkg::*;
module Collatz_conjecture(clk, ld, startn, done, overflow);
input clk;
input ld;
input [127:0] startn;
output reg done;
output reg overflow;
reg [129:0] n;
always_ff @(posedge clk)
if (ld)
-a-a-a-an <= 130'(startn);
else begin
-a-a-a-aif (!done)
-a-a-a-a-a-a-a case(n[1:0])
-a-a-a-a-a-a-a 2'd0:-a-a-a n <= n >> 2'd2;
-a-a-a-a-a-a-a 2'd1:-a-a-a n <= n[129:2]+n[129:2]+n[129:2]+2'd1;
-a-a-a-a-a-a-a 2'd2:-a-a-a n <= n[129:2]+n[129:2]+2'd2;
-a-a-a-a-a-a-a 2'd3:-a-a-a n <= {n[129:2],3'd0}+n[129:2]+4'd8;
-a-a-a-a-a-a-a endcase
end
always_ff @(posedge clk)
if (ld)
-a-a-a-adone <= FALSE;
else begin
-a-a-a-aif (!done) begin
-a-a-a-a-a-a-a if (n < startn || (|n[129:128]))
-a-a-a-a-a-a-a-a-a-a-a done <= TRUE;
-a-a-a-aend
end
always_ff @(posedge clk)
if (ld)
-a-a-a-aoverflow <= FALSE;
else begin
-a-a-a-aif (!done) begin
-a-a-a-a-a-a-a if (|n[129:128])
-a-a-a-a-a-a-a-a-a-a-a overflow <= TRUE;
-a-a-a-aend
end
endmodule
I was initially seeing it more like a very large hash table.
But, I have since realized it is not a hash table, it is a more
organized pattern.
But, yeah, not a math person, and I will make no claim of having
"proven" anything...
Rather, my aim was mostly to try to start organizing stuff enough to
where I could get a sense of why it works, and doesn't explode off into infinite numbers.
But, will note that this whole area is one major reason I am not really
into math or physics. As a practical tool, they are useful, or can be
seen as interesting detours.
But, proofs kinda ruin the whole experience.
But, yeah, at least for me, now seeing it as a sort of cone of
power-of-2 rings with all paths forming into spirals which end at 1,
each odd step effectively going half-way around a spiral formed from the number line but then ending up at an incremented position, etc, seems
like a way to see why it does why it does what it does.
Even if it doesn't count for anything, it isn't much worse than trying
to test it empirically, which as noted, also does not count (it could
only count if an exception could be found, but an exception wont be
found no matter how big the search space becomes, so...).
But, no proofs to be found here, I will not make this claim, in any case.
It is a mathematically very interesting problem. Whether there is anything >> to be gained by continuing the search for exceptions by brute force, is
a different matter - but /if/ it is worth doing, then doing so more
efficiently is a good idea.
AFAIK, in such games "more efficiently" tends to me largely irrelevant
if it's more efficient by a (non-astronomical) constant factor.
Instead, you need it to be algorithmically more efficient.
On 2/19/2026 2:27 PM, Stefan Monnier wrote:
I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems clear
that constant-factor improvements are completely irrelevant (even with
large constants).
I guess it all boils down to the "useless" part of the title?
Yeah, basically.
It is useless in the sense that no matter how big you go, an exception
wont be found, so rendering further testing essentially meaningless
(doesn't prove anything unless an exception is found, exception wont be found if none can exist).
Or, the seeming futility of trying to send humans on interstellar trips. Basically, no way to send full sized humans interstellar within any plausible timescale or resource/energy budget.
Sure - algorithmic improvements usually (but not always in practice)
lead to the biggest efficiency gains. But if you are doing a "test
every number up to N" check of a conjecture, /any/ efficiency
improvements are helpful.
BGB <cr88192@gmail.com> schrieb:
On 2/19/2026 2:27 PM, Stefan Monnier wrote:
I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems clear >>> that constant-factor improvements are completely irrelevant (even with
large constants).
I guess it all boils down to the "useless" part of the title?
Yeah, basically.
It is useless in the sense that no matter how big you go, an exception
wont be found, so rendering further testing essentially meaningless
(doesn't prove anything unless an exception is found, exception wont be
found if none can exist).
You are assuming that the Collatz conjecture is true. But this
is unknown, and the whole point of the excercise.
Do I personally think that exchaustive search will find something? No.
Does my personal belief matter in the slightest? No.
[...]
Or, the seeming futility of trying to send humans on interstellar trips.
Basically, no way to send full sized humans interstellar within any
plausible timescale or resource/energy budget.
My favorite candiate for that is the nuclear salt water rocket.
Which will be difficult to build, or so my son (who just received
his master's degree in nuclear engineering) tells me because
it needs to be prompt critical.
My own looking into this led me to realize that the number-space is
modulo (relative to a cone with a spiral that repeats with every power-of-2), and that all the odd steps only ever increment the phase
angle by 1, or fall down to a lower level (whenever the phase angle is
not incremented).
In effect, each odd step jumps to the opposite side of the cone, but slightly increments the angle. Since the phase-angle path can never move backwards, it can't touch any previous paths, thus, no loops can happen
this way.
The only way a cycle could exist is if (somehow) there were an
impassable wall where pretty much *no* value could descend unless it
hits the power-of-2 descent path, and then somehow miss this path as well.
But, since "3*n+1" results in the minimum forward step, this means that
one of the sides *has* to hit the power-of-2 descent cases at some point.
BGB <cr88192@gmail.com> schrieb:
My own looking into this led me to realize that the number-space is
modulo (relative to a cone with a spiral that repeats with every
power-of-2), and that all the odd steps only ever increment the phase
angle by 1, or fall down to a lower level (whenever the phase angle is
not incremented).
In effect, each odd step jumps to the opposite side of the cone, but
slightly increments the angle. Since the phase-angle path can never move
backwards, it can't touch any previous paths, thus, no loops can happen
this way.
The only way a cycle could exist is if (somehow) there were an
impassable wall where pretty much *no* value could descend unless it
hits the power-of-2 descent path, and then somehow miss this path as well. >>
But, since "3*n+1" results in the minimum forward step, this means that
one of the sides *has* to hit the power-of-2 descent cases at some point.
The same argument would hold for the 3n-1 problem, where there is
more than one cycle.
[...]
On 2/24/2026 12:44 AM, Thomas Koenig wrote:
BGB <cr88192@gmail.com> schrieb:
My own looking into this led me to realize that the number-space is
modulo (relative to a cone with a spiral that repeats with every
power-of-2), and that all the odd steps only ever increment the phase
angle by 1, or fall down to a lower level (whenever the phase angle is
not incremented).
In effect, each odd step jumps to the opposite side of the cone, but
slightly increments the angle. Since the phase-angle path can never move >>> backwards, it can't touch any previous paths, thus, no loops can happen
this way.
The only way a cycle could exist is if (somehow) there were an
impassable wall where pretty much *no* value could descend unless it
hits the power-of-2 descent path, and then somehow miss this path as
well.
But, since "3*n+1" results in the minimum forward step, this means that
one of the sides *has* to hit the power-of-2 descent cases at some
point.
The same argument would hold for the 3n-1 problem, where there is
more than one cycle.
At least going by "the power of imagination" (would require testing to confirm):
"3*n-1" would be different in that it neither reliably advances the
phase angle, nor reliably moves it backwards. So, one is likely to end
up with pairs of odd-steps that loop back to the same position, creating infinite cycles (so, it is not the same geometry).
"3*n-3" might be stable (may appear to behave similarly to 3*n+1), but
would instead advance the phase angle in the opposite direction.
"3*n+3" should initially appear similar to "3*n+1", but would have paths that ascend to infinity (paths would exist which manage to miss all of
the even steps and also miss the magic "power-of-2 auto-win" step).
If the adjustment offset is even, then the "(3*n+1)/2" trick breaks, and also all paths that hit an odd number will ascend to infinity.
Granted, may make sense to check/confirm, if these don't match up I
might need to revisit my mental model.
On 2/24/2026 1:38 AM, BGB wrote:
On 2/24/2026 12:44 AM, Thomas Koenig wrote:
BGB <cr88192@gmail.com> schrieb:
My own looking into this led me to realize that the number-space is
modulo (relative to a cone with a spiral that repeats with every
power-of-2), and that all the odd steps only ever increment the phase
angle by 1, or fall down to a lower level (whenever the phase angle is >>>> not incremented).
In effect, each odd step jumps to the opposite side of the cone, but
slightly increments the angle. Since the phase-angle path can never
move
backwards, it can't touch any previous paths, thus, no loops can happen >>>> this way.
The only way a cycle could exist is if (somehow) there were an
impassable wall where pretty much *no* value could descend unless it
hits the power-of-2 descent path, and then somehow miss this path as
well.
But, since "3*n+1" results in the minimum forward step, this means that >>>> one of the sides *has* to hit the power-of-2 descent cases at some
point.
The same argument would hold for the 3n-1 problem, where there is
more than one cycle.
Then, went and took a shower, realized there was more to this:
At least going by "the power of imagination" (would require testing to
confirm):
"3*n-1" would be different in that it neither reliably advances the
phase angle, nor reliably moves it backwards. So, one is likely to end
up with pairs of odd-steps that loop back to the same position,
creating infinite cycles (so, it is not the same geometry).
Imagination is mostly saying there will be a lot of fairly short loops
here.
There should probably be a quite significant number of short loops in
this case.
"3*n-3" might be stable (may appear to behave similarly to 3*n+1), but
would instead advance the phase angle in the opposite direction.
But, "3*n-3" will change the ending state, so it will not land on 1
anymore.
Rather:
-a Around 84% of the time it end land on 3;
-a Around 16% of the time it end land on 0.
Give or take... ("imagination" showing around 84% or so of paths landing
on 3).
Will probably still converge, just ending up at a different end state.
All other negative offset paths would lead to underflow.
So, can probably exclude cases where the value space goes negative.
Though... "looks like" if these were included there is another cone on
the other side, if one goes this way.
"3*n+3" should initially appear similar to "3*n+1", but would have
paths that ascend to infinity (paths would exist which manage to miss
all of the even steps and also miss the magic "power-of-2 auto-win"
step).
If the adjustment offset is even, then the "(3*n+1)/2" trick breaks,
and also all paths that hit an odd number will ascend to infinity.
Further assertion 1:
All other positive offset paths should contain paths to infinity (along
with jostling around the ending states for those paths which do converge).
So, the specific end-state behavior seems unique.
Further assertion 2:
-a "3*n+1" may be unique in this space.
Or, possibly the only one that reliably converges, lands on 1, and does
not contain cycles.
Granted, may make sense to check/confirm, if these don't match up I
might need to revisit my mental model.
Still TODO, but for me, shower seemed like a bigger immediate priority.
So, yeah, I guess I may report back later as to whether or not the
actual math behaves the same way as it does in my imagination.
"3*n-1" would be different in that it neither reliably advances the
phase angle, nor reliably moves it backwards. So, one is likely to end
up with pairs of odd-steps that loop back to the same position, creating infinite cycles (so, it is not the same geometry).
BGB <cr88192@gmail.com> schrieb:
"3*n-1" would be different in that it neither reliably advances the
phase angle, nor reliably moves it backwards. So, one is likely to end
up with pairs of odd-steps that loop back to the same position, creating
infinite cycles (so, it is not the same geometry).
What is this phase angle? I am only familiar with that term for
complex numbers.
On 22/02/2026 19:38, BGB wrote:
I was initially seeing it more like a very large hash table.
But, I have since realized it is not a hash table, it is a more
organized pattern.
As far as we know, the result of apply the Collatz function (calculating
the number of steps to reach 1, if it ever does) follows a pattern statistically but not any clear pattern for individual values. There
are a lot of mathematical functions like that - a good example being the number of prime factors in a number. You can draw graphs and visually
see nice patterns - some of which can be proven, others are still
unproven and may not apply over a big enough scale. Sometimes there are tricks or hints to ways to improve your chances when guessing which
values might be outliers.
David Brown <david.brown@hesbynett.no> posted:
On 22/02/2026 19:38, BGB wrote:
I was initially seeing it more like a very large hash table.
But, I have since realized it is not a hash table, it is a more
organized pattern.
As far as we know, the result of apply the Collatz function (calculating
the number of steps to reach 1, if it ever does) follows a pattern
statistically but not any clear pattern for individual values. There
are a lot of mathematical functions like that - a good example being the
number of prime factors in a number. You can draw graphs and visually
see nice patterns - some of which can be proven, others are still
unproven and may not apply over a big enough scale. Sometimes there are
tricks or hints to ways to improve your chances when guessing which
values might be outliers.
Given by definition: even: n = n/2
odd: n = 3|un+1 {which is even}
The most one can get out of an iteration is n|u1.5 {for large n} while
the most one can loose from an iteration is n/2
So, there will be vanishingly small numbers of n which see the odd
iteration enough times to out weight the even iterations. Thus one
would expect the conjecture is true.
MitchAlsup wrote:
Robert Finch <robfi680@gmail.com> posted:
On 2026-02-15 7:14 a.m., David Brown wrote:
On 15/02/2026 09:58, Robert Finch wrote:
I wonder how difficult it would be to implement a parallel tester in >>>>> an FPGA. It looks simple enough and there are hundreds of DSP blocks >>>>> available to use. Could test in blocks of hundreds of numbers at a time. >>>>> Running at 100 MHz * 200 tests at a time = how long to get to 2^71?
One of the nice things about this is that you don't need DSP blocks - it >>> Yeah, I figured that out after coding. 3n+1 = n+n+n+1
3n+1 = (n<<1)+n+1
one add with carry in and n gets wired in directly and also shifted
up by 1 (no logic shifter).
Even better:
(3n+1)/2 = n + (n>>1) + 1
I.e two iterations for the cost of one.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 59 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 02:06:24 |
| Calls: | 810 |
| Files: | 1,287 |
| Messages: | 200,610 |