Forum: Too Lazy BBS

A useless machine

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Feb 14 09:57:13 2026

From Newsgroup: comp.arch

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,

f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

What could specialized hardware look like?

The central part could be a unit which is handed a block of numbers
to check, for example a range of 2^32 numbers. The current records
for number of iterations and maximum number reached would also be
stored in that unit.

In that unit, the calculaiton would be performed by a combination
of adder and shifter. The input would always be odd (so the last
bit could be dropped), and it would calculate (3*n+1)>>ctz(3*n+1).

The adder would compute (3*n+1)/2, and would be specialzed 128-bit
adder. Why specialized? The logic is far less complex than a
general-purpose adder. For example, a two-bit adder has the formula

y1 = (!a1&a0) | (!c_in&a1&!a0);
y0 = (c_in&!a1&!a0) | (!c_in&a1&!a0) | (a1&a0);

and its propagate and generate bits are

p = (a1);
g = (a1&a0);

The maximum number of trailing zeros after the (3*n+1)/2 step is
three and can be determined cheaply from bits <3:1> of the input,
so the input to the shifter (two levels only) can be provided
in parallel. The machine would be designed that adding and
shifting can be done in a single cycle.

The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.

Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.

Which numbers to chose? It is well known that only certain
numbers with remainders modulo 2^n are interesting. The number
can be seen on OEIS A076227. For example, modulo 2^10, only 64
numbers are "interesting", or 6.25% (but only eight bits would
be needed to store them because the last two are always one).
Modulo 2^16 would be 734 numbers, or 3.22%, so this would cut the
amount of numbers to be checked by half. One would use a small ROM,
I presume. The number of bits would be a tradeoff between area and
amount of compuation. My current guess would be that 2^10 could
be a sweet spot. One third of the numbers would further be
excluded by checking for divisiblity by three.

The problem is embarassingly parallel. One could chose a size so
each parcel is calculated in a few seconds or even minutes. to avoid
having to avoid a large amount of communication with the outside.

Because all units would be working all the time, and if one has
a smaller unit one could simply build more of them, I suspect that
power would be the main limiting factor. An adder design that
might not be very fast, but uses lower power, low voltage and low
frequency would be beneficial.

Comments?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Feb 14 18:48:39 2026

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,

f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

What could specialized hardware look like?

The central part could be a unit which is handed a block of numbers
to check, for example a range of 2^32 numbers. The current records
for number of iterations and maximum number reached would also be
stored in that unit.

In that unit, the calculaiton would be performed by a combination
of adder and shifter. The input would always be odd (so the last
bit could be dropped), and it would calculate (3*n+1)>>ctz(3*n+1).

The adder would compute (3*n+1)/2, and would be specialzed 128-bit
adder. Why specialized? The logic is far less complex than a general-purpose adder. For example, a two-bit adder has the formula

y1 = (!a1&a0) | (!c_in&a1&!a0);
y0 = (c_in&!a1&!a0) | (!c_in&a1&!a0) | (a1&a0);

and its propagate and generate bits are

p = (a1);
g = (a1&a0);

The maximum number of trailing zeros after the (3*n+1)/2 step is
three and can be determined cheaply from bits <3:1> of the input,
so the input to the shifter (two levels only) can be provided
in parallel. The machine would be designed that adding and
shifting can be done in a single cycle.

One can also prove that certain kinds of carry chains would never
be a speed path in this calculation and guarantee that both shifter
and adder are 1 cycle.

The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.

Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.

Which numbers to chose? It is well known that only certain
numbers with remainders modulo 2^n are interesting. The number
can be seen on OEIS A076227. For example, modulo 2^10, only 64
numbers are "interesting", or 6.25% (but only eight bits would
be needed to store them because the last two are always one).
Modulo 2^16 would be 734 numbers, or 3.22%, so this would cut the
amount of numbers to be checked by half. One would use a small ROM,
I presume. The number of bits would be a tradeoff between area and
amount of compuation. My current guess would be that 2^10 could
be a sweet spot. One third of the numbers would further be
excluded by checking for divisiblity by three.

The problem is embarassingly parallel. One could chose a size so
each parcel is calculated in a few seconds or even minutes. to avoid
having to avoid a large amount of communication with the outside.

Because all units would be working all the time, and if one has
a smaller unit one could simply build more of them, I suspect that
power would be the main limiting factor. An adder design that
might not be very fast, but uses lower power, low voltage and low
frequency would be beneficial.

Comments?

Interesting problem,
waste of power.
--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sat Feb 14 20:18:26 2026

From Newsgroup: comp.arch

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,

f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that? "Induction" says nothing of the sort.

The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers. There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.

Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/. There are plenty
of mathematical conjectures that are true up to very large numbers, but
fail later on.

Comments?

Interesting problem,
waste of power.

It is a mathematically very interesting problem. Whether there is
anything to be gained by continuing the search for exceptions by brute
force, is a different matter - but /if/ it is worth doing, then doing so
more efficiently is a good idea.

--- Synchronet 3.21b-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Feb 14 21:22:15 2026

From Newsgroup: comp.arch

David Brown <david.brown@hesbynett.no> posted:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,

f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that? "Induction" says nothing of the sort.

The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers. There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.

By proving up to 2^71, you simultaneously prove that it cannot be proven
by computer-like arithmetic processes. What if the number was 2^71,000,000
you still have not proven it? Thereby, it is not provable by simple computer arithmetic processes.

Ergo, you need a different means.

Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/. There are plenty
of mathematical conjectures that are true up to very large numbers, but
fail later on.

You need cannot even /prove/ it using BigNum computer arithmetic in finite time.

Comments?

Interesting problem,
waste of power.

It is a mathematically very interesting problem. Whether there is
anything to be gained by continuing the search for exceptions by brute force, is a different matter - but /if/ it is worth doing, then doing so more efficiently is a good idea.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat Feb 14 13:30:36 2026

From Newsgroup: comp.arch

On 2/14/2026 1:22 PM, MitchAlsup wrote:

David Brown <david.brown@hesbynett.no> posted:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,

f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that? "Induction" says nothing of the sort.

The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers. There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.

By proving up to 2^71, you simultaneously prove that it cannot be proven
by computer-like arithmetic processes. What if the number was 2^71,000,000 you still have not proven it? Thereby, it is not provable by simple computer arithmetic processes.

While it is certainly true that you can't prove it, no matter how large
a number you get up to, you may be able to disprove it, which would be important in and of itself.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21b-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.arch on Sat Feb 14 21:19:53 2026

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,

f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

What could specialized hardware look like?

The central part could be a unit which is handed a block of numbers
to check, for example a range of 2^32 numbers. The current records
for number of iterations and maximum number reached would also be
stored in that unit.

In that unit, the calculaiton would be performed by a combination
of adder and shifter. The input would always be odd (so the last
bit could be dropped), and it would calculate (3*n+1)>>ctz(3*n+1).

The adder would compute (3*n+1)/2, and would be specialzed 128-bit
adder. Why specialized? The logic is far less complex than a general-purpose adder. For example, a two-bit adder has the formula

y1 = (!a1&a0) | (!c_in&a1&!a0);
y0 = (c_in&!a1&!a0) | (!c_in&a1&!a0) | (a1&a0);

and its propagate and generate bits are

p = (a1);
g = (a1&a0);

The maximum number of trailing zeros after the (3*n+1)/2 step is
three and can be determined cheaply from bits <3:1> of the input,
so the input to the shifter (two levels only) can be provided
in parallel. The machine would be designed that adding and
shifting can be done in a single cycle.

The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.

Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.

Which numbers to chose? It is well known that only certain
numbers with remainders modulo 2^n are interesting. The number
can be seen on OEIS A076227. For example, modulo 2^10, only 64
numbers are "interesting", or 6.25% (but only eight bits would
be needed to store them because the last two are always one).
Modulo 2^16 would be 734 numbers, or 3.22%, so this would cut the
amount of numbers to be checked by half. One would use a small ROM,
I presume. The number of bits would be a tradeoff between area and
amount of compuation. My current guess would be that 2^10 could
be a sweet spot. One third of the numbers would further be
excluded by checking for divisiblity by three.

The problem is embarassingly parallel. One could chose a size so
each parcel is calculated in a few seconds or even minutes. to avoid
having to avoid a large amount of communication with the outside.

Because all units would be working all the time, and if one has
a smaller unit one could simply build more of them, I suspect that
power would be the main limiting factor. An adder design that
might not be very fast, but uses lower power, low voltage and low
frequency would be beneficial.

Comments?

Amusing. I predict that ultimately it will be a waste of
effort because the Collatz conjecture is true even though
currently unproven. Also it seems likely that no mathematical
insights will result from any sort of brute force approach to
disprove it, even if the "brute force" scheme is selective.

If anyone is interested in looking into questions regarding the
Collatz conjecture, I suggest considering a generalization

f[k](n) = n/2 if n is even, 3n+k if n is odd, for an fixed odd k

Empirically you should quickly find

There is always a cycle starting at k. (Duh!)

There is only one cycle if k is a power of 3. (This result
follows from the Collatz conjecture.)

If k is not a power of 3, there are always at least two cycles,
and can be more.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Sun Feb 15 03:58:28 2026

From Newsgroup: comp.arch

On 2026-02-14 4:57 a.m., Thomas Koenig wrote:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

(A reminder, if any is needed: From a starting number n, the
transformation, recursively applied,

f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

What could specialized hardware look like?

The central part could be a unit which is handed a block of numbers
to check, for example a range of 2^32 numbers. The current records
for number of iterations and maximum number reached would also be
stored in that unit.

In that unit, the calculaiton would be performed by a combination
of adder and shifter. The input would always be odd (so the last
bit could be dropped), and it would calculate (3*n+1)>>ctz(3*n+1).

The adder would compute (3*n+1)/2, and would be specialzed 128-bit
adder. Why specialized? The logic is far less complex than a general-purpose adder. For example, a two-bit adder has the formula

y1 = (!a1&a0) | (!c_in&a1&!a0);
y0 = (c_in&!a1&!a0) | (!c_in&a1&!a0) | (a1&a0);

and its propagate and generate bits are

p = (a1);
g = (a1&a0);

The maximum number of trailing zeros after the (3*n+1)/2 step is
three and can be determined cheaply from bits <3:1> of the input,
so the input to the shifter (two levels only) can be provided
in parallel. The machine would be designed that adding and
shifting can be done in a single cycle.

The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.

Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.

Which numbers to chose? It is well known that only certain
numbers with remainders modulo 2^n are interesting. The number
can be seen on OEIS A076227. For example, modulo 2^10, only 64
numbers are "interesting", or 6.25% (but only eight bits would
be needed to store them because the last two are always one).
Modulo 2^16 would be 734 numbers, or 3.22%, so this would cut the
amount of numbers to be checked by half. One would use a small ROM,
I presume. The number of bits would be a tradeoff between area and
amount of compuation. My current guess would be that 2^10 could
be a sweet spot. One third of the numbers would further be
excluded by checking for divisiblity by three.

The problem is embarassingly parallel. One could chose a size so
each parcel is calculated in a few seconds or even minutes. to avoid
having to avoid a large amount of communication with the outside.

Because all units would be working all the time, and if one has
a smaller unit one could simply build more of them, I suspect that
power would be the main limiting factor. An adder design that
might not be very fast, but uses lower power, low voltage and low
frequency would be beneficial.

Comments?

I wonder how difficult it would be to implement a parallel tester in an
FPGA. It looks simple enough and there are hundreds of DSP blocks
available to use. Could test in blocks of hundreds of numbers at a time. Running at 100 MHz * 200 tests at a time = how long to get to 2^71?

I wonder if the calculation can be broken down further. Numbers are all
just defined by rules. I wonder if some calculus may apply. Limit of
function approaches one for a finite number of steps. There may be a
whole range of algorithms, is there one for the number two? Then the
number three?

Occasionally I have made useless machinery. Great puzzle work. Made the
game of life in hardware.

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sun Feb 15 13:14:21 2026

From Newsgroup: comp.arch

On 15/02/2026 09:58, Robert Finch wrote:

I wonder how difficult it would be to implement a parallel tester in an FPGA. It looks simple enough and there are hundreds of DSP blocks
available to use. Could test in blocks of hundreds of numbers at a time. Running at 100 MHz * 200 tests at a time = how long to get to 2^71?

One of the nice things about this is that you don't need DSP blocks - it
all works well with very simple operations well-suited to an FPGA. Your calculation blocks would hold an n-bit number (wider than the number you
are testing for, since rounds of the Collatz function can grow a lot)
and a counter. Initialise the block to the number you want to test.
For each round, check the LSB. If it is 0, right-shift your register.
If it is even, replace "x" with "(x << 1) + x". That's just some simple
logic - no DSP blocks involved. If your register or your counter
overflow, you've found an "interesting" number and can pass that out to
a PC to test it fully.

There's lots of scope for both high-level work (the housekeeping and
tracking of operations for each of your worker elements) and low-level
work (getting the carry operations to work at high speed).

I wonder if the calculation can be broken down further. Numbers are all
just defined by rules. I wonder if some calculus may apply. Limit of function approaches one for a finite number of steps. There may be a
whole range of algorithms, is there one for the number two? Then the
number three?

Occasionally I have made useless machinery. Great puzzle work. Made the
game of life in hardware.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 15 15:35:26 2026

From Newsgroup: comp.arch

On Sun, 15 Feb 2026 03:58:28 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-14 4:57 a.m., Thomas Koenig wrote:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

(A reminder, if any is needed: From a starting number n, the transformation, recursively applied,

f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

All the work is done on general-purpose hardware, which has many capabilities that are not needed, and consume area and power that special-purpose hardware would not need. Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

What could specialized hardware look like?

The central part could be a unit which is handed a block of numbers
to check, for example a range of 2^32 numbers. The current records
for number of iterations and maximum number reached would also be
stored in that unit.

In that unit, the calculaiton would be performed by a combination
of adder and shifter. The input would always be odd (so the last
bit could be dropped), and it would calculate (3*n+1)>>ctz(3*n+1).

The adder would compute (3*n+1)/2, and would be specialzed 128-bit
adder. Why specialized? The logic is far less complex than a general-purpose adder. For example, a two-bit adder has the formula

y1 = (!a1&a0) | (!c_in&a1&!a0);
y0 = (c_in&!a1&!a0) | (!c_in&a1&!a0) | (a1&a0);

and its propagate and generate bits are

p = (a1);
g = (a1&a0);

The maximum number of trailing zeros after the (3*n+1)/2 step is
three and can be determined cheaply from bits <3:1> of the input,
so the input to the shifter (two levels only) can be provided
in parallel. The machine would be designed that adding and
shifting can be done in a single cycle.

The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.

Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked
against the previous, pre-set maximum, with would also signal to
the outside world.

Which numbers to chose? It is well known that only certain
numbers with remainders modulo 2^n are interesting. The number
can be seen on OEIS A076227. For example, modulo 2^10, only 64
numbers are "interesting", or 6.25% (but only eight bits would
be needed to store them because the last two are always one).
Modulo 2^16 would be 734 numbers, or 3.22%, so this would cut the
amount of numbers to be checked by half. One would use a small ROM,
I presume. The number of bits would be a tradeoff between area and
amount of compuation. My current guess would be that 2^10 could
be a sweet spot. One third of the numbers would further be
excluded by checking for divisiblity by three.

The problem is embarassingly parallel. One could chose a size so
each parcel is calculated in a few seconds or even minutes. to avoid
having to avoid a large amount of communication with the outside.

Because all units would be working all the time, and if one has
a smaller unit one could simply build more of them, I suspect that
power would be the main limiting factor. An adder design that
might not be very fast, but uses lower power, low voltage and low
frequency would be beneficial.

Comments?

I wonder how difficult it would be to implement a parallel tester in
an FPGA. It looks simple enough and there are hundreds of DSP blocks available to use. Could test in blocks of hundreds of numbers at a
time. Running at 100 MHz * 200 tests at a time = how long to get to
2^71?

I am not totally sure what exactly you are asking, but if you are
asking about how much time it takes at that rate to do 2**71 iterations
then the answer is ~3743 years.
But you probably already figured it out yourself.

Of course, brute force demonstration of correction of conjecture for all numbers in range 1 to 2**71 will take more steps than 2**71. And it
is unknown how much more.

I wonder if the calculation can be broken down further. Numbers are
all just defined by rules. I wonder if some calculus may apply. Limit
of function approaches one for a finite number of steps. There may be
a whole range of algorithms, is there one for the number two? Then
the number three?

Occasionally I have made useless machinery. Great puzzle work. Made
the game of life in hardware.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 15 15:46:27 2026

From Newsgroup: comp.arch

David Brown <david.brown@hesbynett.no> schrieb:

On 15/02/2026 09:58, Robert Finch wrote:

I wonder how difficult it would be to implement a parallel tester in an
FPGA. It looks simple enough and there are hundreds of DSP blocks
available to use. Could test in blocks of hundreds of numbers at a time.
Running at 100 MHz * 200 tests at a time = how long to get to 2^71?

One of the nice things about this is that you don't need DSP blocks - it
all works well with very simple operations well-suited to an FPGA. Your calculation blocks would hold an n-bit number (wider than the number you
are testing for, since rounds of the Collatz function can grow a lot)
and a counter. Initialise the block to the number you want to test.
For each round, check the LSB. If it is 0, right-shift your register.
If it is even, replace "x" with "(x << 1) + x". That's just some simple logic - no DSP blocks involved. If your register or your counter
overflow, you've found an "interesting" number and can pass that out to
a PC to test it fully.

It is even simpler than that - using the optimization of always
calculating (3*n+1)/2 is ideal for look-up tables. If you have
six-bit LUTs, you divide your number into five-bit blocks and
set up your LUTs for generating (3*n+1+carry_in)/2 - five LUTs
for five result bits. You then also need two LUTs for generating
the propagate and generate bits for the five-bit groups.

You can then generate group generate and propagate bits for three
five-bit groups with two LUTs (15 bits) and so on. Three levels
of P and G would give you 135 bits, more than enough.

The comparators could be more expensive; if this is done with LUTs,
you could only compare three bits each. Maybe this is better
with the DSP blocks, I don't know.

There's lots of scope for both high-level work (the housekeeping and tracking of operations for each of your worker elements) and low-level
work (getting the carry operations to work at high speed).

Yep.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 15 19:19:16 2026

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> schrieb:

"Unknown" is a correct theoretically answer, but I can provide a very educated guess that on average it would take ~500 steps and the worst
case would not exceed 2000 steps.
So, bad news are that overall it will take more than a million years on
your machine. And good news are that it'll take less than 2 million
years.

Not as bad, very many simplifications are possible.

For one, you can discount all even numbers as starting values.
You can go further: If you look at residues modulo 2^n of your starting numbers, there are only A076227(n) "interesting" remaiders that
you need to look at. If you take n=39, only 3611535862 cases
need to be looked at, so only 3611535862/2^39 of all cases need to
be looked at, a reduction of a factor around 150. You can also
do some more reductions (only 2/3 of those numbers need checking
if you can calculate the remainder by 3 fast). There are different
tradeoffs for different n, of course (a factor of 16 for n=10,
for example).

The main key is parallelization.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 15 19:42:47 2026

From Newsgroup: comp.arch

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

Just thinking about a way to replace ~N iterations by a MUL, ADD and SHR

Even with 128-bit numbers, I would only need two MULs, two ADD/ADC pairs
and a SHRD/SHR, so maybe 8-10 cycles?

https://link.springer.com/article/10.1007/s11227-025-07337-0 gives
an improved algorithm, which they are using for the supercomputer
version. They are doing:

n = n0
repeat
n = n + 1
alpha = ctz(n)
n = n * 3^alpha / 2^alpha
n = n - 1
beta = ctz(n)
n = n / 2^beta
until n < n_0

They are also playing some strange tricks with parallel solving that
are not described clearly, or I am not clever enough to understand
what they are doing. I would have to look at their code, I guess.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sun Feb 15 22:40:22 2026

From Newsgroup: comp.arch

MitchAlsup wrote:

Robert Finch <robfi680@gmail.com> posted:

On 2026-02-15 7:14 a.m., David Brown wrote:

On 15/02/2026 09:58, Robert Finch wrote:

I wonder how difficult it would be to implement a parallel tester in
an FPGA. It looks simple enough and there are hundreds of DSP blocks
available to use. Could test in blocks of hundreds of numbers at a time. >>>> Running at 100 MHz * 200 tests at a time = how long to get to 2^71?

One of the nice things about this is that you don't need DSP blocks - it

Yeah, I figured that out after coding. 3n+1 = n+n+n+1

3n+1 = (n<<1)+n+1

one add with carry in and n gets wired in directly and also shifted
up by 1 (no logic shifter).

Even better:

(3n+1)/2 = n + (n>>1) + 1

I.e two iterations for the cost of one.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21b-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sun Feb 15 22:59:19 2026

From Newsgroup: comp.arch

Thomas Koenig wrote:

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

Just thinking about a way to replace ~N iterations by a MUL, ADD and SHR

Even with 128-bit numbers, I would only need two MULs, two ADD/ADC pairs
and a SHRD/SHR, so maybe 8-10 cycles?

https://link.springer.com/article/10.1007/s11227-025-07337-0 gives
an improved algorithm, which they are using for the supercomputer
version. They are doing:

n = n0
repeat
n = n + 1
alpha = ctz(n)
n = n * 3^alpha / 2^alpha

Strange syntax...

ctz(n) must count trailing zeroes, right?
If so, then ctz(n+1) will return 1 or more for odd n, so that gives
3n+3, which after the division by 2 returns a value which is one too high.

n = n - 1

but this is corrected here!

The key must be that the multiplication by a power of three also works
for n with multiple trailing 1 bits.

The "x/2^alpha" part is of course just x>>alpha.

For an even n they still do a multiplication by 1 and a SHR 0, but the
code is branchless.

beta = ctz(n)
n = n / 2^beta
until n < n_0

I am guessing that the best way to do the n*3^alpha parts is a table of
powers of 3, so

let mut n = n0;
loop {
n += 1;
let alpha = ctz(n);
n = (n * power_of_three[alpha]) >> alpha;
n -= 1;
beta = ctz(n);
n >>= beta;
if n <= n_0 {break};
}

They are also playing some strange tricks with parallel solving that
are not described clearly, or I am not clever enough to understand
what they are doing. I would have to look at their code, I guess.

:-)

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21b-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Feb 15 22:41:40 2026

From Newsgroup: comp.arch

Robert Finch <robfi680@gmail.com> writes:

I wonder how difficult it would be to implement a parallel tester in an >FPGA. It looks simple enough and there are hundreds of DSP blocks
available to use. Could test in blocks of hundreds of numbers at a time. >Running at 100 MHz * 200 tests at a time = how long to get to 2^71?

OTOH, with a PC with 8 cores running at 4GHZ, even with scalar
operations you might outperform that. And with SIMD probably even
more. The main problem is that support for 128-bit integers is not so
great for scalar instructions and worse for SIMD instructions.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21b-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Sun Feb 15 21:53:10 2026

From Newsgroup: comp.arch

On 2026-02-15 4:59 p.m., Terje Mathisen wrote:

Thomas Koenig wrote:

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

Just thinking about a way to replace ~N iterations by a MUL, ADD and SHR >>>
Even with 128-bit numbers, I would only need two MULs, two ADD/ADC pairs >>> and a SHRD/SHR, so maybe 8-10 cycles?

https://link.springer.com/article/10.1007/s11227-025-07337-0 gives
an improved algorithm, which they are using for the supercomputer
version.-a They are doing:

n = n0
repeat
-a-a n = n + 1
-a-a alpha = ctz(n)
-a-a n = n * 3^alpha / 2^alpha

Strange syntax...

ctz(n) must count trailing zeroes, right?
If so, then ctz(n+1) will return 1 or more for odd n, so that gives
3n+3, which after the division by 2 returns a value which is one too high.

-a-a n = n - 1

but this is corrected here!

The key must be that the multiplication by a power of three also works
for n with multiple trailing 1 bits.

The "x/2^alpha" part is of course just x>>alpha.

For an even n they still do a multiplication by 1 and a SHR 0, but the
code is branchless.

-a-a beta = ctz(n)
-a-a n = n / 2^beta
until n < n_0

I am guessing that the best way to do the n*3^alpha parts is a table of powers of 3, so

let mut n = n0;
loop {
-a n += 1;
-a let alpha = ctz(n);
-a n = (n * power_of_three[alpha]) >> alpha;
-a n -= 1;
-a beta = ctz(n);
-a n >>= beta;
-a if n <= n_0 {break};
}

They are also playing some strange tricks with parallel solving that
are not described clearly, or I am not clever enough to understand
what they are doing. I would have to look at their code, I guess.

:-)

Terje

How many iterations are done in one loop? I think it would need to be at
least 10 on average to beat the simple approach. The simple approach
only taking one clock per iteration. The 128-bit multiply and 128-bit
count trailing zeros are slow in an FPGA.

And, can the approach consume a more limited number of iterations? For
example counting only the 6 LSB zeros?

I found a mistake in my code handling groups. It can only do about a
24x24=576 matrix of calculations in parallel @ 200MHz. So that's 1/4
million years?

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Mon Feb 16 12:31:31 2026

From Newsgroup: comp.arch

On Sun, 15 Feb 2026 19:19:16 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

"Unknown" is a correct theoretically answer, but I can provide a
very educated guess that on average it would take ~500 steps and
the worst case would not exceed 2000 steps.
So, bad news are that overall it will take more than a million
years on your machine. And good news are that it'll take less than
2 million years.

Not as bad, very many simplifications are possible.

For one, you can discount all even numbers as starting values.

O.k.

You can go further: If you look at residues modulo 2^n of your
starting numbers, there are only A076227(n) "interesting" remaiders
that you need to look at.

Can not say that I understood.

If you take n=39, only 3611535862 cases
need to be looked at, so only 3611535862/2^39 of all cases need to
be looked at, a reduction of a factor around 150. You can also
do some more reductions (only 2/3 of those numbers need checking
if you can calculate the remainder by 3 fast). There are different
tradeoffs for different n, of course (a factor of 16 for n=10,
for example).

The estimate above was done for loop that finishes at 1.
For brute-force disproof of Collatz conjecture it is unnecessary. We
can stop at n < n0. I think, you mentioned it in your original post,
but I didn't pay attention.
Due to early exit mean number of iteration is cut from ~500 to 7 or at
worst 8.

The main key is parallelization.

And here is a problem.
If one want to finish in, say, 1 month, one would need ALOT of
parallelization. Even after all improvements we are talking about
several millions of testing nodes. At such scale I can think about two
main problems:
1. Dispatcher become a bottleneck
2. Some node will fail. It has to be detected and their jobs
re-allocated. Which further complicates a scheduler.

It does not sound particularly hard if somebody pays for engineering,
esp. if the whole work done in one dedicated data center.
Otherwise it does not sound easy.

The worst part is that that there is nothing special about 2**71.
Or about 2**80 for that matter. Tim Reintch already said that better
than I can. The machine is not merely useless, it is pointless.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Mon Feb 16 12:40:54 2026

From Newsgroup: comp.arch

On Sun, 15 Feb 2026 21:53:10 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-15 4:59 p.m., Terje Mathisen wrote:

Thomas Koenig wrote:

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

Just thinking about a way to replace ~N iterations by a MUL, ADD
and SHR

Even with 128-bit numbers, I would only need two MULs, two
ADD/ADC pairs and a SHRD/SHR, so maybe 8-10 cycles?

https://link.springer.com/article/10.1007/s11227-025-07337-0 gives
an improved algorithm, which they are using for the supercomputer
version.a They are doing:

n = n0
repeat
aa n = n + 1
aa alpha = ctz(n)
aa n = n * 3^alpha / 2^alpha

Strange syntax...

ctz(n) must count trailing zeroes, right?
If so, then ctz(n+1) will return 1 or more for odd n, so that gives
3n+3, which after the division by 2 returns a value which is one
too high.

aa n = n - 1

but this is corrected here!

The key must be that the multiplication by a power of three also
works for n with multiple trailing 1 bits.

The "x/2^alpha" part is of course just x>>alpha.

For an even n they still do a multiplication by 1 and a SHR 0, but
the code is branchless.

aa beta = ctz(n)
aa n = n / 2^beta
until n < n_0

I am guessing that the best way to do the n*3^alpha parts is a
table of powers of 3, so

let mut n = n0;
loop {
a n += 1;
a let alpha = ctz(n);
a n = (n * power_of_three[alpha]) >> alpha;
a n -= 1;
a beta = ctz(n);
a n >>= beta;
a if n <= n_0 {break};
}

They are also playing some strange tricks with parallel solving
that are not described clearly, or I am not clever enough to
understand what they are doing. I would have to look at their
code, I guess.

:-)

Terje

How many iterations are done in one loop?

Worst case - few thousands, or less. I didn't yet encounter 1000.
Mean - 6 or 7. May be, 8, but that's unlikely.

I think it would need to be
at least 10 on average to beat the simple approach. The simple
approach only taking one clock per iteration. The 128-bit multiply
and 128-bit count trailing zeros are slow in an FPGA.

And, can the approach consume a more limited number of iterations?
For example counting only the 6 LSB zeros?

I found a mistake in my code handling groups. It can only do about a 24x24=576 matrix of calculations in parallel @ 200MHz. So that's 1/4
million years?

Much less.
My 1st estimate was for more stupid loop that always counts down to 1.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Mon Feb 16 16:08:35 2026

From Newsgroup: comp.arch

Robert Finch wrote:

On 2026-02-15 4:59 p.m., Terje Mathisen wrote:

Thomas Koenig wrote:

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

Just thinking about a way to replace ~N iterations by a MUL, ADD and
SHR

Even with 128-bit numbers, I would only need two MULs, two ADD/ADC
pairs
and a SHRD/SHR, so maybe 8-10 cycles?

https://link.springer.com/article/10.1007/s11227-025-07337-0 gives
an improved algorithm, which they are using for the supercomputer
version.|e-a They are doing:

n = n0
repeat
|e-a|e-a n = n + 1
|e-a|e-a alpha = ctz(n)
|e-a|e-a n = n * 3^alpha / 2^alpha

Strange syntax...

ctz(n) must count trailing zeroes, right?
If so, then ctz(n+1) will return 1 or more for odd n, so that gives
3n+3, which after the division by 2 returns a value which is one too
high.

|e-a|e-a n = n - 1

but this is corrected here!

The key must be that the multiplication by a power of three also works
for n with multiple trailing 1 bits.

The "x/2^alpha" part is of course just x>>alpha.

For an even n they still do a multiplication by 1 and a SHR 0, but the
code is branchless.

|e-a|e-a beta = ctz(n)
|e-a|e-a n = n / 2^beta
until n < n_0

I am guessing that the best way to do the n*3^alpha parts is a table
of powers of 3, so

let mut n = n0;
loop {
-a|e-a n += 1;
-a|e-a let alpha = ctz(n);
-a|e-a n = (n * power_of_three[alpha]) >> alpha;
-a|e-a n -= 1;
-a|e-a beta = ctz(n);
-a|e-a n >>= beta;
-a|e-a if n <= n_0 {break};
}

They are also playing some strange tricks with parallel solving that
are not described clearly, or I am not clever enough to understand
what they are doing. I would have to look at their code, I guess.

:-)

Terje

How many iterations are done in one loop? I think it would need to be at least 10 on average to beat the simple approach. The simple approach
only taking one clock per iteration. The 128-bit multiply and 128-bit
count trailing zeros are slow in an FPGA.

Right.
The first part will handle trailing strings of 1's, while the second
part does the same for all trailing zeroes after that initial mul_by_power_of_three.

And, can the approach consume a more limited number of iterations? For > example counting only the 6 LSB zeros?

It should work directly with a max(ctz,N) wrapper.

I found a mistake in my code handling groups. It can only do about a 24x24=576 matrix of calculations in parallel @ 200MHz. So that's 1/4 > million years?

FPGAs typically run at much, much lower frequencies than modern CPUs, so if the code above can average 3-5 iterations per loop, and do it in
around 10 clock cycles, then the FPGA will be hard pressed to keep up?
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21b-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Mon Feb 16 07:50:49 2026

From Newsgroup: comp.arch

On 2/14/2026 1:57 AM, Thomas Koenig wrote:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

big snip

The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.

Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.

Hold these thoughts

snip

The problem is embarassingly parallel.

Sort of. You could do it that way, but you would be giving up some performance. Specifically, as you say, you want to check the number
against the previous input value and the preset maximum. But in a
parallel implementation, you want to check against the largest value
checked *across all the parallel threads*. In order to do that, the
threads need to communicate, thus violating the "embarrassingly
parallel" rule. I don't know how much performance you would give up so
it may not make much difference, nor do I know the cost of keeping some "global" largest value checked so far. There is a tradeoff here.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Mon Feb 16 18:31:46 2026

From Newsgroup: comp.arch

On Mon, 16 Feb 2026 07:50:49 -0800
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> wrote:

On 2/14/2026 1:57 AM, Thomas Koenig wrote:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture. Graphics cards have sped this
up enormously.

big snip

The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.

Also, each cycle, the result from the previous round would be
compared against the input value. If it is lower, the next number
would be chosen. Finally, the number reached would be checked
against the previous, pre-set maximum, with would also signal to
the outside world.

Hold these thoughts

snip

The problem is embarassingly parallel.

Sort of. You could do it that way, but you would be giving up some performance. Specifically, as you say, you want to check the number
against the previous input value and the preset maximum. But in a
parallel implementation, you want to check against the largest value
checked *across all the parallel threads*. In order to do that, the
threads need to communicate, thus violating the "embarrassingly
parallel" rule. I don't know how much performance you would give up
so it may not make much difference,

If initial values N0 are above 2**64 and the number of tests NP that run
in parallel is few millions or even few hundreds of millions then
performance gain achieved by comparison vs N0+NP instead of comparison
vs N0 is negligible.

nor do I know the cost of keeping
some "global" largest value checked so far. There is a tradeoff here.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Mon Feb 16 20:34:44 2026

From Newsgroup: comp.arch

On Mon, 16 Feb 2026 12:40:54 +0200
Michael S <already5chosen@yahoo.com> wrote:

On Sun, 15 Feb 2026 21:53:10 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-15 4:59 p.m., Terje Mathisen wrote:

Thomas Koenig wrote:

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

Just thinking about a way to replace ~N iterations by a MUL, ADD
and SHR

Even with 128-bit numbers, I would only need two MULs, two
ADD/ADC pairs and a SHRD/SHR, so maybe 8-10 cycles?

https://link.springer.com/article/10.1007/s11227-025-07337-0
gives an improved algorithm, which they are using for the
supercomputer version.a They are doing:

n = n0
repeat
aa n = n + 1
aa alpha = ctz(n)
aa n = n * 3^alpha / 2^alpha

Strange syntax...

ctz(n) must count trailing zeroes, right?
If so, then ctz(n+1) will return 1 or more for odd n, so that
gives 3n+3, which after the division by 2 returns a value which
is one too high.

aa n = n - 1

but this is corrected here!

The key must be that the multiplication by a power of three also
works for n with multiple trailing 1 bits.

The "x/2^alpha" part is of course just x>>alpha.

For an even n they still do a multiplication by 1 and a SHR 0, but
the code is branchless.

aa beta = ctz(n)
aa n = n / 2^beta
until n < n_0

I am guessing that the best way to do the n*3^alpha parts is a
table of powers of 3, so

let mut n = n0;
loop {
a n += 1;
a let alpha = ctz(n);
a n = (n * power_of_three[alpha]) >> alpha;
a n -= 1;
a beta = ctz(n);
a n >>= beta;
a if n <= n_0 {break};
}

They are also playing some strange tricks with parallel solving
that are not described clearly, or I am not clever enough to
understand what they are doing. I would have to look at their
code, I guess.

:-)

Terje

How many iterations are done in one loop?

Worst case - few thousands, or less. I didn't yet encounter 1000.
Mean - 6 or 7. May be, 8, but that's unlikely.

The above mean value is for completely non-filtered inputs.
If we apply a very basic filtering, only looking at inputs in form 4*x+3
then an average # of steps is bigger - somewhere in range [15:20].
So, for filtered inputs higher-order steps start to make sense.
Or not. It strongly depends on specific FPGA model.

I think it would need to be
at least 10 on average to beat the simple approach. The simple
approach only taking one clock per iteration. The 128-bit multiply
and 128-bit count trailing zeros are slow in an FPGA.

And, can the approach consume a more limited number of iterations?
For example counting only the 6 LSB zeros?

I found a mistake in my code handling groups. It can only do about
a 24x24=576 matrix of calculations in parallel @ 200MHz. So that's
1/4 million years?

Much less.
My 1st estimate was for more stupid loop that always counts down to 1.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Tue Feb 17 15:41:23 2026

From Newsgroup: comp.arch

On Mon, 16 Feb 2026 12:31:31 +0200
Michael S <already5chosen@yahoo.com> wrote:

On Sun, 15 Feb 2026 19:19:16 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

"Unknown" is a correct theoretically answer, but I can provide a
very educated guess that on average it would take ~500 steps and
the worst case would not exceed 2000 steps.
So, bad news are that overall it will take more than a million
years on your machine. And good news are that it'll take less than
2 million years.

Not as bad, very many simplifications are possible.

For one, you can discount all even numbers as starting values.

O.k.

You can go further: If you look at residues modulo 2^n of your
starting numbers, there are only A076227(n) "interesting" remaiders
that you need to look at.

Can not say that I understood.

If you take n=39, only 3611535862 cases
need to be looked at, so only 3611535862/2^39 of all cases need to
be looked at, a reduction of a factor around 150. You can also
do some more reductions (only 2/3 of those numbers need checking
if you can calculate the remainder by 3 fast). There are different tradeoffs for different n, of course (a factor of 16 for n=10,
for example).

Hopefully now I figured out what you are talking about, except of 2/3
part and apart from the meaning of A076227(n).
I experimented a little with these ideas and it seems that reduction in
number of steps is nowhere near the reduction in number of tested cases.
My test vectors were 2**32 consecutive points starting at 2**52 or
2**53. I tried different modulo filters in range from 2**5 to 2**31.
Two modes of filtering were used.
In "simple" mode I just filtered out "non-interesting" inputs, but
did nothing special about "interesting" inputs.
In "advanced" mode I accelerated processing of "interesting" inputs by
means of starting their testing from point x0= (x/M)*A[x%M]+B[x%M],
where tables A[] and B[] were pre-calculated during filter build step. Obviously, "advanced" mode filtering requires ~3 times more memory plus multiplier + plus some logic.

Here are my results:
$ ./nsteps2 4503599627370496 0x100000000
Initial value = 4503599627370496 (0010000000000000).
#values = 4294967296 (0000000100000000)
Unfiltered : 22501638882 steps. 4294967296 tests. 5.239 steps/iter
n*4+3 : 17132929762 steps. 1073741824 tests. 15.956 steps/iter
n*2**5+m : 15924970210 steps. 939524096 tests. 16.950 steps/iter
n*2**5+m * : 8408777442 steps. 939524096 tests. 8.950 steps/iter
n*2**10+m : 11948770018 steps. 503316480 tests. 23.740 steps/iter
n*2**10+m * : 7088288929 steps. 503316480 tests. 14.083 steps/iter
n*2**15+m : 9003713250 steps. 289013760 tests. 31.153 steps/iter
n*2**15+m * : 4818894105 steps. 289013760 tests. 16.674 steps/iter
n*2**20+m : 6973854434 steps. 181379072 tests. 38.449 steps/iter
n*2**20+m * : 3496505674 steps. 181379072 tests. 19.277 steps/iter
n*2**25+m : 5574414306 steps. 125809408 tests. 44.308 steps/iter
n*2**25+m * : 2543845543 steps. 125809408 tests. 20.220 steps/iter
n*2**29+m : 4660796778 steps. 95710352 tests. 48.697 steps/iter
n*2**29+m * : 2008917951 steps. 95710352 tests. 20.990 steps/iter
n*2**30+m : 4493358006 steps. 89757100 tests. 50.061 steps/iter
n*2**30+m * : 1915335033 steps. 89757100 tests. 21.339 steps/iter
n*2**31+m : 4212222570 steps. 81419608 tests. 51.735 steps/iter
n*2**31+m * : 1780554493 steps. 81419608 tests. 21.869 steps/iter

$ ./nsteps2 0x20000000000000 0x100000000
Initial value = 9007199254740992 (0020000000000000).
#values = 4294967296 (0000000100000000)
Unfiltered : 22501540484 steps. 4294967296 tests. 5.239 steps/iter
n*4+3 : 17132831364 steps. 1073741824 tests. 15.956 steps/iter
n*2**5+m : 15924871812 steps. 939524096 tests. 16.950 steps/iter
n*2**5+m * : 8408679044 steps. 939524096 tests. 8.950 steps/iter
n*2**10+m : 11948671620 steps. 503316480 tests. 23.740 steps/iter
n*2**10+m * : 7088271089 steps. 503316480 tests. 14.083 steps/iter
n*2**15+m : 9003614852 steps. 289013760 tests. 31.153 steps/iter
n*2**15+m * : 4818766141 steps. 289013760 tests. 16.673 steps/iter
n*2**20+m : 6973756036 steps. 181379072 tests. 38.449 steps/iter
n*2**20+m * : 3496390679 steps. 181379072 tests. 19.277 steps/iter
n*2**25+m : 5574315908 steps. 125809408 tests. 44.308 steps/iter
n*2**25+m * : 2543741535 steps. 125809408 tests. 20.219 steps/iter
n*2**29+m : 4660698380 steps. 95710352 tests. 48.696 steps/iter
n*2**29+m * : 2008935493 steps. 95710352 tests. 20.990 steps/iter
n*2**30+m : 4493259608 steps. 89757100 tests. 50.060 steps/iter
n*2**30+m * : 1915241920 steps. 89757100 tests. 21.338 steps/iter
n*2**31+m : 4212124172 steps. 81419608 tests. 51.734 steps/iter
n*2**31+m * : 1780581916 steps. 81419608 tests. 21.869 steps/iter
'Advanced' mode marked with '*'

As you can see, 2**10 filter reduces # of processed points by factor of
8.5, but number of processing steps reduced only 1.9x in simple mode
and 3.2x in advanced mode.
For 2**20 filter ratios are, respectively, 23.7, 3.2 and 6.4
For 2**31 filter, 52.7, 5.3 and 12.6

Pay attention that for implementation in FPGA even filtering with
modulo 2**29 (~12M entries) is impractical. Assuming advanced mode, 12M
entries occupy 1000 Mbits. In non-expensive FPGA it simply does not
fit. In very expensive FPGA it fits, but barely.

Let's take, for example, Altera Stratix-10 GX 2800. By the standard of
previous decade, it is a real behemoth with 2.75M Logic Elements (LEs). Assuming that each tester unit occupies 500 LEs there is room for 5500
units. Looking into my tables above, you want 1 filter per ~20 tester
units, so 275 filters needed. And how many available? Just two!

For modulo 2**20 it's a little better. Each filter consists of ~45,000
entries and occupies ~2.7 Mbits. So you can fit 80+ filters in GX 2800.
But you want 5500 / 19 = 290, so still unbalanced.

For modulo 2**15 (2205 entries, ~100 Kbits) there is enough memory to
fit desired number of replica. But small filter like that is
significantly less effective, reducing # of steps (in advanced mode)
only by 4.7x.

It is possible that there exist smarter algorithms for partition of
work between parallel tester units that can greatly reduce need for
access to filtering tables relatively to simple scheme that I assumed
in above estimates. But I think that they point stands - this sort of
filtering is not a silver bullet. It can improve throughput, but it's
unlikely that it can improve it by more than one order of magnitude.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Tue Feb 17 18:17:10 2026

From Newsgroup: comp.arch

On Tue, 17 Feb 2026 15:41:23 +0200
Michael S <already5chosen@yahoo.com> wrote:

On Mon, 16 Feb 2026 12:31:31 +0200
Michael S <already5chosen@yahoo.com> wrote:

On Sun, 15 Feb 2026 19:19:16 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

"Unknown" is a correct theoretically answer, but I can provide a
very educated guess that on average it would take ~500 steps and
the worst case would not exceed 2000 steps.
So, bad news are that overall it will take more than a million
years on your machine. And good news are that it'll take less
than 2 million years.

Not as bad, very many simplifications are possible.

For one, you can discount all even numbers as starting values.

O.k.

You can go further: If you look at residues modulo 2^n of your
starting numbers, there are only A076227(n) "interesting"
remaiders that you need to look at.

Can not say that I understood.

If you take n=39, only 3611535862 cases
need to be looked at, so only 3611535862/2^39 of all cases need to
be looked at, a reduction of a factor around 150. You can also
do some more reductions (only 2/3 of those numbers need checking
if you can calculate the remainder by 3 fast). There are
different tradeoffs for different n, of course (a factor of 16
for n=10, for example).

Hopefully now I figured out what you are talking about, except of 2/3
part and apart from the meaning of A076227(n).
I experimented a little with these ideas and it seems that reduction
in number of steps is nowhere near the reduction in number of tested
cases. My test vectors were 2**32 consecutive points starting at
2**52 or 2**53. I tried different modulo filters in range from 2**5
to 2**31. Two modes of filtering were used.
In "simple" mode I just filtered out "non-interesting" inputs, but
did nothing special about "interesting" inputs.
In "advanced" mode I accelerated processing of "interesting" inputs by
means of starting their testing from point x0= (x/M)*A[x%M]+B[x%M],
where tables A[] and B[] were pre-calculated during filter build step. Obviously, "advanced" mode filtering requires ~3 times more memory
plus multiplier + plus some logic.

Here are my results:
$ ./nsteps2 4503599627370496 0x100000000
Initial value = 4503599627370496 (0010000000000000).
#values = 4294967296 (0000000100000000)
Unfiltered : 22501638882 steps. 4294967296 tests. 5.239 steps/iter
n*4+3 : 17132929762 steps. 1073741824 tests. 15.956 steps/iter
n*2**5+m : 15924970210 steps. 939524096 tests. 16.950 steps/iter
n*2**5+m * : 8408777442 steps. 939524096 tests. 8.950 steps/iter n*2**10+m : 11948770018 steps. 503316480 tests. 23.740 steps/iter n*2**10+m * : 7088288929 steps. 503316480 tests. 14.083 steps/iter n*2**15+m : 9003713250 steps. 289013760 tests. 31.153 steps/iter n*2**15+m * : 4818894105 steps. 289013760 tests. 16.674 steps/iter n*2**20+m : 6973854434 steps. 181379072 tests. 38.449 steps/iter n*2**20+m * : 3496505674 steps. 181379072 tests. 19.277 steps/iter n*2**25+m : 5574414306 steps. 125809408 tests. 44.308 steps/iter n*2**25+m * : 2543845543 steps. 125809408 tests. 20.220 steps/iter n*2**29+m : 4660796778 steps. 95710352 tests. 48.697 steps/iter n*2**29+m * : 2008917951 steps. 95710352 tests. 20.990 steps/iter n*2**30+m : 4493358006 steps. 89757100 tests. 50.061 steps/iter n*2**30+m * : 1915335033 steps. 89757100 tests. 21.339 steps/iter n*2**31+m : 4212222570 steps. 81419608 tests. 51.735 steps/iter n*2**31+m * : 1780554493 steps. 81419608 tests. 21.869 steps/iter

$ ./nsteps2 0x20000000000000 0x100000000
Initial value = 9007199254740992 (0020000000000000).
#values = 4294967296 (0000000100000000)
Unfiltered : 22501540484 steps. 4294967296 tests. 5.239 steps/iter
n*4+3 : 17132831364 steps. 1073741824 tests. 15.956 steps/iter
n*2**5+m : 15924871812 steps. 939524096 tests. 16.950 steps/iter
n*2**5+m * : 8408679044 steps. 939524096 tests. 8.950 steps/iter n*2**10+m : 11948671620 steps. 503316480 tests. 23.740 steps/iter n*2**10+m * : 7088271089 steps. 503316480 tests. 14.083 steps/iter n*2**15+m : 9003614852 steps. 289013760 tests. 31.153 steps/iter n*2**15+m * : 4818766141 steps. 289013760 tests. 16.673 steps/iter n*2**20+m : 6973756036 steps. 181379072 tests. 38.449 steps/iter n*2**20+m * : 3496390679 steps. 181379072 tests. 19.277 steps/iter n*2**25+m : 5574315908 steps. 125809408 tests. 44.308 steps/iter n*2**25+m * : 2543741535 steps. 125809408 tests. 20.219 steps/iter n*2**29+m : 4660698380 steps. 95710352 tests. 48.696 steps/iter n*2**29+m * : 2008935493 steps. 95710352 tests. 20.990 steps/iter n*2**30+m : 4493259608 steps. 89757100 tests. 50.060 steps/iter n*2**30+m * : 1915241920 steps. 89757100 tests. 21.338 steps/iter n*2**31+m : 4212124172 steps. 81419608 tests. 51.734 steps/iter n*2**31+m * : 1780581916 steps. 81419608 tests. 21.869 steps/iter 'Advanced' mode marked with '*'

As you can see, 2**10 filter reduces # of processed points by factor
of 8.5, but number of processing steps reduced only 1.9x in simple
mode and 3.2x in advanced mode.
For 2**20 filter ratios are, respectively, 23.7, 3.2 and 6.4
For 2**31 filter, 52.7, 5.3 and 12.6

There was bug in my testing code. In reality filtering is somewhat more efficient. Here are corrected results:

Initial value = 4503599627370496 (0010000000000000).
#values = 4294967296 (0000000100000000)
Unfiltered : 22501638882 steps. 4294967296 tests. 5.239 steps/iter
n*4+3 : 17132929762 steps. 1073741824 tests. 15.956 steps/iter
n*2**5+m : 13374833378 steps. 536870912 tests. 24.913 steps/iter
n*2**5+m * : 8408777442 steps. 536870912 tests. 15.663 steps/iter
n*2**10+m : 9935504098 steps. 268435456 tests. 37.013 steps/iter
n*2**10+m * : 5187551970 steps. 268435456 tests. 19.325 steps/iter
n*2**15+m : 7857750754 steps. 169738240 tests. 46.293 steps/iter
n*2**15+m * : 3449537250 steps. 169738240 tests. 20.323 steps/iter
n*2**20+m : 6242468578 steps. 111935488 tests. 55.768 steps/iter
n*2**20+m * : 2410906338 steps. 111935488 tests. 21.538 steps/iter
n*2**25+m : 4838650338 steps. 73364736 tests. 65.953 steps/iter
n*2**25+m * : 1716934242 steps. 73364736 tests. 23.403 steps/iter
n*2**29+m : 3856366938 steps. 51085096 tests. 75.489 steps/iter
n*2**29+m * : 1338119106 steps. 51085096 tests. 26.194 steps/iter
n*2**30+m : 3856366938 steps. 51085096 tests. 75.489 steps/iter
n*2**30+m * : 1261491462 steps. 51085096 tests. 24.694 steps/iter
n*2**31+m : 3666319938 steps. 47284156 tests. 77.538 steps/iter
n*2**31+m * : 1184863818 steps. 47284156 tests. 25.058 steps/iter

Initial value = 9007199254740992 (0020000000000000).
#values = 4294967296 (0000000100000000)
Unfiltered : 22501540484 steps. 4294967296 tests. 5.239 steps/iter
n*4+3 : 17132831364 steps. 1073741824 tests. 15.956 steps/iter
n*2**5+m : 13374734980 steps. 536870912 tests. 24.912 steps/iter
n*2**5+m * : 8408679044 steps. 536870912 tests. 15.662 steps/iter
n*2**10+m : 9935405700 steps. 268435456 tests. 37.012 steps/iter
n*2**10+m * : 5187453572 steps. 268435456 tests. 19.325 steps/iter
n*2**15+m : 7857652356 steps. 169738240 tests. 46.293 steps/iter
n*2**15+m * : 3449438852 steps. 169738240 tests. 20.322 steps/iter
n*2**20+m : 6242370180 steps. 111935488 tests. 55.768 steps/iter
n*2**20+m * : 2410807940 steps. 111935488 tests. 21.537 steps/iter
n*2**25+m : 4838551940 steps. 73364736 tests. 65.952 steps/iter
n*2**25+m * : 1716835844 steps. 73364736 tests. 23.401 steps/iter
n*2**29+m : 3856268540 steps. 51085096 tests. 75.487 steps/iter
n*2**29+m * : 1338020708 steps. 51085096 tests. 26.192 steps/iter
n*2**30+m : 3856268540 steps. 51085096 tests. 75.487 steps/iter
n*2**30+m * : 1261393064 steps. 51085096 tests. 24.692 steps/iter
n*2**31+m : 3666221540 steps. 47284156 tests. 77.536 steps/iter
n*2**31+m * : 1184765420 steps. 47284156 tests. 25.056 steps/iter

2**10 filter reduces # of processed points by factor of 16.0, but
number of processing steps reduced only 2.7x in simple mode and 4.3x
in advanced mode.
For 2**20 filter ratios are, respectively, 38.4, 3.6 and 9.3
For 2**31 filter: 90.8, 6.1 and 19.0

Pay attention that for implementation in FPGA even filtering with
modulo 2**29 (~12M entries) is impractical. Assuming advanced mode,
12M entries occupy 1000 Mbits. In non-expensive FPGA it simply does
not fit. In very expensive FPGA it fits, but barely.

Let's take, for example, Altera Stratix-10 GX 2800. By the standard of previous decade, it is a real behemoth with 2.75M Logic Elements
(LEs). Assuming that each tester unit occupies 500 LEs there is room
for 5500 units. Looking into my tables above, you want 1 filter per
~20 tester units, so 275 filters needed. And how many available? Just
two!

And here I made even bigger mistake. The size of internal memory
of GX 2800 is 229 Mbit. So 2**29 filter does not fit at all.

For modulo 2**20 it's a little better. Each filter consists of ~45,000 entries and occupies ~2.7 Mbits. So you can fit 80+ filters in GX
2800. But you want 5500 / 19 = 290, so still unbalanced.

Here correct numbers are as folowing:
Each filter consists of 27,328 entries and occupies ~1.7 Mbits. So you
can fit 130+ filters in GX 2800. But you want 5500 / 19 = 290, so still unbalanced.

For modulo 2**15 (2205 entries, ~100 Kbits) there is enough memory to
fit desired number of replica. But small filter like that is
significantly less effective, reducing # of steps (in advanced mode)
only by 4.7x.

Here it should be:
For modulo 2**15 (1295 entries, ~60 Kbits) there is enough memory to
fit desired number of replica. But small filter like that is
significantly less effective, reducing # of steps (in advanced mode)
only by 6.5x.

It is possible that there exist smarter algorithms for partition of
work between parallel tester units that can greatly reduce need for
access to filtering tables relatively to simple scheme that I assumed
in above estimates. But I think that they point stands - this sort of filtering is not a silver bullet. It can improve throughput, but it's unlikely that it can improve it by more than one order of magnitude.

Despite corrected mistakes my conclusions are the same.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Wed Feb 18 18:22:51 2026

From Newsgroup: comp.arch

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the number
would take a large number of iterations though.

I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer to
build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler that you
are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and stop.
Your test unit has to provide supervisor circuity with the mean to
fetch the initial value that caused suspected failure and to clear
alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently pipelined.
But it should be able to monitor ready signals of all test units and to
supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.

3. Supervisor
Again, in proof-of-concept design the supervisor does not have to be
very realistic. But it should be able to selectively exercise all core
feature related to alert, including fetching of initial data and
clearing of alert.

After you coded everything correctly, do not even try to compile test
with 8000 units. That is totally unrealistic. Even if you find huge
FPGA in which such number of testers could fit (and I am not sure that
such FPGA exists) the compilation will take many hours, possibly days.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Wed Feb 18 21:56:42 2026

From Newsgroup: comp.arch

Stephen Fuld wrote:

On 2/14/2026 1:57 AM, Thomas Koenig wrote:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture.-a Graphics cards have sped this
up enormously.

big snip

The number of iterations would be tracked; if larger than a prevous
record, the calculation would terminate with a corresponding signal
to the outside world.

Also, each cycle, the result from the previous round would be
compared against the input value.-a If it is lower, the next number
would be chosen.-a Finally, the number reached would be checked against
the previous, pre-set maximum, with would also signal to the outside
world.

Hold these thoughts

snip

The problem is embarassingly parallel.

Sort of.-a You could do it that way, but you would be giving up some performance.-a Specifically, as you say, you want to check the number against the previous input value and the preset maximum.-a But in a
parallel implementation, you want to check against the largest value
checked *across all the parallel threads*.-a In order to do that, the threads need to communicate, thus violating the "embarrassingly
parallel" rule.-a I don't know how much performance you would give up so
it may not make much difference, nor do I know the cost of keeping some "global" largest value checked so far.-a There is a tradeoff here.

KISS.
In this case that means "don't care". It is perfectly OK to only update
the "global max" each time a thread have finished a large block of
candidates, so just checking against the starting value is fine.
Assuming we also want to track the largest seen intermediate value, that could also be updated the same way.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21b-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Wed Feb 18 22:56:19 2026

From Newsgroup: comp.arch

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the number
would take a large number of iterations though.

I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer to
build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler that you
are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and stop.
Your test unit has to provide supervisor circuity with the mean to
fetch the initial value that caused suspected failure and to clear
alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were not.
Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently pipelined.
But it should be able to monitor ready signals of all test units and to supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of done
signals that are gated together. Meaning the slowest calculation of the
group will slow everything else down. Once all the dones are true the
next set of numbers are loaded.

3. Supervisor
Again, in proof-of-concept design the supervisor does not have to be
very realistic. But it should be able to selectively exercise all core feature related to alert, including fetching of initial data and
clearing of alert.

After you coded everything correctly, do not even try to compile test
with 8000 units. That is totally unrealistic. Even if you find huge
FPGA in which such number of testers could fit (and I am not sure that
such FPGA exists) the compilation will take many hours, possibly days.

Yeah 8,000 was definitely out of the picture. I had since coded 24x24
matrix (576) and although the logic would fit that turned out not to be
able to route. So, it is down to 20x20. 20x20 with no real scheduling fits.

I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use that
to manage the scheduling with some custom logic.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Thu Feb 19 12:10:41 2026

From Newsgroup: comp.arch

On Wed, 18 Feb 2026 22:56:19 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the
number would take a large number of iterations though.

I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer
to build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler that
you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were not.
Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different units. Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of done
signals that are gated together. Meaning the slowest calculation of
the group will slow everything else down. Once all the dones are true
the next set of numbers are loaded.

3. Supervisor
Again, in proof-of-concept design the supervisor does not have to be
very realistic. But it should be able to selectively exercise all
core feature related to alert, including fetching of initial data
and clearing of alert.

After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you find
huge FPGA in which such number of testers could fit (and I am not
sure that such FPGA exists) the compilation will take many hours,
possibly days.

Yeah 8,000 was definitely out of the picture. I had since coded 24x24
matrix (576) and although the logic would fit that turned out not to
be able to route. So, it is down to 20x20. 20x20 with no real
scheduling fits.

I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.

Does 20x20 means 400 cores with something like 80-bit initial values
and 128-bit processing?
If you didn't make further mistakes then it should be rather big, order
of 300K LEs. It would not fit in the biggest low-end Altera device
(Cyclone 10GX 10CX220).
On the Xilinx side, it would not fit in any Spartan and in most Artix
devices. May be, it would fit in the biggest Artix UltraScale+ (AU25P).
But more realistically for 400 test units one would want Arria 10 on
Altera side or Kintex on Xilinx side. Both are not prohibitively
expensive, but not cheap.

What device are you using?
If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Thu Feb 19 11:51:03 2026

From Newsgroup: comp.arch

Robert Finch wrote:

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the number
would take a large number of iterations though.

I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer to
build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
-a That is successful ending. Here you signal to the scheduler that you
-a are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and stop.
Your test unit has to provide supervisor circuity with the mean to
fetch the initial value that caused suspected failure and to clear
alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were not.
Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently pipelined.
But it should be able to monitor ready signals of all test units and to
supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of done
signals that are gated together. Meaning the slowest calculation of the group will slow everything else down. Once all the dones are true the
next set of numbers are loaded.

The obvious choice is to have a work-to-do list, each work item
containing a start-end set of numbers to test.
Each worker thread would simply finish its current task, store the/any results, and then ask for the next available task.
A single producer - multiple consumers queue like this is very simple to get right with minimum admins: Just a single LOCK XADD to update the
queue back index, grabbing the corresponding task.
I used this exact setup for my wire speed multi-threaded ntpd deamon
about two decades ago. (I did need a few more XADD instructions to
handle queue full/empty situations which are not applicable for the
current problem.)
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21b-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Thu Feb 19 15:27:07 2026

From Newsgroup: comp.arch

I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems clear
that constant-factor improvements are completely irrelevant (even with
large constants).

I guess it all boils down to the "useless" part of the title?

=== Stefan
--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Thu Feb 19 23:12:41 2026

From Newsgroup: comp.arch

On Thu, 19 Feb 2026 11:51:03 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:

Robert Finch wrote:

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200 MHz.
Some of the number would take a large number of iterations though.

I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer
to build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
a That is successful ending. Here you signal to the scheduler that
you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.

The obvious choice is to have a work-to-do list, each work item
containing a start-end set of numbers to test.

Each worker thread would simply finish its current task, store
the/any results, and then ask for the next available task.

A single producer - multiple consumers queue like this is very simple
to get right with minimum admins: Just a single LOCK XADD to update
the queue back index, grabbing the corresponding task.

There is no XADD in FPGA, just gate, multipliers and memory blocks.

I used this exact setup for my wire speed multi-threaded ntpd deamon
about two decades ago. (I did need a few more XADD instructions to
handle queue full/empty situations which are not applicable for the
current problem.)

Terje

Did your ntpd deamon scale to 300-400 threads?
300 is the # of test units that fits in FPGAs that appears to be the
most cost-effective for this sort of job, e.g. 10CX220YU484E5G.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Feb 20 08:01:47 2026

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> schrieb:

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler that you
are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.

Looking at the links in https://oeis.org/A006878 , anything above
2442 would be a new record, so 8192 would be plenty.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Fri Feb 20 11:41:53 2026

From Newsgroup: comp.arch

On Thu, 19 Feb 2026 15:27:07 -0500
Stefan Monnier <monnier@iro.umontreal.ca> wrote:

I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems
clear that constant-factor improvements are completely irrelevant
(even with large constants).

I guess it all boils down to the "useless" part of the title?

=== Stefan

As I wrote in earlier post, "useless" sounds too optimistic.
"Useless" in my book is proving or disproving 99% of conjectures known
in the numbers theory, both obscure and extremely famous (Fermat's
Last Theorem). And then 1% process ends up very useful.

But Collatz's conjecture believed to be true. Trying to disprove it by experiments is futile.
Finding that all sequence with initial value in range [2**71:2**72]
converge to 1, which is by far the most likely outcome of experiment,
leaves us in exactly the same place where we started, i.e. conjecture
believed to be true, but unproven. That's not good enough to warrant a
title of "useless". "Pointless" sounds more proper, but may be we shall
find something more derogatory.

However, it's fun.
My current estimate, after considering how "smart" filtering by modulo
2**20 can be made practical, is that a single FPGA that costs 300 USD
would be able to test 2**71 values in ~500 years. Which means that the
company I am working for can be convinced to build gear capable of
doing calculations in 1 year for something like 1M USD or a little more.
I'd guess that smaller and hungrier engineering firms would accept the
job for 1/3rd of that sum.
So, who has a pocket money to spend?

--- Synchronet 3.21b-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Fri Feb 20 15:11:26 2026

From Newsgroup: comp.arch

Michael S wrote:

On Thu, 19 Feb 2026 11:51:03 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:

Robert Finch wrote:

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200 MHz.
Some of the number would take a large number of iterations though.

I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer
to build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
-a That is successful ending. Here you signal to the scheduler that
you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks>>>> like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.

The obvious choice is to have a work-to-do list, each work item
containing a start-end set of numbers to test.

Each worker thread would simply finish its current task, store
the/any results, and then ask for the next available task.

A single producer - multiple consumers queue like this is very simple
to get right with minimum admins: Just a single LOCK XADD to update
the queue back index, grabbing the corresponding task.

There is no XADD in FPGA, just gate, multipliers and memory blocks.

Oops! :-)

I used this exact setup for my wire speed multi-threaded ntpd deamon
about two decades ago. (I did need a few more XADD instructions to
handle queue full/empty situations which are not applicable for the
current problem.)

Did your ntpd deamon scale to 300-400 threads?
300 is the # of test units that fits in FPGAs that appears to be the
most cost-effective for this sort of job, e.g. 10CX220YU484E5G.

Scaling does not depend on the number of threads, only on the frequency
of pulls from the queue: My code was very latency sensitive so I could
never batch up multiple requests, while this particular
"Gedankenexperiment" can very easily use arbitrarily large work lists,
right?
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21b-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Sat Feb 21 01:56:03 2026

From Newsgroup: comp.arch

On 2026-02-19 5:10 a.m., Michael S wrote:

On Wed, 18 Feb 2026 22:56:19 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number simultaneously.
It is supposed to be able to run at over 200 MHz. Some of the
number would take a large number of iterations though.

I found something strange, the code to count for more numbers does
not increase the size of the core very much. (But it takes longer
to build.) I am not sure if this occurs because there is a mitsake
somewhere, or if the tools are able to figure out how to share all
the logic. I am thinking the logic for adders is shared somehow.
128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler that
you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as after
success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were not.
Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different units.
Supplying the same data to all units simultaneously is a sort of
mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of done
signals that are gated together. Meaning the slowest calculation of
the group will slow everything else down. Once all the dones are true
the next set of numbers are loaded.

3. Supervisor
Again, in proof-of-concept design the supervisor does not have to be
very realistic. But it should be able to selectively exercise all
core feature related to alert, including fetching of initial data
and clearing of alert.

After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you find
huge FPGA in which such number of testers could fit (and I am not
sure that such FPGA exists) the compilation will take many hours,
possibly days.

Yeah 8,000 was definitely out of the picture. I had since coded 24x24
matrix (576) and although the logic would fit that turned out not to
be able to route. So, it is down to 20x20. 20x20 with no real
scheduling fits.

I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.

Does 20x20 means 400 cores with something like 80-bit initial values
and 128-bit processing?

Yes. 128-bit processing.

If you didn't make further mistakes then it should be rather big, order
of 300K LEs. It would not fit in the biggest low-end Altera device
(Cyclone 10GX 10CX220).

IIRC it is about 150k LUTs. It uses the simplest approach, which is not
very fast iteration wise, but it allows a lot of cores and a fast clock
cycles time. One core is very small and has a fast iteration (1 clock
cycle). But it takes a lot more iterations than can be done with a more brainiac approach.

Using a more brainiac approach would likely cut the performance in half
and use a lot more LUTs.

On the Xilinx side, it would not fit in any Spartan and in most Artix devices. May be, it would fit in the biggest Artix UltraScale+ (AU25P).
But more realistically for 400 test units one would want Arria 10 on
Altera side or Kintex on Xilinx side. Both are not prohibitively
expensive, but not cheap.

What device are you using?

I am not using an too inexpensive device. I have a Kintex-325 with about
200k LUTs.

If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.

There could be more mistakes for sure. But I am sure the lowest level
core works. It seemed to in simulation. I made a slight mod to it since
I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good for verification.

1 Core is:

module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;

always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end

endmodule

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sat Feb 21 19:23:02 2026

From Newsgroup: comp.arch

On Sat, 21 Feb 2026 01:56:03 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-19 5:10 a.m., Michael S wrote:

On Wed, 18 Feb 2026 22:56:19 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200
MHz. Some of the number would take a large number of iterations
though.

I found something strange, the code to count for more numbers
does not increase the size of the core very much. (But it takes
longer to build.) I am not sure if this occurs because there is
a mitsake somewhere, or if the tools are able to figure out how
to share all the logic. I am thinking the logic for adders is
shared somehow. 128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler
that you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as
after success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different
units. Supplying the same data to all units simultaneously is a
sort of mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.

3. Supervisor
Again, in proof-of-concept design the supervisor does not have to
be very realistic. But it should be able to selectively exercise
all core feature related to alert, including fetching of initial
data and clearing of alert.

After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you
find huge FPGA in which such number of testers could fit (and I
am not sure that such FPGA exists) the compilation will take many
hours, possibly days.

Yeah 8,000 was definitely out of the picture. I had since coded
24x24 matrix (576) and although the logic would fit that turned
out not to be able to route. So, it is down to 20x20. 20x20 with
no real scheduling fits.

I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.

Does 20x20 means 400 cores with something like 80-bit initial values
and 128-bit processing?

Yes. 128-bit processing.

If you didn't make further mistakes then it should be rather big,
order of 300K LEs. It would not fit in the biggest low-end Altera
device (Cyclone 10GX 10CX220).

IIRC it is about 150k LUTs. It uses the simplest approach, which is
not very fast iteration wise, but it allows a lot of cores and a fast
clock cycles time. One core is very small and has a fast iteration (1
clock cycle). But it takes a lot more iterations than can be done
with a more brainiac approach.

Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.

On the Xilinx side, it would not fit in any Spartan and in most
Artix devices. May be, it would fit in the biggest Artix
UltraScale+ (AU25P). But more realistically for 400 test units one
would want Arria 10 on Altera side or Kintex on Xilinx side. Both
are not prohibitively expensive, but not cheap.

What device are you using?

I am not using an too inexpensive device. I have a Kintex-325 with
about 200k LUTs.

If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.

There could be more mistakes for sure. But I am sure the lowest level
core works. It seemed to in simulation. I made a slight mod to it
since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good
for verification.

1 Core is:

module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;

always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end

endmodule

--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Feb 21 12:27:38 2026

From Newsgroup: comp.arch

On 2/14/2026 1:18 PM, David Brown wrote:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture.-a Graphics cards have sped this
up enormously.

(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,

-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that?-a "Induction" says nothing of the sort.

The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers.-a There are other things proven about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.

Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are plenty
of mathematical conjectures that are true up to very large numbers, but
fail later on.

Comments?

Interesting problem,
waste of power.

It is a mathematically very interesting problem.-a Whether there is
anything to be gained by continuing the search for exceptions by brute force, is a different matter - but /if/ it is worth doing, then doing so more efficiently is a good idea.

FWIW:
Is there any number likely to disprove it? No.
Would running it out to some arbitrary precision prove it? No.

Granted, finding a value where it failed would disprove it, but this is unlikely to happen.

But, can note:
For all positive even numbers, n/2 is less than n;
For all positive odd numbers, n*3+1 is even;
So, it becomes (n*3+1)/2
For all positive even numbers:
Probability of n/2 being an integer is 100%.
Probability of n/2 being even is 50%.
For (n*3+1)/2:
Probability of this being even is 50%.
So, 50% value gets bigger, 50% it gets smaller.
Probability of a double odd: 25%
Probability of a triple odd: 12.5%
...

OK, so it would seem like the probability of for N finding a path that diverges, becomes 1/2^k, where k=log2(n).

Where, the paths that get smaller can't include a path that gets bigger
unless the current path itself can get bigger, where the probability of
this happening quickly approaches 0.

If there was any place likely to find a divergence, it would be for
small integers, this is empirically known to be false.

Granted, saying that the probability approaches 0 is not the same as
saying that the probability is 0.

The probability of it being 0 seems directly proportional to the reciprocal:
If p>(1/n): A divergence would be expected.
If p<(1/n): A divergence would be impossible.
But, p=1/n, as (1/n)=(1/(2**log2(n)))

Though, since n*3+1 is always even, it is p=1/(2**ceil(log2(n))).

1/(2**ceil(log2(n))) < (1/n) for all non-even integers (would only be
equal for powers of 2, but powers of 2 always converge).

So, it would seem like probability of divergence converges on 0.

Dunno, though, this is just how it seems to me at the moment.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sat Feb 21 20:36:51 2026

From Newsgroup: comp.arch

On Sat, 21 Feb 2026 01:56:03 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-19 5:10 a.m., Michael S wrote:

On Wed, 18 Feb 2026 22:56:19 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200
MHz. Some of the number would take a large number of iterations
though.

I found something strange, the code to count for more numbers
does not increase the size of the core very much. (But it takes
longer to build.) I am not sure if this occurs because there is
a mitsake somewhere, or if the tools are able to figure out how
to share all the logic. I am thinking the logic for adders is
shared somehow. 128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler
that you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as
after success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different
units. Supplying the same data to all units simultaneously is a
sort of mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.

3. Supervisor
Again, in proof-of-concept design the supervisor does not have to
be very realistic. But it should be able to selectively exercise
all core feature related to alert, including fetching of initial
data and clearing of alert.

After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you
find huge FPGA in which such number of testers could fit (and I
am not sure that such FPGA exists) the compilation will take many
hours, possibly days.

Yeah 8,000 was definitely out of the picture. I had since coded
24x24 matrix (576) and although the logic would fit that turned
out not to be able to route. So, it is down to 20x20. 20x20 with
no real scheduling fits.

I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.

Does 20x20 means 400 cores with something like 80-bit initial values
and 128-bit processing?

Yes. 128-bit processing.

If you didn't make further mistakes then it should be rather big,
order of 300K LEs. It would not fit in the biggest low-end Altera
device (Cyclone 10GX 10CX220).

IIRC it is about 150k LUTs. It uses the simplest approach, which is
not very fast iteration wise, but it allows a lot of cores and a fast
clock cycles time. One core is very small and has a fast iteration (1
clock cycle). But it takes a lot more iterations than can be done
with a more brainiac approach.

Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.

According to my experiments, combining 2 steps have very small negative
effect on achivable clock and increases area by ~1.8x. So, to me it
looks like a win.
That's the combined step I am talking about:
x = n/4;
switch (x % 4) {
case 0: n = x; break;
case 1: n = 3*x+1; break;
case 2: n = 2*x+2; break;
case 3: n = 9*x+8; break;
}

On the Xilinx side, it would not fit in any Spartan and in most
Artix devices. May be, it would fit in the biggest Artix
UltraScale+ (AU25P). But more realistically for 400 test units one
would want Arria 10 on Altera side or Kintex on Xilinx side. Both
are not prohibitively expensive, but not cheap.

What device are you using?

I am not using an too inexpensive device. I have a Kintex-325 with
about 200k LUTs.

According to Xilinx marketing materials XC7K325T has 326080 equivalents
of traditional LEs.
That explains why your design fits.

Pay attention that while Kintex evaluation boards could be
non-expensive, right now buying devices for actual production work is
rather costly, especially in small quantities.
The lowest price I see on digikey site is 1147 USD. That's a bad value relatively to many other FPGAs.

If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.

There could be more mistakes for sure. But I am sure the lowest level
core works. It seemed to in simulation. I made a slight mod to it
since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good
for verification.

1 Core is:

module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;

always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end

endmodule

Ok.
1. Your core misses one critical part - overflow check.
Assuming 80-bit initial count, 128-bit register will overflow
eventually. It would happen very rarely, but it would happen.
The check for overflow does not have to be precise. Conservative
approach, like looking at 2 MS bits and one LS bit and signaling
overflow not when it really happened but when it *could* happen, i.e.
when MS bits > 0 and LS bit='1' is good enough. But the check can't be
omitted.

2. I see that instead of calculating n = (n*3+1)/2 as one step you are
doing it in two separate steps. I am not sure that it is really makes
your core smaller or faster. And it certainly increases # of iterations
by 1/3rd.

3.
128-bit count is pointless loss of resources. 12 or 13 bit are
sufficient. If the count exceeded that size then you already found
something of high interest and it is a job of supervisor to find out
what exactly had happened.
128-bit counter actually not just too big, it is also pointless,
because if we pretend to believe that our purpose is to disprove
Collatz's conjecture then supervisor would have to implement stop
condition with some sort of timeout that will effectively duplicate functionality of the counter.

4.
Your module does not latch the value of startn. It means that caller
module has to do it on it's behalf. That is the most likely place where
one can make a mistake.

5.
Coding style comment. Normally, name rst is used for asynchronous
reset signal. In your design rst is a synchronous signal. For the
benefit of casual reader I suggest to find different name.

--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Feb 21 13:48:02 2026

From Newsgroup: comp.arch

On 2/15/2026 5:50 AM, David Brown wrote:

On 14/02/2026 22:22, MitchAlsup wrote:

David Brown <david.brown@hesbynett.no> posted:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture.-a Graphics cards have sped this
up enormously.

(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,

-a-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that?-a "Induction" says nothing of the sort. >>>
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers.-a There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.

By proving up to 2^71, you simultaneously prove that it cannot be proven
by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.

All that has been proven is that if the conjecture is not true, then the smallest exception to the pattern is greater than 2 ^ 71.-a Maybe it is 2
^ 71 + 1.-a Current computer technology could take the search a lot
further than just 2 ^ 71 - if someone wanted to pay for it!-a Future computer technology could perhaps greatly increase the bounds.-a Maybe Thomas's "useless machine" could lead to far greater bounds without excessive costs - and maybe research into that could lead to useful new technology or techniques with applications beyond an interesting maths problem.

There are lots of examples of conjectures where the first exception is a high number, and all testing of low numbers fitted the conjectures. Sometimes such counter-examples are far beyond the reach of any brute-
force computational testing, but sometimes not.

We have not proven that the Collatz Conjecture can be disproven by brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.-a And we have known all along that computational methods can never prove a conjecture like this
to be true - that much should be obvious.

Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more insights - maybe there is a pattern in the numbers that lead to peaks of growth of
the Collatz function before dropping towards 1.-a Such patterns can
inspire proofs - or more targeted searches for exceptions.

Ergo, you need a different means.

You don't imagine that the only thing mathematicians have done with the Collatz Conjecture is test it to large numbers, do you?

In theory, large numbers could disprove it, if an exception were found. However, it fails to prove it either; and if no value is found (or would
be found) it doesn't prove anything.

Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are plenty >>> of mathematical conjectures that are true up to very large numbers, but
fail later on.

You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.

Surely everyone who has done high-school mathematics already realises
that a conjecture like this cannot be proven by trail and error alone? Brute-force testing is about looking for exceptions to /disprove/ the conjecture, or gaining insights that can lead to its proof or disproof.

Yeah, pretty much...

My own thinking (mostly since looking at this thread, earlier today) has
led me to realize first that the probability of any number containing a divergent path is less than the reciprocal of the value (related to
prior post in this thread).

Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite series
(since any non-divergent path immediately collapses back into a
convergent path).

Effectively, one has to multiply an infinite series of near 0
probabilities. Or, effectively, for all values of n, the effective
probability of a non-zero probability converges to 1/Inf, or, 0...

Probably doesn't count as a sufficient proof by standards of
mathematical rigor, but at least for me, I can say that it seems true
enough that it always converges (and, will always continue to converge
no matter how big the numbers get).

While in some hand wavy sense, one could try to push the probability to infinity to defeat the infinite reciprocal, infinity does not exist
within the set of positive integers...

--- Synchronet 3.21b-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sat Feb 21 15:39:42 2026

From Newsgroup: comp.arch

Robert Finch wrote:

On 2026-02-19 5:10 a.m., Michael S wrote:

If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.

There could be more mistakes for sure. But I am sure the lowest level
core works. It seemed to in simulation. I made a slight mod to it since
I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good for verification.

1 Core is:

module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;

always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end

endmodule

I have a minimal knowledge of Verilog and may be wrong but...

- it looks to me in the code as written that startn, count and done args
are coded either input or output, but are actually both read and written.
The compiler should have given an error as that implies multiple drivers
for those wires (the caller module and the assignments).
Possibly it didn't detect an error because it defaults to data type
"logic" which has 4 states, one of which is tri-state 'z'.

Maybe setting the data types to 'bit' would allow it to detect the error.

Changing them to "inout" would turn them all into tri-state buses which
is probably not what you want.
Possibly have separate args for input and output:

input bit[127:0] iCount;
output bit]127:0] oCount.

- I don't know how smart your compiler is but the >> shift operator can
be replaced by a bit slice operation that only routes wires:
if (~n[0])
n <= {1'b0, n[127:1]};

- it can eliminate a 128 bit adder by shifting the wires left:
else
n <= {n[126:0], 1'b0} + n + 1;

--- Synchronet 3.21b-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sat Feb 21 21:40:26 2026

From Newsgroup: comp.arch

BGB wrote:

On 2/15/2026 5:50 AM, David Brown wrote:

On 14/02/2026 22:22, MitchAlsup wrote:

David Brown <david.brown@hesbynett.no> posted:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying >>>>>> to disprove the Collatz conjecture.|e-a Graphics cards have sped this >>>>>> up enormously.

(A reminder, if any is needed:|e-a From a starting number n, the
transformation, recursively applied,

|e-a|e-a|e-a|e-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.|e-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that?|e-a "Induction" says nothing of the
sort.

The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/>>>> been proven to be true for all numbers.|e-a There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions >>>> to the pattern, but no one has any notion of how to prove it in
general.

By proving up to 2^71, you simultaneously prove that it cannot be proven >>> by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.

All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.|e-a Maybe
it is 2 ^ 71 + 1.|e-a Current computer technology could take the search
a lot further than just 2 ^ 71 - if someone wanted to pay for it!|e
Future computer technology could perhaps greatly increase the
bounds.|e-a Maybe Thomas's "useless machine" could lead to far greater
bounds without excessive costs - and maybe research into that could
lead to useful new technology or techniques with applications beyond
an interesting maths problem.

There are lots of examples of conjectures where the first exception is
a high number, and all testing of low numbers fitted the conjectures. >> Sometimes such counter-examples are far beyond the reach of any brute-
force computational testing, but sometimes not.

We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.|e-a And we have known
all along that computational methods can never prove a conjecture like
this to be true - that much should be obvious.

Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more insights
- maybe there is a pattern in the numbers that lead to peaks of growth
of the Collatz function before dropping towards 1.|e-a Such patterns can
inspire proofs - or more targeted searches for exceptions.

Ergo, you need a different means.

You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?

In theory, large numbers could disprove it, if an exception were found.> However, it fails to prove it either; and if no value is found (or would
be found) it doesn't prove anything.

Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.|e-a There are
plenty
of mathematical conjectures that are true up to very large numbers, but >>>> fail later on.

You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.

Surely everyone who has done high-school mathematics already realises >> that a conjecture like this cannot be proven by trail and error alone?
Brute-force testing is about looking for exceptions to /disprove/ the >> conjecture, or gaining insights that can lead to its proof or disproof.

Yeah, pretty much...

My own thinking (mostly since looking at this thread, earlier today) has
led me to realize first that the probability of any number containing a divergent path is less than the reciprocal of the value (related to
prior post in this thread).

Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite series > (since any non-divergent path immediately collapses back into a
convergent path).

You do have to detect an infinite loop, but that's easy to do:
Even if you don't employ the classic one-step/two-step approach, simply stopping after 8k iterations and then let the supervisor check it
properly would be fine.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Feb 21 15:17:54 2026

From Newsgroup: comp.arch

On 2/21/2026 2:40 PM, Terje Mathisen wrote:

BGB wrote:

On 2/15/2026 5:50 AM, David Brown wrote:

On 14/02/2026 22:22, MitchAlsup wrote:

David Brown <david.brown@hesbynett.no> posted:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying >>>>>>> to disprove the Collatz conjecture.|e-a Graphics cards have sped this >>>>>>> up enormously.

(A reminder, if any is needed:|e-a From a starting number n, the >>>>>>> transformation, recursively applied,

|e-a|e-a|e-a|e-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many >>>>>>> capabilities that are not needed, and consume area and power that >>>>>>> special-purpose hardware would not need.|e-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that?|e-a "Induction" says nothing of the >>>>> sort.

The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/ >>>>> been proven to be true for all numbers.|e-a There are other things
proven
about it, such as the asymptotic rarity of numbers that are exceptions >>>>> to the pattern, but no one has any notion of how to prove it in
general.

By proving up to 2^71, you simultaneously prove that it cannot be
proven
by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.

All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.|e-a Maybe >>> it is 2 ^ 71 + 1.|e-a Current computer technology could take the search >>> a lot further than just 2 ^ 71 - if someone wanted to pay for it!|e
Future computer technology could perhaps greatly increase the
bounds.|e-a Maybe Thomas's "useless machine" could lead to far greater
bounds without excessive costs - and maybe research into that could
lead to useful new technology or techniques with applications beyond
an interesting maths problem.

There are lots of examples of conjectures where the first exception
is a high number, and all testing of low numbers fitted the
conjectures. Sometimes such counter-examples are far beyond the reach
of any brute- force computational testing, but sometimes not.

We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.|e-a And we have known
all along that computational methods can never prove a conjecture
like this to be true - that much should be obvious.

Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more
insights - maybe there is a pattern in the numbers that lead to peaks
of growth of the Collatz function before dropping towards 1.|e-a Such
patterns can inspire proofs - or more targeted searches for exceptions.

Ergo, you need a different means.

You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?

In theory, large numbers could disprove it, if an exception were found.
However, it fails to prove it either; and if no value is found (or
would be found) it doesn't prove anything.

Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.|e-a There are >>>>> plenty
of mathematical conjectures that are true up to very large numbers, >>>>> but
fail later on.

You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.

Surely everyone who has done high-school mathematics already realises
that a conjecture like this cannot be proven by trail and error
alone? Brute-force testing is about looking for exceptions to /
disprove/ the conjecture, or gaining insights that can lead to its
proof or disproof.

Yeah, pretty much...

My own thinking (mostly since looking at this thread, earlier today)
has led me to realize first that the probability of any number
containing a divergent path is less than the reciprocal of the value
(related to prior post in this thread).

Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite series
(since any non-divergent path immediately collapses back into a
convergent path).

You do have to detect an infinite loop, but that's easy to do:
Even if you don't employ the classic one-step/two-step approach, simply stopping after 8k iterations and then let the supervisor check it
properly would be fine.

Yeah, assuming it is being tested.

Another option could be to detect an overflow from whatever size of
integer, assuming the starting point is not within a region where near-immediate overflow is likely.

Seemingly the definitions on Wikipedia don't appear to exclude n==0,
which could lead to an infinite loop: 0 is even, and 0/2==0. Can
probably assume 0 is excluded because that would make it pointlessly
trivial (even if they say, say, "positive integer" rather than "counting number" or similar, where the set of positive integers includes 0, but "counting numbers" do not).

But, yeah, it would be false if the input set included 0.

Terje

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 22 03:16:58 2026

From Newsgroup: comp.arch

On Sat, 21 Feb 2026 15:39:42 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Robert Finch wrote:

On 2026-02-19 5:10 a.m., Michael S wrote:

If it is smaller than Kintex KU5P then I'd strongly suspect that
you didn't clean out all mistakes.

There could be more mistakes for sure. But I am sure the lowest
level core works. It seemed to in simulation. I made a slight mod
to it since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its
good for verification.

1 Core is:

module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;

always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end

endmodule

I have a minimal knowledge of Verilog and may be wrong but...

I am not a Verilog expert either (use VHDL exclusively), but to me the
code of Robert Finch looks o.k from the HDL perspective (functionality
is something else, I addressed it in my replay below).
His Verilog appears to be an exact equivalent of VHDL code below,
which is perfectly legal synthesizable code under VHDL-2008 rules.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity Collatz_conjecture is
port (
rst : in boolean;
clk : in std_logic;
startn : in unsigned(127 downto 0);
count : out unsigned(127 downto 0);
done : out boolean
);
end entity Collatz_conjecture;

architecture a of Collatz_conjecture is
signal n : unsigned(127 downto 0);
begin
process (clk)
begin
if rising_edge(clk) then
if rst then
n <= startn;
count <= (others => '0');
done <= false;
elsif not done then
if n(0)='0' then
n <= '0' & n(n'high downto 1);
else
n <= n+n+n+1;
end if;

if n < startn then
done <= true;
end if;

count <= count + 1;
end if;
end if;
end process;
end architecture a;

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 22 10:46:00 2026

From Newsgroup: comp.arch

On Sat, 21 Feb 2026 20:36:51 +0200
Michael S <already5chosen@yahoo.com> wrote:

Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.

According to my experiments, combining 2 steps have very small
negative effect on achivable clock and increases area by ~1.8x. So,
to me it looks like a win.
That's the combined step I am talking about:
x = n/4;
switch (x % 4) {
case 0: n = x; break;
case 1: n = 3*x+1; break;
case 2: n = 2*x+2; break;
case 3: n = 9*x+8; break;
}

Mistake in C above. Should be
switch (n % 4) {

--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sun Feb 22 04:50:11 2026

From Newsgroup: comp.arch

On 2/21/2026 3:17 PM, BGB wrote:

On 2/21/2026 2:40 PM, Terje Mathisen wrote:

BGB wrote:

On 2/15/2026 5:50 AM, David Brown wrote:

On 14/02/2026 22:22, MitchAlsup wrote:

David Brown <david.brown@hesbynett.no> posted:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying >>>>>>>> to disprove the Collatz conjecture.|e-a Graphics cards have sped this >>>>>>>> up enormously.

(A reminder, if any is needed:|e-a From a starting number n, the >>>>>>>> transformation, recursively applied,

|e-a|e-a|e-a|e-a f(n) = 3*n+1, if n is odd; =n/2 if n is even. >>>>>>>>
is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many >>>>>>>> capabilities that are not needed, and consume area and power that >>>>>>>> special-purpose hardware would not need.|e-a Also, the hardware >>>>>>>> is limited to 64-bit integers, and the range of tested numbers >>>>>>>> is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that?|e-a "Induction" says nothing of the >>>>>> sort.

The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/ >>>>>> been proven to be true for all numbers.|e-a There are other things >>>>>> proven
about it, such as the asymptotic rarity of numbers that are
exceptions
to the pattern, but no one has any notion of how to prove it in
general.

By proving up to 2^71, you simultaneously prove that it cannot be
proven
by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple >>>>> computer
arithmetic processes.

All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.|e
Maybe it is 2 ^ 71 + 1.|e-a Current computer technology could take the >>>> search a lot further than just 2 ^ 71 - if someone wanted to pay for
it!|e Future computer technology could perhaps greatly increase the
bounds.|e-a Maybe Thomas's "useless machine" could lead to far greater >>>> bounds without excessive costs - and maybe research into that could
lead to useful new technology or techniques with applications beyond
an interesting maths problem.

There are lots of examples of conjectures where the first exception
is a high number, and all testing of low numbers fitted the
conjectures. Sometimes such counter-examples are far beyond the
reach of any brute- force computational testing, but sometimes not.

We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.|e-a And we have known >>>> all along that computational methods can never prove a conjecture
like this to be true - that much should be obvious.

Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more
insights - maybe there is a pattern in the numbers that lead to
peaks of growth of the Collatz function before dropping towards 1.|e >>>> Such patterns can inspire proofs - or more targeted searches for
exceptions.

Ergo, you need a different means.

You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?

In theory, large numbers could disprove it, if an exception were found.
However, it fails to prove it either; and if no value is found (or
would be found) it doesn't prove anything.

Being true up to 2 ^ 71 makes it reasonable to believe it is always >>>>>> true, but that is very different from being /proven/.|e-a There are >>>>>> plenty
of mathematical conjectures that are true up to very large
numbers, but
fail later on.

You need cannot even /prove/ it using BigNum computer arithmetic in >>>>> finite
time.

Surely everyone who has done high-school mathematics already
realises that a conjecture like this cannot be proven by trail and
error alone? Brute-force testing is about looking for exceptions
to / disprove/ the conjecture, or gaining insights that can lead to
its proof or disproof.

Yeah, pretty much...

My own thinking (mostly since looking at this thread, earlier today)
has led me to realize first that the probability of any number
containing a divergent path is less than the reciprocal of the value
(related to prior post in this thread).

Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite
series (since any non-divergent path immediately collapses back into
a convergent path).

You do have to detect an infinite loop, but that's easy to do:
Even if you don't employ the classic one-step/two-step approach,
simply stopping after 8k iterations and then let the supervisor check
it properly would be fine.

Yeah, assuming it is being tested.

Another option could be to detect an overflow from whatever size of
integer, assuming the starting point is not within a region where near- immediate overflow is likely.

Seemingly the definitions on Wikipedia don't appear to exclude n==0,
which could lead to an infinite loop: 0 is even, and 0/2==0. Can
probably assume 0 is excluded because that would make it pointlessly
trivial (even if they say, say, "positive integer" rather than "counting number" or similar, where the set of positive integers includes 0, but "counting numbers" do not).

But, yeah, it would be false if the input set included 0.

I realized a detail that I missed earlier:
What you were likely trying to point out was not about whether or not
cases exist which shoot towards +Inf, but rather whether one can prove
that no paths can exist which loop back on themselves.

I guess, the latter is harder to prove.

I can see something visually:
In this case, the number line can be seen as a spiral along a cone where
each turn represents a power of 2 (or twice the circumference of the
prior turn).

So, in this case, dividing by 2 drops 1 turn, or (if it reaches the
point, it reaches 1).

3*n+1, effectively moves 1.5+1 turns up, but then immediately drops down
1 turn, so in effect it moves by 0.5 turn, and but the +1 had
effectively moved it to the next position within this turn.

This means, that for a cycle to exist, there would need to exist a turn
where the loop proceeds through every position within that turn. But,
this in turn can only happen if, the next level up advances through
every position in that level (to form a zigzag pattern across the
levels), with this pattern existing towards +Inf, but this pattern could
only exist if *no* paths converge, so in effect this path does not exist.

Or, in effect, since it is achieving a 1/2 turn +1 in phase space, there
will not exist any closed loops.

Essentially it is the same basic reason why no two adjacent spots can
collide on the same index within modulo indexing.

In effect, the whole space turns into non-intersecting spiral paths
headed toward 1 (and, when paths merge, all roads lead to 1).

Probably doesn't really count as a proof though...

Terje

--- Synchronet 3.21b-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sun Feb 22 12:45:44 2026

From Newsgroup: comp.arch

Michael S wrote:

On Sat, 21 Feb 2026 20:36:51 +0200
Michael S <already5chosen@yahoo.com> wrote:

Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.

According to my experiments, combining 2 steps have very small
negative effect on achivable clock and increases area by ~1.8x. So,
to me it looks like a win.
That's the combined step I am talking about:
x = n/4;
switch (x % 4) {
case 0: n = x; break;
case 1: n = 3*x+1; break;
case 2: n = 2*x+2; break;
case 3: n = 9*x+8; break;
}

Mistake in C above. Should be
switch (n % 4) {

I did notice that. :-)

The one thing that worries me (sw on a 64-bit platform) about the code
is the 9* on 128-bit variables:

9*x+8 =>

Do we use SHLD + SHL here or something else?

How about MUL & LEA?

;; Input in r10:r9, output in rdx:rax
mov rax,r9
mul rax,rbx ;; RBX == 9
lea r10,[r10+r10*8]
add rdx,r10

That looks like 5-6 clock cycles, so the branch misses from the switch statement would probably dominate unless you do as I suggested and use
lookup tables instead:

let bot2 = n & 3;
let x = n >> 2;
n = x*multab[bot2] + addtab[bot2];

but if we do that, then (at least for a sw implementation) it would be
better to pick a lot more of the LS bits, at least 8-12?

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sun Feb 22 14:43:11 2026

From Newsgroup: comp.arch

On 21/02/2026 20:48, BGB wrote:

On 2/15/2026 5:50 AM, David Brown wrote:

On 14/02/2026 22:22, MitchAlsup wrote:

David Brown <david.brown@hesbynett.no> posted:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying >>>>>> to disprove the Collatz conjecture.-a Graphics cards have sped this >>>>>> up enormously.

(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,

-a-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that?-a "Induction" says nothing of the sort. >>>>
The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers.-a There are other things proven >>>> about it, such as the asymptotic rarity of numbers that are exceptions >>>> to the pattern, but no one has any notion of how to prove it in
general.

By proving up to 2^71, you simultaneously prove that it cannot be proven >>> by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.

All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.-a Maybe
it is 2 ^ 71 + 1.-a Current computer technology could take the search a
lot further than just 2 ^ 71 - if someone wanted to pay for it!
Future computer technology could perhaps greatly increase the bounds.
Maybe Thomas's "useless machine" could lead to far greater bounds
without excessive costs - and maybe research into that could lead to
useful new technology or techniques with applications beyond an
interesting maths problem.

There are lots of examples of conjectures where the first exception is
a high number, and all testing of low numbers fitted the conjectures.
Sometimes such counter-examples are far beyond the reach of any brute-
force computational testing, but sometimes not.

We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.-a And we have known all
along that computational methods can never prove a conjecture like
this to be true - that much should be obvious.

Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more insights
- maybe there is a pattern in the numbers that lead to peaks of growth
of the Collatz function before dropping towards 1.-a Such patterns can
inspire proofs - or more targeted searches for exceptions.

Ergo, you need a different means.

You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?

In theory, large numbers could disprove it, if an exception were found. However, it fails to prove it either; and if no value is found (or would
be found) it doesn't prove anything.

Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are plenty >>>> of mathematical conjectures that are true up to very large numbers, but >>>> fail later on.

You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.

Surely everyone who has done high-school mathematics already realises
that a conjecture like this cannot be proven by trail and error alone?
Brute-force testing is about looking for exceptions to /disprove/ the
conjecture, or gaining insights that can lead to its proof or disproof.

Yeah, pretty much...

My own thinking (mostly since looking at this thread, earlier today) has
led me to realize first that the probability of any number containing a divergent path is less than the reciprocal of the value (related to
prior post in this thread).

Mathematicians don't consider someone's gut feelings as proof. At best,
it can be an inspiration about what to focus on. Random guesses about
what someone things is the probability of something are rarely of any
use to anyone.

It does not even make sense to talk about "probability" here - a number
either leads to a divergent path, a convergent path ending in 1, or a different cycle. It is not a random process.

You can make statements about the density of numbers that do not lead to
1, or limits to the density. There has been a fair amount of work done
on the Collatz Conjecture in this way, putting limits on the proportion
of numbers that are divergent or lead to other cycles (it turns out they
are very rare, if they exist at all). But those limits do not come from guesses, they come from proofs.

Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite series (since any non-divergent path immediately collapses back into a
convergent path).

You can't prove that a number leads to a divergent path by trial and
error, even if you find such a number. But if someone found a number
that appeared to lead to a divergent path, it may be possible to prove
that it diverges by other methods. If alternative cycles exist, then
they could be found by trial and error (though no one is hopeful that
this will be practically possible).

Effectively, one has to multiply an infinite series of near 0
probabilities. Or, effectively, for all values of n, the effective probability of a non-zero probability converges to 1/Inf, or, 0...

Such arguments are invariably nonsense unless you qualify them far
better - especially when "probability" has no meaning here.

Probably doesn't count as a sufficient proof by standards of
mathematical rigor, but at least for me, I can say that it seems true
enough that it always converges (and, will always continue to converge
no matter how big the numbers get).

You are correct that it doesn't count as proof.

While in some hand wavy sense, one could try to push the probability to infinity to defeat the infinite reciprocal, infinity does not exist
within the set of positive integers...

Let me give another argument in the same line as yours - with the same
level of "hand-wavy sense" and the same level of completely invalid mathematics.

With a ruler (unmarked) and compass, you can construct lots of things by algorithmic sequences. You can, for example, bisect an arbitrary angle.
You can construct an exact 65537 sided polynomial. So if we take a
random sequence of N steps, maybe it can be used to trisect an arbitrary angle. It's unlikely, but it is obvious that it is plausible. So we
can say that there is a probability "p(N)" that any sequence of length N
will trisect an arbitrary angle. No one has found such a trisector
algorithm yet, so we know p(N) is small. However, it is also obvious
that as the number of steps grows, so does this probability (always
limited by 1).

So now consider q(N) = 1 - p(N), the probability that no sequence of
length N an trisect an angle. As we have reasoned, q(N) will start at 1
(you can't trisect an angle in one step), but will drop below 1 and
decrease. The probability that /no/ sequence of steps will be able to
trisect an angle is thus the product of all q(N). It's a product of a decreasing sequence of numbers between 1 and 0 - clearly it converges to
0. Thus at some point, there has to be a sequence of steps that can
trisect an angle - we have proved its existence, even though we have not figured out how to do it.

The problem with this argument is that it is bollocks. Real mathematics
has proved that trisection of arbitrary angles by ruler and compass is impossible.

We don't yet know if the Collatz Conjecture is true or not. But your "probability argument" is of no more worth there than it is for
trisecting angles.

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sun Feb 22 15:14:31 2026

From Newsgroup: comp.arch

On 21/02/2026 19:27, BGB wrote:

On 2/14/2026 1:18 PM, David Brown wrote:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying
to disprove the Collatz conjecture.-a Graphics cards have sped this
up enormously.

(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,

-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many
capabilities that are not needed, and consume area and power that
special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that?-a "Induction" says nothing of the sort.

The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/
been proven to be true for all numbers.-a There are other things proven
about it, such as the asymptotic rarity of numbers that are exceptions
to the pattern, but no one has any notion of how to prove it in general.

Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are
plenty of mathematical conjectures that are true up to very large
numbers, but fail later on.

Comments?

Interesting problem,
waste of power.

It is a mathematically very interesting problem.-a Whether there is
anything to be gained by continuing the search for exceptions by brute
force, is a different matter - but /if/ it is worth doing, then doing
so more efficiently is a good idea.

FWIW:
Is there any number likely to disprove it? No.

What justification do you have for that claim?

There's a big difference between saying that tests on sample numbers
have not given any sign of a way to find exceptions to the conjecture,
and talking about what is "likely" or not.

If a number is found that has a different cycle, or a number is found
that can be proven to lead to a divergent path, then that would disprove
the conjecture.

Would running it out to some arbitrary precision prove it? No.

Correct.

Granted, finding a value where it failed would disprove it, but this is unlikely to happen.

Again, you can't talk about "likely" or "unlikely" - you are giving quantitative (albeit vague) probabilities to something that is not
random, so you should be careful about what you mean. There /have/ been proofs made about limits to the densities of possible exceptions, but
that only gives you limits to the probability of finding an exception by arbitrarily picking a number within a given range and testing it.

But, can note:
-aFor all positive even numbers, n/2 is less than n;
-aFor all positive odd numbers, n*3+1 is even;
-a-a So, it becomes (n*3+1)/2
-a For all positive even numbers:
-a-a-a Probability of n/2 being an integer is 100%.
-a-a-a Probability of n/2 being even is 50%.
-a For (n*3+1)/2:
-a-a-a Probability of this being even is 50%.
-a-a-a-a-a So, 50% value gets bigger, 50% it gets smaller.
-a-a-a Probability of a double odd: 25%
-a-a-a Probability of a triple odd: 12.5%
-a-a-a ...

OK, so it would seem like the probability of for N finding a path that diverges, becomes 1/2^k, where k=log2(n).

No. You are mixing statistical patterns and probabilities. The
statistics are that most people who play in a lottery, lose out - but
the probability that someone wins is high. And once the lottery is
drawn, probabilities make no sense for individuals - either they won, or
they lost. (The "Collatz lottery" has already been drawn, even though
we don't know the results.)

Your argument for the statistics is fair, and can be seen as a visual
pattern if you draw a graph of the path length against the starting
value. It has been proven that this pattern continues - there is a
bound to the density of exceptions what are outside that pattern. But
that does not tell you anything about probabilities of the path-length
or cycle for any given case.

Where, the paths that get smaller can't include a path that gets bigger unless the current path itself can get bigger, where the probability of
this happening quickly approaches 0.

If there was any place likely to find a divergence, it would be for
small integers, this is empirically known to be false.

Granted, saying that the probability approaches 0 is not the same as
saying that the probability is 0.

The probability of it being 0 seems directly proportional to the
reciprocal:
-a If p>(1/n): A divergence would be expected.
-a If p<(1/n): A divergence would be impossible.
-a But, p=1/n, as (1/n)=(1/(2**log2(n)))

Though, since n*3+1 is always even, it is p=1/(2**ceil(log2(n))).

1/(2**ceil(log2(n))) < (1/n) for all non-even integers (would only be
equal for powers of 2, but powers of 2 always converge).

So, it would seem like probability of divergence converges on 0.

Dunno, though, this is just how it seems to me at the moment.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 22 16:35:02 2026

From Newsgroup: comp.arch

On Sun, 22 Feb 2026 12:45:44 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:

Michael S wrote:

On Sat, 21 Feb 2026 20:36:51 +0200
Michael S <already5chosen@yahoo.com> wrote:

Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.

According to my experiments, combining 2 steps have very small
negative effect on achivable clock and increases area by ~1.8x. So,
to me it looks like a win.
That's the combined step I am talking about:
x = n/4;
switch (x % 4) {
case 0: n = x; break;
case 1: n = 3*x+1; break;
case 2: n = 2*x+2; break;
case 3: n = 9*x+8; break;
}

Mistake in C above. Should be
switch (n % 4) {

I did notice that. :-)

The one thing that worries me (sw on a 64-bit platform) about the
code is the 9* on 128-bit variables:

9*x+8 =>

Do we use SHLD + SHL here or something else?

How about MUL & LEA?

;; Input in r10:r9, output in rdx:rax
mov rax,r9
mul rax,rbx ;; RBX == 9
lea r10,[r10+r10*8]
add rdx,r10

That looks like 5-6 clock cycles, so the branch misses from the
switch statement would probably dominate unless you do as I suggested
and use lookup tables instead:

let bot2 = n & 3;
let x = n >> 2;
n = x*multab[bot2] + addtab[bot2];

but if we do that, then (at least for a sw implementation) it would
be better to pick a lot more of the LS bits, at least 8-12?

Terje

This variant is not intended for software implementation.
For software it is obviously sub-optimal for the reasons you suggested.
OTOH, for FPGA it seems to work a little better than both simpler and
more complicated steps.

I presented the code in C only because C is lingua franca of
comp.arch.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Sun Feb 22 11:41:34 2026

From Newsgroup: comp.arch

It is a mathematically very interesting problem. Whether there is anything to be gained by continuing the search for exceptions by brute force, is
a different matter - but /if/ it is worth doing, then doing so more efficiently is a good idea.

AFAIK, in such games "more efficiently" tends to me largely irrelevant
if it's more efficient by a (non-astronomical) constant factor.
Instead, you need it to be algorithmically more efficient.

=== Stefan
--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sun Feb 22 12:38:31 2026

From Newsgroup: comp.arch

On 2/22/2026 7:43 AM, David Brown wrote:

On 21/02/2026 20:48, BGB wrote:

On 2/15/2026 5:50 AM, David Brown wrote:

On 14/02/2026 22:22, MitchAlsup wrote:

David Brown <david.brown@hesbynett.no> posted:

On 14/02/2026 19:48, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

Some people are throwing large amounts of computing power at trying >>>>>>> to disprove the Collatz conjecture.-a Graphics cards have sped this >>>>>>> up enormously.

(A reminder, if any is needed:-a From a starting number n, the
transformation, recursively applied,

-a-a-a-a f(n) = 3*n+1, if n is odd; =n/2 if n is even.

is conjectured to always reach 1 for any positive integer 1).

While n/2 remains even, n/2 is guaranteed to converge to 1.
So, only the odd cases are interesting.

3*n+1 is always even (since n was odd, and odd*odd is odd and
odd+1 is even). So, a double iteration is (3*n+1)/2.

Yes, that is often a micro-optimisation used.

All the work is done on general-purpose hardware, which has many >>>>>>> capabilities that are not needed, and consume area and power that >>>>>>> special-purpose hardware would not need.-a Also, the hardware
is limited to 64-bit integers, and the range of tested numbers
is now up to 2^71, so

Induction would say it has been proven by then.

Would you like to re-think that?-a "Induction" says nothing of the
sort.

The conjecture has been proven true for all numbers up to 2 ^ 71
(assuming that is the current figure) - it has most definitely /not/ >>>>> been proven to be true for all numbers.-a There are other things proven >>>>> about it, such as the asymptotic rarity of numbers that are exceptions >>>>> to the pattern, but no one has any notion of how to prove it in
general.

By proving up to 2^71, you simultaneously prove that it cannot be
proven
by computer-like arithmetic processes. What if the number was
2^71,000,000
you still have not proven it? Thereby, it is not provable by simple
computer
arithmetic processes.

All that has been proven is that if the conjecture is not true, then
the smallest exception to the pattern is greater than 2 ^ 71.-a Maybe
it is 2 ^ 71 + 1.-a Current computer technology could take the search
a lot further than just 2 ^ 71 - if someone wanted to pay for it!
Future computer technology could perhaps greatly increase the bounds.
Maybe Thomas's "useless machine" could lead to far greater bounds
without excessive costs - and maybe research into that could lead to
useful new technology or techniques with applications beyond an
interesting maths problem.

There are lots of examples of conjectures where the first exception
is a high number, and all testing of low numbers fitted the
conjectures. Sometimes such counter-examples are far beyond the reach
of any brute- force computational testing, but sometimes not.

We have not proven that the Collatz Conjecture can be disproven by
brute-force testing - we have merely shown that doing so would be
difficult, and seems unlikely to be successful.-a And we have known
all along that computational methods can never prove a conjecture
like this to be true - that much should be obvious.

Testing conjectures like this can be helpful to mathematicians.
Sometimes they can show patterns that can be used to get more
insights - maybe there is a pattern in the numbers that lead to peaks
of growth of the Collatz function before dropping towards 1.-a Such
patterns can inspire proofs - or more targeted searches for exceptions.

Ergo, you need a different means.

You don't imagine that the only thing mathematicians have done with
the Collatz Conjecture is test it to large numbers, do you?

In theory, large numbers could disprove it, if an exception were found.
However, it fails to prove it either; and if no value is found (or
would be found) it doesn't prove anything.

Being true up to 2 ^ 71 makes it reasonable to believe it is always
true, but that is very different from being /proven/.-a There are
plenty
of mathematical conjectures that are true up to very large numbers, >>>>> but
fail later on.

You need cannot even /prove/ it using BigNum computer arithmetic in
finite
time.

Surely everyone who has done high-school mathematics already realises
that a conjecture like this cannot be proven by trail and error
alone? Brute-force testing is about looking for exceptions to /
disprove/ the conjecture, or gaining insights that can lead to its
proof or disproof.

Yeah, pretty much...

My own thinking (mostly since looking at this thread, earlier today)
has led me to realize first that the probability of any number
containing a divergent path is less than the reciprocal of the value
(related to prior post in this thread).

Mathematicians don't consider someone's gut feelings as proof.-a At best,
it can be an inspiration about what to focus on.-a Random guesses about
what someone things is the probability of something are rarely of any
use to anyone.

It does not even make sense to talk about "probability" here - a number either leads to a divergent path, a convergent path ending in 1, or a different cycle.-a It is not a random process.

You can make statements about the density of numbers that do not lead to
1, or limits to the density.-a There has been a fair amount of work done
on the Collatz Conjecture in this way, putting limits on the proportion
of numbers that are divergent or lead to other cycles (it turns out they
are very rare, if they exist at all).-a But those limits do not come from guesses, they come from proofs.

Then, further realized that, since one doesn't just need *one* number
with the probability of a divergent path, one needs an infinite series
(since any non-divergent path immediately collapses back into a
convergent path).

You can't prove that a number leads to a divergent path by trial and
error, even if you find such a number.-a But if someone found a number
that appeared to lead to a divergent path, it may be possible to prove
that it diverges by other methods.-a If alternative cycles exist, then
they could be found by trial and error (though no one is hopeful that
this will be practically possible).

Effectively, one has to multiply an infinite series of near 0
probabilities. Or, effectively, for all values of n, the effective
probability of a non-zero probability converges to 1/Inf, or, 0...

Such arguments are invariably nonsense unless you qualify them far
better - especially when "probability" has no meaning here.

Probably doesn't count as a sufficient proof by standards of
mathematical rigor, but at least for me, I can say that it seems true
enough that it always converges (and, will always continue to converge
no matter how big the numbers get).

You are correct that it doesn't count as proof.

While in some hand wavy sense, one could try to push the probability
to infinity to defeat the infinite reciprocal, infinity does not exist
within the set of positive integers...

Let me give another argument in the same line as yours - with the same
level of "hand-wavy sense" and the same level of completely invalid mathematics.

With a ruler (unmarked) and compass, you can construct lots of things by algorithmic sequences.-a You can, for example, bisect an arbitrary angle.
-aYou can construct an exact 65537 sided polynomial.-a So if we take a random sequence of N steps, maybe it can be used to trisect an arbitrary angle.-a It's unlikely, but it is obvious that it is plausible.-a So we
can say that there is a probability "p(N)" that any sequence of length N will trisect an arbitrary angle.-a No one has found such a trisector algorithm yet, so we know p(N) is small.-a However, it is also obvious
that as the number of steps grows, so does this probability (always
limited by 1).

So now consider q(N) = 1 - p(N), the probability that no sequence of
length N an trisect an angle.-a As we have reasoned, q(N) will start at 1 (you can't trisect an angle in one step), but will drop below 1 and decrease.-a The probability that /no/ sequence of steps will be able to trisect an angle is thus the product of all q(N).-a It's a product of a decreasing sequence of numbers between 1 and 0 - clearly it converges to 0.-a Thus at some point, there has to be a sequence of steps that can trisect an angle - we have proved its existence, even though we have not figured out how to do it.

The problem with this argument is that it is bollocks.-a Real mathematics has proved that trisection of arbitrary angles by ruler and compass is impossible.

We don't yet know if the Collatz Conjecture is true or not.-a But your "probability argument" is of no more worth there than it is for
trisecting angles.

I was initially seeing it more like a very large hash table.
But, I have since realized it is not a hash table, it is a more
organized pattern.

But, yeah, not a math person, and I will make no claim of having
"proven" anything...

Rather, my aim was mostly to try to start organizing stuff enough to
where I could get a sense of why it works, and doesn't explode off into infinite numbers.

But, will note that this whole area is one major reason I am not really
into math or physics. As a practical tool, they are useful, or can be
seen as interesting detours.

But, proofs kinda ruin the whole experience.

But, yeah, at least for me, now seeing it as a sort of cone of
power-of-2 rings with all paths forming into spirals which end at 1,
each odd step effectively going half-way around a spiral formed from the number line but then ending up at an incremented position, etc, seems
like a way to see why it does why it does what it does.

Even if it doesn't count for anything, it isn't much worse than trying
to test it empirically, which as noted, also does not count (it could
only count if an exception could be found, but an exception wont be
found no matter how big the search space becomes, so...).

But, no proofs to be found here, I will not make this claim, in any case.

...

--- Synchronet 3.21b-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Feb 22 14:48:13 2026

From Newsgroup: comp.arch

Michael S wrote:

On Sat, 21 Feb 2026 15:39:42 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Robert Finch wrote:

On 2026-02-19 5:10 a.m., Michael S wrote:

If it is smaller than Kintex KU5P then I'd strongly suspect that
you didn't clean out all mistakes.

There could be more mistakes for sure. But I am sure the lowest
level core works. It seemed to in simulation. I made a slight mod
to it since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its
good for verification.

1 Core is:

module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;

always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end

endmodule

I have a minimal knowledge of Verilog and may be wrong but...

I am not a Verilog expert either (use VHDL exclusively), but to me the
code of Robert Finch looks o.k from the HDL perspective (functionality
is something else, I addressed it in my replay below).
His Verilog appears to be an exact equivalent of VHDL code below,
which is perfectly legal synthesizable code under VHDL-2008 rules.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity Collatz_conjecture is
port (
rst : in boolean;
clk : in std_logic;
startn : in unsigned(127 downto 0);
count : out unsigned(127 downto 0);
done : out boolean
);
end entity Collatz_conjecture;

architecture a of Collatz_conjecture is
signal n : unsigned(127 downto 0);
begin
process (clk)
begin
if rising_edge(clk) then
if rst then
n <= startn;
count <= (others => '0');
done <= false;
elsif not done then
if n(0)='0' then
n <= '0' & n(n'high downto 1);
else
n <= n+n+n+1;
end if;

if n < startn then
done <= true;
end if;

count <= count + 1;
end if;
end if;
end process;
end architecture a;

Well then I just don't get this at all.
I have a VHDL textbook that explicitly states that IN ports are input only,
OUT ports are output only, and INOUT are input and output (tri-state).

The language spec SystemVerilog IEEE_Std-1800-2017
(it's a horrid language - the spec is 1315 pages long,
and so complex that no compiler implements all of it,
and some compilers still use the pre-2005 language spec)

says in section "23.3.3.2 Port connection rules for variables" that "Assignments to variables declared as input ports shall be illegal." "Procedural or continuous assignments to a variable connected to the
output port of an instance shall be illegal."

but it also says "An output port can be connected to a net (or a
concatenation) of a compatible data type. In this case, multiple drivers
shall be permitted on the net"

and in "23.3.3.1 Port coercion
A port that is declared as input (output) but used as an output (input)
or inout may be coerced to inout. If not coerced to inout, a warning
shall be issued."

but never defines what "coerced" means or how to do it.

Verilog-2017 also has the "uwire" data type which is a uni-driver type
that only allows a single driver for the wire.
(But some compilers don't support uwire)

"6.6.2 Unresolved nets
The uwire net is an unresolved or unidriver wire and is used to model nets
that allow only a single driver. The uwire type can be used to enforce this restriction. It shall be an error to connect any bit of a uwire net to more than one driver. It shall be an error to connect a uwire net to a
bidirectional terminal of a bidirectional pass switch.

The port connection rule in 23.3.3.6 enforces this restriction across
the net hierarchy or shall issue a warning if not."

"23.3.3.6 Single source nets (uwire)
If the net on either side of a port has the net type uwire,
a warning shall be issued if the nets are not merged
into a single net, as described in 23.3.3.7."

and spec contradicts itself as to whether it is an error or warning.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 22 23:25:47 2026

From Newsgroup: comp.arch

On Sun, 22 Feb 2026 14:48:13 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Michael S wrote:

On Sat, 21 Feb 2026 15:39:42 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Robert Finch wrote:

On 2026-02-19 5:10 a.m., Michael S wrote:

If it is smaller than Kintex KU5P then I'd strongly suspect that
you didn't clean out all mistakes.

There could be more mistakes for sure. But I am sure the lowest
level core works. It seemed to in simulation. I made a slight mod
to it since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its
good for verification.

1 Core is:

module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;

always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end

endmodule

I have a minimal knowledge of Verilog and may be wrong but...

I am not a Verilog expert either (use VHDL exclusively), but to me
the code of Robert Finch looks o.k from the HDL perspective
(functionality is something else, I addressed it in my replay
below). His Verilog appears to be an exact equivalent of VHDL code
below, which is perfectly legal synthesizable code under VHDL-2008
rules.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity Collatz_conjecture is
port (
rst : in boolean;
clk : in std_logic;
startn : in unsigned(127 downto 0);
count : out unsigned(127 downto 0);
done : out boolean
);
end entity Collatz_conjecture;

architecture a of Collatz_conjecture is
signal n : unsigned(127 downto 0);
begin
process (clk)
begin
if rising_edge(clk) then
if rst then
n <= startn;
count <= (others => '0');
done <= false;
elsif not done then
if n(0)='0' then
n <= '0' & n(n'high downto 1);
else
n <= n+n+n+1;
end if;

if n < startn then
done <= true;
end if;

count <= count + 1;
end if;
end if;
end process;
end architecture a;

Well then I just don't get this at all.
I have a VHDL textbook that explicitly states that IN ports are input
only, OUT ports are output only, and INOUT are input and output
(tri-state).

Sounds like old textbook, not updated with VHDL-2008 changes related to
what is legal for 'out' ports.
See, for example, here: https://docs.amd.com/r/en-US/ug901-vivado-synthesis/Reading-Output-Ports

--- Synchronet 3.21b-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Sun Feb 22 19:12:49 2026

From Newsgroup: comp.arch

On 2026-02-21 1:36 p.m., Michael S wrote:

On Sat, 21 Feb 2026 01:56:03 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-19 5:10 a.m., Michael S wrote:

On Wed, 18 Feb 2026 22:56:19 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200
MHz. Some of the number would take a large number of iterations
though.

I found something strange, the code to count for more numbers
does not increase the size of the core very much. (But it takes
longer to build.) I am not sure if this occurs because there is
a mitsake somewhere, or if the tools are able to figure out how
to share all the logic. I am thinking the logic for adders is
shared somehow. 128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible.

You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value.
That is successful ending. Here you signal to the scheduler
that you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks
like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and
stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as
after success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different
units. Supplying the same data to all units simultaneously is a
sort of mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.

3. Supervisor
Again, in proof-of-concept design the supervisor does not have to
be very realistic. But it should be able to selectively exercise
all core feature related to alert, including fetching of initial
data and clearing of alert.

After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you
find huge FPGA in which such number of testers could fit (and I
am not sure that such FPGA exists) the compilation will take many
hours, possibly days.

Yeah 8,000 was definitely out of the picture. I had since coded
24x24 matrix (576) and although the logic would fit that turned
out not to be able to route. So, it is down to 20x20. 20x20 with
no real scheduling fits.

I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.

Does 20x20 means 400 cores with something like 80-bit initial values
and 128-bit processing?

Yes. 128-bit processing.

If you didn't make further mistakes then it should be rather big,
order of 300K LEs. It would not fit in the biggest low-end Altera
device (Cyclone 10GX 10CX220).

IIRC it is about 150k LUTs. It uses the simplest approach, which is
not very fast iteration wise, but it allows a lot of cores and a fast
clock cycles time. One core is very small and has a fast iteration (1
clock cycle). But it takes a lot more iterations than can be done
with a more brainiac approach.

Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.

According to my experiments, combining 2 steps have very small negative effect on achivable clock and increases area by ~1.8x. So, to me it
looks like a win.
That's the combined step I am talking about:
x = n/4;
switch (x % 4) {
case 0: n = x; break;
case 1: n = 3*x+1; break;
case 2: n = 2*x+2; break;
case 3: n = 9*x+8; break;
}

On the Xilinx side, it would not fit in any Spartan and in most
Artix devices. May be, it would fit in the biggest Artix
UltraScale+ (AU25P). But more realistically for 400 test units one
would want Arria 10 on Altera side or Kintex on Xilinx side. Both
are not prohibitively expensive, but not cheap.

What device are you using?

I am not using an too inexpensive device. I have a Kintex-325 with
about 200k LUTs.

According to Xilinx marketing materials XC7K325T has 326080 equivalents
of traditional LEs.
That explains why your design fits.

Pay attention that while Kintex evaluation boards could be
non-expensive, right now buying devices for actual production work is
rather costly, especially in small quantities.
The lowest price I see on digikey site is 1147 USD. That's a bad value relatively to many other FPGAs.

If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.

There could be more mistakes for sure. But I am sure the lowest level
core works. It seemed to in simulation. I made a slight mod to it
since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good
for verification.

1 Core is:

module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;

always_ff @(posedge clk)
if (rst) begin
n <= startn;
count <= 0;
done <= FALSE;
end
else begin
if (!done) begin
if (~n[0])
n <= n >> 2'd1;
else
n <= n+n+n+1;
if (n < startn)
done <= TRUE;
count <= count + 1;
end
end

endmodule

Ok.
1. Your core misses one critical part - overflow check.
Assuming 80-bit initial count, 128-bit register will overflow
eventually. It would happen very rarely, but it would happen.
The check for overflow does not have to be precise. Conservative
approach, like looking at 2 MS bits and one LS bit and signaling
overflow not when it really happened but when it *could* happen, i.e.
when MS bits > 0 and LS bit='1' is good enough. But the check can't be omitted.

2. I see that instead of calculating n = (n*3+1)/2 as one step you are
doing it in two separate steps. I am not sure that it is really makes
your core smaller or faster. And it certainly increases # of iterations
by 1/3rd.

3.
128-bit count is pointless loss of resources. 12 or 13 bit are
sufficient. If the count exceeded that size then you already found
something of high interest and it is a job of supervisor to find out
what exactly had happened.
128-bit counter actually not just too big, it is also pointless,
because if we pretend to believe that our purpose is to disprove
Collatz's conjecture then supervisor would have to implement stop
condition with some sort of timeout that will effectively duplicate functionality of the counter.

4.
Your module does not latch the value of startn. It means that caller
module has to do it on it's behalf. That is the most likely place where
one can make a mistake.

5.
Coding style comment. Normally, name rst is used for asynchronous
reset signal. In your design rst is a synchronous signal. For the
benefit of casual reader I suggest to find different name.

Re-wrote the core using Michael's suggestion of iterations. Looks
something like the below code.
Renamed the rst signal ld to better reflect its usage.
Timing is close to the same, but not as many cores can fit. It is a
trade-off, fewer cores, but they process more iterations in one go. Net
result is more numbers processed in the same length of time.
I am using 16x16 cores now. Overflow check using 130-bit numbers.

import const_pkg::*;

module Collatz_conjecture(clk, ld, startn, done, overflow);
input clk;
input ld;
input [127:0] startn;
output reg done;
output reg overflow;

reg [129:0] n;

always_ff @(posedge clk)
if (ld)
n <= 130'(startn);
else begin
if (!done)
case(n[1:0])
2'd0: n <= n >> 2'd2;
2'd1: n <= n[129:2]+n[129:2]+n[129:2]+2'd1;
2'd2: n <= n[129:2]+n[129:2]+2'd2;
2'd3: n <= {n[129:2],3'd0}+n[129:2]+4'd8;
endcase
end

always_ff @(posedge clk)
if (ld)
done <= FALSE;
else begin
if (!done) begin
if (n < startn || (|n[129:128]))
done <= TRUE;
end
end

always_ff @(posedge clk)
if (ld)
overflow <= FALSE;
else begin
if (!done) begin
if (|n[129:128])
overflow <= TRUE;
end
end

endmodule

--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sun Feb 22 18:41:50 2026

From Newsgroup: comp.arch

On 2/19/2026 2:27 PM, Stefan Monnier wrote:

I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems clear
that constant-factor improvements are completely irrelevant (even with
large constants).

I guess it all boils down to the "useless" part of the title?

Yeah, basically.

It is useless in the sense that no matter how big you go, an exception
wont be found, so rendering further testing essentially meaningless
(doesn't prove anything unless an exception is found, exception wont be
found if none can exist).

I did feel tempted to write a program to draw a graph...

Well, then did so, and posted a partial graph on X: https://x.com/cr88192/status/2025698159519817849

Though, a linear/log mismatch makes it look wonky, and got annoyed
fiddling with trying to make the code to draw the graph not make it look
like a mess.

It is like, now realizing how it works and the fairly obvious nature of
it being true, but also realizing that I am also not in a position to be
able to prove it as such, is making me feel stupid.

But, alas, there are reasons I am not really a math of physics person...

Well, at least excluding the guys who go around promoting wacky/crazy
ideas like recursive compression and perpetual motion machines and
similar (or, all the "quantum consciousness" stuff and similar),
wouldn't really want to be associated with this crowd either...

Actually, this latter camp I don't get:
If you can easily test things and see that they don't work, what is the
point of continuing to promote them as-if they do?...

Or, like the whole fusion thing:
Can someone make it happen? Yes.
Can one get net positive energy output? Not so much.

Or, the seeming futility of trying to send humans on interstellar trips. Basically, no way to send full sized humans interstellar within any
plausible timescale or resource/energy budget. That said, not entirely convinced it is outside the reach of machines (so, my sci-fi stuff
wouldn't involve sending humans, rather machines with enough to try to bootstrap the process).

Ship can get to target system, find some asteroids or similar to begin consuming, building itself up into a more significant structure, then
begins building other machinery it needs; building whatever it needs to initiate the colonization process. Even sending a full-set of organic
samples is a bit of a stretch, so more likely it would preferentially bootstrap high priority species from actual cell lines, but may need
backup strategies (recreating the whole biosphere from yeast or similar
if need-be). In some sense, the main "payload" not likely being lots of organic matter, but enough dense information storage to be able to
initiate the bootstrap process (and a machine just smart enough to be
able to figure out how to find suitable asteroids and start the process).

Galactic civilizations? No. Any such colonies being unlikely to realize
the others exist, much less have any real way to communicate with them (light-speed limits effectively making any sort of large scale
"civilization" all but impossible).

Etc.

Well, except in cases where I am feeling half tempted to do stories that
just throw realism entirely out the window, "yeah, errm, how about we
have wooden pirate ships, in space, fighting giant robots?..." maybe
take the "solar wind" part a little too literally: have air in space and
the ships just sorta tack-sailing on the wind ("moment of inertia be
damned", things still move like they were floating on water, but in
space). Stuff basically works like in 1980s cartoons (like, you know,
the ones that mostly existed as glorified toy commercials). Then the
worlds' physics are based less on realism and more a mix of "rule of
cool" and "rule of funny".

Say, also:
Ships battle off giant robots by shooting at them with old style canons;
People can expand to 50ft tall to fist-fight said robots;
Rocket flatulence (can fly around by essentially farting fire);
Trinket based costume changes;
Giant robots vs dinosaurs (maybe also in space, dinosaurs also having
rocket flatulence, and fire breath, ...);
Obvious plot armor;
People can escape restraints by burning them off with eye lasers;
...

But, say:
Characters are not just going around like Superman or something (because
then there are no stakes), and thus only able to use their
higher-powered abilities when the plot demands it.

Did recently throw a fragment of this sort of idea at Grok Imagine, but
while it didn't give exactly what I was imagining, it was in the same
general area:
https://x.com/cr88192/status/2025021520507011292

But, alas in this case the video generation only gives a few seconds
(for much more than this, would cost money...).

Because, alas, reality is just kinda depressing sometimes...

Well, except a while back, when I did write a story set in a sort of reverse-urban-fantasy setting:
As opposed to the typical "fantasy stuff in an urban setting";
More "stereotypical urban stuff in a fantasy setting".

So, more rap-battling gangsta elves and stuff...

Then again, "pirate ships in space" wouldn't exactly be too far out of
place in a fantasy world. Could pass it off as "sci-fi" in a loose sense
in that while it is fairly obviously a fantasy setting, many of the
characters are not themselves aware that there is anything unusual with
them living within a fantasy-world magic system. But, then may as well
borrow ideas from "Rainbow Brite" or something, and have the ships
powered by "Star Sprinkles" or such (well, and puts a limit on how much characters can do: The more powerful magic they use, the more rapidly
they use up their supply of star sprinkles...).

...

But, Alas...

--- Synchronet 3.21b-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Sun Feb 22 19:47:32 2026

From Newsgroup: comp.arch

On 2026-02-22 7:12 p.m., Robert Finch wrote:

On 2026-02-21 1:36 p.m., Michael S wrote:

On Sat, 21 Feb 2026 01:56:03 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-19 5:10 a.m., Michael S wrote:

On Wed, 18 Feb 2026 22:56:19 -0500
Robert Finch <robfi680@gmail.com> wrote:

On 2026-02-18 11:22 a.m., Michael S wrote:

On Sun, 15 Feb 2026 10:56:51 -0500
Robert Finch <robfi680@gmail.com> wrote:

I have coded a version that tests up to 8192 number
simultaneously. It is supposed to be able to run at over 200
MHz. Some of the number would take a large number of iterations
though.

I found something strange, the code to count for more numbers
does not increase the size of the core very much. (But it takes
longer to build.) I am not sure if this occurs because there is
a mitsake somewhere, or if the tools are able to figure out how
to share all the logic. I am thinking the logic for adders is
shared somehow. 128-bit arithmetic is being used.

There is a mistake in your code. No other explanation is possible. >>>>>>
You should look for 3 sorts of mistakes
1. Core itself.
The most likely mistake is failing to look for one of the ending
conditions. You have to look for 3 conditions:
a) - value produced at current step is smaller than initial value. >>>>>> -a-a-a That is successful ending. Here you signal to the scheduler >>>>>> that you are ready to accept the next value.
b) - overflow. Next value exceeds the width of register.
c) - timeout. You were running for more than N steps. N=8192 looks >>>>>> like the reasonable value.
a) and b) are suspected failures.
The handling for b) and c) is similar - you raise alert signal and >>>>>> stop. Your test unit has to provide supervisor circuity with the
mean to fetch the initial value that caused suspected failure and
to clear alert condition.
After alert is cleared your unit returns to the same state as
after success - waiting for next initial value.

I found the mistake. Core was working, but groups of cores were
not. Done signals were missing.

2. Scheduler.
In proof-of-concept design the scheduler does not have to be very
realistic. For example, it does not have to be efficiently
pipelined. But it should be able to monitor ready signals of all
test units and to supply different initial data to different
units. Supplying the same data to all units simultaneously is a
sort of mistake that explains your observed behavior.

There is really simple scheduling ATM. There is just a bunch of
done signals that are gated together. Meaning the slowest
calculation of the group will slow everything else down. Once all
the dones are true the next set of numbers are loaded.

3. Supervisor
Again, in proof-of-concept design the supervisor does not have to
be very realistic. But it should be able to selectively exercise
all core feature related to alert, including fetching of initial
data and clearing of alert.

After you coded everything correctly, do not even try to compile
test with 8000 units. That is totally unrealistic. Even if you
find huge FPGA in which such number of testers could fit (and I
am not sure that such FPGA exists) the compilation will take many
hours, possibly days.

Yeah 8,000 was definitely out of the picture. I had since coded
24x24 matrix (576) and although the logic would fit that turned
out not to be able to route. So, it is down to 20x20. 20x20 with
no real scheduling fits.

I tried displaying progress as an on-screen bitmap, but that just
appeared noisy. I have been working on an SCP so may be able to use
that to manage the scheduling with some custom logic.

Does 20x20 means 400 cores with something like 80-bit initial values
and 128-bit processing?

Yes. 128-bit processing.

If you didn't make further mistakes then it should be rather big,
order of 300K LEs. It would not fit in the biggest low-end Altera
device (Cyclone 10GX 10CX220).

IIRC it is about 150k LUTs. It uses the simplest approach, which is
not very fast iteration wise, but it allows a lot of cores and a fast
clock cycles time. One core is very small and has a fast iteration (1
clock cycle). But it takes a lot more iterations than can be done
with a more brainiac approach.

Using a more brainiac approach would likely cut the performance in
half and use a lot more LUTs.

According to my experiments, combining 2 steps have very small negative
effect on achivable clock and increases area by ~1.8x. So, to me it
looks like a win.
That's the combined step I am talking about:
-a-a x = n/4;
-a-a switch (x % 4) {
-a-a-a-a case 0: n = x; break;
-a-a-a-a case 1: n = 3*x+1; break;
-a-a-a-a case 2: n = 2*x+2; break;
-a-a-a-a case 3: n = 9*x+8; break;
-a-a }

On the Xilinx side, it would not fit in any Spartan and in most
Artix devices. May be, it would fit in the biggest Artix
UltraScale+ (AU25P). But more realistically for 400 test units one
would want Arria 10 on Altera side or Kintex on Xilinx side. Both
are not prohibitively expensive, but not cheap.

What device are you using?

I am not using an too inexpensive device. I have a Kintex-325 with
about 200k LUTs.

According to Xilinx marketing materials XC7K325T has 326080 equivalents
of traditional LEs.
That explains why your design fits.

Pay attention that while Kintex evaluation boards could be
non-expensive, right now buying devices for actual production work is
rather costly, especially in small quantities.
The lowest price I see on digikey site is 1147 USD. That's a bad value
relatively to many other FPGAs.

If it is smaller than Kintex KU5P then I'd strongly suspect that you
didn't clean out all mistakes.

There could be more mistakes for sure. But I am sure the lowest level
core works. It seemed to in simulation. I made a slight mod to it
since I tested it last, checking n < startn.
Counts are not used so the logic would be stripped out, but its good
for verification.

1 Core is:

module Collatz_conjecture(rst, clk, startn, count, done);
input rst;
input clk;
input [127:0] startn;
output reg [127:0] count;
output reg done;
reg [127:0] n;

always_ff @(posedge clk)
if (rst) begin
-a-a-a-an <= startn;
-a-a-a-acount <= 0;
-a-a-a-adone <= FALSE;
end
else begin
-a-a-a-aif (!done) begin
-a-a-a-a-a-a-a if (~n[0])
-a-a-a-a-a-a-a-a-a-a-a n <= n >> 2'd1;
-a-a-a-a-a-a-a else
-a-a-a-a-a-a-a-a-a-a-a n <= n+n+n+1;
-a-a-a-a-a-a-a if (n < startn)
-a-a-a-a-a-a-a-a-a-a-a done <= TRUE;
-a-a-a-a-a-a-a count <= count + 1;
-a-a-a-aend
end

endmodule

Ok.
1. Your core misses one critical part - overflow check.
Assuming 80-bit initial count, 128-bit register will overflow
eventually. It would happen very rarely, but it would happen.
The check for overflow does not have to be precise. Conservative
approach, like looking at 2 MS bits and one LS bit and signaling
overflow not when it really happened but when it *could* happen, i.e.
when MS bits > 0 and LS bit='1' is good enough. But the check can't be
omitted.

2. I see that instead of calculating n = (n*3+1)/2 as one step you are
doing it in two separate steps. I am not sure that it is really makes
your core smaller or faster. And it certainly increases # of iterations
by 1/3rd.

3.
128-bit count is pointless loss of resources. 12 or 13 bit are
sufficient. If the count exceeded that size then you already found
something of high interest and it is a job of supervisor to find out
what exactly had happened.
128-bit counter actually not just too big, it is also pointless,
because if we pretend to believe that our purpose is to disprove
Collatz's conjecture then supervisor would have to implement stop
condition with some sort of timeout that will effectively duplicate
functionality of the counter.

4.
Your module does not latch the value of startn. It means that caller
module has to do it on it's behalf. That is the most likely place where
one can make a mistake.

5.
Coding style comment. Normally, name rst is used for asynchronous
reset signal. In your design rst is a synchronous signal. For the
benefit of casual reader I suggest to find different name.

Re-wrote the core using Michael's suggestion of iterations. Looks
something like the below code.
Renamed the rst signal ld to better reflect its usage.
Timing is close to the same, but not as many cores can fit. It is a trade-off, fewer cores, but they process more iterations in one go. Net result is more numbers processed in the same length of time.
I am using 16x16 cores now. Overflow check using 130-bit numbers.

import const_pkg::*;

module Collatz_conjecture(clk, ld, startn, done, overflow);
input clk;
input ld;
input [127:0] startn;
output reg done;
output reg overflow;

reg [129:0] n;

always_ff @(posedge clk)
if (ld)
-a-a-a-an <= 130'(startn);
else begin
-a-a-a-aif (!done)
-a-a-a-a-a-a-a case(n[1:0])
-a-a-a-a-a-a-a 2'd0:-a-a-a n <= n >> 2'd2;
-a-a-a-a-a-a-a 2'd1:-a-a-a n <= n[129:2]+n[129:2]+n[129:2]+2'd1;
-a-a-a-a-a-a-a 2'd2:-a-a-a n <= n[129:2]+n[129:2]+2'd2;
-a-a-a-a-a-a-a 2'd3:-a-a-a n <= {n[129:2],3'd0}+n[129:2]+4'd8;
-a-a-a-a-a-a-a endcase
end

always_ff @(posedge clk)
if (ld)
-a-a-a-adone <= FALSE;
else begin
-a-a-a-aif (!done) begin
-a-a-a-a-a-a-a if (n < startn || (|n[129:128]))
-a-a-a-a-a-a-a-a-a-a-a done <= TRUE;
-a-a-a-aend
end

always_ff @(posedge clk)
if (ld)
-a-a-a-aoverflow <= FALSE;
else begin
-a-a-a-aif (!done) begin
-a-a-a-a-a-a-a if (|n[129:128])
-a-a-a-a-a-a-a-a-a-a-a overflow <= TRUE;
-a-a-a-aend
end

endmodule

Just noticed overflow needs a couple more extra bits since multiplying by 9.

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Mon Feb 23 10:51:12 2026

From Newsgroup: comp.arch

On 22/02/2026 19:38, BGB wrote:

I was initially seeing it more like a very large hash table.
But, I have since realized it is not a hash table, it is a more
organized pattern.

As far as we know, the result of apply the Collatz function (calculating
the number of steps to reach 1, if it ever does) follows a pattern statistically but not any clear pattern for individual values. There
are a lot of mathematical functions like that - a good example being the number of prime factors in a number. You can draw graphs and visually
see nice patterns - some of which can be proven, others are still
unproven and may not apply over a big enough scale. Sometimes there are tricks or hints to ways to improve your chances when guessing which
values might be outliers.

There /is/ a pattern here, in that the Collatz function is fully
deterministic and thus generates the pattern. But it is not clear if
there is any other way to determine the individual values.

But, yeah, not a math person, and I will make no claim of having
"proven" anything...

Rather, my aim was mostly to try to start organizing stuff enough to
where I could get a sense of why it works, and doesn't explode off into infinite numbers.

If you can explain that, that would be a huge breakthrough! The maths
showing the average growth rates is clear enough (you already showed
it), and it has been formalised to show that for most numbers, there are limits on their growth. But that doesn't say anything about whether or
not there are some numbers for which the Collatz function diverges, or
some numbers for which they form a cycle other than the 4, 2, 1 cycle. (Remember, there are two possible ways for numbers to fail the Collatz Conjecture - not just diverging ones.)

But, will note that this whole area is one major reason I am not really
into math or physics. As a practical tool, they are useful, or can be
seen as interesting detours.

But, proofs kinda ruin the whole experience.

I don't think any mathematicians see the Collatz Conjecture as a
practical tool! But sometimes practical tools come out of working on
such problems. And often it is the proofs involved in such maths that
are the attractive aspects. (Proofs along the way, gradually pushing
the limits of parts of a problem, are typically less interesting to the layman.)

You can get a lot of appreciation for these kinds of problems by doing
some of the calculations yourself. This thread has shown that it can be
a fun coding challenge (for cpus, gpus, fpgas, or whatever). But you
can also draw some graphs and see how the function behaves. And there
are many ways to extend the idea to other Collatz-like functions, or
other domains than just the positive integers.

But, yeah, at least for me, now seeing it as a sort of cone of
power-of-2 rings with all paths forming into spirals which end at 1,
each odd step effectively going half-way around a spiral formed from the number line but then ending up at an incremented position, etc, seems
like a way to see why it does why it does what it does.

Even if it doesn't count for anything, it isn't much worse than trying
to test it empirically, which as noted, also does not count (it could
only count if an exception could be found, but an exception wont be
found no matter how big the search space becomes, so...).

But, no proofs to be found here, I will not make this claim, in any case.

Fair enough.

If you are interested in the problem and want to see some videos about
it, I can recommend:

Numberphile (good fun maths channel, without requiring much knowledge) <https://www.youtube.com/watch?v=5mFpVDpKX70>

Veritasium (great science and fact channel) <https://www.youtube.com/watch?v=094y1Z2wpJg>

Other fun maths proof videos to make you appreciate proofs more!

Vsauce (science and maths channel) <https://www.youtube.com/watch?v=s86-Z-CbaHA>

3Blue1Brown (science and maths channel) <https://www.youtube.com/watch?v=yuVqxCSsE7c>

--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Mon Feb 23 10:53:30 2026

From Newsgroup: comp.arch

On 22/02/2026 17:41, Stefan Monnier wrote:

It is a mathematically very interesting problem. Whether there is anything >> to be gained by continuing the search for exceptions by brute force, is
a different matter - but /if/ it is worth doing, then doing so more
efficiently is a good idea.

AFAIK, in such games "more efficiently" tends to me largely irrelevant
if it's more efficient by a (non-astronomical) constant factor.
Instead, you need it to be algorithmically more efficient.

Sure - algorithmic improvements usually (but not always in practice)
lead to the biggest efficiency gains. But if you are doing a "test
every number up to N" check of a conjecture, /any/ efficiency
improvements are helpful.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Feb 23 11:10:01 2026

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> schrieb:

On 2/19/2026 2:27 PM, Stefan Monnier wrote:

I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems clear
that constant-factor improvements are completely irrelevant (even with
large constants).

I guess it all boils down to the "useless" part of the title?

Yeah, basically.

It is useless in the sense that no matter how big you go, an exception
wont be found, so rendering further testing essentially meaningless
(doesn't prove anything unless an exception is found, exception wont be found if none can exist).

You are assuming that the Collatz conjecture is true. But this
is unknown, and the whole point of the excercise.

Do I personally think that exchaustive search will find something? No.

Does my personal belief matter in the slightest? No.

[...]

Or, the seeming futility of trying to send humans on interstellar trips. Basically, no way to send full sized humans interstellar within any plausible timescale or resource/energy budget.

My favorite candiate for that is the nuclear salt water rocket.
Which will be difficult to build, or so my son (who just received
his master's degree in nuclear engineering) tells me because
it needs to be prompt critical.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Mon Feb 23 10:52:15 2026

From Newsgroup: comp.arch

David Brown [2026-02-23 10:53:30] wrote:

Sure - algorithmic improvements usually (but not always in practice)
lead to the biggest efficiency gains. But if you are doing a "test
every number up to N" check of a conjecture, /any/ efficiency
improvements are helpful.

But once you get to 2^70, the exponential explosion means that
constant-factor improvements are vanishingly helpful.

=== Stefan
--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Mon Feb 23 17:24:49 2026

From Newsgroup: comp.arch

On 2/23/2026 5:10 AM, Thomas Koenig wrote:

BGB <cr88192@gmail.com> schrieb:

On 2/19/2026 2:27 PM, Stefan Monnier wrote:

I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems clear >>> that constant-factor improvements are completely irrelevant (even with
large constants).

I guess it all boils down to the "useless" part of the title?

Yeah, basically.

It is useless in the sense that no matter how big you go, an exception
wont be found, so rendering further testing essentially meaningless
(doesn't prove anything unless an exception is found, exception wont be
found if none can exist).

You are assuming that the Collatz conjecture is true. But this
is unknown, and the whole point of the excercise.

Do I personally think that exchaustive search will find something? No.

Does my personal belief matter in the slightest? No.

My own looking into this led me to realize that the number-space is
modulo (relative to a cone with a spiral that repeats with every
power-of-2), and that all the odd steps only ever increment the phase
angle by 1, or fall down to a lower level (whenever the phase angle is
not incremented).

In effect, each odd step jumps to the opposite side of the cone, but
slightly increments the angle. Since the phase-angle path can never move backwards, it can't touch any previous paths, thus, no loops can happen
this way.

The only way a cycle could exist is if (somehow) there were an
impassable wall where pretty much *no* value could descend unless it
hits the power-of-2 descent path, and then somehow miss this path as well.

But, since "3*n+1" results in the minimum forward step, this means that
one of the sides *has* to hit the power-of-2 descent cases at some point.

Even if you did get very unlucky and hit odd-paths at nearly every step,
it would eventually hit the power-of-2 escape case which is basically an immediate descent down to 1.

Or, as I see it, the geometry of the number space itself prevents the possibility of a loop.

Like, while I can't write a proof for it, I can still see that it is
true in a geometric/mechanistic sense.

Like, my confidence that it is true isn't so much because of a "no
exception has been found, so I believe it is true" sense, but more
because I can see the geometry of the number space well enough to see
that no non-descending paths can exist through the space.

[...]

Or, the seeming futility of trying to send humans on interstellar trips.
Basically, no way to send full sized humans interstellar within any
plausible timescale or resource/energy budget.

My favorite candiate for that is the nuclear salt water rocket.
Which will be difficult to build, or so my son (who just received
his master's degree in nuclear engineering) tells me because
it needs to be prompt critical.

Yeah, I have floated a few ideas...

One I used in my own sci-fi was using cyclotrons and long linac (linear accelerator) stage to try to push atomic nuclei to relativistic speeds
(and generate thrust mostly by the energy required to accelerating the nuclei).

Comparably, thrust would be modest, but the amount of propellant used
could be very small. Harder part is powering the thing...

Main candidates are nuclear and fusion power. With fusion, one could
separate out the He nuclei and use these. With nuclear, CO2 makes more
sense as a working fluid and propellant (after first decomposing it into
C and O nuclei).

While fusion would be the preferable option, nuclear would be easier to
build in practice (probably a molten-salt breeder reactor with CO2 as
the working fluid).

No light water reactors though, these would just suck...

For high-speed travel, would likely need to have a lot of ablative
shielding. My current leaning here basically being to put a very big
chunk of graphite on the front of the ship.

Could use magnetic deflection, but this is likely to result in more
energy cost and drag than essentially using a big chunk of graphite and
maybe some permanent magnets (to slightly reduce ablation rate). Likely
if using permanent magnets, there would be an iron structure inside the
big chunk of graphite.

Say, this to deal with the front of the ship being hit with a stream of hydrogen atoms potentially traveling at relativistic speeds, etc.

Would likely be pointy (sort of like the tip of a pencil) to reduce drag.

It is possibly that the iron structure (inside the graphite) could
in-part be magnetized using part of the field strength of the magnets in
the cyclotrons (say, for example, using big neodymium magnets or similar
to limit energy cost).

Comparably, variations of nuclear thermal rockets have tradeoffs:
Orders of magnitude more thrust;
But, lower maximum exhaust velocity (so, more propellant is needed);
...

So, these are more likely to be able to get going good and fast, within
a more reasonable timeframe, but the problem will then become how far
you can get before running out of fuel. Assuming human passengers
though, time and human lifespan become major limiting factors.

Also, for humans, you need a whole lot of stuff to keep them alive and comfortable, etc. So, you need a whole lot of thrust to get things
moving quickly.

So, in this sense, nuclear thermal makes more sense.

Vs, say, trying to design the engine around the assumption of a
"continuous burn" likely measured in centuries (for a ship with a
comparably modest payload).

Ion engines are also possible, but downside of ion engines is that they
would have higher propellant use rate than a cyclotron+linac while also
giving significantly less thrust relative to the amount of propellant used.

Downside of cyclotron+linac though would be that it would be more energy hungry and necessarily significantly larger/bulkier. Such a ship would
be impractical for ground launch so would need to be assembled in space.

Well, unless a very small version were built, likely with the
(comparably long) linac tail in a folded up form, which then unpacks and assembles itself once in space.

Pretty much no one else that I am aware of is pushing for a
cyclotron+linac based propulsion system though...

But, then again:
Less viability for manned interplanetary travel (nuclear powered plasma thrusters make more sense for this);
Less viability for unmanned interplanetary travel (domain likely better
suited to ion engines);
Not viable for manned interstellar travel (limited payload, using it for
a manned mission would likely result in everyone expiring long before it reached the destination).

So, it is mostly (at best) left as an option for unmanned interstellar
travel.

Well, there are other ideas around, like using very small probes, and accelerating them using huge lasers, but downside is that these would
have no way to decelerate.

With other strategies, one basically needing to flip the ship around and spending the latter half of the trip decelerating (say, with the "butt"
end of the ship having the ablative graphite shield in a more concave
shape in an attempt to maximize the drag profile).

One traditional idea has also been "generation ships", but the
practicality of this is debatable IMO.

Wrote a bit on this, but decided to leave this out.

Seems like whatever approach to generation ships is taken, it still
needs to be able to reach the target within a few centuries to have much
of any real hope of success (1k years, very low probability; after 10k
years, "inescapable problems" would make longer-term survival almost impossible, *1).

*1: Running out of usable fissile materials; running out of nitrogen
(would need to initially stock up on nitrogen in the form of C-14, which
could decay to replace that lost over time, but by 10k years, much of
the C14 would be gone; and long term survival is kinda gone if nitrogen
runs out), ... Stocking up on more thorium and C-14 and similar would ultimately only delay the issues.

Granted, one could reach further out via "system hopping", say, if each
colony can spot another system with potentially habitable target worlds,
say, within 25ly .. 50ly or similar, could spread pretty far over time.

Still none of the "galactic civilization" stuff (as I see it, the
formation of a cohesive civilization spanning multiple star systems is
most likely essentially impossible; absent the development of FTL travel
and communication, etc, which is itself most likely impossible).

...

--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue Feb 24 06:44:31 2026

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> schrieb:

My own looking into this led me to realize that the number-space is
modulo (relative to a cone with a spiral that repeats with every power-of-2), and that all the odd steps only ever increment the phase
angle by 1, or fall down to a lower level (whenever the phase angle is
not incremented).

In effect, each odd step jumps to the opposite side of the cone, but slightly increments the angle. Since the phase-angle path can never move backwards, it can't touch any previous paths, thus, no loops can happen
this way.

The only way a cycle could exist is if (somehow) there were an
impassable wall where pretty much *no* value could descend unless it
hits the power-of-2 descent path, and then somehow miss this path as well.

But, since "3*n+1" results in the minimum forward step, this means that
one of the sides *has* to hit the power-of-2 descent cases at some point.

The same argument would hold for the 3n-1 problem, where there is
more than one cycle.

[...]
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Tue Feb 24 01:38:05 2026

From Newsgroup: comp.arch

On 2/24/2026 12:44 AM, Thomas Koenig wrote:

BGB <cr88192@gmail.com> schrieb:

My own looking into this led me to realize that the number-space is
modulo (relative to a cone with a spiral that repeats with every
power-of-2), and that all the odd steps only ever increment the phase
angle by 1, or fall down to a lower level (whenever the phase angle is
not incremented).

In effect, each odd step jumps to the opposite side of the cone, but
slightly increments the angle. Since the phase-angle path can never move
backwards, it can't touch any previous paths, thus, no loops can happen
this way.

The only way a cycle could exist is if (somehow) there were an
impassable wall where pretty much *no* value could descend unless it
hits the power-of-2 descent path, and then somehow miss this path as well. >>
But, since "3*n+1" results in the minimum forward step, this means that
one of the sides *has* to hit the power-of-2 descent cases at some point.

The same argument would hold for the 3n-1 problem, where there is
more than one cycle.

At least going by "the power of imagination" (would require testing to confirm):

"3*n-1" would be different in that it neither reliably advances the
phase angle, nor reliably moves it backwards. So, one is likely to end
up with pairs of odd-steps that loop back to the same position, creating infinite cycles (so, it is not the same geometry).

"3*n-3" might be stable (may appear to behave similarly to 3*n+1), but
would instead advance the phase angle in the opposite direction.

"3*n+3" should initially appear similar to "3*n+1", but would have paths
that ascend to infinity (paths would exist which manage to miss all of
the even steps and also miss the magic "power-of-2 auto-win" step).

If the adjustment offset is even, then the "(3*n+1)/2" trick breaks, and
also all paths that hit an odd number will ascend to infinity.

Granted, may make sense to check/confirm, if these don't match up I
might need to revisit my mental model.

...

[...]

--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Tue Feb 24 03:16:04 2026

From Newsgroup: comp.arch

On 2/24/2026 1:38 AM, BGB wrote:

On 2/24/2026 12:44 AM, Thomas Koenig wrote:

BGB <cr88192@gmail.com> schrieb:

My own looking into this led me to realize that the number-space is
modulo (relative to a cone with a spiral that repeats with every
power-of-2), and that all the odd steps only ever increment the phase
angle by 1, or fall down to a lower level (whenever the phase angle is
not incremented).

In effect, each odd step jumps to the opposite side of the cone, but
slightly increments the angle. Since the phase-angle path can never move >>> backwards, it can't touch any previous paths, thus, no loops can happen
this way.

The only way a cycle could exist is if (somehow) there were an
impassable wall where pretty much *no* value could descend unless it
hits the power-of-2 descent path, and then somehow miss this path as
well.

But, since "3*n+1" results in the minimum forward step, this means that
one of the sides *has* to hit the power-of-2 descent cases at some
point.

The same argument would hold for the 3n-1 problem, where there is
more than one cycle.

Then, went and took a shower, realized there was more to this:

At least going by "the power of imagination" (would require testing to confirm):

"3*n-1" would be different in that it neither reliably advances the
phase angle, nor reliably moves it backwards. So, one is likely to end
up with pairs of odd-steps that loop back to the same position, creating infinite cycles (so, it is not the same geometry).

Imagination is mostly saying there will be a lot of fairly short loops here.

There should probably be a quite significant number of short loops in
this case.

"3*n-3" might be stable (may appear to behave similarly to 3*n+1), but
would instead advance the phase angle in the opposite direction.

But, "3*n-3" will change the ending state, so it will not land on 1 anymore.

Rather:
Around 84% of the time it end land on 3;
Around 16% of the time it end land on 0.
Give or take... ("imagination" showing around 84% or so of paths landing
on 3).

Will probably still converge, just ending up at a different end state.

All other negative offset paths would lead to underflow.

So, can probably exclude cases where the value space goes negative.
Though... "looks like" if these were included there is another cone on
the other side, if one goes this way.

"3*n+3" should initially appear similar to "3*n+1", but would have paths that ascend to infinity (paths would exist which manage to miss all of
the even steps and also miss the magic "power-of-2 auto-win" step).

If the adjustment offset is even, then the "(3*n+1)/2" trick breaks, and also all paths that hit an odd number will ascend to infinity.

Further assertion 1:
All other positive offset paths should contain paths to infinity (along
with jostling around the ending states for those paths which do converge).

So, the specific end-state behavior seems unique.

Further assertion 2:
"3*n+1" may be unique in this space.

Or, possibly the only one that reliably converges, lands on 1, and does
not contain cycles.

Granted, may make sense to check/confirm, if these don't match up I
might need to revisit my mental model.

Still TODO, but for me, shower seemed like a bigger immediate priority.

So, yeah, I guess I may report back later as to whether or not the
actual math behaves the same way as it does in my imagination.

...

--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Tue Feb 24 04:52:14 2026

From Newsgroup: comp.arch

On 2/24/2026 3:16 AM, BGB wrote:

On 2/24/2026 1:38 AM, BGB wrote:

On 2/24/2026 12:44 AM, Thomas Koenig wrote:

BGB <cr88192@gmail.com> schrieb:

My own looking into this led me to realize that the number-space is
modulo (relative to a cone with a spiral that repeats with every
power-of-2), and that all the odd steps only ever increment the phase
angle by 1, or fall down to a lower level (whenever the phase angle is >>>> not incremented).

In effect, each odd step jumps to the opposite side of the cone, but
slightly increments the angle. Since the phase-angle path can never
move
backwards, it can't touch any previous paths, thus, no loops can happen >>>> this way.

The only way a cycle could exist is if (somehow) there were an
impassable wall where pretty much *no* value could descend unless it
hits the power-of-2 descent path, and then somehow miss this path as
well.

But, since "3*n+1" results in the minimum forward step, this means that >>>> one of the sides *has* to hit the power-of-2 descent cases at some
point.

The same argument would hold for the 3n-1 problem, where there is
more than one cycle.

Now, after actually testing (to confirm or disprove my imagination):

Then, went and took a shower, realized there was more to this:

At least going by "the power of imagination" (would require testing to
confirm):

"3*n-1" would be different in that it neither reliably advances the
phase angle, nor reliably moves it backwards. So, one is likely to end
up with pairs of odd-steps that loop back to the same position,
creating infinite cycles (so, it is not the same geometry).

Imagination is mostly saying there will be a lot of fairly short loops
here.

There should probably be a quite significant number of short loops in
this case.

Quite a lot of loops, but they are not quite matching the expected
pattern (the typical loop period is longer than imagination had implied).

"3*n-3" might be stable (may appear to behave similarly to 3*n+1), but
would instead advance the phase angle in the opposite direction.

But, "3*n-3" will change the ending state, so it will not land on 1
anymore.

Rather:
-a Around 84% of the time it end land on 3;
-a Around 16% of the time it end land on 0.
Give or take... ("imagination" showing around 84% or so of paths landing
on 3).

Will probably still converge, just ending up at a different end state.

Disproven:
"3*n-3" is nowhere near stable.

It does land on 3, but is prone to loops (imagination did not predict prevalent looping for this one).

All other negative offset paths would lead to underflow.

So, can probably exclude cases where the value space goes negative.
Though... "looks like" if these were included there is another cone on
the other side, if one goes this way.

Not testing for this ATM.

"3*n+3" should initially appear similar to "3*n+1", but would have
paths that ascend to infinity (paths would exist which manage to miss
all of the even steps and also miss the magic "power-of-2 auto-win"
step).

Does appear similar, but with a different ending state (lands on 3).

But, thus far the infinity-paths are being no-show.

Thus far, otherwise, 3*n+3 appears stable; at least for the range
tested. This part does match mental prediction so far (but appears more
stable than expected).

Mental model expected existence of divergent paths to exist in this
case, but thus far I am not seeing any.

If the adjustment offset is even, then the "(3*n+1)/2" trick breaks,
and also all paths that hit an odd number will ascend to infinity.

Confirmed: Even offsets do fire off towards infinity.

Further assertion 1:
All other positive offset paths should contain paths to infinity (along
with jostling around the ending states for those paths which do converge).

So, the specific end-state behavior seems unique.

Non-match:
Loops (or long cycles) are more prevalent than expected;
Paths to infinity are not.

Testing +5 and +7, seeing a whole lot of loops, not so many divergent paths.

Thus far, divergent paths have only appeared in one of the expected
scenarios.

Granted, it is possible that what my imagination was predicting as
divergent paths are instead manifesting as long-period cycles (probably because they are hitting too many even-numbered spots to achieve
"sufficient escape velocity").

May need to add something to try to differentiate loops from "failed
escape paths" where it just skips around, but doesn't represent an
actual loop.

Further assertion 2:
-a "3*n+1" may be unique in this space.

Or, possibly the only one that reliably converges, lands on 1, and does
not contain cycles.

Thus far, it is looking like "3*n+1" is likely fairly unique here.

Granted, may make sense to check/confirm, if these don't match up I
might need to revisit my mental model.

Still TODO, but for me, shower seemed like a bigger immediate priority.

So, yeah, I guess I may report back later as to whether or not the
actual math behaves the same way as it does in my imagination.

Not quite as close of a match as I would like.

So, seems my imagination model is still subject to doubt in this case...

Given my seeming inability to sufficiently accurately predict the
behavior of adjusting the offset to a sufficient degree of accuracy; it
is possible my interpretation of the "3*n+1" case may not be
sufficiently accurate.

Nothing wildly different from my imagination happened, but it is still
not quite close enough it would seem.

...

--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue Feb 24 11:16:40 2026

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> schrieb:

"3*n-1" would be different in that it neither reliably advances the
phase angle, nor reliably moves it backwards. So, one is likely to end
up with pairs of odd-steps that loop back to the same position, creating infinite cycles (so, it is not the same geometry).

What is this phase angle? I am only familiar with that term for
complex numbers.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Tue Feb 24 13:19:59 2026

From Newsgroup: comp.arch

On 2/24/2026 5:16 AM, Thomas Koenig wrote:

BGB <cr88192@gmail.com> schrieb:

"3*n-1" would be different in that it neither reliably advances the
phase angle, nor reliably moves it backwards. So, one is likely to end
up with pairs of odd-steps that loop back to the same position, creating
infinite cycles (so, it is not the same geometry).

What is this phase angle? I am only familiar with that term for
complex numbers.

I had noted that it follows a pattern, but primarily if one splits it up
into a spiral that itself resembles a floating point number.

Though, splitting it into real and imaginary components could also make
sense as well, but I was thinking of something more like polar
coordinates though.

So, say, level=exponent, phase=mantissa. But, differing some in that in
this context it is more continuous (like a log scale number), but some
of the behavioral patterns are more like those of a floating point number.

So, say:
level ~= log2(n)
phase ~= n/(2**floor(level))
So, say, imagine it represents a value between 1.0 and 1.9999...

Each crossing of a power of 2 is an immediate drop to the "win".
Each odd step will either land on a spot that is equal (and drops a
level) or increments the phase (or, roughly like incrementing the ULP in
a floating point number, if the size of the mantissa were proportional
to the value of the exponent).

It seems, the behavior only manifests in the desired ways if:
Relative phase increases for each odd step;
Increases by the minimum possible amount
(what results from the +1 adjustment).

As noted, the phase effectively exists on two points on the circle at
any moment:
phaseA = phase
phaseB = phase + 0.5

Or, if one wants to see it as a circle in radians, could be like:
angle=(phase-1.0)*(2*PI);
Or, in degrees:
angle=(phase-1.0)*(360);
Phase0 and Phase1 are always at 180 degrees relative to each other.

And, at each odd step, these two points move very slightly on the
circle. The relative angle relative to each side of the circle, relative
to the starting point, typically remaining small (relative to the
starting value).

Overall alignment of the two points can be almost anywhere in the
circle, except that it can't cross the point at 0 degrees (or
powers-of-2) because these drop straight to 1.

Or, in polar coordinate terms:
rho = level
phi = angle

My previous attempt at a graph though is wonky in that the numbers are following a log spacing around the circle, rather than an linear
spacing. May need to fix this, since the explanation of the two points
being 180 degrees out of phase looks wonky if the diagram doesn't show
these points as 180 degrees apart.

Failing to increase leads to loops (leading to the same spot).
Trying to go backwards leads to loops and instabilities (divergent
paths, or, "more backwards" leads into negative-land).
Trying to go forwards more than the minimum amount also leads to
instabilities (it results in it stepping over downwards paths).

I initially thought these instabilities would head towards infinity, but
it seems more like they just head to a larger orbit and get stuck
bouncing around the circle. I would need to do more looking into it to
figure out whether they get stuck in a loop, or continue to climb (but
more gradually). My mental model implies that these paths should
continue to climb, but that increasing the constant bias possibly causes
them to ascend faster.

If they are more prone to forming actual closed loops (rather than
climbing indefinitely), this would be inconsistent with my mental model,
but seems I would need to do more testing here.

Though, loops can form if the ascent is slow enough that it manages to complete a full rotation of the circle without escaping to a higher
orbit (if it manages to escape the original orbit without forming a
loop, it should trend upwards to infinity).

Once the relative phase (from the starting point) exceeds 180 degrees
(or the two points covering a full circle), if the path has not ascended
to a higher level, a closed loop will form.

But, as noted, with the +1 step, the above scenario (for loop forming)
becomes impossible.

So, in this case, the "3*n+1" step would appear to be uniquely stable in
this particular configuration.

--- Synchronet 3.21b-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 26 01:27:54 2026

From Newsgroup: comp.arch

David Brown <david.brown@hesbynett.no> posted:

On 22/02/2026 19:38, BGB wrote:

I was initially seeing it more like a very large hash table.
But, I have since realized it is not a hash table, it is a more
organized pattern.

As far as we know, the result of apply the Collatz function (calculating
the number of steps to reach 1, if it ever does) follows a pattern statistically but not any clear pattern for individual values. There
are a lot of mathematical functions like that - a good example being the number of prime factors in a number. You can draw graphs and visually
see nice patterns - some of which can be proven, others are still
unproven and may not apply over a big enough scale. Sometimes there are tricks or hints to ways to improve your chances when guessing which
values might be outliers.

Given by definition: even: n = n/2
odd: n = 3|un+1 {which is even}

The most one can get out of an iteration is n|u1.5 {for large n} while
the most one can loose from an iteration is n/2

So, there will be vanishingly small numbers of n which see the odd
iteration enough times to out weight the even iterations. Thus one
would expect the conjecture is true.
--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Thu Feb 26 09:21:11 2026

From Newsgroup: comp.arch

On 26/02/2026 02:27, MitchAlsup wrote:

David Brown <david.brown@hesbynett.no> posted:

On 22/02/2026 19:38, BGB wrote:

I was initially seeing it more like a very large hash table.
But, I have since realized it is not a hash table, it is a more
organized pattern.

As far as we know, the result of apply the Collatz function (calculating
the number of steps to reach 1, if it ever does) follows a pattern
statistically but not any clear pattern for individual values. There
are a lot of mathematical functions like that - a good example being the
number of prime factors in a number. You can draw graphs and visually
see nice patterns - some of which can be proven, others are still
unproven and may not apply over a big enough scale. Sometimes there are
tricks or hints to ways to improve your chances when guessing which
values might be outliers.

Given by definition: even: n = n/2
odd: n = 3|un+1 {which is even}

The most one can get out of an iteration is n|u1.5 {for large n} while
the most one can loose from an iteration is n/2

So, there will be vanishingly small numbers of n which see the odd
iteration enough times to out weight the even iterations. Thus one
would expect the conjecture is true.

Hand-waving arguments like that are far from mathematical proofs. But certainly that is the logic that suggests it makes more sense to work
towards proving the conjecture, rather than disproving it - most mathematicians familiar with the work done on it expect that it is true.
Working towards disproves - trying to find counter-examples, or a
proof of the existence of counter-examples - can still be helpful.
Figuring out why the attempt to disprove it failed can lead to insights.
(Simple pass/fail trial and error in itself is rarely helpful in this manner, but some ideas could come from looking at which numbers lead to unusally large growth before dropping to 1.)

Some of the interesting proofs in this are are formalisations of what
you said here - demonstrating that /if/ there are exceptions to the conjecture, then they are "vanishingly rare". That is, bounds can be
found that limit the maximum number of counter-examples in any given
interval.

Still, there is no proof that the conjecture is true. It may be one of
these statements that is true but unprovable - though of course we won't
be able to prove that it is unprovable.

And it may turn out not to be true, but counter-examples are very large.
This is not unknown in number theory conjectures. Merten's conjecture (which you can look up on Wikipedia - don't ask me to explain it,
because it is beyond me) is a disproven conjecture about natural numbers
where it is known that the first counter-example is between 10 ^ 16 and
10 ^ (8.5 * 10 ^ 18).

Certainly we can be confident that no one here is likely to find a counter-example by testing numbers beyond 2 ^ 71. But that does not
mean people can't have fun trying to do so, or thinking up new and
efficient ways to run the tests!

--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Thu Feb 26 19:31:20 2026

From Newsgroup: comp.arch

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

MitchAlsup wrote:

Robert Finch <robfi680@gmail.com> posted:

On 2026-02-15 7:14 a.m., David Brown wrote:

On 15/02/2026 09:58, Robert Finch wrote:

I wonder how difficult it would be to implement a parallel tester in >>>>> an FPGA. It looks simple enough and there are hundreds of DSP blocks >>>>> available to use. Could test in blocks of hundreds of numbers at a time. >>>>> Running at 100 MHz * 200 tests at a time = how long to get to 2^71?

One of the nice things about this is that you don't need DSP blocks - it >>> Yeah, I figured that out after coding. 3n+1 = n+n+n+1

3n+1 = (n<<1)+n+1

one add with carry in and n gets wired in directly and also shifted
up by 1 (no logic shifter).

Even better:

(3n+1)/2 = n + (n>>1) + 1

I.e two iterations for the cost of one.

For b = n + (n>>1) + 1, for odd n, a bit of a micro-optimization is
possible because lowest-order bit of the result equals bit number
1 from n.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	59
Nodes:	6 (0 / 6)
Uptime:	02:06:24
Calls:	810
Files:	1,287
Messages:	200,610

A useless machine

Who's Online

System Info