On Wed, 17 May 2023 10:55:27 -0400
Russ Cox <rsc@swtch.com> wrote:
When BitKeeper came along, we added this:
--longest Restrict the deltas to those on the longest line
between the two range endpoints. Unlike a range, the
lower bound is included in the output.
because BitKeeper history is a lattice, not a straight line. So your
test points became
bk changes -nd:REV: > REVS
and binary search over those. That gets you the longest "straight" line
in the graph.
Any partial ordering can be extended to a total ordering. Why can't you
just do that instead of using 'the longest "straight" line' ?
From: Larry McVoy <lm@mcvoy.com>
Date: Mon, May 15, 2023 at 1:57 PM
Subject: Re: history of source control binary search for debugging
Good to hear from you Russ.
I'm pretty sure I "invented" that idea, which means I hadn't seen it before. >All I was doing was using the fact that binary search is log(n). And giving >non-kernel people a way to do some grunt work and then get it close and
then hand that off to the kernel people.
But note that the BUG-HUNTING was using snapshot, not source management. >BitKeeper hadn't been invented yet and the other source management systems >sucked because they were reproducible only at tagged points. At least
that was true of CVS.
Kaz Kylheku <864-11...@kylheku.com> schrieb:
Oh, stepping doubles with ++ is, I would say, not *that* uncommon.
I doubt that if such a bug were introduced as an easter egg into GCC,
it would go very long without being discovered by the FOSS distros and other downstream users.
It would very likely be caught by regression testing. GCC has an
extensive test suite, and it is a requirement that this is run
before submitting a patch.
In 2015, Keith Randall was working on a new SSA backend for the Go
compiler. The old and new backends coexisted, and we could use either
for any given function in the compilation. Keith introduced an
environment variable GOSSAHASH that specifies the last few binary
digits of the hash of function names that should use the new backend.
So GOSSAHASH=0110 means compile only those functions whose names hash
to a value with last four bits 0110. When a test is failing with the
new backend, you try GOSSAHASH=0 and GOSSAHASH=1, and then use binary
search to progressively refine the pattern, narrowing the failure down
until only a single function is being compiled with the new backend.
This was invaluable for approaching failures in large real-world tests
(tests for libraries or production code, not for the compiler) that we
had not written and did not understand. ...
It seems like the idea of binary search to narrow down a buggy
software change is so natural that there must have been many earlier
uses for debugging before 1999, especially in compilers.
Hi Russ, that's very interesting. You should consider submitting a report of the technique to CGO! The next deadline is on the 19th, but then there will be
another deadline on September 1st.
As I mentioned at the top, I am interested to hear about earlier uses
of approaches like this, or any other ideas for situations where the
approach might be applicable. Clearly the general problem has overlaps
with group testing [1], and in particular Hwang's binary search-based approach (1972) [2].
Setup:
We can (somehow) enumerate the N entities with integers, which we
express using a pure binary enumeration. Then, we can discover the
number of an erroneously processed element, one bit at a time.
Procedure:
First we treat all elements numbered ..XXXXXX0 using the new technique technique, and all others using the the old technique. If the
system fails, we know that it's the ..XXXXX0 set which has one or more
badly processed elements. Otherwise it's the other set, whose
enumerations end in a 1. Either way, we have discovered the identity of
the last bit.
Russ Cox <rsc@swtch.com> schrieb:
As I mentioned at the top, I am interested to hear about earlier uses
of approaches like this, or any other ideas for situations where the approach might be applicable. Clearly the general problem has overlaps
with group testing [1], and in particular Hwang's binary search-based approach (1972) [2].
For GCC regression-hunting, two techniques are routinely used.
One of them is the use of "gcc bisect", described at https://git-scm.com/docs/git-bisect
. If there is a failing test
case, this can (at the cost of some CPU time) usually pinpoint
offending commit.
Russ Cox <rsc@swtch.com> schrieb:
As I mentioned at the top, I am interested to hear about earlier uses
of approaches like this, or any other ideas for situations where the
approach might be applicable. Clearly the general problem has overlaps
with group testing [1], and in particular Hwang's binary search-based
approach (1972) [2].
For GCC regression-hunting, two techniques are routinely used.
One of them is the use of "gcc bisect", described at https://git-scm.com/docs/git-bisect . If there is a failing test
case, this can (at the cost of some CPU time) usually pinpoint
offending commit.
For reducing test cases, at least for C and C++, cvise https://github.com/marxin/cvise is now the mehod of choice; it is
brutally efficient at reducing test cases to an absolute minimum.
Because it can transform C and C++ programs, it can make
transformations of the program which are language-aware.
Both are tools external to the compiler.
On Sunday, May 14, 2023 at 9:20:54rC>AM UTC-7, Kaz Kylheku wrote:
(snip)
Setup:
We can (somehow) enumerate the N entities with integers, which we
express using a pure binary enumeration. Then, we can discover the
number of an erroneously processed element, one bit at a time.
Procedure:
First we treat all elements numbered ..XXXXXX0 using the new technique
technique, and all others using the the old technique. If the
system fails, we know that it's the ..XXXXX0 set which has one or more
badly processed elements. Otherwise it's the other set, whose
enumerations end in a 1. Either way, we have discovered the identity of
the last bit.
There is the assumption that only one of the N is in error.
Say we have a basket of apples with a rotten smell emanating from it.
We can subdivide it, and smell one half and the other. If both halves
smell, we know we have two or more rotten apples and they ended up
in different halves. This doesn't matter. We just pick one half and
keep subdividing.
As long as we stay on the trail of the rotten scent, we willroot
get down to one rotten apple, and we can use that apple to analyze further: what kind of mould or bacterium has infected it and so on. Probably
the other rotten apples have the same root cause. If they have different
causes, we can do another search after fixing the one root cause we havefound.
Thanks to everyone for the replies and pointers.
As far as 'git bisect' is concerned, we have found 'git bisect'
incredibly useful too, of course. It is clearly another instance of
binary search to find bugs, but it is binary search over time, while
the system I described is binary search over the program being
compiled. "What change broke the compiler?" vs "What code does that
change miscompile?". The two are independent and complementary.
On Monday, May 15, 2023 at 6:54:08rC>PM UTC-7, Kaz Kylheku wrote:
(snip)
Say we have a basket of apples with a rotten smell emanating from it.
We can subdivide it, and smell one half and the other. If both halves
smell, we know we have two or more rotten apples and they ended up
in different halves. This doesn't matter. We just pick one half and
keep subdividing.
But the algorithm described, at least as I remember it, doesn't test both halves.
because BitKeeper history is a lattice, not a straight line. So your
test points became
From: Brian Ness <bcn@bubblecreek.com>
than relocatables. Cray had its own version control system called USM
(Unicos Source Manager) which had support for atomic changes, meaning
that a set of related changes in multiple files could be encapsulated
and treated as a single thing. We called those atomic change sets, rCLmodsrCY. As far as I know, there were no other version control systems supporting that at the time.
When BitKeeper came along, we added this:
--longest Restrict the deltas to those on the longest line
between the two range endpoints. Unlike a range, the
lower bound is included in the output.
because BitKeeper history is a lattice, not a straight line. So your
test points became
bk changes -nd:REV: > REVS
and binary search over those. That gets you the longest "straight" line
in the graph.
Any partial ordering can be extended to a total ordering. Why can't you
just do that instead of using 'the longest "straight" line' ?
As to the problem of debugging by miscompiled code, this reminds
me of stories of debugging compilers by feeding them cards from
the card recycling bin. It seems that cards were especially high grade
paper and so worth keeping separate for recycling. I never actually
saw anyone do that, though.
In my younger days, some said that I was especially good at finding
compiler errors. Maybe not as good now.
I would try things that were
allowed, but others might not have thought about doing.
Only one I remember now, is a C compiler that miscompiled the ++
operator on a double variable. Allowed in C, but maybe not often used.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 59 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 19:24:50 |
| Calls: | 810 |
| Calls today: | 1 |
| Files: | 1,287 |
| D/L today: |
10 files (21,017K bytes) |
| Messages: | 193,978 |