• Nice application of that hash bisection debugging approach.

    From Kaz Kylheku@643-408-1753@kylheku.com to comp.compilers on Fri Jul 4 08:27:04 2025
    From Newsgroup: comp.compilers

    I'm doing compiler work at the moment and have one of those bugs
    where an optimization miscompiles something in the compiler which
    then shows up as a breakage elsewhere.

    Because this is Lisp, it took literally two or three minutes to whip up the hash bisection technique.

    In the Lisp compiler, the basic unit of file compilation is the
    top-level form.

    I defined three dynamically scoped variables in the compiler,
    which are shown with the final values:

    (defvar *cmask* #b1111111)
    (defvar *cval* #b0101100)
    (defvar *chash*)

    *chash* is bound to the equal hash calculated over the top-level form
    being compiled.

    E.g. if the form were (+ 2 2), the hash would be calculated
    as:

    2> (hash-equal '(+ 2 2))
    192681593

    and that would be the value of *chash*.

    Then whenever the expression

    (eql (logand *chash* *cmask*) *cval*)

    is true, the optimization is enabled. Also, when that same expression
    is true, a portion of top-level form is printed: enough to identify it.

    I quickly arrived at the above values. One form is printed,
    and that's the one mistranslated:

    $ make
    TXR stdlib/optimize.tl -> stdlib/optimize.tlo
    TXR stdlib/compiler.tl -> stdlib/compiler.tlo
    (defun compile-file-conditionally
    (in-path out-path
    test-fn))
    TXR stdlib/asm.tl -> stdlib/asm.tlo
    ./txr: unhandled exception of type error:
    ./txr: slot: #<struct-type assembler> has no slot named asm
    make: failing command:
    ./txr --in-package=sys --compile=stdlib/asm.tl:stdlib/asm.tlo.tmp
    Makefile:195: recipe for target 'stdlib/asm.tlo' failed
    make: *** [stdlib/asm.tlo] Error 1

    So the function compile-file-condtionally is the mistranslated one.

    When we flip the top masked bit in *cval* from 0 to 1:

    (defvar *cmask* #b1111111)
    (defvar *cval* #b1101100)
    (defvar *chash*)

    then a different top-level form is identified and targeted for
    the optimization, a function called rewrite in the optimizer, and the compilation succeeds:

    $ make
    TXR stdlib/optimize.tl -> stdlib/optimize.tlo
    (defun rewrite
    (fun list out))
    TXR stdlib/compiler.tl -> stdlib/compiler.tlo
    TXR stdlib/asm.tl -> stdlib/asm.tlo

    It's so easy to do it's not worth keeping the changes; next time I need
    the technique, I will whip it up from scratch again.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Kaz Kylheku@643-408-1753@kylheku.com to comp.compilers on Sat Jul 5 00:52:15 2025
    From Newsgroup: comp.compilers

    On 2025-07-04, Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    $ make
    TXR stdlib/optimize.tl -> stdlib/optimize.tlo
    TXR stdlib/compiler.tl -> stdlib/compiler.tlo
    (defun compile-file-conditionally
    (in-path out-path
    test-fn))
    TXR stdlib/asm.tl -> stdlib/asm.tlo
    ./txr: unhandled exception of type error:
    ./txr: slot: #<struct-type assembler> has no slot named asm

    What was the bug, you might want to ask? It was a little
    anticlimactic: an issue I was already anticipating in the development
    plan, but which I neglected to suspect I would need to solve at this
    stage already.

    I'm working on tail calls. Tail calls cannot occur in a scope where
    dynamic bindings (bindings of dynamically scoped variables) have been established, but my code was doing them anyway.

    Tail calls are being represented in the VM by a prefix instruction
    called "tail". It has no inputs or outputs, and is placed before
    one of several kinds of call/apply instructions.

    (Self tail calls are handled by a backward jmp. I label it with
    a "tjmp" pseudo-instruction in the compiler intermediate code;
    it gets mapped to a jmp instruction at some point.)

    Curently, the tail prefix instruction is a no-op; I'm just
    getting all else ready before implementing it.

    In the virtual machine, there are frame instructions that push new
    display frames which represent lexical variables. Maching end
    instructions dispose of them. (The compiler optimizes away frames
    very well, but cannot always do so, like when there are closures
    capturing them.)

    A tail call cannot jump out from the middle of one or more frames.
    What happens it that the compiler at first naively generates the
    tail call (tjmp or tail prefix), disregarding its location within
    frames.

    A post-processing step is applied to the code which moves the
    tail calls out of frames. It basically parses the code and inserts
    end instructions before each tail call: enough end instructions
    to terminate all the frames.

    Now the problem is that there is a frame type "dframe" for dynamic
    variables; the logic will happily move the tail call out of that one,
    too. This means that if you have

    ;; at the end of a function body:
    (let ((*dynamic-var* 42))
    ...
    (fun arg)) ;; tail call

    The *dynamic-var* binding is wrongly torn down /before/ the tail call;
    fun will not see a value of 42 of the dynamic variable *dynamic-var*, as
    if it were from this source code:

    ;; /not/ at the end of a function body:
    (let ((*dynamic-var* 42))
    ...)

    ;; at the end of function body:
    (fun arg) ;; tail call

    So that's the bug. As soon as I viewed the disassembly listing
    for the mistranslated function and saw the dframe close to before
    a tail instruction, I knew what was going on.

    The fix is simply to switch off the recursive tail position indicator in
    a context where one or more dynamic variables are bound, so no tail
    calls will be compiled there.

    (I think, it is possible to have tail calls under dynamic binding, but
    not with the deep binding strategy in this Lisp dialect/implementation.
    Dynamic binding pushes new environment objects onto an environment stack (separately from the VM dframe mechanism I talked about above) and so if
    tail recursion were perpetrated anyway, that stack would keep growing;
    it wouldn't be proper stackless recursion.)

    BTW, this bug affects the existing self-tail-calls also, a released
    feature. But there are no instances anywhere where a function binds a
    special variable and then calls itself, expecting the recursive
    self-call to see that binding. (Or no instances that break the
    bootstrapping and test suite, in any case!) It was already in my
    list of things to fix.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.21a-Linux NewsLink 1.2