• Bug#1105019: sbuild: source.changes includes binary build info

    From Guillem Jover@21:1/5 to Holger Levsen on Wed May 14 11:10:01 2025
    XPost: linux.debian.maint.dpkg

    Hi!

    On Tue, 2025-05-13 at 12:58:30 +0000, Holger Levsen wrote:
    On Tue, May 13, 2025 at 02:24:38PM +0200, Guillem Jover wrote:
    We have had reproducible source packages (barring OpenPGP signatures in
    the .dsc files) since pretty much the same time dpkg-deb gained support

    have you actually tried that?

    Sure, I'd like to assume at the time this got implemented :), and also
    as part of every dpkg release:

    https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/build-aux/gen-release#n147

    Also ISTM that reproducibility of source packages is easier to proof
    (at least from the toolchain PoV), than for binary packages, because
    most of the generation is driven by the toolchain itself (as seen from
    the commits I referenced in dpkg). The only variable and/or potentially problematic part is the «debian/rules clean» and whether it has side
    effects that could affect that generation.

    A current test could be something like:

    ,---
    $ apt source dpkg
    $ sq verify --cleartext dpkg_1.22.18.dsc | head -n-1 > dpkg-orig.dsc
    $ cd dpkg-1.22.18
    $ dpkg-buildpackage -us -uc -S
    $ cd ..
    $ diff -u dpkg-orig.dsc dpkg_1.22.18.dsc && echo reproduced source
    reproduced source
    `---

    why do you think they are important?

    For QA alone this seems important (test suites for example), but in a security context, to me this seems like a rather important part TBH,
    the foundation on which binary package reproducibility is sitting. More
    so in scenarios such as the xz attack for example. Reviewing diffoscope differences is very helpful, but in the end we need to review and modify the sources, from which the binaries get derived. :)

    obviously I agree that being able to reproduce the content would be nice, however in our tests years ago, not even that was possible, yet alone
    bit by bit (thus including timestamps).

    If you recall the specifics, I'd be curious to hear them!

    I guess someone would need to actually investigate some hundred packages today, to see how things are really today.

    Perhaps my statements were sloppy though. When I said reproducible, I
    meant that the toolchain can produce them, assuming the source package
    itself does not get in the way via «debian/rules clean». I didn't mean
    we have 100% coverage on the Debian archive for example, where as you
    point out we (well someone :) would need to practically check whether
    that's the case. My assumption is that most would do, but I think it's realistic to expect that we might find a number of packages were
    «debian/rules clean» affects the source generation.

    I think whether we can reproduce the same source after a full build
    (so the equivalent of a twice in a row build) might perhaps be more
    challenging (and I'd expect less reproducibility there), but for a
    single download source + full build, we are only concerned about the
    «clean» target, as the source generation is performed as the first
    thing.

    OTOH, I think the current reproducible infra has probably all the
    data, and it might just be a matter of checking whether the unsigned
    *.dsc (from build-a and build-b) match? :)

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Holger Levsen@21:1/5 to Guillem Jover on Fri May 16 14:10:02 2025
    XPost: linux.debian.maint.dpkg

    hi,

    On Wed, May 14, 2025 at 10:56:41AM +0200, Guillem Jover wrote:
    Sure, I'd like to assume at the time this got implemented :), and also
    as part of every dpkg release:
    https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/build-aux/gen-release#n147

    oh nice!

    I guess someone would need to actually investigate some hundred packages today, to see how things are really today.
    Perhaps my statements were sloppy though. When I said reproducible, I
    meant that the toolchain can produce them, assuming the source package
    itself does not get in the way via «debian/rules clean». I didn't mean
    we have 100% coverage on the Debian archive for example, where as you
    point out we (well someone :) would need to practically check whether
    that's the case. My assumption is that most would do, but I think it's realistic to expect that we might find a number of packages were «debian/rules clean» affects the source generation.

    I've just checked devscripts and developers-reference, and much to my
    surprise their source packages indeed built bit by bit identical:

    $ diffoscope p1/developers-reference_13.19_source.changes p2/developers-reference_13.19_source.changes
    --- p1/developers-reference_13.19_source.changes
    +++ p2/developers-reference_13.19_source.changes
    ├── Files
    │ @@ -1,4 +1,4 @@

    │ 6c2a48c479ecd9d4710b64549f8ef44a 1644 doc optional developers-reference_13.19.dsc
    │ 283e1516834500ab48daf62c74714af2 575920 doc optional developers-reference_13.19.tar.xz
    │ - 3afde36f59e56164068ad521f11bc60a 6057 doc optional developers-reference_13.19_source.buildinfo
    │ + e3d438ba597ef522c68b9a730a7b32d4 6057 doc optional developers-reference_13.19_source.buildinfo
    ├── developers-reference_13.19_source.buildinfo
    │ ├── Build-Date
    │ │ @@ -1 +1 @@
    │ │ -Fri, 16 May 2025 11:54:47 +0000
    │ │ +Fri, 16 May 2025 11:55:12 +0000


    I think whether we can reproduce the same source after a full build
    (so the equivalent of a twice in a row build) might perhaps be more challenging (and I'd expect less reproducibility there),

    yes, me too, but that's not how source packages are build for real. :)

    but for a
    single d