• RFH: manually rebuild pytorch-cuda on ppc64el and upload binaries

    From M. Zhou@21:1/5 to All on Sun Jun 15 03:00:01 2025
    Hi folks,

    I expired my access to ppc64el since CUDA (>> 12.4) no longer supports
    ppc64el for the trixie+1 cycle. But the recent CUDA 12.2->12.4 transition requires me to rebuild pytorch-cuda, while I've already lost access.

    The help I need is pretty simple -- manuually rebuild pytorch-cuda and
    upload the resulting binaries. Note the building process
    involves two major non-free dependencies:

    (1) nvidia-cuda-toolkit: from non-free section
    (2) nvidia-cudnn: this is my installation script to download binary
    blobs during postinst.

    They are the direct reason why XS-Autobuild and porterbox do not work.

    Steps
    =====

    1. get the source of pytorch-cuda, make sure version is 2.6.0+dfsg-7

    apt source pytorch-cuda

    2. do the manual binNMU with sbuild

    sbuild --no-clean -c unstable-ppc64el-sbuild \
    --build=ppc64el --arch=ppc64el \
    --make-binNMU="Rebuild against CUDA 12.4." \
    -m "your name <your email>" \
    pytorch-cuda_2.6.0+dfsg-7.dsc -d sid

    3. sign the built packages and upload

    debsign pytorch-cuda_2.6.0+dfsg-7+b1_ppc64el.changes
    dput ftp-master pytorch-cuda_2.6.0+dfsg-7+b1_ppc64el.changes


    Parallelism and RAM
    ===================

    On amd64/ppc64el building pytorch-cuda needs 4GB per job to avoid
    OOM during parallel link. On arm64 it requires 8GB per job. It is
    OK to allocate a large swap as it is largely used to counter the
    RAM spikes during parallel linker invokes.

    I have already done the amd64 rebuild: https://buildd.debian.org/status/package.php?p=pytorch%2dcuda

    My arm64 rebuild is on the way but it will take roughly one day
    with my raspberry pi 5. If you have a stronger arm64 device,
    feel free to help the rebuild and upload before I do.
    Note, arm64 needs roughly 8GB RAM/swap per job to avoid OOM.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From M. Zhou@21:1/5 to M. Zhou on Wed Jun 18 05:30:01 2025
    This is no longer needed. The ppc64el nvidia driver is gone from testing,
    which means the dependency libcuda1 can no longer be satisfied.

    On Sat, 2025-06-14 at 20:51 -0400, M. Zhou wrote:

    The help I need is pretty simple -- manuually rebuild pytorch-cuda and
    upload the resulting binaries. Note the building process
    involves two major non-free dependencies:

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)