• Bug#1091858: zstd: -9 SIGILLs on mips64el

    From Peter Pentchev@21:1/5 to All on Fri Jan 3 00:30:01 2025
    On Thu, Jan 02, 2025 at 09:06:11PM +0100, Mateusz Jończyk wrote:
    Hello,

    I also hit this bug on 32-bit mipsel, on the Malta platform in QEMU.

    I used images from https://ftp.debian.org/debian/dists/Debian12.8/main/installer-mipsel/current/images/malta/netboot/

    Command line:

    qemu-system-mipsel     -cpu    24Kc     -M      malta     -m      512          -kernel debian12.8/installer-mipsel/malta/vmlinuz-6.1.0-27-4kc-malta     -initrd
    debian12.8/installer-mipsel/malta/initrd.gz  -hda /media/1T-data/virtual_machines/debian_mips/hda.raw    -append "root=/dev/sda1 nokaslr"     -nographic
    [snip]

    I have run gdb and the offending instruction is in the ZSTD_RowFindBestMatch function and it is the "prefx" instruction.

       0x555b02f8 <+1160>:    addiu    v0,v0,31
       0x555b02fc <+1164>:    andi    v0,v0,0xf
       0x555b0300 <+1168>:    sll    v0,v0,0x2
       0x555b0304 <+1172>:    addu    v0,a3,v0
       0x555b0308 <+1176>:    lw    v0,0(v0)
       0x555b030c <+1180>:    b    0x555b0368 <ZSTD_RowFindBestMatch_noDict_5_4+1272>
       0x555b0310 <+1184>:    move    t7,v0
    0x555b0314 <+1188>:    prefx    0x6,t7(s5)
       0x555b0318 <+1192>:    subu    v0,a1,v0
       0x555b031c <+1196>:    sw    t7,0(a2)
       0x555b0320 <+1200>:    addiu    t9,a0,-1
       0x555b0324 <+1204>:    and    a1,a1,v0
       0x555b0328 <+1208>:    and    a0,a0,t9

    It appears that this instruction requires a floating-point coprocessor and is a CP1X instruction.
    It is used to prefetch locations from memory.

    Thanks a lot for your analysis! (and to наб for reporting the problem with lots of
    detail, too)

    So, hm, if this is a prefetch instruction, is it possible that it would correspond to
    one of the PREFETCH_L1() invocations, either in ZSTD_RowFindBestMatch itself, or in
    ZSTD_row_prefetch()?

    https://sources.debian.org/src/libzstd/1.5.4%2Bdfsg2-5/lib/compress/zstd_lazy.c/#L1139

    https://sources.debian.org/src/libzstd/1.5.4%2Bdfsg2-5/lib/compress/zstd_lazy.c/#L823

    If I'm reading the conditional compilation directives right, for GCC >= 4
    the PREFETCH_L1() macro would be left as an exercise to the compiler... erm,
    I mean, would be defined as __builtin_prefetch():

    https://sources.debian.org/src/libzstd/1.5.4%2Bdfsg2-5/lib/common/compiler.h/#L119

    And in the build log for libzstd-1.5.4+dfsg2-5 for mipsel, it seems that this particular file, zstd_lazy.c, was not compiled with any special flags:

    https://buildd.debian.org/status/fetch.php?pkg=libzstd&arch=mips64el&ver=1.5.4%2Bdfsg2-5&stamp=1679182427&raw=0

    CC obj/conf_d0b7c101029993bfb103f90ea5393d0b/zstd_lazy.o
    mips64el-linux-gnuabi64-gcc -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=.
    -fstack-protector-strong -Wformat -Werror=format-security -DBACKTRACE_ENABLE=0
    -Wa,--noexecstack -Wdate-time -D_FORTIFY_SOURCE=2 -DXXH_NAMESPACE=ZSTD_ -DDEBUGLEVEL=0
    -DZSTD_LEGACY_SUPPORT=5 -DZSTD_MULTITHREAD -DZSTD_GZCOMPRESS -DZSTD_GZDECOMPRESS
    -DZSTD_LZMACOMPRESS -DZSTD_LZMADECOMPRESS -DZSTD_LZ4COMPRESS -DZSTD_LZ4DECOMPRESS
    -DZSTD_LEGACY_SUPPORT=5 -c -MT obj/conf_d0b7c101029993bfb103f90ea5393d0b/zstd_lazy.o
    -MMD -MP -MF obj/conf_d0b7c101029993bfb103f90ea5393d0b/zstd_lazy.d
    -o obj/conf_d0b7c101029993bfb103f90ea5393d0b/zstd_lazy.o
    ../lib/compress/zstd_lazy.c

    [snip]
    Indeed, when using the 24Kf variant (qemu-system-mipsel -cpu 24Kf), zstd -9 works.

    The question is why the Linux kernel's math-emu module (which is compiled in and enabled) didn't catch and emulate it.

        root@mateusz-debian-mips:/sys/kernel/debug/mips# cat fpuemustats_clear     root@mateusz-debian-mips:/sys/kernel/debug/mips# zstd -9 </etc/fstab >/dev/null
        Caught SIGILL signal, printing stack:
        Illegal instruction
        root@mateusz-debian-mips:/sys/kernel/debug/mips# grep -r . fpuemustats/*
        fpuemustats/branches:11
        fpuemustats/cp1ops:44
        fpuemustats/cp1xops:1
        fpuemustats/ds_emul:1
        fpuemustats/emulated:535
        fpuemustats/errors:0
        fpuemustats/ieee754_inexact:4
        fpuemustats/ieee754_invalidop:0
        fpuemustats/ieee754_overflow:0
        fpuemustats/ieee754_underflow:0
        fpuemustats/ieee754_zerodiv:0
        [...]

    So yeah, does this look like a qemu bug, or a compiler bug? I'm still
    not completely sure, but from the source code it does not really
    seem to be a libzstd bug - it leaves __builtin_prefetch() to GCC...

    Would it be possible for somebody with access to hardware to
    test it on real hardware, not in qemu?

    G'luck,
    Peter

    --
    Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
    PGP key: https://www.ringlet.net/roam/roam.key.asc
    Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmd3HmwACgkQZR7vsCUn 3xPyeg/8D2htxm/o1xJVlZ5moMxA3i7saUlJbcVxpYJkBPwmKvAIDqlm4kTkSZFp qD/lECLKOltRqc+prwpdEuVl4HrqqZ7BLIDE9qYXwfiWYdZ0pD4+GP1D2MH/vJll sKyl7OKQABoH/V0byrzkpzDvKYtVCyMke7aH36SV/WTyLqDbxONsXPJk3fF6OU14 b9Sm5pobxJzX2lXMre32Wx6L976a66BkkjiCPkk9QkVPFxfITup+qzE+hb8OPnh1 KLeoTMQECxTBdQSjlSJBwovMQd3LC6Nvdvs+0SFWpfwnkmCxmij2NGn7BdSmFSOx qCn6JBRTlDDwtllU9LWXgUGA7LGcN7g2R5gGaxDjoU7TKOs3BlmKEpyA4svjFUwM vapDgSlglVlKvFh2msjBePAH2ZvkWs8CS8a4Bdm1oIGwj6qY/JA3PftVas+hWCIo pshu0nWhSWfZadWamgODnHiSavsG7WDyJT/hP+AGkFtYcHc0IFC/MAB83XB8Adnd +7ro1V3LiCTXlCY1kYQ24JVTyL9CwHjOnUYM5pbNA9zZYvykIfku5rJWhHAgojSp XFJyA8lRVLsRysIelM/6pHqJz5R2Joczjv+B47wTmTHEPZFDj/ogYG9tJl2I9VOK +947QFUUcclPFpZMWtEIpHSPjh17MtTImjkZleuPpMqjKIxl/AA=
    =0ue5
  • From =?UTF-8?Q?Mateusz_Jo=C5=84czyk?=@21:1/5 to All on Fri Jan 3 08:00:01 2025
    Dnia 3 stycznia 2025 00:17:11 CET, Peter Pentchev napisał/a:

    So yeah, does this look like a qemu bug, or a compiler bug? I'm still
    not completely sure, but from the source code it does not really
    seem to be a libzstd bug - it leaves __builtin_prefetch() to GCC..

    More than anything, I suspect it to be a kernel bug. GCC already emits other floating point instructions on this arch and the kernel emulates them when the fpu is not available.

    There is some mention of prefx in the code:

    <https://elixir.bootlin.com/linux/v6.12.6/source/arch/mips/math-emu/cp1emu.c#L1664>

    Greetings,
    Mateusz

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)