• innd SERVER cant malloc 18446744073699012160 bytes

    From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.nntp on Mon Nov 17 08:55:42 2025
    From Newsgroup: news.software.nntp

    I have been working with Billy G., having him push articles from his pugleaf instance to an INN instance running on an OmniOS VM. Tonight innd crashed with the following:

    Nov 16 22:33:33 omnios-inn innd: [ID 608925 news.crit] SERVER cant malloc 18446744073699012160 bytes at interface.c line 622: Not enough space

    What condition could have caused innd to attempt to allocate that much memory?

    The VM has 64GB of RAM, uses CNFS and ovsqlite, and at the time of the crash was handling ~32 inbound connections. There are no newsfeeds and Perl/Python filtering are disabled.

    I found a core dump, but I don't really know what to do with it or how to use mdb effectively:

    root@omnios-inn:/usr/local/news/spool/articles# mdb core
    Loading modules: [ <libc.so>.1 <libuutil.so>.1 <libnvpair.so>.1 <libavl.so>.1 <libsmbios.so>.1 <libumem.so>.1 <ld.so>.1 ]
    $G
    C++ symbol demangling enabled
    $r
    %rax = 0x0000000000000000 %r8 = 0x0000000000000000
    %rbx = 0xfffffc7fef2e7c7d %r9 = 0xfffffc7fed982cb0
    %rcx = 0xfffffc7feec6ef6b %r10 = 0xfffffc7feec6ef6b
    %rdx = 0xfffffe59676e3800 %r11 = 0x0000000000000246
    %rsi = 0x0000000000000006 %r12 = 0x0000000000000006
    %rdi = 0x0000000000000001 %r13 = 0xfffffc7fef23fdd0
    %r14 = 0x000000000000026e
    %r15 = 0xfffffc7fef2593c0

    %cs = 0x0053 %fs = 0x0000 %gs = 0x0000
    %ds = 0x004b %es = 0x004b %ss = 0x004b

    %rip = 0xfffffc7feec6e51a <libc.so>.1`_lwp_kill+0xa
    %rbp = 0xfffffc7fffdf4950
    %rsp = 0xfffffc7fffdf4938

    %rflags = 0x00000282
    id=0 vip=0 vif=0 ac=0 vm=0 rf=0 nt=0 iopl=0x0
    status=<of,df,IF,tf,SF,zf,af,pf,cf>

    %gsbase = 0x0000000000000000
    %fsbase = 0xfffffc7fed982a40
    %trapno = 0xe
    %err = 0xfb8005f8
    $C
    fffffc7fffdf4950 <libc.so>.1`_lwp_kill+0xa()
    fffffc7fffdf4980 <libc.so>.1`raise+0x22(6)
    fffffc7fffdf49d0 <libc.so>.1`abort+0x58()
    fffffc7fffdf4a10 ~xmalloc_abort+0x75()
    fffffc7fffdf4a40 <libinn.so>.9.0.2`x_malloc+0x4f()
    fffffc7fffdf4ab0 <libinnstorage.so>.3.1.3`SMgetsub+0x1b8()
    fffffc7fffdf4b70 <libinnstorage.so>.3.1.3`SMstore+0x87()
    fffffc7fffdf4e40 ARTstore+0x43e()
    fffffc7fffdf5090 ARTpost+0x242a()
    fffffc7fffdf5430 NCproc+0x3db()
    fffffc7fffdf94e0 CHANreadloop+0x5aa()
    fffffc7fffdf9660 main+0x1089()
    fffffc7fffdf9690 _start_crt+0x87()
    fffffc7fffdf96a0 _start+0x18()
    ::stacks
    THREAD STATE SOBJ COUNT
    1 UNPARKED <NONE> 1
    <libc.so>.1`raise+0x22
    <libc.so>.1`abort+0x58
    0x41bd46
    <libinn.so>.9.0.2`x_malloc+0x4f
    <libinnstorage.so>.3.1.3`SMgetsub+0x1b8
    <libinnstorage.so>.3.1.3`SMstore+0x87
    ARTstore+0x43e
    ARTpost+0x242a
    NCproc+0x3db
    CHANreadloop+0x5aa
    main+0x1089
    _start_crt+0x87
    _start+0x18


    root@omnios-inn:/usr/local/news/spool/articles# pmap core
    core 'core' of 1248: /usr/local/news/bin/innd
    0000000000400000 236K r-x-- /usr/local/news/bin/innd
    000000000044A000 12K rw--- /usr/local/news/bin/innd
    000000000044D000 112K rw--- /usr/local/news/bin/innd
    0000000000E4E000 97024K rw--- [ heap ]
    FFFFFC7600000000 17578128K rw---* [ anon ]
    FFFFFC7A37200000 31264K rw---* [ anon ]
    FFFFFC7A3BA00000 31264K rw---* [ anon ]
    FFFFFC7A40000000 23437500K rw---* [ anon ]
    FFFFFC7FD7800000 31264K rw---* [ anon ]
    FFFFFC7FDA400000 31264K rw---* [ anon ]
    FFFFFC7FDC400000 31264K rw---* [ anon ]
    FFFFFC7FDEA00000 31264K rw---* [ anon ]
    FFFFFC7FE1400000 31264K rw---* [ anon ]
    FFFFFC7FE3400000 31264K rw---* [ anon ]
    FFFFFC7FE6E40000 4K rwx-- [ anon ]
    FFFFFC7FE7200000 31264K rw---* [ anon ]
    FFFFFC7FE9200000 31264K rw---* [ anon ]
    FFFFFC7FEBBB0000 4K rwx-- [ anon ]
    FFFFFC7FEC8F0000 4K rwx-- [ anon ]
    FFFFFC7FECC00000 2508K rw---* [ anon ]
    FFFFFC7FED390000 60K r-x-- /lib/amd64/<libvarpd.so>.1
    FFFFFC7FED3AF000 4K rw--- /lib/amd64/<libvarpd.so>.1
    FFFFFC7FED6A0000 4K r----* [ anon ]
    FFFFFC7FED710000 64K rwx-- [ anon ]
    FFFFFC7FED820000 128K rwx-- [ anon ]
    FFFFFC7FED860000 484K r-x-- /lib/amd64/<libumem.so>.1
    FFFFFC7FED8E9000 136K rw--- /lib/amd64/<libumem.so>.1
    FFFFFC7FED90B000 52K rw--- /lib/amd64/<libumem.so>.1
    FFFFFC7FED930000 64K rwx-- [ anon ]
    FFFFFC7FED950000 64K rwx-- [ anon ]
    FFFFFC7FED970000 4K rw--- [ anon ]
    FFFFFC7FED980000 24K rwx-- [ anon ]
    FFFFFC7FED990000 4K rwx-- [ anon ]
    FFFFFC7FED9A0000 12K r-x-- /usr/lib/amd64/<librename.so>.1 FFFFFC7FED9B3000 4K rw--- /usr/lib/amd64/<librename.so>.1 FFFFFC7FED9C0000 4K rwx-- [ anon ]
    FFFFFC7FED9E0000 8K r-x-- /usr/lib/amd64/<libidspace.so>.1 FFFFFC7FED9F2000 4K rw--- /usr/lib/amd64/<libidspace.so>.1 FFFFFC7FEDA00000 5012K r-x-- /usr/lib/amd64/<libpython3.13.so>.1.0 FFFFFC7FEDEF4000 840K rw--- /usr/lib/amd64/<libpython3.13.so>.1.0 FFFFFC7FEDFC6000 456K rw--- /usr/lib/amd64/<libpython3.13.so>.1.0 FFFFFC7FEE050000 376K r-x-- /lib/amd64/<libm.so>.2
    FFFFFC7FEE0BE000 20K rw--- /lib/amd64/<libm.so>.2
    FFFFFC7FEE0E0000 4K rwx-- [ anon ]
    FFFFFC7FEE100000 8K r-x-- /lib/amd64/<libsendfile.so>.1 FFFFFC7FEE112000 4K rw--- /lib/amd64/<libsendfile.so>.1 FFFFFC7FEE120000 4K rwx-- [ anon ]
    FFFFFC7FEE140000 4K r-x-- /lib/amd64/<libintl.so>.1
    FFFFFC7FEE150000 4K rwx-- [ anon ]
    FFFFFC7FEE170000 4K rwx-- [ anon ]
    FFFFFC7FEE190000 4K r-x-- /lib/amd64/<libdl.so>.1
    FFFFFC7FEE1A0000 4K rw--- [ anon ]
    FFFFFC7FEE1B0000 4K rwx-- [ anon ]
    FFFFFC7FEE1D0000 1404K r-x-- /lib/amd64/<libxml2.so>.2.13.8 FFFFFC7FEE33E000 48K rw--- /lib/amd64/<libxml2.so>.2.13.8 FFFFFC7FEE34A000 4K rw--- /lib/amd64/<libxml2.so>.2.13.8 FFFFFC7FEE360000 4K rwx-- [ anon ]
    FFFFFC7FEE370000 32K r-x-- /lib/amd64/<librcm.so>.1
    FFFFFC7FEE388000 4K rw--- /lib/amd64/<librcm.so>.1
    FFFFFC7FEE389000 4K rw--- /lib/amd64/<librcm.so>.1
    FFFFFC7FEE3A0000 4K rwx-- [ anon ]
    FFFFFC7FEE3C0000 4K rwx-- [ anon ]
    FFFFFC7FEE3D0000 24K r-x-- /usr/lib/amd64/<libexacct.so>.1 FFFFFC7FEE3E6000 4K rw--- /usr/lib/amd64/<libexacct.so>.1 FFFFFC7FEE400000 4048K r-x-- /usr/perl5/5.40/lib/i86pc-solaris-thread-multi-64/CORE/<libperl.so> FFFFFC7FEE803000 80K rw--- /usr/perl5/5.40/lib/i86pc-solaris-thread-multi-64/CORE/<libperl.so> FFFFFC7FEE817000 24K rw--- /usr/perl5/5.40/lib/i86pc-solaris-thread-multi-64/CORE/<libperl.so> FFFFFC7FEE830000 4K rwx-- [ anon ]
    FFFFFC7FEE840000 220K r-x-- /lib/amd64/<libscf.so>.1
    FFFFFC7FEE887000 8K rw--- /lib/amd64/<libscf.so>.1
    FFFFFC7FEE8A0000 4K rwx-- [ anon ]
    FFFFFC7FEE8C0000 4K rwx-- [ anon ]
    FFFFFC7FEE8D0000 204K r-x-- /usr/lib/amd64/<liblzma.so>.5.8.1 FFFFFC7FEE912000 4K rw--- /usr/lib/amd64/<liblzma.so>.5.8.1 FFFFFC7FEE930000 4K rw--- [ anon ]
    FFFFFC7FEE950000 4K rwx-- [ anon ]
    FFFFFC7FEE970000 132K r-x-- /usr/lib/amd64/<libpool.so>.1 FFFFFC7FEE9A1000 8K rw--- /usr/lib/amd64/<libpool.so>.1 FFFFFC7FEE9C0000 112K r-x-- /usr/lib/amd64/<libsmbios.so>.1 FFFFFC7FEE9EC000 4K rw--- /usr/lib/amd64/<libsmbios.so>.1 FFFFFC7FEEA00000 556K r-x-- /lib/amd64/<libnsl.so>.1
    FFFFFC7FEEA9B000 12K rw--- /lib/amd64/<libnsl.so>.1
    FFFFFC7FEEA9E000 32K rw--- /lib/amd64/<libnsl.so>.1
    FFFFFC7FEEAB0000 12K r-x-- /lib/amd64/<libavl.so>.1
    FFFFFC7FEEAC3000 4K rw--- /lib/amd64/<libavl.so>.1
    FFFFFC7FEEAD0000 32K r-x-- /lib/amd64/<libgen.so>.1
    FFFFFC7FEEAE8000 4K rw--- /lib/amd64/<libgen.so>.1
    FFFFFC7FEEAF0000 4K rwx-- [ anon ]
    FFFFFC7FEEB10000 4K rwx-- [ anon ]
    FFFFFC7FEEB30000 4K rwx-- [ anon ]
    FFFFFC7FEEB50000 1524K r-x-- /lib/amd64/<libc.so>.1
    FFFFFC7FEECDD000 48K rw--- /lib/amd64/<libc.so>.1
    FFFFFC7FEECE9000 16K rw--- /lib/amd64/<libc.so>.1
    FFFFFC7FEED00000 296K r-x-- /lib/amd64/<libdladm.so>.1
    FFFFFC7FEED5A000 24K rw--- /lib/amd64/<libdladm.so>.1
    FFFFFC7FEED70000 4K rwx-- [ anon ]
    FFFFFC7FEED90000 4K rwx-- [ anon ]
    FFFFFC7FEEDB0000 20K r-x-- /lib/amd64/<libinetutil.so>.1 FFFFFC7FEEDC5000 4K rw--- /lib/amd64/<libinetutil.so>.1 FFFFFC7FEEDD0000 8K r-x-- /lib/amd64/<libkstat.so>.1
    FFFFFC7FEEDE2000 4K rw--- /lib/amd64/<libkstat.so>.1
    FFFFFC7FEEE00000 4K rwx-- [ anon ]
    FFFFFC7FEEE10000 4K rwx-- [ anon ]
    FFFFFC7FEEE30000 28K r-x-- /lib/amd64/<libdlpi.so>.1
    FFFFFC7FEEE47000 4K rw--- /lib/amd64/<libdlpi.so>.1
    FFFFFC7FEEE50000 4K rwx-- [ anon ]
    FFFFFC7FEEE70000 100K r-x-- /lib/amd64/<libnvpair.so>.1 FFFFFC7FEEE99000 4K rw--- /lib/amd64/<libnvpair.so>.1 FFFFFC7FEEEA0000 4K rwx-- [ anon ]
    FFFFFC7FEEEC0000 24K r-x-- /lib/amd64/<libsecdb.so>.1
    FFFFFC7FEEED6000 4K rw--- /lib/amd64/<libsecdb.so>.1
    FFFFFC7FEEEF0000 4K rw--- [ anon ]
    FFFFFC7FEEF10000 20K r-x-- /lib/amd64/<libmp.so>.2
    FFFFFC7FEEF25000 4K rw--- /lib/amd64/<libmp.so>.2
    FFFFFC7FEEF40000 4K rwx-- [ anon ]
    FFFFFC7FEEF60000 4K rwx-- [ anon ]
    FFFFFC7FEEF80000 4K rwx-- [ anon ]
    FFFFFC7FEEFA0000 68K r-x-- /lib/amd64/<libmd.so>.1
    FFFFFC7FEEFC1000 4K rw--- /lib/amd64/<libmd.so>.1
    FFFFFC7FEEFE0000 4K rwx-- [ anon ]
    FFFFFC7FEF000000 48K r-x-- /lib/amd64/<libtsol.so>.2
    FFFFFC7FEF01C000 4K rw--- /lib/amd64/<libtsol.so>.2
    FFFFFC7FEF020000 4K rwx-- [ anon ]
    FFFFFC7FEF030000 4K rwx-- [ anon ]
    FFFFFC7FEF050000 4K rw--- [ anon ]
    FFFFFC7FEF060000 84K r-x-- /lib/amd64/<libsec.so>.1
    FFFFFC7FEF085000 24K rw--- /lib/amd64/<libsec.so>.1
    FFFFFC7FEF08B000 12K rw--- /lib/amd64/<libsec.so>.1
    FFFFFC7FEF0A0000 4K rwx-- [ anon ]
    FFFFFC7FEF0B0000 76K r-x-- /usr/lib/amd64/<libidmap.so>.1 FFFFFC7FEF0D3000 4K rw--- /usr/lib/amd64/<libidmap.so>.1 FFFFFC7FEF0F0000 96K r-x-- /lib/amd64/<libz.so>.1.3.1
    FFFFFC7FEF117000 4K rw--- /lib/amd64/<libz.so>.1.3.1
    FFFFFC7FEF130000 4K rwx-- [ anon ]
    FFFFFC7FEF140000 160K r-x-- /lib/amd64/<libdevinfo.so>.1 FFFFFC7FEF178000 4K rw--- /lib/amd64/<libdevinfo.so>.1 FFFFFC7FEF180000 4K rwx-- [ anon ]
    FFFFFC7FEF190000 44K r-x-- /lib/amd64/<libuutil.so>.1
    FFFFFC7FEF1AB000 4K rw--- /lib/amd64/<libuutil.so>.1
    FFFFFC7FEF1C0000 76K r-x-- /lib/amd64/<libsocket.so>.1 FFFFFC7FEF1E3000 4K rw--- /lib/amd64/<libsocket.so>.1 FFFFFC7FEF1F0000 4K rwx-- [ anon ]
    FFFFFC7FEF210000 196K r-x--
    /usr/local/news/lib/<libinnstorage.so>.3.1.3
    FFFFFC7FEF250000 4K rw---
    /usr/local/news/lib/<libinnstorage.so>.3.1.3
    FFFFFC7FEF251000 72K rw---
    /usr/local/news/lib/<libinnstorage.so>.3.1.3
    FFFFFC7FEF280000 4K rwx-- [ anon ]
    FFFFFC7FEF290000 4K rwx-- [ anon ]
    FFFFFC7FEF2B0000 4K rwx-- [ anon ]
    FFFFFC7FEF2C0000 160K r-x-- /usr/local/news/lib/<libinn.so>.9.0.2 FFFFFC7FEF2F7000 20K rw--- /usr/local/news/lib/<libinn.so>.9.0.2 FFFFFC7FEF310000 4K rwx-- [ anon ]
    FFFFFC7FEF320000 4K rwx-- [ anon ]
    FFFFFC7FEF337000 4K rwx-- [ anon ]
    FFFFFC7FEF340000 4K rwx-- [ anon ]
    FFFFFC7FEF350000 4K rwx-- [ anon ]
    FFFFFC7FEF360000 28K r-x-- /usr/local/news/lib/<libinnhist.so>.3.0.9 FFFFFC7FEF376000 4K rw--- /usr/local/news/lib/<libinnhist.so>.3.0.9 FFFFFC7FEF377000 4K rw--- /usr/local/news/lib/<libinnhist.so>.3.0.9 FFFFFC7FEF37A000 80K r---- [ anon ]
    FFFFFC7FEF390000 4K r----* [ anon ]
    FFFFFC7FEF399000 324K r-x-- /lib/amd64/<ld.so>.1
    FFFFFC7FEF3FA000 12K rwx-- /lib/amd64/<ld.so>.1
    FFFFFC7FEF3FD000 8K rwx-- /lib/amd64/<ld.so>.1
    FFFFFC7FFFDF1000 60K rw--- [ stack ]
    total 41447036K
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Mon Nov 17 11:24:12 2025
    From Newsgroup: news.software.nntp

    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

    I have been working with Billy G., having him push articles from his
    pugleaf instance to an INN instance running on an OmniOS VM. Tonight
    innd crashed with the following:

    Nov 16 22:33:33 omnios-inn innd: [ID 608925 news.crit] SERVER cant malloc 18446744073699012160 bytes at interface.c line 622: Not enough space

    What condition could have caused innd to attempt to allocate that much memory?

    There is some sort of memory or data structure corruption going on inside SMgetsub. article.groupslen is nonsense (it's a large negative number). I
    don't immediately see how that can happen, though. It could be through
    some sort of stack overwrite from a different part of the program.

    I don't know what mdb is (I've never used it). The output that it
    generated for you is pretty useless. I would use gdb to inspect the
    article data structure up the call chain and see if that provides any
    clues.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.nntp on Tue Nov 18 01:25:30 2025
    From Newsgroup: news.software.nntp

    On Nov 17, 2025 at 1:24:12rC>PM CST, "Russ Allbery" <eagle@eyrie.org> wrote:

    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

    I have been working with Billy G., having him push articles from his
    pugleaf instance to an INN instance running on an OmniOS VM. Tonight
    innd crashed with the following:

    Nov 16 22:33:33 omnios-inn innd: [ID 608925 news.crit] SERVER cant malloc
    18446744073699012160 bytes at interface.c line 622: Not enough space

    What condition could have caused innd to attempt to allocate that much
    memory?

    There is some sort of memory or data structure corruption going on inside SMgetsub. article.groupslen is nonsense (it's a large negative number). I don't immediately see how that can happen, though. It could be through
    some sort of stack overwrite from a different part of the program.

    I don't know what mdb is (I've never used it). The output that it
    generated for you is pretty useless. I would use gdb to inspect the
    article data structure up the call chain and see if that provides any
    clues.

    mdb is the default debugger on Solaris-related systems. What you see are just
    a few common commands I found, I have no real idea how to use mdb or what to look for. I can install gdb, but I don't know what to do with it to be useful either.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Tue Nov 18 15:44:13 2025
    From Newsgroup: news.software.nntp

    Hi Jesse,

    I have been working with Billy G., having him push articles from his pugleaf instance to an INN instance running on an OmniOS VM. Tonight innd crashed with
    the following:

    Nov 16 22:33:33 omnios-inn innd: [ID 608925 news.crit] SERVER cant malloc 18446744073699012160 bytes at interface.c line 622: Not enough space

    Is the INN instance running in slave mode?
    Do you happen to have the culprit article which made INN crash?

    As Russ noted, article.groupslen is corrupted. Digging a bit at where
    it can be set, I see in innd/art.c:

    if (innconf->storeonxref) {
    arth.groups = data->Replic;
    arth.groupslen = data->ReplicLength;
    }

    I am wondering whether ARTxrefslave does the right thing if the Xref
    header field is badly formatted.
    For instance "Xref: group:10\r\n" without a server name.

    if (!HDR_FOUND(HDR__XREF))
    return false;
    /* skip server name */
    if ((p = strpbrk(HDR(HDR__XREF), " \t\r\n")) == NULL)
    return false;
    /* in case Xref is folded */
    while (*++p == ' ' || *p == '\t' || *p == '\r' || *p == '\n')
    ;
    if (*p == '\0')
    return false;
    data->Replic = p;
    data->ReplicLength = HDR_LEN(HDR__XREF) - (p - HDR(HDR__XREF));

    I think p would point at the beginning of the following header field, or
    the beginning of the body, and then:
    data->ReplicLength = strlen("group:10") - strlen("group:10\r\n") = -2

    How p is set should be more robust, or we could test whether data->ReplicLength is negative, and return false in that case.
    Any thoughts or confirmation the current parsing looks wrong?


    If the INN instance is not running in slave mode, well, there's another
    issue to find...
    --
    Julien |eLIE

    -2-aWhenever you set out to do something, something else must be done
    first.-a-+ (Murphy's Fourth Corollary)

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Tue Nov 18 11:08:39 2025
    From Newsgroup: news.software.nntp

    Julien |eLIE <iulius@nom-de-mon-site.com.invalid> writes:

    I am wondering whether ARTxrefslave does the right thing if the Xref
    header field is badly formatted.
    For instance "Xref: group:10\r\n" without a server name.

    if (!HDR_FOUND(HDR__XREF))
    return false;
    /* skip server name */
    if ((p = strpbrk(HDR(HDR__XREF), " \t\r\n")) == NULL)
    return false;
    /* in case Xref is folded */
    while (*++p == ' ' || *p == '\t' || *p == '\r' || *p == '\n')
    ;
    if (*p == '\0')
    return false;
    data->Replic = p;
    data->ReplicLength = HDR_LEN(HDR__XREF) - (p - HDR(HDR__XREF));

    I think p would point at the beginning of the following header field, or
    the beginning of the body, and then:
    data->ReplicLength = strlen("group:10") - strlen("group:10\r\n") = -2

    Oh, good catch! I was staring at exactly that piece of code because it
    looked rather suspicious to me on first glance with all the pointer math,
    but then I couldn't see how it failed.

    I think the problem you identify could be fixed with a check like:

    if (p > HDR(HDR__XREF) + HDR_LEN(HDR__XREF))
    return false;

    after the strpbrk line. This still assumes that the full article being processed is nul-terminated, but I think that is indeed the case.

    I am still a bit confused, though, because the value of groupslen doesn't
    seem to be something sensible like -2. If I'm doing the math correctly,
    it's something more like -10539456, which is a hard value to justify if
    the problem is the above bug. strpbrk would terminate at the first
    newline, and that's a lot of characters to go without encountering a
    newline. Such an article is very syntactically invalid and I would have expected it to fail other checks.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.nntp on Tue Nov 18 19:41:39 2025
    From Newsgroup: news.software.nntp

    On Nov 18, 2025 at 8:44:13rC>AM CST, "Julien |eLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Jesse,

    I have been working with Billy G., having him push articles from his pugleaf >> instance to an INN instance running on an OmniOS VM. Tonight innd crashed with
    the following:

    Nov 16 22:33:33 omnios-inn innd: [ID 608925 news.crit] SERVER cant malloc
    18446744073699012160 bytes at interface.c line 622: Not enough space

    Is the INN instance running in slave mode?
    Do you happen to have the culprit article which made INN crash?

    If the INN instance is not running in slave mode, well, there's another
    issue to find...

    It is not running in slave mode. Not sure if this would point to anything meaningful, but these are two parameters that I've modified in inn.conf:

    datamovethreshold: 1048576
    icdsynccount: 300000

    For reasons I haven't dug deeper on, INN is *painfully* slow to write to the history file on ZFS filesystems (FreeBSD and illumos distros tested, have briefly tested on Linux with XFS and it was much better).

    Leaving icdsynccount at the default of 10 results in horrid throughput. No matter the number of connections throughput doesn't go much above 1-3Mbps. I played around *a lot* with various values and found the best throughput above 100000. At the current value, we get well over 100Mbps throughput. Obviously this is 'risky' but in this use-case we're just playing around.

    We've ran through a few cycles of Billy feeding over 1 billion articles to
    this instance, and this was the first time it crashed. That said, this round I had him add some additional newsgroups, so its possible there could be a very strangely formatted article.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From richard@richard@cogsci.ed.ac.uk (Richard Tobin) to news.software.nntp on Tue Nov 18 20:04:27 2025
    From Newsgroup: news.software.nntp

    In article <87tsyrf9ag.fsf@hope.eyrie.org>,
    Russ Allbery <eagle@eyrie.org> wrote:

    I am still a bit confused, though, because the value of groupslen doesn't >seem to be something sensible like -2. If I'm doing the math correctly,
    it's something more like -10539456, which is a hard value to justify if
    the problem is the above bug.

    The number in hex is 0xffffffffff5f2e40, and the non-f part
    corresponds to the printable characters _.@ which is hardly conclusive
    but might be due to being overwritten by a malformed mail address.

    Are there any other examples?

    -- Richard
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.nntp on Tue Nov 18 20:12:07 2025
    From Newsgroup: news.software.nntp

    On Nov 18, 2025 at 2:04:27rC>PM CST, "Richard Tobin" <Richard Tobin> wrote:

    In article <87tsyrf9ag.fsf@hope.eyrie.org>,
    Russ Allbery <eagle@eyrie.org> wrote:

    I am still a bit confused, though, because the value of groupslen doesn't
    seem to be something sensible like -2. If I'm doing the math correctly,
    it's something more like -10539456, which is a hard value to justify if
    the problem is the above bug.

    The number in hex is 0xffffffffff5f2e40, and the non-f part
    corresponds to the printable characters _.@ which is hardly conclusive
    but might be due to being overwritten by a malformed mail address.

    Are there any other examples?

    -- Richard

    If it could be helpful to anyone who wants to take a look, I've placed the
    core dump that was generated at https://usenet.blueworldhosting.com/core

    # du -A core
    119533 core

    # sha256sum core 04fd0e53ca39252d456d81fd128609fbb8b667987e2c6b358c3fa0b17fb7ee31 core

    Unfortunately with the buffered 'news' logfile, I don't think I can determine exactly which article may have caused the crash.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From richard@richard@cogsci.ed.ac.uk (Richard Tobin) to news.software.nntp on Tue Nov 18 21:07:25 2025
    From Newsgroup: news.software.nntp

    In article <10fijun$ac8$1@nnrp.usenet.blueworldhosting.com>,
    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:

    If it could be helpful to anyone who wants to take a look, I've placed the >core dump that was generated at https://usenet.blueworldhosting.com/core

    I started to download it, but the connection is very slow and it would
    take over an hour! Can you gzip it? (Core files often compress very
    well.)

    -- Richard
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.nntp on Tue Nov 18 23:13:12 2025
    From Newsgroup: news.software.nntp

    On Nov 18, 2025 at 3:07:25rC>PM CST, "Richard Tobin" <Richard Tobin> wrote:

    In article <10fijun$ac8$1@nnrp.usenet.blueworldhosting.com>,
    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:

    If it could be helpful to anyone who wants to take a look, I've placed the >> core dump that was generated at https://usenet.blueworldhosting.com/core

    I started to download it, but the connection is very slow and it would
    take over an hour! Can you gzip it? (Core files often compress very
    well.)

    -- Richard

    I've been having provider issues and awaiting an appointment to swap equipment so the connection has been unreliable lately.

    Try: https://usenet.blueworldhosting.com/core.gz
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Richard Kettlewell@invalid@invalid.invalid to news.software.nntp on Tue Nov 18 23:19:01 2025
    From Newsgroup: news.software.nntp

    richard@cogsci.ed.ac.uk (Richard Tobin) writes:
    Russ Allbery <eagle@eyrie.org> wrote:
    I am still a bit confused, though, because the value of groupslen doesn't >>seem to be something sensible like -2. If I'm doing the math correctly, >>it's something more like -10539456, which is a hard value to justify if
    the problem is the above bug.

    The number in hex is 0xffffffffff5f2e40, and the non-f part
    corresponds to the printable characters _.@ which is hardly conclusive
    but might be due to being overwritten by a malformed mail address.

    The proximate code is:

    static bool
    MatchGroups(const char *g, int len, const char *pattern, bool exactmatch)
    {
    char *group, *groups, *q;
    int i, lastwhite;
    enum uwildmat matched;
    bool wanted = false;

    q = groups = xmalloc(len + 1);

    This is consistent with len = -10539457.

    The length comes from ARTstore:

    if (innconf->storeonxref) {
    arth.groups = data->Replic;
    arth.groupslen = data->ReplicLength;
    } else {
    arth.groups = HDR(HDR__NEWSGROUPS);
    arth.groupslen = HDR_LEN(HDR__NEWSGROUPS);
    }

    I donrCOt think JesserCOs said what storeonxref is set to but the default is true, so IrCOm going to assume thatrCOs what it is until I hear otherwise.

    ReplicLength is set in ARTassignnumbers, which is the kind of ad-hoc
    string builder that tends to be full of bugs. It has a number of issues
    but the most relevant is:

    * If the collection of group names (and article numbers etc) adds up to
    more than 2GB then len will overflow to negative. This could explain
    the outcome seen, although it would need a pathologically large input
    article.

    Other observations about ARTassignnumbers, which I donrCOt think can
    explain the behavior seen:

    * A pathhost bigger than the default xref buffer size (2049 bytes) would
    generate a buffer overflow.

    * An article number longer than ARTNUMPRINTSIZE could fail to expand the
    buffer when needed. Currently you get ten digits, and the starting
    buffer size is 2049 bytes, so this is probably not an issue in
    real life.

    * If there are no groups at all then the buffer expansion never happens
    at all, so nothing explicitly accounts for the CRLF at the end -
    i.e. the commented assumption rCychecked during the reallocation aboverCO
    is violated. But the default buffer size is more than enough in that
    case.

    * p and len are tracked separately, which is pointless and invites
    error, but I donrCOt see any way to get them out of sync apart from the
    overflow issue described above.

    * ReplicLength is set in an unnecessarily complicated way, because
    len - (q + 1 - data->Xref) == Path.used - 1
    but I donrCOt see any realistic way for that to end up negative or for
    q+1 to run past the end of the buffer.
    --
    https://www.greenend.org.uk/rjk/
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.nntp on Wed Nov 19 00:05:38 2025
    From Newsgroup: news.software.nntp

    On Nov 18, 2025 at 5:19:01rC>PM CST, "Richard Kettlewell" <invalid@invalid.invalid> wrote:

    richard@cogsci.ed.ac.uk (Richard Tobin) writes:
    Russ Allbery <eagle@eyrie.org> wrote:
    I am still a bit confused, though, because the value of groupslen doesn't >>> seem to be something sensible like -2. If I'm doing the math correctly,
    it's something more like -10539456, which is a hard value to justify if
    the problem is the above bug.

    The number in hex is 0xffffffffff5f2e40, and the non-f part
    corresponds to the printable characters _.@ which is hardly conclusive
    but might be due to being overwritten by a malformed mail address.

    The proximate code is:

    static bool
    MatchGroups(const char *g, int len, const char *pattern, bool exactmatch)
    {
    char *group, *groups, *q;
    int i, lastwhite;
    enum uwildmat matched;
    bool wanted = false;

    q = groups = xmalloc(len + 1);

    This is consistent with len = -10539457.

    The length comes from ARTstore:

    if (innconf->storeonxref) {
    arth.groups = data->Replic;
    arth.groupslen = data->ReplicLength;
    } else {
    arth.groups = HDR(HDR__NEWSGROUPS);
    arth.groupslen = HDR_LEN(HDR__NEWSGROUPS);
    }

    I donrCOt think JesserCOs said what storeonxref is set to but the default is true, so IrCOm going to assume thatrCOs what it is until I hear otherwise.

    You are correct, storeonxref is true and xrefslave is false.

    ReplicLength is set in ARTassignnumbers, which is the kind of ad-hoc
    string builder that tends to be full of bugs. It has a number of issues
    but the most relevant is:

    * If the collection of group names (and article numbers etc) adds up to
    more than 2GB then len will overflow to negative. This could explain
    the outcome seen, although it would need a pathologically large input
    article.

    Well, I do have icdsynccount set to 300000, and we are feeding this machine over a billion articles from another machine, so the rate/throughput of incoming articles is very high.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Richard Kettlewell@invalid@invalid.invalid to news.software.nntp on Wed Nov 19 08:22:11 2025
    From Newsgroup: news.software.nntp

    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
    "Richard Kettlewell" <invalid@invalid.invalid> wrote:
    ReplicLength is set in ARTassignnumbers, which is the kind of ad-hoc
    string builder that tends to be full of bugs. It has a number of issues
    but the most relevant is:

    * If the collection of group names (and article numbers etc) adds up to
    more than 2GB then len will overflow to negative. This could explain
    the outcome seen, although it would need a pathologically large input
    article.

    Well, I do have icdsynccount set to 300000, and we are feeding this
    machine over a billion articles from another machine, so the
    rate/throughput of incoming articles is very high.

    To trigger the issue in ARTassignnumbers yourCOd need a single article
    where the collection of group names plus article numbers added up to
    more than 2GB (in fact nearly 4GB for the specific value seen here).
    ItrCOd take a crosspost to millions of groups and the article header alone would be multiple gigabytes.

    If your peer does have such a large article then thatrCOs probably the
    cause. It doensrCOt seem likely but you never know l-)

    If not then further analysis is needed. The coredump posted might shed
    light on it but I donrCOt have a debugger that understands Solaris
    coredumps (and if finding one is harder than apt-get install something,
    itrCOs probably not happening).
    --
    https://www.greenend.org.uk/rjk/
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.nntp on Wed Nov 19 11:28:19 2025
    From Newsgroup: news.software.nntp

    On Nov 19, 2025 at 2:22:11rC>AM CST, "Richard Kettlewell" <invalid@invalid.invalid> wrote:

    To trigger the issue in ARTassignnumbers yourCOd need a single article
    where the collection of group names plus article numbers added up to
    more than 2GB (in fact nearly 4GB for the specific value seen here).
    ItrCOd take a crosspost to millions of groups and the article header alone would be multiple gigabytes.

    If your peer does have such a large article then thatrCOs probably the
    cause. It doensrCOt seem likely but you never know l-)

    The articles are coming from a newly written NNTP stack and the transfer tool has had some bugs we've worked out together. It's possible we hit a bug on the sending side and received some garbage.

    If not then further analysis is needed. The coredump posted might shed
    light on it but I donrCOt have a debugger that understands Solaris
    coredumps (and if finding one is harder than apt-get install something, itrCOs probably not happening).

    Argh, I didn't think about that. I doubt mdb or other compatible debuggers are available on Linux. If anyone can direct me what to do with the core file
    using mdb/gdb I'm happy to do so.

    That said, OmniOS is easy to install and OpenIndiana has a Live-CD. (I don't expect anyone to go that far to help, but I like pushing people away from Linux.)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Tue Nov 25 07:22:09 2025
    From Newsgroup: news.software.nntp

    Hi Jesse,

    If it could be helpful to anyone who wants to take a look, I've placed the core dump that was generated at https://usenet.blueworldhosting.com/core

    Thanks for the core file. Unfortunately, I do not manage to get
    anything from it. I have successfully opened it with mdb on Solaris
    11.4 but do not actually know how mdb works.
    I did not manage to find helpful examples in online documentation too.

    $c
    libinn.so.9.0.2`x_malloc+0x4f()
    libinnstorage.so.3.1.3`SMgetsub+0x1b8()
    libinnstorage.so.3.1.3`SMstore+0x87()

    Does someone know whether it is possible to have some insights about the variables and their contents in the SMgetsub and x_malloc calls?
    --
    Julien |eLIE

    -2-aTuto, cito, iucunde.-a-+ (Esculape)

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.nntp on Tue Nov 25 12:55:13 2025
    From Newsgroup: news.software.nntp

    On Nov 25, 2025 at 12:22:09rC>AM CST, "Julien |eLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Jesse,

    If it could be helpful to anyone who wants to take a look, I've placed the >> core dump that was generated at https://usenet.blueworldhosting.com/core

    Thanks for the core file. Unfortunately, I do not manage to get
    anything from it. I have successfully opened it with mdb on Solaris
    11.4 but do not actually know how mdb works.
    I did not manage to find helpful examples in online documentation too.

    $c
    libinn.so.9.0.2`x_malloc+0x4f()
    libinnstorage.so.3.1.3`SMgetsub+0x1b8()
    libinnstorage.so.3.1.3`SMstore+0x87()

    Does someone know whether it is possible to have some insights about the variables and their contents in the SMgetsub and x_malloc calls?

    gdb should be available on Solaris 11.4, if that's more comfortable. I don't currently have Solaris 11.4 installed, but see gdb in Oracle's online man pages.

    I'm swimming in the middle of the ocean without a raft when it comes to this stuff. :-)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to comp.unix.solaris,news.software.nntp on Tue Nov 25 14:51:44 2025
    From Newsgroup: news.software.nntp

    On Nov 25, 2025 at 6:55:13rC>AM CST, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

    On Nov 25, 2025 at 12:22:09rC>AM CST, "Julien |eLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Jesse,

    If it could be helpful to anyone who wants to take a look, I've placed the >>> core dump that was generated at https://usenet.blueworldhosting.com/core

    Thanks for the core file. Unfortunately, I do not manage to get
    anything from it. I have successfully opened it with mdb on Solaris
    11.4 but do not actually know how mdb works.
    I did not manage to find helpful examples in online documentation too.

    $c
    libinn.so.9.0.2`x_malloc+0x4f()
    libinnstorage.so.3.1.3`SMgetsub+0x1b8()
    libinnstorage.so.3.1.3`SMstore+0x87()

    Does someone know whether it is possible to have some insights about the
    variables and their contents in the SMgetsub and x_malloc calls?

    gdb should be available on Solaris 11.4, if that's more comfortable. I don't currently have Solaris 11.4 installed, but see gdb in Oracle's online man pages.

    I'm swimming in the middle of the ocean without a raft when it comes to this stuff. :-)

    Cross-posting to comp.unix.solaris (probably should have done that from the start).

    If anyone checking comp.unix.solaris is interested in helping us debug an
    issue with INN on OmniOS, we could use some pointers.

    I have a core file from a crash that we'd like to get more details on, but
    none of us are very familiar with debugging on illumos or Solaris.
    --- Synchronet 3.21a-Linux NewsLink 1.2