$GC++ symbol demangling enabled
$r%rax = 0x0000000000000000 %r8 = 0x0000000000000000
$Cfffffc7fffdf4950 <libc.so>.1`_lwp_kill+0xa()
::stacksTHREAD STATE SOBJ COUNT
I have been working with Billy G., having him push articles from his
pugleaf instance to an INN instance running on an OmniOS VM. Tonight
innd crashed with the following:
Nov 16 22:33:33 omnios-inn innd: [ID 608925 news.crit] SERVER cant malloc 18446744073699012160 bytes at interface.c line 622: Not enough space
What condition could have caused innd to attempt to allocate that much memory?
Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
I have been working with Billy G., having him push articles from his
pugleaf instance to an INN instance running on an OmniOS VM. Tonight
innd crashed with the following:
Nov 16 22:33:33 omnios-inn innd: [ID 608925 news.crit] SERVER cant malloc
18446744073699012160 bytes at interface.c line 622: Not enough space
What condition could have caused innd to attempt to allocate that much
memory?
There is some sort of memory or data structure corruption going on inside SMgetsub. article.groupslen is nonsense (it's a large negative number). I don't immediately see how that can happen, though. It could be through
some sort of stack overwrite from a different part of the program.
I don't know what mdb is (I've never used it). The output that it
generated for you is pretty useless. I would use gdb to inspect the
article data structure up the call chain and see if that provides any
clues.
I have been working with Billy G., having him push articles from his pugleaf instance to an INN instance running on an OmniOS VM. Tonight innd crashed with
the following:
Nov 16 22:33:33 omnios-inn innd: [ID 608925 news.crit] SERVER cant malloc 18446744073699012160 bytes at interface.c line 622: Not enough space
I am wondering whether ARTxrefslave does the right thing if the Xref
header field is badly formatted.
For instance "Xref: group:10\r\n" without a server name.
if (!HDR_FOUND(HDR__XREF))
return false;
/* skip server name */
if ((p = strpbrk(HDR(HDR__XREF), " \t\r\n")) == NULL)
return false;
/* in case Xref is folded */
while (*++p == ' ' || *p == '\t' || *p == '\r' || *p == '\n')
;
if (*p == '\0')
return false;
data->Replic = p;
data->ReplicLength = HDR_LEN(HDR__XREF) - (p - HDR(HDR__XREF));
I think p would point at the beginning of the following header field, or
the beginning of the body, and then:
data->ReplicLength = strlen("group:10") - strlen("group:10\r\n") = -2
Hi Jesse,
I have been working with Billy G., having him push articles from his pugleaf >> instance to an INN instance running on an OmniOS VM. Tonight innd crashed with
the following:
Nov 16 22:33:33 omnios-inn innd: [ID 608925 news.crit] SERVER cant malloc
18446744073699012160 bytes at interface.c line 622: Not enough space
Is the INN instance running in slave mode?
Do you happen to have the culprit article which made INN crash?
If the INN instance is not running in slave mode, well, there's another
issue to find...
I am still a bit confused, though, because the value of groupslen doesn't >seem to be something sensible like -2. If I'm doing the math correctly,
it's something more like -10539456, which is a hard value to justify if
the problem is the above bug.
In article <87tsyrf9ag.fsf@hope.eyrie.org>,
Russ Allbery <eagle@eyrie.org> wrote:
I am still a bit confused, though, because the value of groupslen doesn't
seem to be something sensible like -2. If I'm doing the math correctly,
it's something more like -10539456, which is a hard value to justify if
the problem is the above bug.
The number in hex is 0xffffffffff5f2e40, and the non-f part
corresponds to the printable characters _.@ which is hardly conclusive
but might be due to being overwritten by a malformed mail address.
Are there any other examples?
-- Richard
If it could be helpful to anyone who wants to take a look, I've placed the >core dump that was generated at https://usenet.blueworldhosting.com/core
In article <10fijun$ac8$1@nnrp.usenet.blueworldhosting.com>,
Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:
If it could be helpful to anyone who wants to take a look, I've placed the >> core dump that was generated at https://usenet.blueworldhosting.com/core
I started to download it, but the connection is very slow and it would
take over an hour! Can you gzip it? (Core files often compress very
well.)
-- Richard
Russ Allbery <eagle@eyrie.org> wrote:
I am still a bit confused, though, because the value of groupslen doesn't >>seem to be something sensible like -2. If I'm doing the math correctly, >>it's something more like -10539456, which is a hard value to justify if
the problem is the above bug.
The number in hex is 0xffffffffff5f2e40, and the non-f part
corresponds to the printable characters _.@ which is hardly conclusive
but might be due to being overwritten by a malformed mail address.
richard@cogsci.ed.ac.uk (Richard Tobin) writes:
Russ Allbery <eagle@eyrie.org> wrote:
I am still a bit confused, though, because the value of groupslen doesn't >>> seem to be something sensible like -2. If I'm doing the math correctly,
it's something more like -10539456, which is a hard value to justify if
the problem is the above bug.
The number in hex is 0xffffffffff5f2e40, and the non-f part
corresponds to the printable characters _.@ which is hardly conclusive
but might be due to being overwritten by a malformed mail address.
The proximate code is:
static bool
MatchGroups(const char *g, int len, const char *pattern, bool exactmatch)
{
char *group, *groups, *q;
int i, lastwhite;
enum uwildmat matched;
bool wanted = false;
q = groups = xmalloc(len + 1);
This is consistent with len = -10539457.
The length comes from ARTstore:
if (innconf->storeonxref) {
arth.groups = data->Replic;
arth.groupslen = data->ReplicLength;
} else {
arth.groups = HDR(HDR__NEWSGROUPS);
arth.groupslen = HDR_LEN(HDR__NEWSGROUPS);
}
I donrCOt think JesserCOs said what storeonxref is set to but the default is true, so IrCOm going to assume thatrCOs what it is until I hear otherwise.
ReplicLength is set in ARTassignnumbers, which is the kind of ad-hoc
string builder that tends to be full of bugs. It has a number of issues
but the most relevant is:
* If the collection of group names (and article numbers etc) adds up to
more than 2GB then len will overflow to negative. This could explain
the outcome seen, although it would need a pathologically large input
article.
"Richard Kettlewell" <invalid@invalid.invalid> wrote:
ReplicLength is set in ARTassignnumbers, which is the kind of ad-hoc
string builder that tends to be full of bugs. It has a number of issues
but the most relevant is:
* If the collection of group names (and article numbers etc) adds up to
more than 2GB then len will overflow to negative. This could explain
the outcome seen, although it would need a pathologically large input
article.
Well, I do have icdsynccount set to 300000, and we are feeding this
machine over a billion articles from another machine, so the
rate/throughput of incoming articles is very high.
To trigger the issue in ARTassignnumbers yourCOd need a single article
where the collection of group names plus article numbers added up to
more than 2GB (in fact nearly 4GB for the specific value seen here).
ItrCOd take a crosspost to millions of groups and the article header alone would be multiple gigabytes.
If your peer does have such a large article then thatrCOs probably the
cause. It doensrCOt seem likely but you never know l-)
If not then further analysis is needed. The coredump posted might shed
light on it but I donrCOt have a debugger that understands Solaris
coredumps (and if finding one is harder than apt-get install something, itrCOs probably not happening).
If it could be helpful to anyone who wants to take a look, I've placed the core dump that was generated at https://usenet.blueworldhosting.com/core
$clibinn.so.9.0.2`x_malloc+0x4f()
Hi Jesse,
If it could be helpful to anyone who wants to take a look, I've placed the >> core dump that was generated at https://usenet.blueworldhosting.com/core
Thanks for the core file. Unfortunately, I do not manage to get
anything from it. I have successfully opened it with mdb on Solaris
11.4 but do not actually know how mdb works.
I did not manage to find helpful examples in online documentation too.
$clibinn.so.9.0.2`x_malloc+0x4f()
libinnstorage.so.3.1.3`SMgetsub+0x1b8()
libinnstorage.so.3.1.3`SMstore+0x87()
Does someone know whether it is possible to have some insights about the variables and their contents in the SMgetsub and x_malloc calls?
On Nov 25, 2025 at 12:22:09rC>AM CST, "Julien |eLIE" <iulius@nom-de-mon-site.com.invalid> wrote:
Hi Jesse,
If it could be helpful to anyone who wants to take a look, I've placed the >>> core dump that was generated at https://usenet.blueworldhosting.com/core
Thanks for the core file. Unfortunately, I do not manage to get
anything from it. I have successfully opened it with mdb on Solaris
11.4 but do not actually know how mdb works.
I did not manage to find helpful examples in online documentation too.
$clibinn.so.9.0.2`x_malloc+0x4f()
libinnstorage.so.3.1.3`SMgetsub+0x1b8()
libinnstorage.so.3.1.3`SMstore+0x87()
Does someone know whether it is possible to have some insights about the
variables and their contents in the SMgetsub and x_malloc calls?
gdb should be available on Solaris 11.4, if that's more comfortable. I don't currently have Solaris 11.4 installed, but see gdb in Oracle's online man pages.
I'm swimming in the middle of the ocean without a raft when it comes to this stuff. :-)
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 65 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 10:06:07 |
| Calls: | 862 |
| Files: | 1,311 |
| D/L today: |
3 files (7,546K bytes) |
| Messages: | 265,184 |