• webv4 portal login/logout stalls 90 seconds at "Initializing User Obje

    From xbit ops@1:103/705 to GitLab issue in main/sbbs on Tue Jun 23 09:51:52 2026
    open https://gitlab.synchro.net/main/sbbs/-/issues/1169

    ## Summary

    Logging in or out of the webv4 portal takes exactly 90 seconds before completing. The stall is consistent and reproducible on every login and logout attempt, on both local and remote connections.

    Confirmed by multiple sysops on IRC (2026-06-23).

    ## Environment

    - Windows 10 32-bit
    - Synchronet 3.21e-Win32
    - webv4 (current master)
    - Local disk data directory (not SMB-mounted)

    ## Symptom

    The web server log shows the stall always occurs at the same point:

    HTTPS [x.x.x.x] <<unknown user>> logged-in
    HTTPS [x.x.x.x] JavaScript: Initializing User Objects
    -- 90 second gap --
    HTTPS [x.x.x.x] Adding query value username=...

    The JS script itself executes in 0.02 seconds once it starts -- the
    entire delay is in the native "Initializing User Objects" step.

    ## Investigation

    - Hostname Lookup disabled in web server config -- no change
    - Wireshark capture during stall shows zero outbound TCP connections
    timing out from the BBS server -- no network activity during the stall
    - Stall is exactly 90 seconds every time, consistent with a lock timeout
    rather than a DNS or TCP timeout
    - System is idle during stall (low CPU/memory)

    ## Possible relation

    This may be related to #1153 (Windows xp_lockfile() exclusive locking serializing user.tab reads). The "Initializing User Objects" step reads
    the user database and would be affected by exclusive lock contention on Windows even under light load.

    ## Impact

    Every webv4 login and logout takes 90 seconds. Affects all users on
    Windows installs regardless of load.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Tue Jun 23 11:05:02 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1169#note_9424

    Issue #1153 has been fixed in the current development code, so if it is in fact related to #1153, then upgrading to v3.22a would make the problem go away. That would be a good data point.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From xbit ops@1:103/705 to GitLab note in main/sbbs on Tue Jun 23 12:53:22 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1169#note_9425

    i was on an older 3.22a but just updated to the nightly. 90-second stall at Initializing User Objects remains. i can turn on debug logging and/or Wireshark logs if needed.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Tue Jun 23 13:31:08 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1169#note_9426

    Good to know. So it's unrelated to issue #1153. Does the problem happen on Windows and Linux or just Windows?

    If it happens on Linux, it should be easy to capture a backtrace at the time of the delay and root cause it:
    `gdb -p <pid> -batch -ex 'thread apply all bt'`
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Tue Jun 23 13:40:31 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1169#note_9429

    Confirmed unrelated to #1153 — good. The stall surviving a nightly that already has the #1153 fix rules out the Windows exclusive-`user.tab`-lock theory.

    Tracing the webv4 login path, the "Initializing User Objects" native step (`js_CreateUserObjects`) and its msg/file/xtrn area-object creators only build lazy JS object skeletons and take no `user.tab` lock — and the stalling request is the *anonymous* one (`<<unknown user>>`, credentials still in the POST body), which writes no user record at all. So the delay is probably **not** in that native step despite where the log line lands; the next log line just happens to be the next thing that logs.

    To pin it down empirically, I've added six `LOG_DEBUG` timing probes that bisect the gap between `Initializing User Objects` and `Adding query value` (commit `f6d382c13`, on master — they'll be in the next nightly). With debug logging on, a login should now emit:

    ```
    Initializing User Objects
    User Objects initialized ← if the 90s is BEFORE this, the native user-objects step is the culprit
    Authorization check complete ← else if here: rest of check_ars (access/ARS checks)
    Responding to request (dynamic=2) ← else if here: check_request tail / thread dispatch
    exec_ssjs: beginning JS request: … ← else if here: respond()->exec_ssjs setup
    exec_ssjs: initializing request properties ← if the 90s is between these two: JS_BEGINREQUEST is blocking (GC/runtime)
    Adding query value username=… ← else if here: request-property setup / query parse
    ```

    @xbit — once you're on a nightly with this commit, please enable web-server debug logging, reproduce one login, and paste the run of lines from `Initializing User Objects` through `Adding query value` (with timestamps). Whichever adjacent pair straddles the 90s gap localizes it.

    Two more data points that would help:
    - **Does it reproduce on Linux, or only Windows?** If Linux, a `gdb -p <pid> -batch -ex 'thread apply all bt'` during the stall would name the blocking call outright.
    - The Windows equivalent if it's Windows-only: a `cdb` attach (or WER minidump) of the stalled web thread.

    — *Authored by Claude (Claude Code), on behalf of @rswindell*
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From xbit ops@1:103/705 to GitLab note in main/sbbs on Tue Jun 23 14:40:16 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1169#note_9430

    no issues on the linux side - logs in is **VERY** fast. will update to the nightly once its out.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From xbit ops@1:103/705 to GitLab note in main/sbbs on Tue Jun 23 15:32:43 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1169#note_9433

    i should add that i'm running Sync v3.21f on the linux side
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab issue in main/sbbs on Tue Jun 23 22:21:41 2026
    close https://gitlab.synchro.net/main/sbbs/-/issues/1169
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Tue Jun 23 22:31:29 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1169#note_9440

    ## Root-caused and fixed

    This is **not** the native "Initializing User Objects" step, and it's **not** related to #1153. It's a regression in the TLS body-read path, introduced by `50258e70b` (*notion-31-talks*, "detect TLS client disconnect", #1155).

    ### Mechanism

    A webv4 login/logout is an HTTPS `POST /api/auth.ssjs` carrying a body (the credentials). That body frequently arrives **in the same TLS record as the headers**, so once the headers are parsed it sits decrypted-but-unread inside the TLS layer (`tls_pending`), with **nothing left on the raw socket**.

    `read_post_data()` → `recvbufsocket()` gates each read on `session_check()`. Before `50258e70b`, `session_check()` short-circuited to "readable" on `tls_pending`. That commit changed the short-circuit to `peeked_valid` (a single peeked byte) to detect a bare-FIN client disconnect. The body-read path has no peeked byte, so `session_check()` now falls through to `socket_check()` on the **raw socket** — which never becomes readable, because the data is already in the TLS buffer — and blocks for the full `MaxInactivity` before the buffered body is finally read.

    Header reads were unaffected: `sockreadline()` kept its own `tls_pending` guard, so only the **body** read (i.e. only POST = login/logout) regressed.

    This matches every reported symptom:
    - **Login/logout only** — those are the POST-with-body requests; GETs never call `read_post_data()`.
    - **Stall == `MaxInactivity`** — 60s where it's `1m`, 90s where it's `1m30s`. (The "exactly 90s every time" is just the configured timeout.)
    - **Idle system, zero wire traffic in Wireshark** — it's a `select()` on the raw socket waiting on data that already arrived and is buffered in TLS.
    - **"JS runs in 0.02s"** — correct; the block is in the native body read, before the script.
    - The pre-probe builds logged the gap right after `Initializing User Objects` simply because that was the nearest preceding log line; the probes (`f6d382c13`) place it precisely between `Authorization check complete` and `Responding to request`, i.e. inside `read_post_data()`.

    ### It's the version, not the platform

    The only reason this looked OS-related is that @xbit compared **v3.21f on Linux** (no stall) against **v3.22a on Windows** (stall). The determinant is the version:

    - **v3.21f and earlier** — no regression (`session_check()` still short-circuits on `tls_pending`); unaffected. That's why @xbit's Linux box was fast.
    - **v3.22a** (post-`50258e70b`) — affected on **all platforms**. Reproduced on **Linux v3.22a** by @deathr0w_ and @nelgin, and on **Windows v3.22a** (vert and @xbit's Windows box).

    (A single hot-git web.synchro.net login didn't surface it in one test — most likely TLS record/segment timing or a low `MaxInactivity` — but the defect is present there too.)

    ### Fix

    `b0f02c4e6` (*pink-27-boss*) on master: guard the `recvbufsocket()` wait with `tls_pending` exactly the way `sockreadline()` already does — when TLS data is already buffered, read it directly instead of waiting on the raw socket. `session_check()` / the #1155 disconnect detection is untouched.

    **Verified on vert (v3.22a):** the auth-POST `Authorization check complete` → `Responding to request` gap went from **60s → 0s**; `auth.ssjs` then runs in ~0.5s and login is instant.

    @deathr0w_ / @nelgin — a rebuild of `libwebsrvr.so` at `b0f02c4e6` should confirm the fix on Linux v3.22a.

    *(Separately, the post-login portal page render is slow on heavily-loaded installs with a network-mounted data dir — that's a different I/O issue, not this stall.)*

    — *Authored by Claude (Claude Code), on behalf of @rswindell*
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Nigel Reed@1:103/705 to GitLab note in main/sbbs on Tue Jun 23 22:35:57 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1169#note_9442

    Confirmed that the issue appears fixed on Linux. Thank you.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From xbit ops@1:103/705 to GitLab note in main/sbbs on Thu Jun 25 01:56:11 2026
    https://gitlab.synchro.net/main/sbbs/-/issues/1169#note_9461

    Confirmed issue fixed on Windows.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)