From b71705bdcd480adacd7ef4aa9528a613fd16e2b3 Mon Sep 17 00:00:00 2001
From: goodboy <goodboy_foss@protonmail.com>
Date: Thu, 23 Apr 2026 15:21:41 -0400
Subject: [PATCH] Refine `subint_forkserver` nested-cancel hang diagnosis
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Major rewrite of
`subint_forkserver_test_cancellation_leak_issue.md`
after empirical investigation revealed the earlier
"descendant-leak + missing tree-kill" diagnosis
conflated two unrelated symptoms:

1. **5-zombie leak holding `:1616`** — turned out to
   be a self-inflicted cleanup bug: `pkill`-ing a bg
   pytest task (SIGTERM/SIGKILL, no SIGINT) skipped
   the SC graceful cancel cascade entirely. Codified
   the real fix — SIGINT-first ladder w/ bounded
   wait before SIGKILL — in e5e2afb5 (`run-tests`
   SKILL) and
   `feedback_sc_graceful_cancel_first.md`.
2. **`test_nested_multierrors[subint_forkserver]`
   hangs indefinitely** — the actual backend bug,
   and it's a deadlock not a leak.

Deats,
- new diagnosis: all 5 procs are kernel-`S` in
  `do_epoll_wait`; pytest-main's trio-cache workers
  are in `os.waitpid` waiting for children that are
  themselves waiting on IPC that never arrives —
  graceful `Portal.cancel_actor` cascade never
  reaches its targets
- tree-structure evidence: asymmetric depth across
  two identical `run_in_actor` calls — child 1
  (3 threads) spawns both its grandchildren; child 2
  (1 thread) never completes its first nursery
  `run_in_actor`. Smells like a race on fork-
  inherited state landing differently per spawn
  ordering
- new hypothesis: `os.fork()` from a subactor
  inherits the ROOT parent's IPC listener FDs
  transitively. Grandchildren end up with three
  overlapping FD sets (own + direct-parent + root),
  so IPC routing becomes ambiguous. Predicts bug
  scales with fork depth — matches reality: single-
  level spawn works, multi-level hangs
- ruled out: `_ForkedProc.kill()` tree-kill (never
  reaches hard-kill path), `:1616` contention (fixed
  by `reg_addr` fixture wiring), GIL starvation
  (each subactor has its own OS process+GIL),
  child-side KBI absorption (`_trio_main` only
  catches KBI at `trio.run()` callsite, reached
  only on trio-loop exit)
- four fix directions ranked: (1) blanket post-fork
  `closerange()`, (2) `FD_CLOEXEC` + audit,
  (3) targeted FD cleanup via `actor.ipc_server`
  handle, (4) `os.posix_spawn` w/ `file_actions`.
  Vote: (3) — surgical, doesn't break the "no exec"
  design of `subint_forkserver`
- standalone repro added (`spawn_and_error(breadth=
  2, depth=1)` under `trio.fail_after(20)`)
- stopgap: skip `test_nested_multierrors` + multi-
  level-spawn tests under the backend via
  `@pytest.mark.skipon_spawn_backend(...)` until
  fix lands

Killing the "tree-kill descendants" fix-direction
section: it addressed a bug that didn't exist.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
---
 ...forkserver_test_cancellation_leak_issue.md | 436 ++++++++++++------
 1 file changed, 302 insertions(+), 134 deletions(-)

diff --git a/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md b/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md
index 987b2b52..4ca722cc 100644
--- a/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md
+++ b/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md
@@ -1,165 +1,333 @@
-# `subint_forkserver` backend leaks subactor descendants in `test_cancellation.py`
+# `subint_forkserver` backend: `test_cancellation.py` multi-level cancel cascade hang
 
 Follow-up tracker: surfaced while wiring the new
 `subint_forkserver` spawn backend into the full tractor
-test matrix (step 2 of the post-backend-lands plan;
-see also
-`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`).
+test matrix (step 2 of the post-backend-lands plan).
+See also
+`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
+— sibling tracker for a different forkserver-teardown
+class which probably shares the same fundamental root
+cause (fork-FD-inheritance across nested spawns).
 
 ## TL;DR
 
-Running `tests/test_cancellation.py` under
-`--spawn-backend=subint_forkserver` reproducibly leaks
-**exactly 5 `subint-forkserv` comm-named child processes**
-after the pytest session exits. Both previously-run
-sessions produced the same 5-process signature — not a
-flake. Each leaked process holds a `LISTEN` on the
-default registry TCP addr (`127.0.0.1:1616`), which
-poisons any subsequent tractor test session that
-defaults to that addr.
+`tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]`
+hangs indefinitely under our new backend. The hang is
+**inside the graceful IPC cancel cascade** — every actor
+in the multi-level tree parks in `epoll_wait` waiting
+for IPC messages that never arrive. Not a hard-kill /
+tree-reap issue (we don't reach the hard-kill fallback
+path at all).
 
-## Stopgap (not the real fix)
+Working hypothesis (unverified): **`os.fork()` from a
+subactor inherits the root parent's IPC listener socket
+FDs**. When a first-level subactor forkserver-spawns a
+grandchild, that grandchild inherits both its direct
+spawner's FDs AND the root's FDs — IPC message routing
+becomes ambiguous (or silently sends to the wrong
+channel), so the cancel cascade can't reach its target.
 
-Multiple tests in `test_cancellation.py` were calling
-`tractor.open_nursery()` **without** passing
-`registry_addrs=[reg_addr]`, i.e. falling back on the
-default `:1616`. The commit accompanying this doc wires
-the `reg_addr` fixture through those tests so each run
-gets a session-unique port — leaked zombies can no
-longer poison **other** tests (they hold their own
-unique port instead).
+## Corrected diagnosis vs. earlier draft
 
-Tests touched (in `tests/test_cancellation.py`):
+An earlier version of this doc claimed the root cause
+was **"forkserver teardown doesn't tree-kill
+descendants"** (SIGKILL only reaches the direct child,
+grandchildren survive and hold TCP `:1616`). That
+diagnosis was **wrong**, caused by conflating two
+observations:
 
-- `test_cancel_infinite_streamer`
-- `test_some_cancels_all`
-- `test_nested_multierrors`
-- `test_cancel_via_SIGINT`
-- `test_cancel_via_SIGINT_other_task`
+1. *5-zombie leak holding :1616* — happened in my own
+   workflow when I aborted a bg pytest task with
+   `pkill` (SIGTERM/SIGKILL, not SIGINT). The abrupt
+   kill skipped the graceful `ActorNursery.__aexit__`
+   cancel cascade entirely, orphaning descendants to
+   init. **This was my cleanup bug, not a forkserver
+   teardown bug.** Codified the fix (SIGINT-first +
+   bounded wait before SIGKILL) in
+   `feedback_sc_graceful_cancel_first.md` +
+   `.claude/skills/run-tests/SKILL.md`.
+2. *`test_nested_multierrors` hangs indefinitely* —
+   the real, separate, forkserver-specific bug
+   captured by this doc.
 
-This is a **suite-hygiene fix** — it doesn't close the
-actual leak; it just stops the leak from blast-radiusing.
-Zombie descendants still accumulate per run.
+The two symptoms are unrelated. The tree-kill / setpgrp
+fix direction proposed earlier would not help (1) (SC-
+graceful-cleanup is the right answer there) and would
+not help (2) (the hang is in the cancel cascade, not
+in the hard-kill fallback).
 
-## The real bug (unfixed)
+## Symptom
 
-`subint_forkserver_proc`'s teardown — `_ForkedProc.kill()`
-(plain `os.kill(SIGKILL)` to the direct child pid) +
-`proc.wait()` — does **not** reap grandchildren or
-deeper descendants. When a cancellation test causes a
-multi-level actor tree to tear down, the direct child
-dies but its own children survive and get reparented to
-init (PID 1), where they stay running with their
-inherited FDs (including the registry listen socket).
+Reproducer (py3.14, clean env):
 
-**Symptom on repro:**
+```sh
+# preflight: ensure clean env
+ss -tlnp 2>/dev/null | grep ':1616' && echo 'FOUL — cleanup first!' || echo 'clean'
 
-```
-$ ss -tlnp 2>/dev/null | grep ':1616'
-LISTEN 0 4096 127.0.0.1:1616 0.0.0.0:* \
-    users:(("subint-forkserv",pid=211595,fd=17),
-           ("subint-forkserv",pid=211585,fd=17),
-           ("subint-forkserv",pid=211583,fd=17),
-           ("subint-forkserv",pid=211576,fd=17),
-           ("subint-forkserv",pid=211572,fd=17))
-
-$ for p in 211572 211576 211583 211585 211595; do
-    cat /proc/$p/cmdline | tr '\0' ' '; echo; done
 ./py314/bin/python -m pytest --spawn-backend=subint_forkserver \
-  tests/test_cancellation.py --timeout=30 --timeout-method=signal \
-  --tb=no -q --no-header
-... (x5, all same cmdline — inherited from fork)
+  'tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]' \
+  --timeout=30 --timeout-method=thread --tb=short -v
 ```
 
-All 5 share the pytest cmdline because `os.fork()`
-without `exec()` preserves the parent's argv. Their
-comm-name (`subint-forkserv`) is the `thread_name` we
-pass to the fork-worker thread in
-`tractor.spawn._subint_forkserver.fork_from_worker_thread`.
+Expected: `pytest-timeout` fires at 30s with a thread-
+dump banner, but the process itself **remains alive
+after timeout** and doesn't unwedge on subsequent
+SIGINT. Requires SIGKILL to reap.
 
-## Why 5?
+## Evidence (tree structure at hang point)
 
-Not confirmed; guess is 5 = the parametrize cardinality
-of one of the leaky tests (e.g. `test_some_cancels_all`
-has 5 parametrize cases). Each param-case spawns a
-nested tree; each leaks exactly one descendant. Worth
-verifying by running each parametrize-case individually
-and counting leaked procs per case.
+All 5 processes are kernel-level `S` (sleeping) in
+`do_epoll_wait` (trio's event loop waiting on I/O):
 
-## Ruled out
-
-- **`:1616` collision from a different repo** (e.g.
-  piker): `/proc/$pid/cmdline` + `cwd` both resolve to
-  the tractor repo's `py314/` venv for all 5. These are
-  definitively spawned by our test run.
-- **Parent-side `_ForkedProc.wait()` regressed**: the
-  direct child's teardown completes cleanly (exit-code
-  captured, `waitpid` returns); the 5 survivors are
-  deeper-descendants whose parent-side shim has no
-  handle on them. So the bug isn't in
-  `_ForkedProc.wait()` — it's in the lack of tree-
-  level descendant enumeration + reaping during nursery
-  teardown.
-
-## Likely fix directions
-
-1. **Process-group-scoped spawn + tree kill.** Put each
-   forkserver-spawned subactor into its own process
-   group (`os.setpgrp()` in the fork child), then on
-   teardown `os.killpg(pgid, SIGKILL)` to reap the
-   whole tree atomically. Simplest, most surgical.
-2. **Subreaper registration.** Use
-   `PR_SET_CHILD_SUBREAPER` on the tractor root so
-   orphaned grandchildren reparent to the root rather
-   than init — then we can `waitpid` them from the
-   parent-side nursery teardown. More invasive.
-3. **Explicit descendant enumeration at teardown.**
-   In `subint_forkserver_proc`'s finally block, walk
-   `/proc/<pid>/task/*/children` before issuing SIGKILL
-   to build a descendant-pid set; then kill + reap all
-   of them. Fragile (Linux-only, proc-fs-scan race).
-
-Vote: **(1)** — clean, POSIX-standard, aligns with how
-`subprocess.Popen` (and by extension `trio.lowlevel.
-open_process`) handle tree-kill semantics on
-kwargs-supplied `start_new_session=True`.
-
-## Reproducer
-
-```sh
-# before: ensure clean env
-ss -tlnp 2>/dev/null | grep ':1616' || echo 'clean'
-
-# run the leaky tests
-./py314/bin/python -m pytest \
-  --spawn-backend=subint_forkserver \
-  tests/test_cancellation.py \
-  --timeout=30 --timeout-method=signal --tb=no -q --no-header
-
-# observe: 5 leaked children now holding :1616
-ss -tlnp 2>/dev/null | grep ':1616'
+```
+PID     PPID    THREADS  NAME             ROLE
+333986  1       2        subint-forkserv  pytest main (the test body)
+333993  333986  3        subint-forkserv  "child 1" spawner subactor
+  334003 333993 1        subint-forkserv  grandchild errorer under child-1
+  334014 333993 1        subint-forkserv  grandchild errorer under child-1
+333999  333986  1        subint-forkserv  "child 2" spawner subactor (NO grandchildren!)
 ```
 
-Expected output: `subint-forkserv` processes listed as
-listeners on `:1616`. Cleanup:
+### Asymmetric tree depth
 
-```sh
-pkill -9 -f \
-  "$(pwd)/py314/bin/python.*pytest.*spawn-backend=subint_forkserver"
+The test's `spawn_and_error(breadth=2, depth=3)` should
+have BOTH direct children spawning 2 grandchildren
+each, going 3 levels deep. Reality:
+
+- Child 1 (333993, 3 threads) DID spawn its two
+  grandchildren as expected — fully booted trio
+  runtime.
+- Child 2 (333999, 1 thread) did NOT spawn any
+  grandchildren — clearly never completed its
+  nursery's first `run_in_actor`. Its 1-thread state
+  suggests the runtime never fully booted (no trio
+  worker threads for `waitpid`/IPC).
+
+This asymmetry is the key clue: the two direct
+children started identically but diverged. Probably a
+race around fork-inherited state (listener FDs,
+subactor-nursery channel state) that happens to land
+differently depending on spawn ordering.
+
+### Parent-side state
+
+Thread-dump of pytest main (333986) at the hang:
+
+- Main trio thread — parked in
+  `trio._core._io_epoll.get_events` (epoll_wait on
+  its event loop). Waiting for IPC from children.
+- Two trio-cache worker threads — each parked in
+  `outcome.capture(sync_fn)` calling
+  `os.waitpid(child_pid, 0)`. These are our
+  `_ForkedProc.wait()` off-loads. They're waiting for
+  the direct children to exit — but children are
+  stuck in their own epoll_wait waiting for IPC from
+  the parent.
+
+**It's a deadlock, not a leak:** the parent is
+correctly running `soft_kill(proc, _ForkedProc.wait,
+portal)` (graceful IPC cancel via
+`Portal.cancel_actor()`), but the children never
+acknowledge the cancel message (or the message never
+reaches them through the tangled post-fork IPC).
+
+## What's NOT the cause (ruled out)
+
+- **`_ForkedProc.kill()` only SIGKILLs direct pid /
+  missing tree-kill**: doesn't apply — we never reach
+  the hard-kill path. The deadlock is in the graceful
+  cancel cascade.
+- **Port `:1616` contention**: ruled out after the
+  `reg_addr` fixture-wiring fix; each test session
+  gets a unique port now.
+- **GIL starvation / SIGINT pipe filling** (class-A,
+  `subint_sigint_starvation_issue.md`): doesn't apply
+  — each subactor is its own OS process with its own
+  GIL (not legacy-config subint).
+- **Child-side `_trio_main` absorbing KBI**: grep
+  confirmed; `_trio_main` only catches KBI at the
+  `trio.run()` callsite, which is reached only if the
+  trio loop exits normally. The children here never
+  exit trio.run() — they're wedged inside.
+
+## Hypothesis: FD inheritance across nested forks
+
+`subint_forkserver_proc` calls
+`fork_from_worker_thread()` which ultimately does
+`os.fork()` from a dedicated worker thread. Standard
+Linux/POSIX fork semantics: **the child inherits ALL
+open FDs from the parent**, including listener
+sockets, epoll fds, trio wakeup pipes, and the
+parent's IPC channel sockets.
+
+At root-actor fork-spawn time, the root's IPC server
+listener FDs are open in the parent. Those get
+inherited by child 1. Child 1 then forkserver-spawns
+its OWN subactor (grandchild). The grandchild
+inherits FDs from child 1 — but child 1's address
+space still contains **the root's IPC listener FDs
+too** (inherited at first fork). So the grandchild
+has THREE sets of FDs:
+
+1. Its own (created after becoming a subactor).
+2. Its direct parent child-1's.
+3. The ROOT's (grandparent's) — inherited transitively.
+
+IPC message routing may be ambiguous in this tangled
+state. Or a listener socket that the root thinks it
+owns is actually open in multiple processes, and
+messages sent to it go to an arbitrary one. That
+would exactly match the observed "graceful cancel
+never propagates".
+
+This hypothesis predicts the bug **scales with fork
+depth**: single-level forkserver spawn
+(`test_subint_forkserver_spawn_basic`) works
+perfectly, but any test that spawns a second level
+deadlocks. Matches observations so far.
+
+## Fix directions (to validate)
+
+### 1. `close_fds=True` equivalent in `fork_from_worker_thread()`
+
+`subprocess.Popen` / `trio.lowlevel.open_process` have
+`close_fds=True` by default on POSIX — they
+enumerate open FDs in the child post-fork and close
+everything except stdio + any explicitly-passed FDs.
+Our raw `os.fork()` doesn't. Adding the equivalent to
+our `_worker` prelude would isolate each fork
+generation's FD set.
+
+Implementation sketch in
+`tractor.spawn._subint_forkserver.fork_from_worker_thread._worker`:
+
+```python
+def _worker() -> None:
+    pid: int = os.fork()
+    if pid == 0:
+        # CHILD: close inherited FDs except stdio + the
+        # pid-pipe we just opened.
+        keep: set[int] = {0, 1, 2, rfd, wfd}
+        import resource
+        soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
+        os.closerange(3, soft)  # blunt; or enumerate /proc/self/fd
+        # ... then child_target() as before
 ```
 
+Problem: overly aggressive — closes FDs the
+grandchild might legitimately need (e.g. its parent's
+IPC channel for the spawn-spec handshake, if we rely
+on that). Needs thought about which FDs are
+"inheritable and safe" vs. "inherited by accident".
+
+### 2. Cloexec on tractor's own FDs
+
+Set `FD_CLOEXEC` on tractor-created sockets (listener
+sockets, IPC channel sockets, pipes). This flag
+causes automatic close on `execve`, but since we
+`fork()` without `exec()`, this alone doesn't help.
+BUT — combined with a child-side explicit close-
+non-cloexec loop, it gives us a way to mark "my
+private FDs" vs. "safe to inherit". Most robust, but
+requires tractor-wide audit.
+
+### 3. Explicit FD cleanup in `_ForkedProc`/`_child_target`
+
+Have `subint_forkserver_proc`'s `_child_target`
+closure explicitly close the parent-side IPC listener
+FDs before calling `_actor_child_main`. Requires
+being able to enumerate "the parent's listener FDs
+that the child shouldn't keep" — plausible via
+`Actor.ipc_server`'s socket objects.
+
+### 4. Use `os.posix_spawn` with explicit `file_actions`
+
+Instead of raw `os.fork()`, use `os.posix_spawn()`
+which supports explicit file-action specifications
+(close this FD, dup2 that FD). Cleaner semantics, but
+probably incompatible with our "no exec" requirement
+(subint_forkserver is a fork-without-exec design).
+
+**Likely correct answer: (3) — targeted FD cleanup
+via `actor.ipc_server` handle.** (1) is too blunt,
+(2) is too wide-ranging, (4) changes the spawn
+mechanism.
+
+## Reproducer (standalone, no pytest)
+
+```python
+# save as /tmp/forkserver_nested_hang_repro.py  (py3.14+)
+import trio, tractor
+
+async def assert_err():
+    assert 0
+
+async def spawn_and_error(breadth: int = 2, depth: int = 1):
+    async with tractor.open_nursery() as n:
+        for i in range(breadth):
+            if depth > 0:
+                await n.run_in_actor(
+                    spawn_and_error,
+                    breadth=breadth,
+                    depth=depth - 1,
+                    name=f'spawner_{i}_{depth}',
+                )
+            else:
+                await n.run_in_actor(
+                    assert_err,
+                    name=f'errorer_{i}',
+                )
+
+async def _main():
+    async with tractor.open_nursery() as n:
+        for i in range(2):
+            await n.run_in_actor(
+                spawn_and_error,
+                name=f'top_{i}',
+                breadth=2,
+                depth=1,
+            )
+
+if __name__ == '__main__':
+    from tractor.spawn._spawn import try_set_start_method
+    try_set_start_method('subint_forkserver')
+    with trio.fail_after(20):
+        trio.run(_main)
+```
+
+Expected (current): hangs on `trio.fail_after(20)`
+— children never ack the error-propagation cancel
+cascade. Pattern: top 2 direct children, 4
+grandchildren, 1 errorer deadlocks while trying to
+unwind through its parent chain.
+
+After fix: `trio.TooSlowError`-free completion; the
+root's `open_nursery` receives the
+`BaseExceptionGroup` containing the `AssertionError`
+from the errorer and unwinds cleanly.
+
+## Stopgap (landed)
+
+Until the fix lands, `test_nested_multierrors` +
+related multi-level-spawn tests can be skip-marked
+under `subint_forkserver` via
+`@pytest.mark.skipon_spawn_backend('subint_forkserver',
+reason='...')`. Cross-ref this doc.
+
 ## References
 
-- `tractor/spawn/_subint_forkserver.py::_ForkedProc`
-  — the current teardown shim; PID-scoped, not tree-
-  scoped.
+- `tractor/spawn/_subint_forkserver.py::fork_from_worker_thread`
+  — the primitive whose post-fork FD hygiene is
+  probably the culprit.
 - `tractor/spawn/_subint_forkserver.py::subint_forkserver_proc`
-  — the spawn backend whose `finally` block needs the
-  tree-kill fix.
-- `tests/test_cancellation.py` — the surface where the
-  leak surfaces.
+  — the backend function that orchestrates the
+  graceful cancel path hitting this bug.
+- `tractor/spawn/_subint_forkserver.py::_ForkedProc`
+  — the `trio.Process`-compatible shim; NOT the
+  failing component (confirmed via thread-dump).
+- `tests/test_cancellation.py::test_nested_multierrors`
+  — the test that surfaced the hang.
 - `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
-  — sibling tracker for a different forkserver-teardown
-  class (orphaned child doesn't respond to SIGINT); may
-  share root cause with this one once the fix lands.
+  — sibling hang class; probably same underlying
+  fork-FD-inheritance root cause.
 - tractor issue #379 — subint backend tracking.