2 changed files with 140 additions and 375 deletions
--- a/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md
+++ b/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md
@ -1,333 +1,165 @@
-# `subint_forkserver` backend: `test_cancellation.py` multi-level cancel cascade hang
+# `subint_forkserver` backend leaks subactor descendants in `test_cancellation.py`

 Follow-up tracker: surfaced while wiring the new
 `subint_forkserver` spawn backend into the full tractor
-test matrix (step 2 of the post-backend-lands plan).
-See also
-`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
-— sibling tracker for a different forkserver-teardown
-class which probably shares the same fundamental root
-cause (fork-FD-inheritance across nested spawns).
+test matrix (step 2 of the post-backend-lands plan;
+see also
+`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`).

 ## TL;DR

-`tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]`
-hangs indefinitely under our new backend. The hang is
-**inside the graceful IPC cancel cascade** — every actor
-in the multi-level tree parks in `epoll_wait` waiting
-for IPC messages that never arrive. Not a hard-kill /
-tree-reap issue (we don't reach the hard-kill fallback
-path at all).
+Running `tests/test_cancellation.py` under
+`--spawn-backend=subint_forkserver` reproducibly leaks
+**exactly 5 `subint-forkserv` comm-named child processes**
+after the pytest session exits. Both previously-run
+sessions produced the same 5-process signature — not a
+flake. Each leaked process holds a `LISTEN` on the
+default registry TCP addr (`127.0.0.1:1616`), which
+poisons any subsequent tractor test session that
+defaults to that addr.

-Working hypothesis (unverified): **`os.fork()` from a
-subactor inherits the root parent's IPC listener socket
-FDs**. When a first-level subactor forkserver-spawns a
-grandchild, that grandchild inherits both its direct
-spawner's FDs AND the root's FDs — IPC message routing
-becomes ambiguous (or silently sends to the wrong
-channel), so the cancel cascade can't reach its target.
+## Stopgap (not the real fix)

-## Corrected diagnosis vs. earlier draft
+Multiple tests in `test_cancellation.py` were calling
+`tractor.open_nursery()` **without** passing
+`registry_addrs=[reg_addr]`, i.e. falling back on the
+default `:1616`. The commit accompanying this doc wires
+the `reg_addr` fixture through those tests so each run
+gets a session-unique port — leaked zombies can no
+longer poison **other** tests (they hold their own
+unique port instead).

-An earlier version of this doc claimed the root cause
-was **"forkserver teardown doesn't tree-kill
-descendants"** (SIGKILL only reaches the direct child,
-grandchildren survive and hold TCP `:1616`). That
-diagnosis was **wrong**, caused by conflating two
-observations:
+Tests touched (in `tests/test_cancellation.py`):

-1. *5-zombie leak holding :1616* — happened in my own
-   workflow when I aborted a bg pytest task with
-   `pkill` (SIGTERM/SIGKILL, not SIGINT). The abrupt
-   kill skipped the graceful `ActorNursery.__aexit__`
-   cancel cascade entirely, orphaning descendants to
-   init. **This was my cleanup bug, not a forkserver
-   teardown bug.** Codified the fix (SIGINT-first +
-   bounded wait before SIGKILL) in
-   `feedback_sc_graceful_cancel_first.md` +
-   `.claude/skills/run-tests/SKILL.md`.
-2. *`test_nested_multierrors` hangs indefinitely* —
-   the real, separate, forkserver-specific bug
-   captured by this doc.
+- `test_cancel_infinite_streamer`
+- `test_some_cancels_all`
+- `test_nested_multierrors`
+- `test_cancel_via_SIGINT`
+- `test_cancel_via_SIGINT_other_task`

-The two symptoms are unrelated. The tree-kill / setpgrp
-fix direction proposed earlier would not help (1) (SC-
-graceful-cleanup is the right answer there) and would
-not help (2) (the hang is in the cancel cascade, not
-in the hard-kill fallback).
+This is a **suite-hygiene fix** — it doesn't close the
+actual leak; it just stops the leak from blast-radiusing.
+Zombie descendants still accumulate per run.

-## Symptom
+## The real bug (unfixed)

-Reproducer (py3.14, clean env):
+`subint_forkserver_proc`'s teardown — `_ForkedProc.kill()`
+(plain `os.kill(SIGKILL)` to the direct child pid) +
+`proc.wait()` — does **not** reap grandchildren or
+deeper descendants. When a cancellation test causes a
+multi-level actor tree to tear down, the direct child
+dies but its own children survive and get reparented to
+init (PID 1), where they stay running with their
+inherited FDs (including the registry listen socket).
+
+**Symptom on repro:**
+
+```
+$ ss -tlnp 2>/dev/null | grep ':1616'
+LISTEN 0 4096 127.0.0.1:1616 0.0.0.0:* \
+    users:(("subint-forkserv",pid=211595,fd=17),
+           ("subint-forkserv",pid=211585,fd=17),
+           ("subint-forkserv",pid=211583,fd=17),
+           ("subint-forkserv",pid=211576,fd=17),
+           ("subint-forkserv",pid=211572,fd=17))
+
+$ for p in 211572 211576 211583 211585 211595; do
+    cat /proc/$p/cmdline | tr '\0' ' '; echo; done
+./py314/bin/python -m pytest --spawn-backend=subint_forkserver \
+  tests/test_cancellation.py --timeout=30 --timeout-method=signal \
+  --tb=no -q --no-header
+... (x5, all same cmdline — inherited from fork)
+```
+
+All 5 share the pytest cmdline because `os.fork()`
+without `exec()` preserves the parent's argv. Their
+comm-name (`subint-forkserv`) is the `thread_name` we
+pass to the fork-worker thread in
+`tractor.spawn._subint_forkserver.fork_from_worker_thread`.
+
+## Why 5?
+
+Not confirmed; guess is 5 = the parametrize cardinality
+of one of the leaky tests (e.g. `test_some_cancels_all`
+has 5 parametrize cases). Each param-case spawns a
+nested tree; each leaks exactly one descendant. Worth
+verifying by running each parametrize-case individually
+and counting leaked procs per case.
+
+## Ruled out
+
+- **`:1616` collision from a different repo** (e.g.
+  piker): `/proc/$pid/cmdline` + `cwd` both resolve to
+  the tractor repo's `py314/` venv for all 5. These are
+  definitively spawned by our test run.
+- **Parent-side `_ForkedProc.wait()` regressed**: the
+  direct child's teardown completes cleanly (exit-code
+  captured, `waitpid` returns); the 5 survivors are
+  deeper-descendants whose parent-side shim has no
+  handle on them. So the bug isn't in
+  `_ForkedProc.wait()` — it's in the lack of tree-
+  level descendant enumeration + reaping during nursery
+  teardown.
+
+## Likely fix directions
+
+1. **Process-group-scoped spawn + tree kill.** Put each
+   forkserver-spawned subactor into its own process
+   group (`os.setpgrp()` in the fork child), then on
+   teardown `os.killpg(pgid, SIGKILL)` to reap the
+   whole tree atomically. Simplest, most surgical.
+2. **Subreaper registration.** Use
+   `PR_SET_CHILD_SUBREAPER` on the tractor root so
+   orphaned grandchildren reparent to the root rather
+   than init — then we can `waitpid` them from the
+   parent-side nursery teardown. More invasive.
+3. **Explicit descendant enumeration at teardown.**
+   In `subint_forkserver_proc`'s finally block, walk
+   `/proc/<pid>/task/*/children` before issuing SIGKILL
+   to build a descendant-pid set; then kill + reap all
+   of them. Fragile (Linux-only, proc-fs-scan race).
+
+Vote: **(1)** — clean, POSIX-standard, aligns with how
+`subprocess.Popen` (and by extension `trio.lowlevel.
+open_process`) handle tree-kill semantics on
+kwargs-supplied `start_new_session=True`.
+
+## Reproducer

 ```sh
-# preflight: ensure clean env
-ss -tlnp 2>/dev/null | grep ':1616' && echo 'FOUL — cleanup first!' || echo 'clean'
+# before: ensure clean env
+ss -tlnp 2>/dev/null | grep ':1616' || echo 'clean'

-./py314/bin/python -m pytest --spawn-backend=subint_forkserver \
-  'tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]' \
-  --timeout=30 --timeout-method=thread --tb=short -v
+# run the leaky tests
+./py314/bin/python -m pytest \
+  --spawn-backend=subint_forkserver \
+  tests/test_cancellation.py \
+  --timeout=30 --timeout-method=signal --tb=no -q --no-header
+
+# observe: 5 leaked children now holding :1616
+ss -tlnp 2>/dev/null | grep ':1616'
 ```

-Expected: `pytest-timeout` fires at 30s with a thread-
-dump banner, but the process itself **remains alive
-after timeout** and doesn't unwedge on subsequent
-SIGINT. Requires SIGKILL to reap.
-
-## Evidence (tree structure at hang point)
-
-All 5 processes are kernel-level `S` (sleeping) in
-`do_epoll_wait` (trio's event loop waiting on I/O):
+Expected output: `subint-forkserv` processes listed as
+listeners on `:1616`. Cleanup:

+```sh
+pkill -9 -f \
+  "$(pwd)/py314/bin/python.*pytest.*spawn-backend=subint_forkserver"
 ```
-PID     PPID    THREADS  NAME             ROLE
-333986  1       2        subint-forkserv  pytest main (the test body)
-333993  333986  3        subint-forkserv  "child 1" spawner subactor
-  334003 333993 1        subint-forkserv  grandchild errorer under child-1
-  334014 333993 1        subint-forkserv  grandchild errorer under child-1
-333999  333986  1        subint-forkserv  "child 2" spawner subactor (NO grandchildren!)
-```
-
-### Asymmetric tree depth
-
-The test's `spawn_and_error(breadth=2, depth=3)` should
-have BOTH direct children spawning 2 grandchildren
-each, going 3 levels deep. Reality:
-
- Child 1 (333993, 3 threads) DID spawn its two
-  grandchildren as expected — fully booted trio
-  runtime.
- Child 2 (333999, 1 thread) did NOT spawn any
-  grandchildren — clearly never completed its
-  nursery's first `run_in_actor`. Its 1-thread state
-  suggests the runtime never fully booted (no trio
-  worker threads for `waitpid`/IPC).
-
-This asymmetry is the key clue: the two direct
-children started identically but diverged. Probably a
-race around fork-inherited state (listener FDs,
-subactor-nursery channel state) that happens to land
-differently depending on spawn ordering.
-
-### Parent-side state
-
-Thread-dump of pytest main (333986) at the hang:
-
- Main trio thread — parked in
-  `trio._core._io_epoll.get_events` (epoll_wait on
-  its event loop). Waiting for IPC from children.
- Two trio-cache worker threads — each parked in
-  `outcome.capture(sync_fn)` calling
-  `os.waitpid(child_pid, 0)`. These are our
-  `_ForkedProc.wait()` off-loads. They're waiting for
-  the direct children to exit — but children are
-  stuck in their own epoll_wait waiting for IPC from
-  the parent.
-
-**It's a deadlock, not a leak:** the parent is
-correctly running `soft_kill(proc, _ForkedProc.wait,
-portal)` (graceful IPC cancel via
-`Portal.cancel_actor()`), but the children never
-acknowledge the cancel message (or the message never
-reaches them through the tangled post-fork IPC).
-
-## What's NOT the cause (ruled out)
-
- **`_ForkedProc.kill()` only SIGKILLs direct pid /
-  missing tree-kill**: doesn't apply — we never reach
-  the hard-kill path. The deadlock is in the graceful
-  cancel cascade.
- **Port `:1616` contention**: ruled out after the
-  `reg_addr` fixture-wiring fix; each test session
-  gets a unique port now.
- **GIL starvation / SIGINT pipe filling** (class-A,
-  `subint_sigint_starvation_issue.md`): doesn't apply
-  — each subactor is its own OS process with its own
-  GIL (not legacy-config subint).
- **Child-side `_trio_main` absorbing KBI**: grep
-  confirmed; `_trio_main` only catches KBI at the
-  `trio.run()` callsite, which is reached only if the
-  trio loop exits normally. The children here never
-  exit trio.run() — they're wedged inside.
-
-## Hypothesis: FD inheritance across nested forks
-
-`subint_forkserver_proc` calls
-`fork_from_worker_thread()` which ultimately does
-`os.fork()` from a dedicated worker thread. Standard
-Linux/POSIX fork semantics: **the child inherits ALL
-open FDs from the parent**, including listener
-sockets, epoll fds, trio wakeup pipes, and the
-parent's IPC channel sockets.
-
-At root-actor fork-spawn time, the root's IPC server
-listener FDs are open in the parent. Those get
-inherited by child 1. Child 1 then forkserver-spawns
-its OWN subactor (grandchild). The grandchild
-inherits FDs from child 1 — but child 1's address
-space still contains **the root's IPC listener FDs
-too** (inherited at first fork). So the grandchild
-has THREE sets of FDs:
-
-1. Its own (created after becoming a subactor).
-2. Its direct parent child-1's.
-3. The ROOT's (grandparent's) — inherited transitively.
-
-IPC message routing may be ambiguous in this tangled
-state. Or a listener socket that the root thinks it
-owns is actually open in multiple processes, and
-messages sent to it go to an arbitrary one. That
-would exactly match the observed "graceful cancel
-never propagates".
-
-This hypothesis predicts the bug **scales with fork
-depth**: single-level forkserver spawn
-(`test_subint_forkserver_spawn_basic`) works
-perfectly, but any test that spawns a second level
-deadlocks. Matches observations so far.
-
-## Fix directions (to validate)
-
-### 1. `close_fds=True` equivalent in `fork_from_worker_thread()`
-
-`subprocess.Popen` / `trio.lowlevel.open_process` have
-`close_fds=True` by default on POSIX — they
-enumerate open FDs in the child post-fork and close
-everything except stdio + any explicitly-passed FDs.
-Our raw `os.fork()` doesn't. Adding the equivalent to
-our `_worker` prelude would isolate each fork
-generation's FD set.
-
-Implementation sketch in
-`tractor.spawn._subint_forkserver.fork_from_worker_thread._worker`:
-
-```python
-def _worker() -> None:
-    pid: int = os.fork()
-    if pid == 0:
-        # CHILD: close inherited FDs except stdio + the
-        # pid-pipe we just opened.
-        keep: set[int] = {0, 1, 2, rfd, wfd}
-        import resource
-        soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
-        os.closerange(3, soft)  # blunt; or enumerate /proc/self/fd
-        # ... then child_target() as before
-```
-
-Problem: overly aggressive — closes FDs the
-grandchild might legitimately need (e.g. its parent's
-IPC channel for the spawn-spec handshake, if we rely
-on that). Needs thought about which FDs are
-"inheritable and safe" vs. "inherited by accident".
-
-### 2. Cloexec on tractor's own FDs
-
-Set `FD_CLOEXEC` on tractor-created sockets (listener
-sockets, IPC channel sockets, pipes). This flag
-causes automatic close on `execve`, but since we
-`fork()` without `exec()`, this alone doesn't help.
-BUT — combined with a child-side explicit close-
-non-cloexec loop, it gives us a way to mark "my
-private FDs" vs. "safe to inherit". Most robust, but
-requires tractor-wide audit.
-
-### 3. Explicit FD cleanup in `_ForkedProc`/`_child_target`
-
-Have `subint_forkserver_proc`'s `_child_target`
-closure explicitly close the parent-side IPC listener
-FDs before calling `_actor_child_main`. Requires
-being able to enumerate "the parent's listener FDs
-that the child shouldn't keep" — plausible via
-`Actor.ipc_server`'s socket objects.
-
-### 4. Use `os.posix_spawn` with explicit `file_actions`
-
-Instead of raw `os.fork()`, use `os.posix_spawn()`
-which supports explicit file-action specifications
-(close this FD, dup2 that FD). Cleaner semantics, but
-probably incompatible with our "no exec" requirement
-(subint_forkserver is a fork-without-exec design).
-
-**Likely correct answer: (3) — targeted FD cleanup
-via `actor.ipc_server` handle.** (1) is too blunt,
-(2) is too wide-ranging, (4) changes the spawn
-mechanism.
-
-## Reproducer (standalone, no pytest)
-
-```python
-# save as /tmp/forkserver_nested_hang_repro.py  (py3.14+)
-import trio, tractor
-
-async def assert_err():
-    assert 0
-
-async def spawn_and_error(breadth: int = 2, depth: int = 1):
-    async with tractor.open_nursery() as n:
-        for i in range(breadth):
-            if depth > 0:
-                await n.run_in_actor(
-                    spawn_and_error,
-                    breadth=breadth,
-                    depth=depth - 1,
-                    name=f'spawner_{i}_{depth}',
-                )
-            else:
-                await n.run_in_actor(
-                    assert_err,
-                    name=f'errorer_{i}',
-                )
-
-async def _main():
-    async with tractor.open_nursery() as n:
-        for i in range(2):
-            await n.run_in_actor(
-                spawn_and_error,
-                name=f'top_{i}',
-                breadth=2,
-                depth=1,
-            )
-
-if __name__ == '__main__':
-    from tractor.spawn._spawn import try_set_start_method
-    try_set_start_method('subint_forkserver')
-    with trio.fail_after(20):
-        trio.run(_main)
-```
-
-Expected (current): hangs on `trio.fail_after(20)`
-— children never ack the error-propagation cancel
-cascade. Pattern: top 2 direct children, 4
-grandchildren, 1 errorer deadlocks while trying to
-unwind through its parent chain.
-
-After fix: `trio.TooSlowError`-free completion; the
-root's `open_nursery` receives the
-`BaseExceptionGroup` containing the `AssertionError`
-from the errorer and unwinds cleanly.
-
-## Stopgap (landed)
-
-Until the fix lands, `test_nested_multierrors` +
-related multi-level-spawn tests can be skip-marked
-under `subint_forkserver` via
-`@pytest.mark.skipon_spawn_backend('subint_forkserver',
-reason='...')`. Cross-ref this doc.

 ## References

- `tractor/spawn/_subint_forkserver.py::fork_from_worker_thread`
-  — the primitive whose post-fork FD hygiene is
-  probably the culprit.
- `tractor/spawn/_subint_forkserver.py::subint_forkserver_proc`
-  — the backend function that orchestrates the
-  graceful cancel path hitting this bug.
 - `tractor/spawn/_subint_forkserver.py::_ForkedProc`
-  — the `trio.Process`-compatible shim; NOT the
-  failing component (confirmed via thread-dump).
- `tests/test_cancellation.py::test_nested_multierrors`
-  — the test that surfaced the hang.
+  — the current teardown shim; PID-scoped, not tree-
+  scoped.
+- `tractor/spawn/_subint_forkserver.py::subint_forkserver_proc`
+  — the spawn backend whose `finally` block needs the
+  tree-kill fix.
+- `tests/test_cancellation.py` — the surface where the
+  leak surfaces.
 - `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
-  — sibling hang class; probably same underlying
-  fork-FD-inheritance root cause.
+  — sibling tracker for a different forkserver-teardown
+  class (orphaned child doesn't respond to SIGINT); may
+  share root cause with this one once the fix lands.
 - tractor issue #379 — subint backend tracking.
--- a/tractor/spawn/_subint_forkserver.py
+++ b/tractor/spawn/_subint_forkserver.py
@ -195,69 +195,6 @@ except ImportError:
    _has_subints: bool = False


-def _close_inherited_fds(
-    keep: frozenset[int] = frozenset({0, 1, 2}),
-) -> int:
-    '''
-    Close every open file descriptor in the current process
-    EXCEPT those in `keep` (default: stdio only).
-
-    Intended as the first thing a post-`os.fork()` child runs
-    after closing any communication pipes it knows about. This
-    is the fork-child FD hygiene discipline that
-    `subprocess.Popen(close_fds=True)` applies by default for
-    its exec-based children, but which we have to implement
-    ourselves because our `fork_from_worker_thread()` primitive
-    deliberately does NOT exec.
-
-    Why it matters
-    --------------
-    Without this, a forkserver-spawned subactor inherits the
-    parent actor's IPC listener sockets, trio-epoll fd, trio
-    wakeup-pipe, peer-channel sockets, etc. If that subactor
-    then itself forkserver-spawns a grandchild, the grandchild
-    inherits the FDs transitively from *both* its direct
-    parent AND the root actor — IPC message routing becomes
-    ambiguous and the cancel cascade deadlocks. See
-    `ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`
-    for the full diagnosis + the empirical repro.
-
-    Fresh children will open their own IPC sockets via
-    `_actor_child_main()`, so they don't need any of the
-    parent's FDs.
-
-    Returns the count of fds that were successfully closed —
-    useful for sanity-check logging at callsites.
-
-    '''
-    # Enumerate open fds via `/proc/self/fd` on Linux (the fast +
-    # precise path); fall back to `RLIMIT_NOFILE` range close on
-    # other platforms. Matches stdlib
-    # `subprocess._posixsubprocess.close_fds` strategy.
-    try:
-        fd_names: list[str] = os.listdir('/proc/self/fd')
-        candidates: list[int] = [
-            int(n) for n in fd_names if n.isdigit()
-        ]
-    except (FileNotFoundError, PermissionError):
-        import resource
-        soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
-        candidates = list(range(3, soft))
-
-    closed: int = 0
-    for fd in candidates:
-        if fd in keep:
-            continue
-        try:
-            os.close(fd)
-            closed += 1
-        except OSError:
-            # fd was already closed (race with listdir) or
-            # otherwise unclosable — either is fine.
-            pass
-    return closed
-
-
 def _format_child_exit(
    status: int,
 ) -> str:
@ -365,13 +302,9 @@ def fork_from_worker_thread(
        pid: int = os.fork()
        if pid == 0:
            # CHILD: close the pid-pipe ends (we don't use
-            # them here), then scrub ALL other inherited FDs
-            # so the child starts with a clean slate
-            # (stdio-only). Critical for multi-level spawn
-            # trees — see `_close_inherited_fds()` docstring.
+            # them here), run the user callable if any, exit.
            os.close(rfd)
            os.close(wfd)
-            _close_inherited_fds()
            rc: int = 0
            if child_target is not None:
                try: