Compare commits

..

No commits in common. "0cd0b633f1793b17f94ed0ba07c6540e44c60606" and "e5e2afb5f4890cb0c8f7f3bb38ad12429f4e0b59" have entirely different histories.

2 changed files with 140 additions and 375 deletions

View File

@ -1,333 +1,165 @@
# `subint_forkserver` backend: `test_cancellation.py` multi-level cancel cascade hang
# `subint_forkserver` backend leaks subactor descendants in `test_cancellation.py`
Follow-up tracker: surfaced while wiring the new
`subint_forkserver` spawn backend into the full tractor
test matrix (step 2 of the post-backend-lands plan).
See also
`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
— sibling tracker for a different forkserver-teardown
class which probably shares the same fundamental root
cause (fork-FD-inheritance across nested spawns).
test matrix (step 2 of the post-backend-lands plan;
see also
`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`).
## TL;DR
`tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]`
hangs indefinitely under our new backend. The hang is
**inside the graceful IPC cancel cascade** — every actor
in the multi-level tree parks in `epoll_wait` waiting
for IPC messages that never arrive. Not a hard-kill /
tree-reap issue (we don't reach the hard-kill fallback
path at all).
Running `tests/test_cancellation.py` under
`--spawn-backend=subint_forkserver` reproducibly leaks
**exactly 5 `subint-forkserv` comm-named child processes**
after the pytest session exits. Both previously-run
sessions produced the same 5-process signature — not a
flake. Each leaked process holds a `LISTEN` on the
default registry TCP addr (`127.0.0.1:1616`), which
poisons any subsequent tractor test session that
defaults to that addr.
Working hypothesis (unverified): **`os.fork()` from a
subactor inherits the root parent's IPC listener socket
FDs**. When a first-level subactor forkserver-spawns a
grandchild, that grandchild inherits both its direct
spawner's FDs AND the root's FDs — IPC message routing
becomes ambiguous (or silently sends to the wrong
channel), so the cancel cascade can't reach its target.
## Stopgap (not the real fix)
## Corrected diagnosis vs. earlier draft
Multiple tests in `test_cancellation.py` were calling
`tractor.open_nursery()` **without** passing
`registry_addrs=[reg_addr]`, i.e. falling back on the
default `:1616`. The commit accompanying this doc wires
the `reg_addr` fixture through those tests so each run
gets a session-unique port — leaked zombies can no
longer poison **other** tests (they hold their own
unique port instead).
An earlier version of this doc claimed the root cause
was **"forkserver teardown doesn't tree-kill
descendants"** (SIGKILL only reaches the direct child,
grandchildren survive and hold TCP `:1616`). That
diagnosis was **wrong**, caused by conflating two
observations:
Tests touched (in `tests/test_cancellation.py`):
1. *5-zombie leak holding :1616* — happened in my own
workflow when I aborted a bg pytest task with
`pkill` (SIGTERM/SIGKILL, not SIGINT). The abrupt
kill skipped the graceful `ActorNursery.__aexit__`
cancel cascade entirely, orphaning descendants to
init. **This was my cleanup bug, not a forkserver
teardown bug.** Codified the fix (SIGINT-first +
bounded wait before SIGKILL) in
`feedback_sc_graceful_cancel_first.md` +
`.claude/skills/run-tests/SKILL.md`.
2. *`test_nested_multierrors` hangs indefinitely*
the real, separate, forkserver-specific bug
captured by this doc.
- `test_cancel_infinite_streamer`
- `test_some_cancels_all`
- `test_nested_multierrors`
- `test_cancel_via_SIGINT`
- `test_cancel_via_SIGINT_other_task`
The two symptoms are unrelated. The tree-kill / setpgrp
fix direction proposed earlier would not help (1) (SC-
graceful-cleanup is the right answer there) and would
not help (2) (the hang is in the cancel cascade, not
in the hard-kill fallback).
This is a **suite-hygiene fix** — it doesn't close the
actual leak; it just stops the leak from blast-radiusing.
Zombie descendants still accumulate per run.
## Symptom
## The real bug (unfixed)
Reproducer (py3.14, clean env):
`subint_forkserver_proc`'s teardown — `_ForkedProc.kill()`
(plain `os.kill(SIGKILL)` to the direct child pid) +
`proc.wait()` — does **not** reap grandchildren or
deeper descendants. When a cancellation test causes a
multi-level actor tree to tear down, the direct child
dies but its own children survive and get reparented to
init (PID 1), where they stay running with their
inherited FDs (including the registry listen socket).
**Symptom on repro:**
```
$ ss -tlnp 2>/dev/null | grep ':1616'
LISTEN 0 4096 127.0.0.1:1616 0.0.0.0:* \
users:(("subint-forkserv",pid=211595,fd=17),
("subint-forkserv",pid=211585,fd=17),
("subint-forkserv",pid=211583,fd=17),
("subint-forkserv",pid=211576,fd=17),
("subint-forkserv",pid=211572,fd=17))
$ for p in 211572 211576 211583 211585 211595; do
cat /proc/$p/cmdline | tr '\0' ' '; echo; done
./py314/bin/python -m pytest --spawn-backend=subint_forkserver \
tests/test_cancellation.py --timeout=30 --timeout-method=signal \
--tb=no -q --no-header
... (x5, all same cmdline — inherited from fork)
```
All 5 share the pytest cmdline because `os.fork()`
without `exec()` preserves the parent's argv. Their
comm-name (`subint-forkserv`) is the `thread_name` we
pass to the fork-worker thread in
`tractor.spawn._subint_forkserver.fork_from_worker_thread`.
## Why 5?
Not confirmed; guess is 5 = the parametrize cardinality
of one of the leaky tests (e.g. `test_some_cancels_all`
has 5 parametrize cases). Each param-case spawns a
nested tree; each leaks exactly one descendant. Worth
verifying by running each parametrize-case individually
and counting leaked procs per case.
## Ruled out
- **`:1616` collision from a different repo** (e.g.
piker): `/proc/$pid/cmdline` + `cwd` both resolve to
the tractor repo's `py314/` venv for all 5. These are
definitively spawned by our test run.
- **Parent-side `_ForkedProc.wait()` regressed**: the
direct child's teardown completes cleanly (exit-code
captured, `waitpid` returns); the 5 survivors are
deeper-descendants whose parent-side shim has no
handle on them. So the bug isn't in
`_ForkedProc.wait()` — it's in the lack of tree-
level descendant enumeration + reaping during nursery
teardown.
## Likely fix directions
1. **Process-group-scoped spawn + tree kill.** Put each
forkserver-spawned subactor into its own process
group (`os.setpgrp()` in the fork child), then on
teardown `os.killpg(pgid, SIGKILL)` to reap the
whole tree atomically. Simplest, most surgical.
2. **Subreaper registration.** Use
`PR_SET_CHILD_SUBREAPER` on the tractor root so
orphaned grandchildren reparent to the root rather
than init — then we can `waitpid` them from the
parent-side nursery teardown. More invasive.
3. **Explicit descendant enumeration at teardown.**
In `subint_forkserver_proc`'s finally block, walk
`/proc/<pid>/task/*/children` before issuing SIGKILL
to build a descendant-pid set; then kill + reap all
of them. Fragile (Linux-only, proc-fs-scan race).
Vote: **(1)** — clean, POSIX-standard, aligns with how
`subprocess.Popen` (and by extension `trio.lowlevel.
open_process`) handle tree-kill semantics on
kwargs-supplied `start_new_session=True`.
## Reproducer
```sh
# preflight: ensure clean env
ss -tlnp 2>/dev/null | grep ':1616' && echo 'FOUL — cleanup first!' || echo 'clean'
# before: ensure clean env
ss -tlnp 2>/dev/null | grep ':1616' || echo 'clean'
./py314/bin/python -m pytest --spawn-backend=subint_forkserver \
'tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]' \
--timeout=30 --timeout-method=thread --tb=short -v
# run the leaky tests
./py314/bin/python -m pytest \
--spawn-backend=subint_forkserver \
tests/test_cancellation.py \
--timeout=30 --timeout-method=signal --tb=no -q --no-header
# observe: 5 leaked children now holding :1616
ss -tlnp 2>/dev/null | grep ':1616'
```
Expected: `pytest-timeout` fires at 30s with a thread-
dump banner, but the process itself **remains alive
after timeout** and doesn't unwedge on subsequent
SIGINT. Requires SIGKILL to reap.
## Evidence (tree structure at hang point)
All 5 processes are kernel-level `S` (sleeping) in
`do_epoll_wait` (trio's event loop waiting on I/O):
Expected output: `subint-forkserv` processes listed as
listeners on `:1616`. Cleanup:
```sh
pkill -9 -f \
"$(pwd)/py314/bin/python.*pytest.*spawn-backend=subint_forkserver"
```
PID PPID THREADS NAME ROLE
333986 1 2 subint-forkserv pytest main (the test body)
333993 333986 3 subint-forkserv "child 1" spawner subactor
334003 333993 1 subint-forkserv grandchild errorer under child-1
334014 333993 1 subint-forkserv grandchild errorer under child-1
333999 333986 1 subint-forkserv "child 2" spawner subactor (NO grandchildren!)
```
### Asymmetric tree depth
The test's `spawn_and_error(breadth=2, depth=3)` should
have BOTH direct children spawning 2 grandchildren
each, going 3 levels deep. Reality:
- Child 1 (333993, 3 threads) DID spawn its two
grandchildren as expected — fully booted trio
runtime.
- Child 2 (333999, 1 thread) did NOT spawn any
grandchildren — clearly never completed its
nursery's first `run_in_actor`. Its 1-thread state
suggests the runtime never fully booted (no trio
worker threads for `waitpid`/IPC).
This asymmetry is the key clue: the two direct
children started identically but diverged. Probably a
race around fork-inherited state (listener FDs,
subactor-nursery channel state) that happens to land
differently depending on spawn ordering.
### Parent-side state
Thread-dump of pytest main (333986) at the hang:
- Main trio thread — parked in
`trio._core._io_epoll.get_events` (epoll_wait on
its event loop). Waiting for IPC from children.
- Two trio-cache worker threads — each parked in
`outcome.capture(sync_fn)` calling
`os.waitpid(child_pid, 0)`. These are our
`_ForkedProc.wait()` off-loads. They're waiting for
the direct children to exit — but children are
stuck in their own epoll_wait waiting for IPC from
the parent.
**It's a deadlock, not a leak:** the parent is
correctly running `soft_kill(proc, _ForkedProc.wait,
portal)` (graceful IPC cancel via
`Portal.cancel_actor()`), but the children never
acknowledge the cancel message (or the message never
reaches them through the tangled post-fork IPC).
## What's NOT the cause (ruled out)
- **`_ForkedProc.kill()` only SIGKILLs direct pid /
missing tree-kill**: doesn't apply — we never reach
the hard-kill path. The deadlock is in the graceful
cancel cascade.
- **Port `:1616` contention**: ruled out after the
`reg_addr` fixture-wiring fix; each test session
gets a unique port now.
- **GIL starvation / SIGINT pipe filling** (class-A,
`subint_sigint_starvation_issue.md`): doesn't apply
— each subactor is its own OS process with its own
GIL (not legacy-config subint).
- **Child-side `_trio_main` absorbing KBI**: grep
confirmed; `_trio_main` only catches KBI at the
`trio.run()` callsite, which is reached only if the
trio loop exits normally. The children here never
exit trio.run() — they're wedged inside.
## Hypothesis: FD inheritance across nested forks
`subint_forkserver_proc` calls
`fork_from_worker_thread()` which ultimately does
`os.fork()` from a dedicated worker thread. Standard
Linux/POSIX fork semantics: **the child inherits ALL
open FDs from the parent**, including listener
sockets, epoll fds, trio wakeup pipes, and the
parent's IPC channel sockets.
At root-actor fork-spawn time, the root's IPC server
listener FDs are open in the parent. Those get
inherited by child 1. Child 1 then forkserver-spawns
its OWN subactor (grandchild). The grandchild
inherits FDs from child 1 — but child 1's address
space still contains **the root's IPC listener FDs
too** (inherited at first fork). So the grandchild
has THREE sets of FDs:
1. Its own (created after becoming a subactor).
2. Its direct parent child-1's.
3. The ROOT's (grandparent's) — inherited transitively.
IPC message routing may be ambiguous in this tangled
state. Or a listener socket that the root thinks it
owns is actually open in multiple processes, and
messages sent to it go to an arbitrary one. That
would exactly match the observed "graceful cancel
never propagates".
This hypothesis predicts the bug **scales with fork
depth**: single-level forkserver spawn
(`test_subint_forkserver_spawn_basic`) works
perfectly, but any test that spawns a second level
deadlocks. Matches observations so far.
## Fix directions (to validate)
### 1. `close_fds=True` equivalent in `fork_from_worker_thread()`
`subprocess.Popen` / `trio.lowlevel.open_process` have
`close_fds=True` by default on POSIX — they
enumerate open FDs in the child post-fork and close
everything except stdio + any explicitly-passed FDs.
Our raw `os.fork()` doesn't. Adding the equivalent to
our `_worker` prelude would isolate each fork
generation's FD set.
Implementation sketch in
`tractor.spawn._subint_forkserver.fork_from_worker_thread._worker`:
```python
def _worker() -> None:
pid: int = os.fork()
if pid == 0:
# CHILD: close inherited FDs except stdio + the
# pid-pipe we just opened.
keep: set[int] = {0, 1, 2, rfd, wfd}
import resource
soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
os.closerange(3, soft) # blunt; or enumerate /proc/self/fd
# ... then child_target() as before
```
Problem: overly aggressive — closes FDs the
grandchild might legitimately need (e.g. its parent's
IPC channel for the spawn-spec handshake, if we rely
on that). Needs thought about which FDs are
"inheritable and safe" vs. "inherited by accident".
### 2. Cloexec on tractor's own FDs
Set `FD_CLOEXEC` on tractor-created sockets (listener
sockets, IPC channel sockets, pipes). This flag
causes automatic close on `execve`, but since we
`fork()` without `exec()`, this alone doesn't help.
BUT — combined with a child-side explicit close-
non-cloexec loop, it gives us a way to mark "my
private FDs" vs. "safe to inherit". Most robust, but
requires tractor-wide audit.
### 3. Explicit FD cleanup in `_ForkedProc`/`_child_target`
Have `subint_forkserver_proc`'s `_child_target`
closure explicitly close the parent-side IPC listener
FDs before calling `_actor_child_main`. Requires
being able to enumerate "the parent's listener FDs
that the child shouldn't keep" — plausible via
`Actor.ipc_server`'s socket objects.
### 4. Use `os.posix_spawn` with explicit `file_actions`
Instead of raw `os.fork()`, use `os.posix_spawn()`
which supports explicit file-action specifications
(close this FD, dup2 that FD). Cleaner semantics, but
probably incompatible with our "no exec" requirement
(subint_forkserver is a fork-without-exec design).
**Likely correct answer: (3) — targeted FD cleanup
via `actor.ipc_server` handle.** (1) is too blunt,
(2) is too wide-ranging, (4) changes the spawn
mechanism.
## Reproducer (standalone, no pytest)
```python
# save as /tmp/forkserver_nested_hang_repro.py (py3.14+)
import trio, tractor
async def assert_err():
assert 0
async def spawn_and_error(breadth: int = 2, depth: int = 1):
async with tractor.open_nursery() as n:
for i in range(breadth):
if depth > 0:
await n.run_in_actor(
spawn_and_error,
breadth=breadth,
depth=depth - 1,
name=f'spawner_{i}_{depth}',
)
else:
await n.run_in_actor(
assert_err,
name=f'errorer_{i}',
)
async def _main():
async with tractor.open_nursery() as n:
for i in range(2):
await n.run_in_actor(
spawn_and_error,
name=f'top_{i}',
breadth=2,
depth=1,
)
if __name__ == '__main__':
from tractor.spawn._spawn import try_set_start_method
try_set_start_method('subint_forkserver')
with trio.fail_after(20):
trio.run(_main)
```
Expected (current): hangs on `trio.fail_after(20)`
— children never ack the error-propagation cancel
cascade. Pattern: top 2 direct children, 4
grandchildren, 1 errorer deadlocks while trying to
unwind through its parent chain.
After fix: `trio.TooSlowError`-free completion; the
root's `open_nursery` receives the
`BaseExceptionGroup` containing the `AssertionError`
from the errorer and unwinds cleanly.
## Stopgap (landed)
Until the fix lands, `test_nested_multierrors` +
related multi-level-spawn tests can be skip-marked
under `subint_forkserver` via
`@pytest.mark.skipon_spawn_backend('subint_forkserver',
reason='...')`. Cross-ref this doc.
## References
- `tractor/spawn/_subint_forkserver.py::fork_from_worker_thread`
— the primitive whose post-fork FD hygiene is
probably the culprit.
- `tractor/spawn/_subint_forkserver.py::subint_forkserver_proc`
— the backend function that orchestrates the
graceful cancel path hitting this bug.
- `tractor/spawn/_subint_forkserver.py::_ForkedProc`
— the `trio.Process`-compatible shim; NOT the
failing component (confirmed via thread-dump).
- `tests/test_cancellation.py::test_nested_multierrors`
— the test that surfaced the hang.
— the current teardown shim; PID-scoped, not tree-
scoped.
- `tractor/spawn/_subint_forkserver.py::subint_forkserver_proc`
— the spawn backend whose `finally` block needs the
tree-kill fix.
- `tests/test_cancellation.py` — the surface where the
leak surfaces.
- `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
— sibling hang class; probably same underlying
fork-FD-inheritance root cause.
— sibling tracker for a different forkserver-teardown
class (orphaned child doesn't respond to SIGINT); may
share root cause with this one once the fix lands.
- tractor issue #379 — subint backend tracking.

View File

@ -195,69 +195,6 @@ except ImportError:
_has_subints: bool = False
def _close_inherited_fds(
keep: frozenset[int] = frozenset({0, 1, 2}),
) -> int:
'''
Close every open file descriptor in the current process
EXCEPT those in `keep` (default: stdio only).
Intended as the first thing a post-`os.fork()` child runs
after closing any communication pipes it knows about. This
is the fork-child FD hygiene discipline that
`subprocess.Popen(close_fds=True)` applies by default for
its exec-based children, but which we have to implement
ourselves because our `fork_from_worker_thread()` primitive
deliberately does NOT exec.
Why it matters
--------------
Without this, a forkserver-spawned subactor inherits the
parent actor's IPC listener sockets, trio-epoll fd, trio
wakeup-pipe, peer-channel sockets, etc. If that subactor
then itself forkserver-spawns a grandchild, the grandchild
inherits the FDs transitively from *both* its direct
parent AND the root actor IPC message routing becomes
ambiguous and the cancel cascade deadlocks. See
`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`
for the full diagnosis + the empirical repro.
Fresh children will open their own IPC sockets via
`_actor_child_main()`, so they don't need any of the
parent's FDs.
Returns the count of fds that were successfully closed
useful for sanity-check logging at callsites.
'''
# Enumerate open fds via `/proc/self/fd` on Linux (the fast +
# precise path); fall back to `RLIMIT_NOFILE` range close on
# other platforms. Matches stdlib
# `subprocess._posixsubprocess.close_fds` strategy.
try:
fd_names: list[str] = os.listdir('/proc/self/fd')
candidates: list[int] = [
int(n) for n in fd_names if n.isdigit()
]
except (FileNotFoundError, PermissionError):
import resource
soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
candidates = list(range(3, soft))
closed: int = 0
for fd in candidates:
if fd in keep:
continue
try:
os.close(fd)
closed += 1
except OSError:
# fd was already closed (race with listdir) or
# otherwise unclosable — either is fine.
pass
return closed
def _format_child_exit(
status: int,
) -> str:
@ -365,13 +302,9 @@ def fork_from_worker_thread(
pid: int = os.fork()
if pid == 0:
# CHILD: close the pid-pipe ends (we don't use
# them here), then scrub ALL other inherited FDs
# so the child starts with a clean slate
# (stdio-only). Critical for multi-level spawn
# trees — see `_close_inherited_fds()` docstring.
# them here), run the user callable if any, exit.
os.close(rfd)
os.close(wfd)
_close_inherited_fds()
rc: int = 0
if child_target is not None:
try: