Third diagnostic pass on
`test_nested_multierrors[subint_forkserver]` hang.
Two prior hypotheses ruled out + a new, more
specific deadlock shape identified.
Ruled out,
- **capture-pipe fill** (`-s` flag changes test):
retested explicitly — `test_nested_multierrors`
hangs identically with and without `-s`. The
earlier observation was likely a competing
pytest process I had running in another session
holding registry state
- **stuck peer-chan recv that cancel can't
break**: pivot from the prior pass. With
`handle_stream_from_peer` instrumented at ENTER
/ `except trio.Cancelled:` / finally: 40
ENTERs, ZERO `trio.Cancelled` hits. Cancel never
reaches those tasks at all — the recvs are
fine, nothing is telling them to stop
Actual deadlock shape: multi-level mutual wait.
root blocks on spawner.wait()
spawner blocks on grandchild.wait()
grandchild blocks on errorer.wait()
errorer Actor.cancel() ran, but proc
never exits
`Actor.cancel()` fired in 12 PIDs — but NOT in
root + 2 direct spawners. Those 3 have peer
handlers stuck because their own `Actor.cancel()`
never runs, which only runs when the enclosing
`tractor.open_nursery()` exits, which waits on
`_ForkedProc.wait()` for the child pidfd to
signal, which only signals when the child
process fully exits.
Refined question: **why does an errorer process
not exit after its `Actor.cancel()` completes?**
Three hypotheses (unverified):
1. `_parent_chan_cs.cancel()` fires but the
shielded loop's recv is stuck in a way cancel
still can't break
2. `async_main`'s post-cancel unwind has other
tasks in `root_tn` awaiting something that
never arrives (e.g. outbound IPC reply)
3. `os._exit(rc)` in `_worker` never runs because
`_child_target` never returns
Next-session probes (priority order):
1. instrument `_worker`'s fork-child branch —
confirm whether `child_target()` returns /
`os._exit(rc)` is reached for errorer PIDs
2. instrument `async_main`'s final unwind — see
which await in teardown doesn't complete
3. compare under `trio_proc` backend at the
equivalent level to spot divergence
No code changes — diagnosis-only.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code