Doc ruled-out fix + capture-pipe aside

Two new sections in
`subint_forkserver_test_cancellation_leak_issue.md`
documenting continued investigation of the
`test_nested_multierrors[subint_forkserver]` peer-
channel-loop hang:

1. **"Attempted fix (DID NOT work) — hypothesis
   (3)"**: tried sync-closing peer channels' raw
   socket fds from `_serve_ipc_eps`'s finally block
   (iterate `server._peers`, `_chan._transport.
   stream.socket.close()`). Theory was that sync
   close would propagate as `EBADF` /
   `ClosedResourceError` into the stuck
   `recv_some()` and unblock it. Result: identical
   hang. Either trio holds an internal fd
   reference that survives external close, or the
   stuck recv isn't even the root blocker. Either
   way: ruled out, experiment reverted, skip-mark
   restored.
2. **"Aside: `-s` flag changes behavior for peer-
   intensive tests"**: noticed
   `test_context_stream_semantics.py` under
   `subint_forkserver` hangs with default
   `--capture=fd` but passes with `-s`
   (`--capture=no`). Working hypothesis: subactors
   inherit pytest's capture pipe (fds 1,2 — which
   `_close_inherited_fds` deliberately preserves);
   verbose subactor logging fills the buffer,
   writes block, deadlock. Fix direction (if
   confirmed): redirect subactor stdout/stderr to
   `/dev/null` or a file in `_actor_child_main`.
   Not a blocker on the main investigation;
   deserves its own mini-tracker.

Both sections are diagnosis-only — no code changes
in this commit.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
subint_forkserver_backend
Gud Boi 2026-04-23 18:10:30 -04:00
parent 76d12060aa
commit 7cd47ef7fb
1 changed files with 56 additions and 0 deletions

View File

@ -395,6 +395,62 @@ Candidate follow-up experiments:
re-raise means it should still exit. Unless re-raise means it should still exit. Unless
something higher up swallows it. something higher up swallows it.
### Attempted fix (DID NOT work) — hypothesis (3)
Tried: in `_serve_ipc_eps` finally, after closing
listeners, also iterate `server._peers` and
sync-close each peer channel's underlying stream
socket fd:
```python
for _uid, _chans in list(server._peers.items()):
for _chan in _chans:
try:
_stream = _chan._transport.stream if _chan._transport else None
if _stream is not None:
_stream.socket.close() # sync fd close
except (AttributeError, OSError):
pass
```
Theory: closing the socket fd from outside the stuck
recv task would make the recv see EBADF /
ClosedResourceError and unblock.
Result: `test_nested_multierrors[subint_forkserver]`
still hangs identically. Either:
- The sync `socket.close()` doesn't propagate into
trio's in-flight `recv_some()` the way I expected
(trio may hold an internal reference that keeps the
fd open even after an external close), or
- The stuck recv isn't even the root blocker and the
peer handlers never reach the finally for some
reason I haven't understood yet.
Either way, the sync-close hypothesis is **ruled
out**. Reverted the experiment, restored the skip-
mark on the test.
### Aside: `-s` flag changes behavior for peer-intensive tests
While exploring, noticed
`tests/test_context_stream_semantics.py` under
`--spawn-backend=subint_forkserver` hangs with
pytest's default `--capture=fd` but passes with
`-s` (`--capture=no`). Hypothesis (unverified): fork
children inherit pytest's capture pipe for stdout/
stderr (fds 1,2 — we preserve these in
`_close_inherited_fds`). When subactor logging is
verbose, the capture pipe buffer fills, writes block,
child can't progress, deadlock.
If confirmed, fix direction: redirect subactor
stdout/stderr to `/dev/null` (or a file) in
`_actor_child_main` so subactors don't hold pytest's
capture pipe open. Not a blocker on the main
peer-chan-loop investigation; deserves its own mini-
tracker.
## Stopgap (landed) ## Stopgap (landed)
`test_nested_multierrors` skip-marked under `test_nested_multierrors` skip-marked under