From 7cd47ef7fb0ca9e467a87d243eb6c1c700fe3503 Mon Sep 17 00:00:00 2001 From: goodboy Date: Thu, 23 Apr 2026 18:10:30 -0400 Subject: [PATCH] Doc ruled-out fix + capture-pipe aside MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two new sections in `subint_forkserver_test_cancellation_leak_issue.md` documenting continued investigation of the `test_nested_multierrors[subint_forkserver]` peer- channel-loop hang: 1. **"Attempted fix (DID NOT work) — hypothesis (3)"**: tried sync-closing peer channels' raw socket fds from `_serve_ipc_eps`'s finally block (iterate `server._peers`, `_chan._transport. stream.socket.close()`). Theory was that sync close would propagate as `EBADF` / `ClosedResourceError` into the stuck `recv_some()` and unblock it. Result: identical hang. Either trio holds an internal fd reference that survives external close, or the stuck recv isn't even the root blocker. Either way: ruled out, experiment reverted, skip-mark restored. 2. **"Aside: `-s` flag changes behavior for peer- intensive tests"**: noticed `test_context_stream_semantics.py` under `subint_forkserver` hangs with default `--capture=fd` but passes with `-s` (`--capture=no`). Working hypothesis: subactors inherit pytest's capture pipe (fds 1,2 — which `_close_inherited_fds` deliberately preserves); verbose subactor logging fills the buffer, writes block, deadlock. Fix direction (if confirmed): redirect subactor stdout/stderr to `/dev/null` or a file in `_actor_child_main`. Not a blocker on the main investigation; deserves its own mini-tracker. Both sections are diagnosis-only — no code changes in this commit. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code --- ...forkserver_test_cancellation_leak_issue.md | 56 +++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md b/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md index 8762bba7..f273a304 100644 --- a/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md +++ b/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md @@ -395,6 +395,62 @@ Candidate follow-up experiments: re-raise means it should still exit. Unless something higher up swallows it. +### Attempted fix (DID NOT work) — hypothesis (3) + +Tried: in `_serve_ipc_eps` finally, after closing +listeners, also iterate `server._peers` and +sync-close each peer channel's underlying stream +socket fd: + +```python +for _uid, _chans in list(server._peers.items()): + for _chan in _chans: + try: + _stream = _chan._transport.stream if _chan._transport else None + if _stream is not None: + _stream.socket.close() # sync fd close + except (AttributeError, OSError): + pass +``` + +Theory: closing the socket fd from outside the stuck +recv task would make the recv see EBADF / +ClosedResourceError and unblock. + +Result: `test_nested_multierrors[subint_forkserver]` +still hangs identically. Either: +- The sync `socket.close()` doesn't propagate into + trio's in-flight `recv_some()` the way I expected + (trio may hold an internal reference that keeps the + fd open even after an external close), or +- The stuck recv isn't even the root blocker and the + peer handlers never reach the finally for some + reason I haven't understood yet. + +Either way, the sync-close hypothesis is **ruled +out**. Reverted the experiment, restored the skip- +mark on the test. + +### Aside: `-s` flag changes behavior for peer-intensive tests + +While exploring, noticed +`tests/test_context_stream_semantics.py` under +`--spawn-backend=subint_forkserver` hangs with +pytest's default `--capture=fd` but passes with +`-s` (`--capture=no`). Hypothesis (unverified): fork +children inherit pytest's capture pipe for stdout/ +stderr (fds 1,2 — we preserve these in +`_close_inherited_fds`). When subactor logging is +verbose, the capture pipe buffer fills, writes block, +child can't progress, deadlock. + +If confirmed, fix direction: redirect subactor +stdout/stderr to `/dev/null` (or a file) in +`_actor_child_main` so subactors don't hold pytest's +capture pipe open. Not a blocker on the main +peer-chan-loop investigation; deserves its own mini- +tracker. + ## Stopgap (landed) `test_nested_multierrors` skip-marked under