tractor

History

Gud Boi eceed29d4a Pin forkserver hang to pytest `--capture=fd` Sixth and final diagnostic pass — after all 4 cascade fixes landed (FD hygiene, pidfd wait, `_parent_chan_cs` wiring, bounded peer-clear), the actual last gate on `test_nested_multierrors[subint_forkserver]` turned out to be pytest's default `--capture=fd` stdout/stderr capture, not anything in the runtime cascade. Empirical result: `pytest -s` → test PASSES in 6.20s. Default `--capture=fd` → hangs forever. Mechanism: pytest replaces the parent's fds 1,2 with pipe write-ends it reads from. Fork children inherit those pipes (since `_close_inherited_fds` correctly preserves stdio). The error-propagation cascade in a multi-level cancel test generates 7+ actors each logging multiple `RemoteActorError` / `ExceptionGroup` tracebacks — enough output to fill Linux's 64KB pipe buffer. Writes block, subactors can't progress, processes don't exit, `_ForkedProc.wait` hangs. Self-critical aside: I earlier tested w/ and w/o `-s` and both hung, concluding "capture-pipe ruled out". That was wrong — at that time fixes 1-4 weren't all in place, so the test was failing at deeper levels long before reaching the "produce lots of output" phase. Once the cascade could actually tear down cleanly, enough output flowed to hit the pipe limit. Order-of- operations mistake: ruling something out based on a test that was failing for a different reason. Deats, - `subint_forkserver_test_cancellation_leak_issue .md`: new section "Update — VERY late: pytest capture pipe IS the final gate" w/ DIAG timeline showing `trio.run` fully returns, diagnosis of pipe-fill mechanism, retrospective on the earlier wrong ruling-out, and fix direction (redirect subactor stdout/stderr to `/dev/null` in fork-child prelude, conditional on pytest-detection or opt-in flag) - `tests/test_cancellation.py`: skip-mark reason rewritten to describe the capture-pipe gate specifically; cross-refs the new doc section - `tests/spawn/test_subint_forkserver.py`: the orphan-SIGINT test regresses back to xfail. Previously passed after the FD-hygiene fix, but the new `wait_for_no_more_peers( move_on_after=3.0)` bound in `async_main`'s teardown added up to 3s latency, pushing orphan-subactor exit past the test's 10s poll window. Real fix: faster orphan-side teardown OR extend poll window to 15s No runtime code changes in this commit — just test-mark adjustments + doc wrap-up. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-23 23:18:14 -04:00
..
conc-anal	Pin forkserver hang to pytest `--capture=fd`	2026-04-23 23:18:14 -04:00
prompt-io	Add CPython-level `subint_fork` workaround smoketest	2026-04-23 18:48:34 -04:00

Gud Boi eceed29d4a Pin forkserver hang to pytest `--capture=fd`

Sixth and final diagnostic pass — after all 4
cascade fixes landed (FD hygiene, pidfd wait,
`_parent_chan_cs` wiring, bounded peer-clear), the
actual last gate on
`test_nested_multierrors[subint_forkserver]`
turned out to be **pytest's default
`--capture=fd` stdout/stderr capture**, not
anything in the runtime cascade.

Empirical result: `pytest -s` → test PASSES in
6.20s. Default `--capture=fd` → hangs forever.

Mechanism: pytest replaces the parent's fds 1,2
with pipe write-ends it reads from. Fork children
inherit those pipes (since `_close_inherited_fds`
correctly preserves stdio). The error-propagation
cascade in a multi-level cancel test generates
7+ actors each logging multiple `RemoteActorError`
/ `ExceptionGroup` tracebacks — enough output to
fill Linux's 64KB pipe buffer. Writes block,
subactors can't progress, processes don't exit,
`_ForkedProc.wait` hangs.

Self-critical aside: I earlier tested w/ and w/o
`-s` and both hung, concluding "capture-pipe
ruled out". That was wrong — at that time fixes
1-4 weren't all in place, so the test was
failing at deeper levels long before reaching
the "produce lots of output" phase. Once the
cascade could actually tear down cleanly, enough
output flowed to hit the pipe limit. Order-of-
operations mistake: ruling something out based
on a test that was failing for a different
reason.

Deats,
- `subint_forkserver_test_cancellation_leak_issue
  .md`: new section "Update — VERY late: pytest
  capture pipe IS the final gate" w/ DIAG timeline
  showing `trio.run` fully returns, diagnosis of
  pipe-fill mechanism, retrospective on the
  earlier wrong ruling-out, and fix direction
  (redirect subactor stdout/stderr to `/dev/null`
  in fork-child prelude, conditional on
  pytest-detection or opt-in flag)
- `tests/test_cancellation.py`: skip-mark reason
  rewritten to describe the capture-pipe gate
  specifically; cross-refs the new doc section
- `tests/spawn/test_subint_forkserver.py`: the
  orphan-SIGINT test regresses back to xfail.
  Previously passed after the FD-hygiene fix,
  but the new `wait_for_no_more_peers(
  move_on_after=3.0)` bound in `async_main`'s
  teardown added up to 3s latency, pushing
  orphan-subactor exit past the test's 10s poll
  window. Real fix: faster orphan-side teardown
  OR extend poll window to 15s

No runtime code changes in this commit — just
test-mark adjustments + doc wrap-up.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

2026-04-23 23:18:14 -04:00

conc-anal

Pin forkserver hang to pytest `--capture=fd`

2026-04-23 23:18:14 -04:00

prompt-io

Add CPython-level `subint_fork` workaround smoketest

2026-04-23 18:48:34 -04:00