tractor

History

Gud Boi 4d0555435b Narrow forkserver hang to `async_main` outer tn Fourth diagnostic pass — instrument `_worker`'s fork-child branch (`pre child_target()` / `child_ target RETURNED rc=N` / `about to os._exit(rc)`) and `_trio_main` boundaries (`about to trio.run` / `trio.run RETURNED NORMALLY` / `FINALLY`). Test config: depth=1/breadth=2 = 1 root + 14 forked = 15 actors total. Fresh-run results, - 9 processes complete the full flow: `trio.run RETURNED NORMALLY` → `child_target RETURNED rc=0` → `os._exit(0)`. These are tree LEAVES (errorers) plus their direct parents (depth-0 spawners) — they actually exit - 5 processes stuck INSIDE `trio.run(trio_ main)`: hit "about to trio.run" but never see "trio.run RETURNED NORMALLY". These are root + top-level spawners + one intermediate The deadlock is in `async_main` itself, NOT the peer-channel loops. Specifically, the outer `async with root_tn:` in `async_main` never exits for the 5 stuck actors, so the cascade wedges: trio.run never returns → _trio_main finally never runs → _worker never reaches os._exit(rc) → process never dies → parent's _ForkedProc.wait() blocks → parent's nursery hangs → parent's async_main hangs → (recurse up) The precise new question: what task in the 5 stuck actors' `async_main` never completes? Candidates: 1. shielded parent-chan `process_messages` task in `root_tn` — but we cancel it via `_parent_chan_cs.cancel()` in `Actor.cancel()`, which only runs during `open_root_actor.__aexit__`, which itself runs only after `async_main`'s outer unwind — which doesn't happen. So the shield isn't broken in this path. 2. `actor_nursery._join_procs.wait()` or similar inline in the backend `*_proc` flow. 3. `_ForkedProc.wait()` on a grandchild that DID exit — but pidfd_open watch didn't fire (race between `pidfd_open` and the child exiting?). Most specific next probe: add DIAG around `_ForkedProc.wait()` enter/exit to see whether pidfd-based wait returns for every grandchild exit. If a stuck parent's `_ForkedProc.wait()` never returns despite its child exiting → pidfd mechanism has a race bug under nested forkserver. Asymmetry observed in the cascade tree: some d=0 spawners exit cleanly, others stick, even though they started identically. Not purely depth- determined — some race condition in nursery teardown when multiple siblings error simultaneously. No code changes — diagnosis-only. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code		2026-04-23 21:36:19 -04:00
..
subint_cancel_delivery_hang_issue.md	Doc `subint` backend hang classes + arm `dump_on_hang`	2026-04-23 18:47:49 -04:00
subint_fork_blocked_by_cpython_post_fork_issue.md	Doc `subint_fork` as blocked by CPython post-fork	2026-04-23 18:48:06 -04:00
subint_fork_from_main_thread_smoketest.py	Add trio-parent tests for `_subint_forkserver`	2026-04-23 18:48:34 -04:00
subint_forkserver_orphan_sigint_hang_issue.md	Refine `subint_forkserver` orphan-SIGINT diagnosis	2026-04-23 18:48:34 -04:00
subint_forkserver_test_cancellation_leak_issue.md	Narrow forkserver hang to `async_main` outer tn	2026-04-23 21:36:19 -04:00
subint_forkserver_thread_constraints_on_pep684_issue.md	Add `subint_forkserver` PEP 684 audit-plan doc	2026-04-23 18:48:34 -04:00
subint_sigint_starvation_issue.md	Expand `subint` sigint-starvation hang catalog	2026-04-23 18:47:49 -04:00