--- model: claude-opus-4-7[1m] service: claude session: subints-phase-b2-destroy-race-fix timestamp: 2026-04-18T04:25:26Z git_ref: 26fb820 scope: code substantive: true raw_file: 20260418T042526Z_26fb820_prompt_io.raw.md --- ## Prompt Follow-up to Phase B.2 (`5cd6df58`) after the user observed intermittent mid-suite hangs when running the tractor test suite under `--spawn-backend=subint` on py3.14. The specific sequence of prompts over several turns: 1. User pointed at the `test_context_stream_semantics.py` suite as the first thing to make run clean under `--spawn-backend=subint`. 2. After a series of `timeout`-terminated runs that gave no diagnostic info, user nudged me to stop relying on `timeout` and get actual runtime diagnostics ("the suite hangs indefinitely, so i don't think this `timeout 30` is helping you at all.."). Switched to `faulthandler.dump_traceback_later(...)` and a resource-tracker fixture to rule out leaks. 3. Captured a stack pinning the hang on `_interpreters.destroy(interp_id)` in the subint teardown finally block. 4. Proposed dedicated-OS-thread fix. User greenlit. 5. Implemented + verified on-worktree; user needed to be pointed at the *worktree*'s `./py313` venv because bare `pytest` was picking up the main repo's venv (running un-patched `_subint.py`) and still hanging. Running theme over the whole exchange: this patch only closes the *destroy race*. The user and I also traced through the deeper cancellation story — SIGINT can't reach subints, legacy-mode shares the GIL, portal-cancel dies when the IPC channel is already broken — and agreed the next step is a bounded hard-kill in `subint_proc`'s teardown plus a dedicated cancellation test suite. Those land as separate commits. ## Response summary Produced the `tractor/spawn/_subint.py` patch landed as commit `26fb8206` ("Fix subint destroy race via dedicated OS thread"). One file, +110/-84 LOC. Mechanism: swap `trio.to_thread.run_sync(_interpreters .exec, ...)` for a plain `threading.Thread(target=... , daemon=False)`. The trio thread cache recycles workers — so the OS thread that ran `_interpreters .exec()` remained alive in the cache holding a stale subint tstate, blocking `_interpreters.destroy()` in the finally indefinitely. A dedicated one-shot thread exits naturally after the sync target returns, releasing tstate and unblocking destroy. Coordination across the trio↔thread boundary: - `trio.lowlevel.current_trio_token()` captured at `subint_proc` entry - driver thread signals `subint_exited.set()` back to parent trio via `trio.from_thread.run_sync(..., trio_token=token)` (synchronous from the thread's POV; the call returns after trio has run `.set()`) - `trio.RunFinishedError` swallowed in that path for the process-teardown case where parent trio already exited - teardown `finally` off-loads the sync `driver_thread.join()` via `to_thread.run_sync` (a cache thread carries no subint tstate — safe) ## Files changed See `git diff 26fb820~1..26fb820 --stat`: ``` tractor/spawn/_subint.py | 194 +++++++++++++++++++------------ 1 file changed, 110 insertions(+), 84 deletions(-) ``` Validation: - `test_parent_cancels[chk_ctx_result_before_exit=True- cancel_method=ctx-child_returns_early=False]` (the specific test that was hanging for the user) — passed in 1.06s. - Full `tests/test_context_stream_semantics.py` under subint — 61 passed in 100.35s (clean-cache re-run: 100.82s). - Trio backend regression subset — 69 passed / 1 skipped / 89.19s — no regressions from this change. ## Files changed Beyond the `_subint.py` patch, the raw log also records the cancellation-semantics research that spanned this conversation but did not ship as code in *this* commit. Preserving it inline under "Non- code output" because it directly informs the Phase B.3 hard-kill impl that will follow (and any upstream CPython bug reports we end up filing). ## Human edits None — committed as generated. The commit message itself was also AI-drafted via `/commit-msg` and rewrapped via the project's `rewrap.py --width 67` tooling; user landed it without edits.