---
model: claude-opus-4-7[1m]
service: claude
session: subints-phase-b2-destroy-race-fix
timestamp: 2026-04-18T04:25:26Z
git_ref: 26fb820
scope: code
substantive: true
raw_file: 20260418T042526Z_26fb820_prompt_io.raw.md
---

## Prompt

Follow-up to Phase B.2 (`5cd6df58`) after the user
observed intermittent mid-suite hangs when running
the tractor test suite under `--spawn-backend=subint`
on py3.14. The specific sequence of prompts over
several turns:

1. User pointed at the `test_context_stream_semantics.py`
   suite as the first thing to make run clean under
   `--spawn-backend=subint`.
2. After a series of `timeout`-terminated runs that
   gave no diagnostic info, user nudged me to stop
   relying on `timeout` and get actual runtime
   diagnostics ("the suite hangs indefinitely, so i
   don't think this `timeout 30` is helping you at
   all.."). Switched to
   `faulthandler.dump_traceback_later(...)` and a
   resource-tracker fixture to rule out leaks.
3. Captured a stack pinning the hang on
   `_interpreters.destroy(interp_id)` in the subint
   teardown finally block.
4. Proposed dedicated-OS-thread fix. User greenlit.
5. Implemented + verified on-worktree; user needed
   to be pointed at the *worktree*'s `./py313` venv
   because bare `pytest` was picking up the main
   repo's venv (running un-patched `_subint.py`) and
   still hanging.

Running theme over the whole exchange: this patch
only closes the *destroy race*. The user and I also
traced through the deeper cancellation story — SIGINT
can't reach subints, legacy-mode shares the GIL,
portal-cancel dies when the IPC channel is already
broken — and agreed the next step is a bounded
hard-kill in `subint_proc`'s teardown plus a
dedicated cancellation test suite. Those land as
separate commits.

## Response summary

Produced the `tractor/spawn/_subint.py` patch landed
as commit `26fb8206` ("Fix subint destroy race via
dedicated OS thread"). One file, +110/-84 LOC.

Mechanism: swap `trio.to_thread.run_sync(_interpreters
.exec, ...)` for a plain `threading.Thread(target=...
, daemon=False)`. The trio thread cache recycles
workers — so the OS thread that ran `_interpreters
.exec()` remained alive in the cache holding a
stale subint tstate, blocking
`_interpreters.destroy()` in the finally indefinitely.
A dedicated one-shot thread exits naturally after
the sync target returns, releasing tstate and
unblocking destroy.

Coordination across the trio↔thread boundary:
- `trio.lowlevel.current_trio_token()` captured at
  `subint_proc` entry
- driver thread signals `subint_exited.set()` back
  to parent trio via `trio.from_thread.run_sync(...,
  trio_token=token)` (synchronous from the thread's
  POV; the call returns after trio has run `.set()`)
- `trio.RunFinishedError` swallowed in that path for
  the process-teardown case where parent trio already
  exited
- teardown `finally` off-loads the sync
  `driver_thread.join()` via `to_thread.run_sync` (a
  cache thread carries no subint tstate — safe)

## Files changed

See `git diff 26fb820~1..26fb820 --stat`:

```
 tractor/spawn/_subint.py | 194 +++++++++++++++++++------------
 1 file changed, 110 insertions(+), 84 deletions(-)
```

Validation:
- `test_parent_cancels[chk_ctx_result_before_exit=True-
  cancel_method=ctx-child_returns_early=False]`
  (the specific test that was hanging for the user)
  — passed in 1.06s.
- Full `tests/test_context_stream_semantics.py` under
  subint — 61 passed in 100.35s (clean-cache re-run:
  100.82s).
- Trio backend regression subset — 69 passed / 1
  skipped / 89.19s — no regressions from this change.

## Files changed

Beyond the `_subint.py` patch, the raw log also
records the cancellation-semantics research that
spanned this conversation but did not ship as code
in *this* commit. Preserving it inline under "Non-
code output" because it directly informs the
Phase B.3 hard-kill impl that will follow (and any
upstream CPython bug reports we end up filing).

## Human edits

None — committed as generated. The commit message
itself was also AI-drafted via `/commit-msg` and
rewrapped via the project's `rewrap.py --width 67`
tooling; user landed it without edits.