From 797f57ce7b7fa118a3cc819a22b8e5cd07c9e7cf Mon Sep 17 00:00:00 2001 From: goodboy Date: Wed, 22 Apr 2026 16:02:01 -0400 Subject: [PATCH] Doc `subint_fork` as blocked by CPython post-fork MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Empirical finding: the WIP `subint_fork_proc` scaffold landed in `cf0e3e6f` does *not* work on current CPython. The `fork()` syscall succeeds in the parent, but the CHILD aborts immediately during `PyOS_AfterFork_Child()` → `_PyInterpreterState_DeleteExceptMain()`, which gates on the current tstate belonging to the main interp — the child dies with `Fatal Python error: not main interpreter`. CPython devs acknowledge the fragility with an in-source comment (`// Ideally we could guarantee tstate is running main.`) but expose no user-facing hook to satisfy the precondition — so the strategy is structurally dead until upstream changes. Rather than delete the scaffold, reshape it into a documented dead-end so the next person with this idea lands on the reason rather than rediscovering the same CPython-level refusal. Deats, - Move `subint_fork_proc` out of `tractor.spawn._subint` into a new `tractor.spawn._subint_fork` dedicated module (153 LOC). Module + fn docstrings now describe the blockage directly; the fn body is trimmed to a `NotImplementedError` pointing at the analysis doc — no more dead-code `bootstrap` sketch bloating `_subint.py`. - `_spawn.py`: keep `'subint_fork'` in `SpawnMethodKey` + the `_methods` dispatch so `--spawn-backend=subint_fork` routes to a clean `NotImplementedError` rather than "invalid backend"; comment calls out the blockage. Collapse the duplicate py3.14 feature-gate in `try_set_start_method()` into a combined `case 'subint' | 'subint_fork':` arm. - New 337-line analysis: `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`. Annotated walkthrough from the user-visible fatal error down to the specific `Modules/posixmodule.c` + `Python/pystate.c` source lines enforcing the refusal, plus an upstream-report draft. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code --- ...fork_blocked_by_cpython_post_fork_issue.md | 337 ++++++++++++++++++ tractor/spawn/_spawn.py | 32 +- tractor/spawn/_subint.py | 200 ----------- tractor/spawn/_subint_fork.py | 153 ++++++++ 4 files changed, 513 insertions(+), 209 deletions(-) create mode 100644 ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md create mode 100644 tractor/spawn/_subint_fork.py diff --git a/ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md b/ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md new file mode 100644 index 00000000..6b2ca06d --- /dev/null +++ b/ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md @@ -0,0 +1,337 @@ +# `os.fork()` from a non-main sub-interpreter aborts the child (CPython refuses post-fork cleanup) + +Third `subint`-class analysis in this project. Unlike its +two siblings (`subint_sigint_starvation_issue.md`, +`subint_cancel_delivery_hang_issue.md`), this one is not a +hang — it's a **hard CPython-level refusal** of an +experimental spawn strategy we wanted to try. + +## TL;DR + +An in-process sub-interpreter cannot be used as a +"launchpad" for `os.fork()` on current CPython. The fork +syscall succeeds in the parent, but the forked CHILD +process is aborted immediately by CPython's post-fork +cleanup with: + +``` +Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter +``` + +This is enforced by a hard `PyStatus_ERR` gate in +`Python/pystate.c`. The CPython devs acknowledge the +fragility with an in-source comment (`// Ideally we could +guarantee tstate is running main.`) but provide no +mechanism to satisfy the precondition from user code. + +**Implication for tractor**: the `subint_fork` backend +sketched in `tractor.spawn._subint_fork` is structurally +dead on current CPython. The submodule is kept as +documentation of the attempt; `--spawn-backend=subint_fork` +raises `NotImplementedError` pointing here. + +## Context — why we tried this + +The motivation is issue #379's "Our own thoughts, ideas +for `fork()`-workaround/hacks..." section. The existing +trio-backend (`tractor.spawn._trio.trio_proc`) spawns +subactors via `trio.lowlevel.open_process()` → ultimately +`posix_spawn()` or `fork+exec`, from the parent's main +interpreter that is currently running `trio.run()`. This +brushes against a known-fragile interaction between +`trio` and `fork()` tracked in +[python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614) +and siblings — mostly mitigated in `tractor`'s case only +incidentally (we `exec()` immediately post-fork). + +The idea was: + +1. Create a subint that has *never* imported `trio`. +2. From a worker thread in that subint, call `os.fork()`. +3. In the child, `execv()` back into + `python -m tractor._child` — same as `trio_proc` does. +4. The fork is from a trio-free context → trio+fork + hazards avoided regardless of downstream behavior. + +The parent-side orchestration (`ipc_server.wait_for_peer`, +`SpawnSpec`, `Portal` yield) would reuse +`trio_proc`'s flow verbatim, with only the subproc-spawn +mechanics swapped. + +## Symptom + +Running the prototype (`tractor.spawn._subint_fork.subint_fork_proc`, +see git history prior to the stub revert) on py3.14: + +``` +Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter +Python runtime state: initialized + +Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first): + File "