diff --git a/ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md b/ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md new file mode 100644 index 00000000..6b2ca06d --- /dev/null +++ b/ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md @@ -0,0 +1,337 @@ +# `os.fork()` from a non-main sub-interpreter aborts the child (CPython refuses post-fork cleanup) + +Third `subint`-class analysis in this project. Unlike its +two siblings (`subint_sigint_starvation_issue.md`, +`subint_cancel_delivery_hang_issue.md`), this one is not a +hang — it's a **hard CPython-level refusal** of an +experimental spawn strategy we wanted to try. + +## TL;DR + +An in-process sub-interpreter cannot be used as a +"launchpad" for `os.fork()` on current CPython. The fork +syscall succeeds in the parent, but the forked CHILD +process is aborted immediately by CPython's post-fork +cleanup with: + +``` +Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter +``` + +This is enforced by a hard `PyStatus_ERR` gate in +`Python/pystate.c`. The CPython devs acknowledge the +fragility with an in-source comment (`// Ideally we could +guarantee tstate is running main.`) but provide no +mechanism to satisfy the precondition from user code. + +**Implication for tractor**: the `subint_fork` backend +sketched in `tractor.spawn._subint_fork` is structurally +dead on current CPython. The submodule is kept as +documentation of the attempt; `--spawn-backend=subint_fork` +raises `NotImplementedError` pointing here. + +## Context — why we tried this + +The motivation is issue #379's "Our own thoughts, ideas +for `fork()`-workaround/hacks..." section. The existing +trio-backend (`tractor.spawn._trio.trio_proc`) spawns +subactors via `trio.lowlevel.open_process()` → ultimately +`posix_spawn()` or `fork+exec`, from the parent's main +interpreter that is currently running `trio.run()`. This +brushes against a known-fragile interaction between +`trio` and `fork()` tracked in +[python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614) +and siblings — mostly mitigated in `tractor`'s case only +incidentally (we `exec()` immediately post-fork). + +The idea was: + +1. Create a subint that has *never* imported `trio`. +2. From a worker thread in that subint, call `os.fork()`. +3. In the child, `execv()` back into + `python -m tractor._child` — same as `trio_proc` does. +4. The fork is from a trio-free context → trio+fork + hazards avoided regardless of downstream behavior. + +The parent-side orchestration (`ipc_server.wait_for_peer`, +`SpawnSpec`, `Portal` yield) would reuse +`trio_proc`'s flow verbatim, with only the subproc-spawn +mechanics swapped. + +## Symptom + +Running the prototype (`tractor.spawn._subint_fork.subint_fork_proc`, +see git history prior to the stub revert) on py3.14: + +``` +Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter +Python runtime state: initialized + +Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first): + File "