Doc `subint_fork` as blocked by CPython post-fork

Empirical finding: the WIP `subint_fork_proc` scaffold
landed in `cf0e3e6f` does *not* work on current CPython.
The `fork()` syscall succeeds in the parent, but the
CHILD aborts immediately during
`PyOS_AfterFork_Child()` →
`_PyInterpreterState_DeleteExceptMain()`, which gates
on the current tstate belonging to the main interp —
the child dies with `Fatal Python error: not main
interpreter`.

CPython devs acknowledge the fragility with an in-source
comment (`// Ideally we could guarantee tstate is running
main.`) but expose no user-facing hook to satisfy the
precondition — so the strategy is structurally dead until
upstream changes.

Rather than delete the scaffold, reshape it into a
documented dead-end so the next person with this idea
lands on the reason rather than rediscovering the same
CPython-level refusal.

Deats,
- Move `subint_fork_proc` out of `tractor.spawn._subint`
  into a new `tractor.spawn._subint_fork` dedicated
  module (153 LOC). Module + fn docstrings now describe
  the blockage directly; the fn body is trimmed to a
  `NotImplementedError` pointing at the analysis doc —
  no more dead-code `bootstrap` sketch bloating
  `_subint.py`.
- `_spawn.py`: keep `'subint_fork'` in `SpawnMethodKey`
  + the `_methods` dispatch so
  `--spawn-backend=subint_fork` routes to a clean
  `NotImplementedError` rather than "invalid backend";
  comment calls out the blockage. Collapse the duplicate
  py3.14 feature-gate in `try_set_start_method()` into a
  combined `case 'subint' | 'subint_fork':` arm.
- New 337-line analysis:
  `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
  Annotated walkthrough from the user-visible fatal
  error down to the specific `Modules/posixmodule.c` +
  `Python/pystate.c` source lines enforcing the refusal,
  plus an upstream-report draft.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
subint_forkserver_backend
Gud Boi 2026-04-22 16:02:01 -04:00
parent eee79a0357
commit 0f48ed2eb9
4 changed files with 513 additions and 209 deletions

View File

@ -0,0 +1,337 @@
# `os.fork()` from a non-main sub-interpreter aborts the child (CPython refuses post-fork cleanup)
Third `subint`-class analysis in this project. Unlike its
two siblings (`subint_sigint_starvation_issue.md`,
`subint_cancel_delivery_hang_issue.md`), this one is not a
hang — it's a **hard CPython-level refusal** of an
experimental spawn strategy we wanted to try.
## TL;DR
An in-process sub-interpreter cannot be used as a
"launchpad" for `os.fork()` on current CPython. The fork
syscall succeeds in the parent, but the forked CHILD
process is aborted immediately by CPython's post-fork
cleanup with:
```
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
```
This is enforced by a hard `PyStatus_ERR` gate in
`Python/pystate.c`. The CPython devs acknowledge the
fragility with an in-source comment (`// Ideally we could
guarantee tstate is running main.`) but provide no
mechanism to satisfy the precondition from user code.
**Implication for tractor**: the `subint_fork` backend
sketched in `tractor.spawn._subint_fork` is structurally
dead on current CPython. The submodule is kept as
documentation of the attempt; `--spawn-backend=subint_fork`
raises `NotImplementedError` pointing here.
## Context — why we tried this
The motivation is issue #379's "Our own thoughts, ideas
for `fork()`-workaround/hacks..." section. The existing
trio-backend (`tractor.spawn._trio.trio_proc`) spawns
subactors via `trio.lowlevel.open_process()` → ultimately
`posix_spawn()` or `fork+exec`, from the parent's main
interpreter that is currently running `trio.run()`. This
brushes against a known-fragile interaction between
`trio` and `fork()` tracked in
[python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
and siblings — mostly mitigated in `tractor`'s case only
incidentally (we `exec()` immediately post-fork).
The idea was:
1. Create a subint that has *never* imported `trio`.
2. From a worker thread in that subint, call `os.fork()`.
3. In the child, `execv()` back into
`python -m tractor._child` — same as `trio_proc` does.
4. The fork is from a trio-free context → trio+fork
hazards avoided regardless of downstream behavior.
The parent-side orchestration (`ipc_server.wait_for_peer`,
`SpawnSpec`, `Portal` yield) would reuse
`trio_proc`'s flow verbatim, with only the subproc-spawn
mechanics swapped.
## Symptom
Running the prototype (`tractor.spawn._subint_fork.subint_fork_proc`,
see git history prior to the stub revert) on py3.14:
```
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
Python runtime state: initialized
Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first):
File "<script>", line 2 in <module>
<script>:2: DeprecationWarning: This process (pid=802985) is multi-threaded, use of fork() may lead to deadlocks in the child.
```
Key clues:
- The **`DeprecationWarning`** fires in the parent (before
fork completes) — fork *is* executing, we get that far.
- The **`Fatal Python error`** comes from the child — it
aborts during CPython's post-fork C initialization
before any user Python runs in the child.
- The thread name `subint-fork-lau[nchpad]` is ours —
confirms the fork is being called from the launchpad
subint's driver thread.
## CPython source walkthrough
### Call site — `Modules/posixmodule.c:728-793`
The post-fork-child hook CPython runs in the child process:
```c
void
PyOS_AfterFork_Child(void)
{
PyStatus status;
_PyRuntimeState *runtime = &_PyRuntime;
// re-creates runtime->interpreters.mutex (HEAD_UNLOCK)
status = _PyRuntimeState_ReInitThreads(runtime);
...
PyThreadState *tstate = _PyThreadState_GET();
_Py_EnsureTstateNotNULL(tstate);
...
// Ideally we could guarantee tstate is running main. ← !!!
_PyInterpreterState_ReinitRunningMain(tstate);
status = _PyEval_ReInitThreads(tstate);
...
status = _PyInterpreterState_DeleteExceptMain(runtime);
if (_PyStatus_EXCEPTION(status)) {
goto fatal_error;
}
...
fatal_error:
Py_ExitStatusException(status);
}
```
The `// Ideally we could guarantee tstate is running
main.` comment is a flashing warning sign — the CPython
devs *know* this path is fragile when fork is called from
a non-main subint, but they've chosen to abort rather than
silently corrupt state. Arguably the right call.
### The refusal — `Python/pystate.c:1035-1075`
```c
/*
* Delete all interpreter states except the main interpreter. If there
* is a current interpreter state, it *must* be the main interpreter.
*/
PyStatus
_PyInterpreterState_DeleteExceptMain(_PyRuntimeState *runtime)
{
struct pyinterpreters *interpreters = &runtime->interpreters;
PyThreadState *tstate = _PyThreadState_Swap(runtime, NULL);
if (tstate != NULL && tstate->interp != interpreters->main) {
return _PyStatus_ERR("not main interpreter"); ← our error
}
HEAD_LOCK(runtime);
PyInterpreterState *interp = interpreters->head;
interpreters->head = NULL;
while (interp != NULL) {
if (interp == interpreters->main) {
interpreters->main->next = NULL;
interpreters->head = interp;
interp = interp->next;
continue;
}
// XXX Won't this fail since PyInterpreterState_Clear() requires
// the "current" tstate to be set?
PyInterpreterState_Clear(interp); // XXX must activate?
zapthreads(interp);
...
}
...
}
```
The comment in the docstring (`If there is a current
interpreter state, it *must* be the main interpreter.`) is
the formal API contract. The `XXX` comments further in
suggest the CPython team is already aware this function
has latent issues even in the happy path.
## Chain summary
1. Our launchpad subint's driver OS-thread calls
`os.fork()`.
2. `fork()` succeeds. Child wakes up with:
- The parent's full memory image (including all
subints).
- Only the *calling* thread alive (the driver thread).
- `_PyThreadState_GET()` on that thread returns the
**launchpad subint's tstate**, *not* main's.
3. CPython runs `PyOS_AfterFork_Child()`.
4. It reaches `_PyInterpreterState_DeleteExceptMain()`.
5. Gate check fails: `tstate->interp != interpreters->main`.
6. `PyStatus_ERR("not main interpreter")``fatal_error`
goto → `Py_ExitStatusException()` → child aborts.
Parent-side consequence: `os.fork()` in the subint
bootstrap returned successfully with the child's PID, but
the child died before connecting back. Our parent's
`ipc_server.wait_for_peer(uid)` would hang forever — the
child never gets to `_actor_child_main`.
## Definitive answer to "Open Question 1"
From the (now-stub) `subint_fork_proc` docstring:
> Does CPython allow `os.fork()` from a non-main
> sub-interpreter under the legacy config?
**No.** Not in a usable-by-user-code sense. The fork
syscall is not blocked, but the child cannot survive
CPython's post-fork initialization. This is enforced, not
accidental, and the CPython devs have acknowledged the
fragility in-source.
## What we'd need from CPython to unblock
Any one of these, from least-to-most invasive:
1. **A pre-fork hook mechanism** that lets user code (or
tractor itself via `os.register_at_fork(before=...)`)
swap the current tstate to main before fork runs. The
swap would need to work across the subint→main
boundary, which is the actual hard part —
`_PyThreadState_Swap()` exists but is internal.
2. **A `_PyInterpreterState_DeleteExceptFor(tstate->interp)`
variant** that cleans up all *other* subints while
preserving the calling subint's state. Lets the child
continue executing in the subint after fork; a
subsequent `execv()` clears everything at the OS
level anyway.
3. **A cleaner error** than `Fatal Python error` aborting
the child. Even without fixing the underlying
capability, a raised Python-level exception in the
parent's `fork()` call (rather than a silent child
abort) would at least make the failure mode
debuggable.
## Upstream-report draft (for CPython issue tracker)
### Title
> `os.fork()` from a non-main sub-interpreter aborts the
> child with a fatal error in `PyOS_AfterFork_Child`; can
> we at least make it a clean `RuntimeError` in the
> parent?
### Body
> **Version**: Python 3.14.x
>
> **Summary**: Calling `os.fork()` from a thread currently
> executing inside a sub-interpreter causes the forked
> child process to abort during CPython's post-fork
> cleanup, with the following output in the child:
>
> ```
> Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
> ```
>
> From the **parent's** point of view the fork succeeded
> (returned a valid child PID). The failure is completely
> opaque to parent-side Python code — unless the parent
> does `os.waitpid()` it won't even notice the child
> died.
>
> **Root cause** (as I understand it from reading sources):
> `Modules/posixmodule.c::PyOS_AfterFork_Child()` calls
> `_PyInterpreterState_DeleteExceptMain()` with a
> precondition that `_PyThreadState_GET()->interp` be the
> main interpreter. When `fork()` is called from a thread
> executing inside a subinterpreter, the child wakes up
> with its tstate still pointing at the subint, and the
> gate in `Python/pystate.c:1044-1047` fails.
>
> A comment in the source
> (`Modules/posixmodule.c:753` — `// Ideally we could
> guarantee tstate is running main.`) suggests this is a
> known-fragile path rather than an intentional
> invariant.
>
> **Use case**: I was experimenting with using a
> sub-interpreter as a "fork launchpad" — have a subint
> that has never imported `trio`, call `os.fork()` from
> that subint's thread, and in the child `execv()` back
> into a fresh Python interpreter process. The goal was
> to sidestep known issues with `trio` + `fork()`
> interaction (see
> [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614))
> by guaranteeing the forking context had never been
> "contaminated" by trio's imports or globals. This
> approach would allow `trio`-using applications to
> combine `fork`-based subprocess spawning with
> per-worker `trio.run()` runtimes — a fairly common
> pattern that currently requires workarounds.
>
> **Request**:
>
> Ideally: make fork-from-subint work (e.g., by swapping
> the caller's tstate to main in the pre-fork hook), or
> provide a `_PyInterpreterState_DeleteExceptFor(interp)`
> variant that permits the caller's subint to survive
> post-fork so user code can subsequently `execv()`.
>
> Minimally: convert the fatal child-side abort into a
> clean `RuntimeError` (or similar) raised in the
> parent's `fork()` call. Even if the capability isn't
> expanded, the failure mode should be debuggable by
> user-code in the parent — right now it's a silent
> child death with an error message buried in the
> child's stderr that parent code can't programmatically
> see.
>
> **Related**: PEP 684 (per-interpreter GIL), PEP 734
> (`concurrent.interpreters` public API). The private
> `_interpreters` module is what I used to create the
> launchpad — behavior is the same whether using
> `_interpreters.create('legacy')` or
> `concurrent.interpreters.create()` (the latter was not
> tested but the gate is identical).
>
> Happy to contribute a minimal reproducer + test case if
> this is something the team wants to pursue.
## References
- `Modules/posixmodule.c:728`
[`PyOS_AfterFork_Child`](https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L728)
- `Python/pystate.c:1040`
[`_PyInterpreterState_DeleteExceptMain`](https://github.com/python/cpython/blob/main/Python/pystate.c#L1040)
- PEP 684 (per-interpreter GIL):
<https://peps.python.org/pep-0684/>
- PEP 734 (`concurrent.interpreters` public API):
<https://peps.python.org/pep-0734/>
- [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
— the original motivation for the launchpad idea.
- tractor issue #379 — "Our own thoughts, ideas for
`fork()`-workaround/hacks..." section where this was
first sketched.
- `tractor.spawn._subint_fork` — in-tree stub preserving
the attempted impl's shape in git history.

View File

@ -63,6 +63,15 @@ SpawnMethodKey = Literal[
'mp_spawn', 'mp_spawn',
'mp_forkserver', # posix only 'mp_forkserver', # posix only
'subint', # py3.14+ via `concurrent.interpreters` (PEP 734) 'subint', # py3.14+ via `concurrent.interpreters` (PEP 734)
# EXPERIMENTAL — blocked at the CPython level. The
# design goal was a `trio+fork`-safe subproc spawn via
# `os.fork()` from a trio-free launchpad sub-interpreter,
# but CPython's `PyOS_AfterFork_Child` → `_PyInterpreterState_DeleteExceptMain`
# requires fork come from the main interp. See
# `tractor.spawn._subint_fork` +
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
# + issue #379 for the full analysis.
'subint_fork',
] ]
_spawn_method: SpawnMethodKey = 'trio' _spawn_method: SpawnMethodKey = 'trio'
@ -115,15 +124,13 @@ def try_set_start_method(
case 'trio': case 'trio':
_ctx = None _ctx = None
case 'subint': case 'subint' | 'subint_fork':
# subints need no `mp.context`; feature-gate on the # Both subint backends need no `mp.context`; both
# py3.14 public `concurrent.interpreters` wrapper # feature-gate on the py3.14 public
# (PEP 734). We actually drive the private # `concurrent.interpreters` wrapper (PEP 734). See
# `_interpreters` C module in legacy mode — see # `tractor.spawn._subint` for the detailed
# `tractor.spawn._subint` for why — but py3.13's # reasoning and the distinction between the two
# vintage of that private module hangs under our # (`subint_fork` is WIP/experimental).
# multi-trio usage, so we refuse it via the public-
# module presence check.
from ._subint import _has_subints from ._subint import _has_subints
if not _has_subints: if not _has_subints:
raise RuntimeError( raise RuntimeError(
@ -461,6 +468,7 @@ async def new_proc(
from ._trio import trio_proc from ._trio import trio_proc
from ._mp import mp_proc from ._mp import mp_proc
from ._subint import subint_proc from ._subint import subint_proc
from ._subint_fork import subint_fork_proc
# proc spawning backend target map # proc spawning backend target map
@ -469,4 +477,10 @@ _methods: dict[SpawnMethodKey, Callable] = {
'mp_spawn': mp_proc, 'mp_spawn': mp_proc,
'mp_forkserver': mp_proc, 'mp_forkserver': mp_proc,
'subint': subint_proc, 'subint': subint_proc,
# blocked at CPython level — see `_subint_fork.py` +
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
# Kept here so `--spawn-backend=subint_fork` routes to a
# clean `NotImplementedError` with pointer to the analysis,
# rather than an "invalid backend" error.
'subint_fork': subint_fork_proc,
} }

View File

@ -433,203 +433,3 @@ async def subint_proc(
actor_nursery._children.pop(uid, None) actor_nursery._children.pop(uid, None)
# ============================================================
# WIP PROTOTYPE — `subint_fork_proc`
# ============================================================
# Experimental: use a sub-interpreter purely as a launchpad
# from which to `os.fork()`, sidestepping the well-known
# trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
# the forking interp hasn't ever imported / run `trio`.
#
# The current `tractor.spawn._trio` backend already spawns a
# subprocess and has the child connect back to the parent
# over IPC. THIS prototype only changes *how* the subproc
# comes into existence — everything downstream (parent-side
# `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal` yield,
# soft-kill) is reused verbatim.
#
# Reference: issue #379's "Our own thoughts, ideas for
# fork()-workaround/hacks..." section.
# ============================================================
async def subint_fork_proc(
name: str,
actor_nursery: ActorNursery,
subactor: Actor,
errors: dict[tuple[str, str], Exception],
# passed through to actor main
bind_addrs: list[UnwrappedAddress],
parent_addr: UnwrappedAddress,
_runtime_vars: dict[str, Any],
*,
infect_asyncio: bool = False,
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
proc_kwargs: dict[str, any] = {},
) -> None:
'''
EXPERIMENTAL / WIP: `trio`-safe `fork()` via a pristine
sub-interpreter launchpad.
Core trick
----------
Create a fresh subint that has *never* imported `trio`.
From a worker thread, drive that subint to call
`os.fork()`. In the forked CHILD process, `exec()` back
into `python -m tractor._child` (a fresh process). In the
fork PARENT (still inside the launchpad subint), do
nothing just let the subint's `exec` call return and
the worker thread exit. The parent-side trio task then
waits for the child process to connect back using the
same `ipc_server.wait_for_peer()` flow as `trio_proc`.
Why this matters
----------------
The existing `trio_proc` backend spawns a subprocess via
`trio.lowlevel.open_process()` which ultimately uses
`posix_spawn()` (or `fork+exec`) from the parent's main
interpreter the one running `trio.run()`. That path is
affected by the trio+fork issues tracked in
python-trio/trio#1614 and related, some of which are
side-stepped only incidentally because we always `exec()`
immediately after fork.
By forking from a pristine subint instead, we have a
known-clean-of-trio fork parent. If we later want to try
**fork-without-exec** for faster startup and automatic
parent-`__main__` inheritance (the property `mp.fork`
gives for free), this approach could unlock that cleanly.
Relationship to the other backends
----------------------------------
- `trio_proc`: fork/exec from main interp affected by
trio+fork issues, solved via immediate exec.
- `subint_proc`: in-process subint, no fork at all
affected by shared-GIL abandoned-thread hazards (see
`ai/conc-anal/subint_sigint_starvation_issue.md`).
- `subint_fork_proc` (THIS): OS-level subproc (like
`trio_proc`) BUT forked from a trio-free subint
avoids both issue-classes above, at the cost of an
extra subint create/destroy per spawn.
Status
------
**NOT IMPLEMENTED** beyond the bootstrap scaffolding
below. Open questions needing empirical validation:
1. Does CPython allow `os.fork()` from a non-main
sub-interpreter under the legacy config? The public
API is silent; there may be PEP 684 safety guards.
2. Does the forked child need to fully `exec()` or can
we stay fork-without-exec and `trio.run()` directly
from within the launchpad subint in the child? The
latter is the "interesting" mode faster startup,
`__main__` inheritance but opens the question of
what residual state from the parent's main interp
leaks into the child's subint.
3. How do `signal.set_wakeup_fd()`, installed signal
handlers, and other process-global state interact
when the forking thread is inside a subint? The
child presumably inherits them but a fresh
`trio.run()` resets what it cares about.
'''
if not _has_subints:
raise RuntimeError(
f'The {"subint_fork"!r} spawn backend requires '
f'Python 3.14+ (private stdlib `_interpreters` C '
f'module + tractor-usage stability).\n'
f'Current runtime: {sys.version}'
)
raise NotImplementedError(
'`subint_fork_proc` is a WIP prototype scaffold — '
'the driver thread + fork-bootstrap + connect-back '
'orchestration below is not yet wired up. See '
'issue #379 for context.\n'
'(Structure kept in-tree so the next iteration has '
'a concrete starting point rather than a blank page.)'
)
# ------------------------------------------------------------
# SKETCH (below is intentionally dead code; kept so reviewers
# can see the shape we'd plausibly build up to). Roughly
# mirrors `subint_proc` structure but WITHOUT the in-process
# subint lifetime management — the subint only lives long
# enough to call `os.fork()`.
# ------------------------------------------------------------
# Create the launchpad subint. Legacy config matches
# `subint_proc`'s reasoning (msgspec / PEP 684). For
# fork-via-subint, isolation is moot since we don't
# *stay* in the subint — we just need it trio-free.
interp_id: int = _interpreters.create('legacy')
log.runtime(
f'Created launchpad subint for fork-spawn\n'
f'(>\n'
f' |_interp_id={interp_id}\n'
)
uid: tuple[str, str] = subactor.aid.uid
loglevel: str | None = subactor.loglevel
# Bootstrap fires inside the launchpad subint on a
# worker OS-thread. Calls `os.fork()`. In the child,
# `execv` back into the existing `python -m tractor._child`
# CLI entry — which is what `trio_proc` already uses — so
# the connect-back dance is identical. In the fork-parent
# (still in the launchpad subint), return so the thread
# can exit and we can `_interpreters.destroy()` the
# launchpad.
#
# NOTE, `os.execv()` replaces the entire process image
# (all interps, all threads — CPython handles this at the
# OS level), so subint cleanup in the child is a no-op.
import shlex
uid_repr: str = repr(str(uid))
parent_addr_repr: str = repr(str(parent_addr))
bootstrap: str = (
'import os, sys\n'
'pid = os.fork()\n'
'if pid == 0:\n'
' # CHILD: full `exec` into fresh Python for\n'
' # maximum isolation. (A `fork`-without-exec\n'
' # variant would skip this and call\n'
' # `_actor_child_main` directly — see class\n'
' # docstring "Open question 2".)\n'
' os.execv(\n'
' sys.executable,\n'
' [\n'
' sys.executable,\n'
" '-m',\n"
" 'tractor._child',\n"
f' {shlex.quote("--uid")!r},\n'
f' {uid_repr},\n'
f' {shlex.quote("--parent_addr")!r},\n'
f' {parent_addr_repr},\n'
+ (
f' {shlex.quote("--loglevel")!r},\n'
f' {loglevel!r},\n'
if loglevel else ''
)
+ (
f' {shlex.quote("--asyncio")!r},\n'
if infect_asyncio else ''
)
+ ' ],\n'
' )\n'
'# FORK-PARENT branch falls through — we just want\n'
'# the launchpad subint to finish so the driver\n'
'# thread exits.\n'
)
# TODO: orchestrate driver thread (mirror `subint_proc`'s
# `_subint_target` pattern), then await
# `ipc_server.wait_for_peer(uid)` on the parent side —
# same as `trio_proc`. Soft-kill path is simpler here
# than in `subint_proc`: we're managing an OS subproc,
# not a legacy subint, so `Portal.cancel_actor()` + wait
# + OS-level `SIGKILL` fallback (like `trio_proc`'s
# `hard_kill()`) applies directly.

View File

@ -0,0 +1,153 @@
# tractor: structured concurrent "actors".
# Copyright 2018-eternity Tyler Goodlet.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
'''
`subint_fork` spawn backend BLOCKED at CPython level.
The idea was to use a sub-interpreter purely as a launchpad
from which to call `os.fork()`, sidestepping the well-known
trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
the forking interp had never imported `trio`.
**IT DOES NOT WORK ON CURRENT CPYTHON.** The fork syscall
itself succeeds (in the parent), but the forked CHILD
process aborts immediately during CPython's post-fork
cleanup `PyOS_AfterFork_Child()` calls
`_PyInterpreterState_DeleteExceptMain()` which refuses to
operate when the current tstate belongs to a non-main
sub-interpreter.
Full annotated walkthrough from the user-visible error
(`Fatal Python error: _PyInterpreterState_DeleteExceptMain:
not main interpreter`) down to the specific CPython source
lines that enforce this is in
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
We keep this submodule as a dedicated documentation of the
attempt. If CPython ever lifts the restriction (e.g., via a
force-destroy primitive or a hook that swaps tstate to main
pre-fork), the structural sketch preserved in this file's
git history is a concrete starting point for a working impl.
See also: issue #379's "Our own thoughts, ideas for
`fork()`-workaround/hacks..." section.
'''
from __future__ import annotations
import sys
from typing import (
Any,
TYPE_CHECKING,
)
import trio
from trio import TaskStatus
from tractor.runtime._portal import Portal
from ._subint import _has_subints
if TYPE_CHECKING:
from tractor.discovery._addr import UnwrappedAddress
from tractor.runtime._runtime import Actor
from tractor.runtime._supervise import ActorNursery
async def subint_fork_proc(
name: str,
actor_nursery: ActorNursery,
subactor: Actor,
errors: dict[tuple[str, str], Exception],
bind_addrs: list[UnwrappedAddress],
parent_addr: UnwrappedAddress,
_runtime_vars: dict[str, Any],
*,
infect_asyncio: bool = False,
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
proc_kwargs: dict[str, any] = {},
) -> None:
'''
EXPERIMENTAL currently blocked by a CPython invariant.
Attempted design
----------------
1. Parent creates a fresh legacy-config subint.
2. A worker OS-thread drives the subint through a
bootstrap that calls `os.fork()`.
3. In the forked CHILD, `os.execv()` back into
`python -m tractor._child` (fresh process).
4. In the fork-PARENT, the launchpad subint is destroyed;
parent-side trio task proceeds identically to
`trio_proc()` (wait for child connect-back, send
`SpawnSpec`, yield `Portal`, etc.).
Why it doesn't work
-------------------
CPython's `PyOS_AfterFork_Child()` (in
`Modules/posixmodule.c`) calls
`_PyInterpreterState_DeleteExceptMain()` (in
`Python/pystate.c`) as part of post-fork cleanup. That
function requires the current `PyThreadState` belong to
the **main** interpreter. When `os.fork()` is called
from within a sub-interpreter, the child wakes up with
its tstate still pointing at the (now-stale) subint, and
this check fails with `PyStatus_ERR("not main
interpreter")`, triggering a `fatal_error` goto and
aborting the child process.
CPython devs acknowledge the fragility with a
`// Ideally we could guarantee tstate is running main.`
comment right above the call site.
See
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
for the full annotated walkthrough + upstream-report
draft.
Why we keep this stub
---------------------
- Documents the attempt in-tree so the next person who
has this idea finds the reason it doesn't work rather
than rediscovering the same CPython-level dead end.
- If CPython ever lifts the restriction (e.g., via a
force-destroy primitive or a hook that swaps tstate
to main pre-fork), this submodule's git history holds
the structural sketch of what a working impl would
look like.
'''
if not _has_subints:
raise RuntimeError(
f'The {"subint_fork"!r} spawn backend requires '
f'Python 3.14+.\n'
f'Current runtime: {sys.version}'
)
raise NotImplementedError(
'The `subint_fork` spawn backend is blocked at the '
'CPython level — `os.fork()` from a non-main '
'sub-interpreter is refused by '
'`PyOS_AfterFork_Child()` → '
'`_PyInterpreterState_DeleteExceptMain()`, which '
'aborts the child with '
'`Fatal Python error: not main interpreter`.\n'
'\n'
'See '
'`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` '
'for the full analysis + upstream-report draft.'
)