Doc `subint_fork` as blocked by CPython post-fork
Empirical finding: the WIP `subint_fork_proc` scaffold landed in `cf0e3e6f` does *not* work on current CPython. The `fork()` syscall succeeds in the parent, but the CHILD aborts immediately during `PyOS_AfterFork_Child()` → `_PyInterpreterState_DeleteExceptMain()`, which gates on the current tstate belonging to the main interp — the child dies with `Fatal Python error: not main interpreter`. CPython devs acknowledge the fragility with an in-source comment (`// Ideally we could guarantee tstate is running main.`) but expose no user-facing hook to satisfy the precondition — so the strategy is structurally dead until upstream changes. Rather than delete the scaffold, reshape it into a documented dead-end so the next person with this idea lands on the reason rather than rediscovering the same CPython-level refusal. Deats, - Move `subint_fork_proc` out of `tractor.spawn._subint` into a new `tractor.spawn._subint_fork` dedicated module (153 LOC). Module + fn docstrings now describe the blockage directly; the fn body is trimmed to a `NotImplementedError` pointing at the analysis doc — no more dead-code `bootstrap` sketch bloating `_subint.py`. - `_spawn.py`: keep `'subint_fork'` in `SpawnMethodKey` + the `_methods` dispatch so `--spawn-backend=subint_fork` routes to a clean `NotImplementedError` rather than "invalid backend"; comment calls out the blockage. Collapse the duplicate py3.14 feature-gate in `try_set_start_method()` into a combined `case 'subint' | 'subint_fork':` arm. - New 337-line analysis: `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`. Annotated walkthrough from the user-visible fatal error down to the specific `Modules/posixmodule.c` + `Python/pystate.c` source lines enforcing the refusal, plus an upstream-report draft. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-codesubint_forkserver_backend
parent
eee79a0357
commit
0f48ed2eb9
|
|
@ -0,0 +1,337 @@
|
||||||
|
# `os.fork()` from a non-main sub-interpreter aborts the child (CPython refuses post-fork cleanup)
|
||||||
|
|
||||||
|
Third `subint`-class analysis in this project. Unlike its
|
||||||
|
two siblings (`subint_sigint_starvation_issue.md`,
|
||||||
|
`subint_cancel_delivery_hang_issue.md`), this one is not a
|
||||||
|
hang — it's a **hard CPython-level refusal** of an
|
||||||
|
experimental spawn strategy we wanted to try.
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
An in-process sub-interpreter cannot be used as a
|
||||||
|
"launchpad" for `os.fork()` on current CPython. The fork
|
||||||
|
syscall succeeds in the parent, but the forked CHILD
|
||||||
|
process is aborted immediately by CPython's post-fork
|
||||||
|
cleanup with:
|
||||||
|
|
||||||
|
```
|
||||||
|
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||||
|
```
|
||||||
|
|
||||||
|
This is enforced by a hard `PyStatus_ERR` gate in
|
||||||
|
`Python/pystate.c`. The CPython devs acknowledge the
|
||||||
|
fragility with an in-source comment (`// Ideally we could
|
||||||
|
guarantee tstate is running main.`) but provide no
|
||||||
|
mechanism to satisfy the precondition from user code.
|
||||||
|
|
||||||
|
**Implication for tractor**: the `subint_fork` backend
|
||||||
|
sketched in `tractor.spawn._subint_fork` is structurally
|
||||||
|
dead on current CPython. The submodule is kept as
|
||||||
|
documentation of the attempt; `--spawn-backend=subint_fork`
|
||||||
|
raises `NotImplementedError` pointing here.
|
||||||
|
|
||||||
|
## Context — why we tried this
|
||||||
|
|
||||||
|
The motivation is issue #379's "Our own thoughts, ideas
|
||||||
|
for `fork()`-workaround/hacks..." section. The existing
|
||||||
|
trio-backend (`tractor.spawn._trio.trio_proc`) spawns
|
||||||
|
subactors via `trio.lowlevel.open_process()` → ultimately
|
||||||
|
`posix_spawn()` or `fork+exec`, from the parent's main
|
||||||
|
interpreter that is currently running `trio.run()`. This
|
||||||
|
brushes against a known-fragile interaction between
|
||||||
|
`trio` and `fork()` tracked in
|
||||||
|
[python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
|
||||||
|
and siblings — mostly mitigated in `tractor`'s case only
|
||||||
|
incidentally (we `exec()` immediately post-fork).
|
||||||
|
|
||||||
|
The idea was:
|
||||||
|
|
||||||
|
1. Create a subint that has *never* imported `trio`.
|
||||||
|
2. From a worker thread in that subint, call `os.fork()`.
|
||||||
|
3. In the child, `execv()` back into
|
||||||
|
`python -m tractor._child` — same as `trio_proc` does.
|
||||||
|
4. The fork is from a trio-free context → trio+fork
|
||||||
|
hazards avoided regardless of downstream behavior.
|
||||||
|
|
||||||
|
The parent-side orchestration (`ipc_server.wait_for_peer`,
|
||||||
|
`SpawnSpec`, `Portal` yield) would reuse
|
||||||
|
`trio_proc`'s flow verbatim, with only the subproc-spawn
|
||||||
|
mechanics swapped.
|
||||||
|
|
||||||
|
## Symptom
|
||||||
|
|
||||||
|
Running the prototype (`tractor.spawn._subint_fork.subint_fork_proc`,
|
||||||
|
see git history prior to the stub revert) on py3.14:
|
||||||
|
|
||||||
|
```
|
||||||
|
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||||
|
Python runtime state: initialized
|
||||||
|
|
||||||
|
Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first):
|
||||||
|
File "<script>", line 2 in <module>
|
||||||
|
<script>:2: DeprecationWarning: This process (pid=802985) is multi-threaded, use of fork() may lead to deadlocks in the child.
|
||||||
|
```
|
||||||
|
|
||||||
|
Key clues:
|
||||||
|
|
||||||
|
- The **`DeprecationWarning`** fires in the parent (before
|
||||||
|
fork completes) — fork *is* executing, we get that far.
|
||||||
|
- The **`Fatal Python error`** comes from the child — it
|
||||||
|
aborts during CPython's post-fork C initialization
|
||||||
|
before any user Python runs in the child.
|
||||||
|
- The thread name `subint-fork-lau[nchpad]` is ours —
|
||||||
|
confirms the fork is being called from the launchpad
|
||||||
|
subint's driver thread.
|
||||||
|
|
||||||
|
## CPython source walkthrough
|
||||||
|
|
||||||
|
### Call site — `Modules/posixmodule.c:728-793`
|
||||||
|
|
||||||
|
The post-fork-child hook CPython runs in the child process:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void
|
||||||
|
PyOS_AfterFork_Child(void)
|
||||||
|
{
|
||||||
|
PyStatus status;
|
||||||
|
_PyRuntimeState *runtime = &_PyRuntime;
|
||||||
|
|
||||||
|
// re-creates runtime->interpreters.mutex (HEAD_UNLOCK)
|
||||||
|
status = _PyRuntimeState_ReInitThreads(runtime);
|
||||||
|
...
|
||||||
|
|
||||||
|
PyThreadState *tstate = _PyThreadState_GET();
|
||||||
|
_Py_EnsureTstateNotNULL(tstate);
|
||||||
|
|
||||||
|
...
|
||||||
|
|
||||||
|
// Ideally we could guarantee tstate is running main. ← !!!
|
||||||
|
_PyInterpreterState_ReinitRunningMain(tstate);
|
||||||
|
|
||||||
|
status = _PyEval_ReInitThreads(tstate);
|
||||||
|
...
|
||||||
|
|
||||||
|
status = _PyInterpreterState_DeleteExceptMain(runtime);
|
||||||
|
if (_PyStatus_EXCEPTION(status)) {
|
||||||
|
goto fatal_error;
|
||||||
|
}
|
||||||
|
...
|
||||||
|
|
||||||
|
fatal_error:
|
||||||
|
Py_ExitStatusException(status);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `// Ideally we could guarantee tstate is running
|
||||||
|
main.` comment is a flashing warning sign — the CPython
|
||||||
|
devs *know* this path is fragile when fork is called from
|
||||||
|
a non-main subint, but they've chosen to abort rather than
|
||||||
|
silently corrupt state. Arguably the right call.
|
||||||
|
|
||||||
|
### The refusal — `Python/pystate.c:1035-1075`
|
||||||
|
|
||||||
|
```c
|
||||||
|
/*
|
||||||
|
* Delete all interpreter states except the main interpreter. If there
|
||||||
|
* is a current interpreter state, it *must* be the main interpreter.
|
||||||
|
*/
|
||||||
|
PyStatus
|
||||||
|
_PyInterpreterState_DeleteExceptMain(_PyRuntimeState *runtime)
|
||||||
|
{
|
||||||
|
struct pyinterpreters *interpreters = &runtime->interpreters;
|
||||||
|
|
||||||
|
PyThreadState *tstate = _PyThreadState_Swap(runtime, NULL);
|
||||||
|
if (tstate != NULL && tstate->interp != interpreters->main) {
|
||||||
|
return _PyStatus_ERR("not main interpreter"); ← our error
|
||||||
|
}
|
||||||
|
|
||||||
|
HEAD_LOCK(runtime);
|
||||||
|
PyInterpreterState *interp = interpreters->head;
|
||||||
|
interpreters->head = NULL;
|
||||||
|
while (interp != NULL) {
|
||||||
|
if (interp == interpreters->main) {
|
||||||
|
interpreters->main->next = NULL;
|
||||||
|
interpreters->head = interp;
|
||||||
|
interp = interp->next;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// XXX Won't this fail since PyInterpreterState_Clear() requires
|
||||||
|
// the "current" tstate to be set?
|
||||||
|
PyInterpreterState_Clear(interp); // XXX must activate?
|
||||||
|
zapthreads(interp);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The comment in the docstring (`If there is a current
|
||||||
|
interpreter state, it *must* be the main interpreter.`) is
|
||||||
|
the formal API contract. The `XXX` comments further in
|
||||||
|
suggest the CPython team is already aware this function
|
||||||
|
has latent issues even in the happy path.
|
||||||
|
|
||||||
|
## Chain summary
|
||||||
|
|
||||||
|
1. Our launchpad subint's driver OS-thread calls
|
||||||
|
`os.fork()`.
|
||||||
|
2. `fork()` succeeds. Child wakes up with:
|
||||||
|
- The parent's full memory image (including all
|
||||||
|
subints).
|
||||||
|
- Only the *calling* thread alive (the driver thread).
|
||||||
|
- `_PyThreadState_GET()` on that thread returns the
|
||||||
|
**launchpad subint's tstate**, *not* main's.
|
||||||
|
3. CPython runs `PyOS_AfterFork_Child()`.
|
||||||
|
4. It reaches `_PyInterpreterState_DeleteExceptMain()`.
|
||||||
|
5. Gate check fails: `tstate->interp != interpreters->main`.
|
||||||
|
6. `PyStatus_ERR("not main interpreter")` → `fatal_error`
|
||||||
|
goto → `Py_ExitStatusException()` → child aborts.
|
||||||
|
|
||||||
|
Parent-side consequence: `os.fork()` in the subint
|
||||||
|
bootstrap returned successfully with the child's PID, but
|
||||||
|
the child died before connecting back. Our parent's
|
||||||
|
`ipc_server.wait_for_peer(uid)` would hang forever — the
|
||||||
|
child never gets to `_actor_child_main`.
|
||||||
|
|
||||||
|
## Definitive answer to "Open Question 1"
|
||||||
|
|
||||||
|
From the (now-stub) `subint_fork_proc` docstring:
|
||||||
|
|
||||||
|
> Does CPython allow `os.fork()` from a non-main
|
||||||
|
> sub-interpreter under the legacy config?
|
||||||
|
|
||||||
|
**No.** Not in a usable-by-user-code sense. The fork
|
||||||
|
syscall is not blocked, but the child cannot survive
|
||||||
|
CPython's post-fork initialization. This is enforced, not
|
||||||
|
accidental, and the CPython devs have acknowledged the
|
||||||
|
fragility in-source.
|
||||||
|
|
||||||
|
## What we'd need from CPython to unblock
|
||||||
|
|
||||||
|
Any one of these, from least-to-most invasive:
|
||||||
|
|
||||||
|
1. **A pre-fork hook mechanism** that lets user code (or
|
||||||
|
tractor itself via `os.register_at_fork(before=...)`)
|
||||||
|
swap the current tstate to main before fork runs. The
|
||||||
|
swap would need to work across the subint→main
|
||||||
|
boundary, which is the actual hard part —
|
||||||
|
`_PyThreadState_Swap()` exists but is internal.
|
||||||
|
|
||||||
|
2. **A `_PyInterpreterState_DeleteExceptFor(tstate->interp)`
|
||||||
|
variant** that cleans up all *other* subints while
|
||||||
|
preserving the calling subint's state. Lets the child
|
||||||
|
continue executing in the subint after fork; a
|
||||||
|
subsequent `execv()` clears everything at the OS
|
||||||
|
level anyway.
|
||||||
|
|
||||||
|
3. **A cleaner error** than `Fatal Python error` aborting
|
||||||
|
the child. Even without fixing the underlying
|
||||||
|
capability, a raised Python-level exception in the
|
||||||
|
parent's `fork()` call (rather than a silent child
|
||||||
|
abort) would at least make the failure mode
|
||||||
|
debuggable.
|
||||||
|
|
||||||
|
## Upstream-report draft (for CPython issue tracker)
|
||||||
|
|
||||||
|
### Title
|
||||||
|
|
||||||
|
> `os.fork()` from a non-main sub-interpreter aborts the
|
||||||
|
> child with a fatal error in `PyOS_AfterFork_Child`; can
|
||||||
|
> we at least make it a clean `RuntimeError` in the
|
||||||
|
> parent?
|
||||||
|
|
||||||
|
### Body
|
||||||
|
|
||||||
|
> **Version**: Python 3.14.x
|
||||||
|
>
|
||||||
|
> **Summary**: Calling `os.fork()` from a thread currently
|
||||||
|
> executing inside a sub-interpreter causes the forked
|
||||||
|
> child process to abort during CPython's post-fork
|
||||||
|
> cleanup, with the following output in the child:
|
||||||
|
>
|
||||||
|
> ```
|
||||||
|
> Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||||
|
> ```
|
||||||
|
>
|
||||||
|
> From the **parent's** point of view the fork succeeded
|
||||||
|
> (returned a valid child PID). The failure is completely
|
||||||
|
> opaque to parent-side Python code — unless the parent
|
||||||
|
> does `os.waitpid()` it won't even notice the child
|
||||||
|
> died.
|
||||||
|
>
|
||||||
|
> **Root cause** (as I understand it from reading sources):
|
||||||
|
> `Modules/posixmodule.c::PyOS_AfterFork_Child()` calls
|
||||||
|
> `_PyInterpreterState_DeleteExceptMain()` with a
|
||||||
|
> precondition that `_PyThreadState_GET()->interp` be the
|
||||||
|
> main interpreter. When `fork()` is called from a thread
|
||||||
|
> executing inside a subinterpreter, the child wakes up
|
||||||
|
> with its tstate still pointing at the subint, and the
|
||||||
|
> gate in `Python/pystate.c:1044-1047` fails.
|
||||||
|
>
|
||||||
|
> A comment in the source
|
||||||
|
> (`Modules/posixmodule.c:753` — `// Ideally we could
|
||||||
|
> guarantee tstate is running main.`) suggests this is a
|
||||||
|
> known-fragile path rather than an intentional
|
||||||
|
> invariant.
|
||||||
|
>
|
||||||
|
> **Use case**: I was experimenting with using a
|
||||||
|
> sub-interpreter as a "fork launchpad" — have a subint
|
||||||
|
> that has never imported `trio`, call `os.fork()` from
|
||||||
|
> that subint's thread, and in the child `execv()` back
|
||||||
|
> into a fresh Python interpreter process. The goal was
|
||||||
|
> to sidestep known issues with `trio` + `fork()`
|
||||||
|
> interaction (see
|
||||||
|
> [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614))
|
||||||
|
> by guaranteeing the forking context had never been
|
||||||
|
> "contaminated" by trio's imports or globals. This
|
||||||
|
> approach would allow `trio`-using applications to
|
||||||
|
> combine `fork`-based subprocess spawning with
|
||||||
|
> per-worker `trio.run()` runtimes — a fairly common
|
||||||
|
> pattern that currently requires workarounds.
|
||||||
|
>
|
||||||
|
> **Request**:
|
||||||
|
>
|
||||||
|
> Ideally: make fork-from-subint work (e.g., by swapping
|
||||||
|
> the caller's tstate to main in the pre-fork hook), or
|
||||||
|
> provide a `_PyInterpreterState_DeleteExceptFor(interp)`
|
||||||
|
> variant that permits the caller's subint to survive
|
||||||
|
> post-fork so user code can subsequently `execv()`.
|
||||||
|
>
|
||||||
|
> Minimally: convert the fatal child-side abort into a
|
||||||
|
> clean `RuntimeError` (or similar) raised in the
|
||||||
|
> parent's `fork()` call. Even if the capability isn't
|
||||||
|
> expanded, the failure mode should be debuggable by
|
||||||
|
> user-code in the parent — right now it's a silent
|
||||||
|
> child death with an error message buried in the
|
||||||
|
> child's stderr that parent code can't programmatically
|
||||||
|
> see.
|
||||||
|
>
|
||||||
|
> **Related**: PEP 684 (per-interpreter GIL), PEP 734
|
||||||
|
> (`concurrent.interpreters` public API). The private
|
||||||
|
> `_interpreters` module is what I used to create the
|
||||||
|
> launchpad — behavior is the same whether using
|
||||||
|
> `_interpreters.create('legacy')` or
|
||||||
|
> `concurrent.interpreters.create()` (the latter was not
|
||||||
|
> tested but the gate is identical).
|
||||||
|
>
|
||||||
|
> Happy to contribute a minimal reproducer + test case if
|
||||||
|
> this is something the team wants to pursue.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `Modules/posixmodule.c:728` —
|
||||||
|
[`PyOS_AfterFork_Child`](https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L728)
|
||||||
|
- `Python/pystate.c:1040` —
|
||||||
|
[`_PyInterpreterState_DeleteExceptMain`](https://github.com/python/cpython/blob/main/Python/pystate.c#L1040)
|
||||||
|
- PEP 684 (per-interpreter GIL):
|
||||||
|
<https://peps.python.org/pep-0684/>
|
||||||
|
- PEP 734 (`concurrent.interpreters` public API):
|
||||||
|
<https://peps.python.org/pep-0734/>
|
||||||
|
- [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
|
||||||
|
— the original motivation for the launchpad idea.
|
||||||
|
- tractor issue #379 — "Our own thoughts, ideas for
|
||||||
|
`fork()`-workaround/hacks..." section where this was
|
||||||
|
first sketched.
|
||||||
|
- `tractor.spawn._subint_fork` — in-tree stub preserving
|
||||||
|
the attempted impl's shape in git history.
|
||||||
|
|
@ -63,6 +63,15 @@ SpawnMethodKey = Literal[
|
||||||
'mp_spawn',
|
'mp_spawn',
|
||||||
'mp_forkserver', # posix only
|
'mp_forkserver', # posix only
|
||||||
'subint', # py3.14+ via `concurrent.interpreters` (PEP 734)
|
'subint', # py3.14+ via `concurrent.interpreters` (PEP 734)
|
||||||
|
# EXPERIMENTAL — blocked at the CPython level. The
|
||||||
|
# design goal was a `trio+fork`-safe subproc spawn via
|
||||||
|
# `os.fork()` from a trio-free launchpad sub-interpreter,
|
||||||
|
# but CPython's `PyOS_AfterFork_Child` → `_PyInterpreterState_DeleteExceptMain`
|
||||||
|
# requires fork come from the main interp. See
|
||||||
|
# `tractor.spawn._subint_fork` +
|
||||||
|
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
# + issue #379 for the full analysis.
|
||||||
|
'subint_fork',
|
||||||
]
|
]
|
||||||
_spawn_method: SpawnMethodKey = 'trio'
|
_spawn_method: SpawnMethodKey = 'trio'
|
||||||
|
|
||||||
|
|
@ -115,15 +124,13 @@ def try_set_start_method(
|
||||||
case 'trio':
|
case 'trio':
|
||||||
_ctx = None
|
_ctx = None
|
||||||
|
|
||||||
case 'subint':
|
case 'subint' | 'subint_fork':
|
||||||
# subints need no `mp.context`; feature-gate on the
|
# Both subint backends need no `mp.context`; both
|
||||||
# py3.14 public `concurrent.interpreters` wrapper
|
# feature-gate on the py3.14 public
|
||||||
# (PEP 734). We actually drive the private
|
# `concurrent.interpreters` wrapper (PEP 734). See
|
||||||
# `_interpreters` C module in legacy mode — see
|
# `tractor.spawn._subint` for the detailed
|
||||||
# `tractor.spawn._subint` for why — but py3.13's
|
# reasoning and the distinction between the two
|
||||||
# vintage of that private module hangs under our
|
# (`subint_fork` is WIP/experimental).
|
||||||
# multi-trio usage, so we refuse it via the public-
|
|
||||||
# module presence check.
|
|
||||||
from ._subint import _has_subints
|
from ._subint import _has_subints
|
||||||
if not _has_subints:
|
if not _has_subints:
|
||||||
raise RuntimeError(
|
raise RuntimeError(
|
||||||
|
|
@ -461,6 +468,7 @@ async def new_proc(
|
||||||
from ._trio import trio_proc
|
from ._trio import trio_proc
|
||||||
from ._mp import mp_proc
|
from ._mp import mp_proc
|
||||||
from ._subint import subint_proc
|
from ._subint import subint_proc
|
||||||
|
from ._subint_fork import subint_fork_proc
|
||||||
|
|
||||||
|
|
||||||
# proc spawning backend target map
|
# proc spawning backend target map
|
||||||
|
|
@ -469,4 +477,10 @@ _methods: dict[SpawnMethodKey, Callable] = {
|
||||||
'mp_spawn': mp_proc,
|
'mp_spawn': mp_proc,
|
||||||
'mp_forkserver': mp_proc,
|
'mp_forkserver': mp_proc,
|
||||||
'subint': subint_proc,
|
'subint': subint_proc,
|
||||||
|
# blocked at CPython level — see `_subint_fork.py` +
|
||||||
|
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||||
|
# Kept here so `--spawn-backend=subint_fork` routes to a
|
||||||
|
# clean `NotImplementedError` with pointer to the analysis,
|
||||||
|
# rather than an "invalid backend" error.
|
||||||
|
'subint_fork': subint_fork_proc,
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -433,203 +433,3 @@ async def subint_proc(
|
||||||
actor_nursery._children.pop(uid, None)
|
actor_nursery._children.pop(uid, None)
|
||||||
|
|
||||||
|
|
||||||
# ============================================================
|
|
||||||
# WIP PROTOTYPE — `subint_fork_proc`
|
|
||||||
# ============================================================
|
|
||||||
# Experimental: use a sub-interpreter purely as a launchpad
|
|
||||||
# from which to `os.fork()`, sidestepping the well-known
|
|
||||||
# trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
|
|
||||||
# the forking interp hasn't ever imported / run `trio`.
|
|
||||||
#
|
|
||||||
# The current `tractor.spawn._trio` backend already spawns a
|
|
||||||
# subprocess and has the child connect back to the parent
|
|
||||||
# over IPC. THIS prototype only changes *how* the subproc
|
|
||||||
# comes into existence — everything downstream (parent-side
|
|
||||||
# `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal` yield,
|
|
||||||
# soft-kill) is reused verbatim.
|
|
||||||
#
|
|
||||||
# Reference: issue #379's "Our own thoughts, ideas for
|
|
||||||
# fork()-workaround/hacks..." section.
|
|
||||||
# ============================================================
|
|
||||||
|
|
||||||
|
|
||||||
async def subint_fork_proc(
|
|
||||||
name: str,
|
|
||||||
actor_nursery: ActorNursery,
|
|
||||||
subactor: Actor,
|
|
||||||
errors: dict[tuple[str, str], Exception],
|
|
||||||
|
|
||||||
# passed through to actor main
|
|
||||||
bind_addrs: list[UnwrappedAddress],
|
|
||||||
parent_addr: UnwrappedAddress,
|
|
||||||
_runtime_vars: dict[str, Any],
|
|
||||||
*,
|
|
||||||
infect_asyncio: bool = False,
|
|
||||||
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
|
|
||||||
proc_kwargs: dict[str, any] = {},
|
|
||||||
|
|
||||||
) -> None:
|
|
||||||
'''
|
|
||||||
EXPERIMENTAL / WIP: `trio`-safe `fork()` via a pristine
|
|
||||||
sub-interpreter launchpad.
|
|
||||||
|
|
||||||
Core trick
|
|
||||||
----------
|
|
||||||
Create a fresh subint that has *never* imported `trio`.
|
|
||||||
From a worker thread, drive that subint to call
|
|
||||||
`os.fork()`. In the forked CHILD process, `exec()` back
|
|
||||||
into `python -m tractor._child` (a fresh process). In the
|
|
||||||
fork PARENT (still inside the launchpad subint), do
|
|
||||||
nothing — just let the subint's `exec` call return and
|
|
||||||
the worker thread exit. The parent-side trio task then
|
|
||||||
waits for the child process to connect back using the
|
|
||||||
same `ipc_server.wait_for_peer()` flow as `trio_proc`.
|
|
||||||
|
|
||||||
Why this matters
|
|
||||||
----------------
|
|
||||||
The existing `trio_proc` backend spawns a subprocess via
|
|
||||||
`trio.lowlevel.open_process()` which ultimately uses
|
|
||||||
`posix_spawn()` (or `fork+exec`) from the parent's main
|
|
||||||
interpreter — the one running `trio.run()`. That path is
|
|
||||||
affected by the trio+fork issues tracked in
|
|
||||||
python-trio/trio#1614 and related, some of which are
|
|
||||||
side-stepped only incidentally because we always `exec()`
|
|
||||||
immediately after fork.
|
|
||||||
|
|
||||||
By forking from a pristine subint instead, we have a
|
|
||||||
known-clean-of-trio fork parent. If we later want to try
|
|
||||||
**fork-without-exec** for faster startup and automatic
|
|
||||||
parent-`__main__` inheritance (the property `mp.fork`
|
|
||||||
gives for free), this approach could unlock that cleanly.
|
|
||||||
|
|
||||||
Relationship to the other backends
|
|
||||||
----------------------------------
|
|
||||||
- `trio_proc`: fork/exec from main interp → affected by
|
|
||||||
trio+fork issues, solved via immediate exec.
|
|
||||||
- `subint_proc`: in-process subint, no fork at all →
|
|
||||||
affected by shared-GIL abandoned-thread hazards (see
|
|
||||||
`ai/conc-anal/subint_sigint_starvation_issue.md`).
|
|
||||||
- `subint_fork_proc` (THIS): OS-level subproc (like
|
|
||||||
`trio_proc`) BUT forked from a trio-free subint →
|
|
||||||
avoids both issue-classes above, at the cost of an
|
|
||||||
extra subint create/destroy per spawn.
|
|
||||||
|
|
||||||
Status
|
|
||||||
------
|
|
||||||
**NOT IMPLEMENTED** beyond the bootstrap scaffolding
|
|
||||||
below. Open questions needing empirical validation:
|
|
||||||
|
|
||||||
1. Does CPython allow `os.fork()` from a non-main
|
|
||||||
sub-interpreter under the legacy config? The public
|
|
||||||
API is silent; there may be PEP 684 safety guards.
|
|
||||||
2. Does the forked child need to fully `exec()` or can
|
|
||||||
we stay fork-without-exec and `trio.run()` directly
|
|
||||||
from within the launchpad subint in the child? The
|
|
||||||
latter is the "interesting" mode — faster startup,
|
|
||||||
`__main__` inheritance — but opens the question of
|
|
||||||
what residual state from the parent's main interp
|
|
||||||
leaks into the child's subint.
|
|
||||||
3. How do `signal.set_wakeup_fd()`, installed signal
|
|
||||||
handlers, and other process-global state interact
|
|
||||||
when the forking thread is inside a subint? The
|
|
||||||
child presumably inherits them but a fresh
|
|
||||||
`trio.run()` resets what it cares about.
|
|
||||||
|
|
||||||
'''
|
|
||||||
if not _has_subints:
|
|
||||||
raise RuntimeError(
|
|
||||||
f'The {"subint_fork"!r} spawn backend requires '
|
|
||||||
f'Python 3.14+ (private stdlib `_interpreters` C '
|
|
||||||
f'module + tractor-usage stability).\n'
|
|
||||||
f'Current runtime: {sys.version}'
|
|
||||||
)
|
|
||||||
|
|
||||||
raise NotImplementedError(
|
|
||||||
'`subint_fork_proc` is a WIP prototype scaffold — '
|
|
||||||
'the driver thread + fork-bootstrap + connect-back '
|
|
||||||
'orchestration below is not yet wired up. See '
|
|
||||||
'issue #379 for context.\n'
|
|
||||||
'(Structure kept in-tree so the next iteration has '
|
|
||||||
'a concrete starting point rather than a blank page.)'
|
|
||||||
)
|
|
||||||
|
|
||||||
# ------------------------------------------------------------
|
|
||||||
# SKETCH (below is intentionally dead code; kept so reviewers
|
|
||||||
# can see the shape we'd plausibly build up to). Roughly
|
|
||||||
# mirrors `subint_proc` structure but WITHOUT the in-process
|
|
||||||
# subint lifetime management — the subint only lives long
|
|
||||||
# enough to call `os.fork()`.
|
|
||||||
# ------------------------------------------------------------
|
|
||||||
|
|
||||||
# Create the launchpad subint. Legacy config matches
|
|
||||||
# `subint_proc`'s reasoning (msgspec / PEP 684). For
|
|
||||||
# fork-via-subint, isolation is moot since we don't
|
|
||||||
# *stay* in the subint — we just need it trio-free.
|
|
||||||
interp_id: int = _interpreters.create('legacy')
|
|
||||||
log.runtime(
|
|
||||||
f'Created launchpad subint for fork-spawn\n'
|
|
||||||
f'(>\n'
|
|
||||||
f' |_interp_id={interp_id}\n'
|
|
||||||
)
|
|
||||||
|
|
||||||
uid: tuple[str, str] = subactor.aid.uid
|
|
||||||
loglevel: str | None = subactor.loglevel
|
|
||||||
|
|
||||||
# Bootstrap fires inside the launchpad subint on a
|
|
||||||
# worker OS-thread. Calls `os.fork()`. In the child,
|
|
||||||
# `execv` back into the existing `python -m tractor._child`
|
|
||||||
# CLI entry — which is what `trio_proc` already uses — so
|
|
||||||
# the connect-back dance is identical. In the fork-parent
|
|
||||||
# (still in the launchpad subint), return so the thread
|
|
||||||
# can exit and we can `_interpreters.destroy()` the
|
|
||||||
# launchpad.
|
|
||||||
#
|
|
||||||
# NOTE, `os.execv()` replaces the entire process image
|
|
||||||
# (all interps, all threads — CPython handles this at the
|
|
||||||
# OS level), so subint cleanup in the child is a no-op.
|
|
||||||
import shlex
|
|
||||||
uid_repr: str = repr(str(uid))
|
|
||||||
parent_addr_repr: str = repr(str(parent_addr))
|
|
||||||
bootstrap: str = (
|
|
||||||
'import os, sys\n'
|
|
||||||
'pid = os.fork()\n'
|
|
||||||
'if pid == 0:\n'
|
|
||||||
' # CHILD: full `exec` into fresh Python for\n'
|
|
||||||
' # maximum isolation. (A `fork`-without-exec\n'
|
|
||||||
' # variant would skip this and call\n'
|
|
||||||
' # `_actor_child_main` directly — see class\n'
|
|
||||||
' # docstring "Open question 2".)\n'
|
|
||||||
' os.execv(\n'
|
|
||||||
' sys.executable,\n'
|
|
||||||
' [\n'
|
|
||||||
' sys.executable,\n'
|
|
||||||
" '-m',\n"
|
|
||||||
" 'tractor._child',\n"
|
|
||||||
f' {shlex.quote("--uid")!r},\n'
|
|
||||||
f' {uid_repr},\n'
|
|
||||||
f' {shlex.quote("--parent_addr")!r},\n'
|
|
||||||
f' {parent_addr_repr},\n'
|
|
||||||
+ (
|
|
||||||
f' {shlex.quote("--loglevel")!r},\n'
|
|
||||||
f' {loglevel!r},\n'
|
|
||||||
if loglevel else ''
|
|
||||||
)
|
|
||||||
+ (
|
|
||||||
f' {shlex.quote("--asyncio")!r},\n'
|
|
||||||
if infect_asyncio else ''
|
|
||||||
)
|
|
||||||
+ ' ],\n'
|
|
||||||
' )\n'
|
|
||||||
'# FORK-PARENT branch falls through — we just want\n'
|
|
||||||
'# the launchpad subint to finish so the driver\n'
|
|
||||||
'# thread exits.\n'
|
|
||||||
)
|
|
||||||
|
|
||||||
# TODO: orchestrate driver thread (mirror `subint_proc`'s
|
|
||||||
# `_subint_target` pattern), then await
|
|
||||||
# `ipc_server.wait_for_peer(uid)` on the parent side —
|
|
||||||
# same as `trio_proc`. Soft-kill path is simpler here
|
|
||||||
# than in `subint_proc`: we're managing an OS subproc,
|
|
||||||
# not a legacy subint, so `Portal.cancel_actor()` + wait
|
|
||||||
# + OS-level `SIGKILL` fallback (like `trio_proc`'s
|
|
||||||
# `hard_kill()`) applies directly.
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,153 @@
|
||||||
|
# tractor: structured concurrent "actors".
|
||||||
|
# Copyright 2018-eternity Tyler Goodlet.
|
||||||
|
|
||||||
|
# This program is free software: you can redistribute it and/or modify
|
||||||
|
# it under the terms of the GNU Affero General Public License as published by
|
||||||
|
# the Free Software Foundation, either version 3 of the License, or
|
||||||
|
# (at your option) any later version.
|
||||||
|
|
||||||
|
# This program is distributed in the hope that it will be useful,
|
||||||
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
# GNU Affero General Public License for more details.
|
||||||
|
|
||||||
|
# You should have received a copy of the GNU Affero General Public License
|
||||||
|
# along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
'''
|
||||||
|
`subint_fork` spawn backend — BLOCKED at CPython level.
|
||||||
|
|
||||||
|
The idea was to use a sub-interpreter purely as a launchpad
|
||||||
|
from which to call `os.fork()`, sidestepping the well-known
|
||||||
|
trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
|
||||||
|
the forking interp had never imported `trio`.
|
||||||
|
|
||||||
|
**IT DOES NOT WORK ON CURRENT CPYTHON.** The fork syscall
|
||||||
|
itself succeeds (in the parent), but the forked CHILD
|
||||||
|
process aborts immediately during CPython's post-fork
|
||||||
|
cleanup — `PyOS_AfterFork_Child()` calls
|
||||||
|
`_PyInterpreterState_DeleteExceptMain()` which refuses to
|
||||||
|
operate when the current tstate belongs to a non-main
|
||||||
|
sub-interpreter.
|
||||||
|
|
||||||
|
Full annotated walkthrough from the user-visible error
|
||||||
|
(`Fatal Python error: _PyInterpreterState_DeleteExceptMain:
|
||||||
|
not main interpreter`) down to the specific CPython source
|
||||||
|
lines that enforce this is in
|
||||||
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||||
|
|
||||||
|
We keep this submodule as a dedicated documentation of the
|
||||||
|
attempt. If CPython ever lifts the restriction (e.g., via a
|
||||||
|
force-destroy primitive or a hook that swaps tstate to main
|
||||||
|
pre-fork), the structural sketch preserved in this file's
|
||||||
|
git history is a concrete starting point for a working impl.
|
||||||
|
|
||||||
|
See also: issue #379's "Our own thoughts, ideas for
|
||||||
|
`fork()`-workaround/hacks..." section.
|
||||||
|
|
||||||
|
'''
|
||||||
|
from __future__ import annotations
|
||||||
|
import sys
|
||||||
|
from typing import (
|
||||||
|
Any,
|
||||||
|
TYPE_CHECKING,
|
||||||
|
)
|
||||||
|
|
||||||
|
import trio
|
||||||
|
from trio import TaskStatus
|
||||||
|
|
||||||
|
from tractor.runtime._portal import Portal
|
||||||
|
from ._subint import _has_subints
|
||||||
|
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from tractor.discovery._addr import UnwrappedAddress
|
||||||
|
from tractor.runtime._runtime import Actor
|
||||||
|
from tractor.runtime._supervise import ActorNursery
|
||||||
|
|
||||||
|
|
||||||
|
async def subint_fork_proc(
|
||||||
|
name: str,
|
||||||
|
actor_nursery: ActorNursery,
|
||||||
|
subactor: Actor,
|
||||||
|
errors: dict[tuple[str, str], Exception],
|
||||||
|
|
||||||
|
bind_addrs: list[UnwrappedAddress],
|
||||||
|
parent_addr: UnwrappedAddress,
|
||||||
|
_runtime_vars: dict[str, Any],
|
||||||
|
*,
|
||||||
|
infect_asyncio: bool = False,
|
||||||
|
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
|
||||||
|
proc_kwargs: dict[str, any] = {},
|
||||||
|
|
||||||
|
) -> None:
|
||||||
|
'''
|
||||||
|
EXPERIMENTAL — currently blocked by a CPython invariant.
|
||||||
|
|
||||||
|
Attempted design
|
||||||
|
----------------
|
||||||
|
1. Parent creates a fresh legacy-config subint.
|
||||||
|
2. A worker OS-thread drives the subint through a
|
||||||
|
bootstrap that calls `os.fork()`.
|
||||||
|
3. In the forked CHILD, `os.execv()` back into
|
||||||
|
`python -m tractor._child` (fresh process).
|
||||||
|
4. In the fork-PARENT, the launchpad subint is destroyed;
|
||||||
|
parent-side trio task proceeds identically to
|
||||||
|
`trio_proc()` (wait for child connect-back, send
|
||||||
|
`SpawnSpec`, yield `Portal`, etc.).
|
||||||
|
|
||||||
|
Why it doesn't work
|
||||||
|
-------------------
|
||||||
|
CPython's `PyOS_AfterFork_Child()` (in
|
||||||
|
`Modules/posixmodule.c`) calls
|
||||||
|
`_PyInterpreterState_DeleteExceptMain()` (in
|
||||||
|
`Python/pystate.c`) as part of post-fork cleanup. That
|
||||||
|
function requires the current `PyThreadState` belong to
|
||||||
|
the **main** interpreter. When `os.fork()` is called
|
||||||
|
from within a sub-interpreter, the child wakes up with
|
||||||
|
its tstate still pointing at the (now-stale) subint, and
|
||||||
|
this check fails with `PyStatus_ERR("not main
|
||||||
|
interpreter")`, triggering a `fatal_error` goto and
|
||||||
|
aborting the child process.
|
||||||
|
|
||||||
|
CPython devs acknowledge the fragility with a
|
||||||
|
`// Ideally we could guarantee tstate is running main.`
|
||||||
|
comment right above the call site.
|
||||||
|
|
||||||
|
See
|
||||||
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
for the full annotated walkthrough + upstream-report
|
||||||
|
draft.
|
||||||
|
|
||||||
|
Why we keep this stub
|
||||||
|
---------------------
|
||||||
|
- Documents the attempt in-tree so the next person who
|
||||||
|
has this idea finds the reason it doesn't work rather
|
||||||
|
than rediscovering the same CPython-level dead end.
|
||||||
|
- If CPython ever lifts the restriction (e.g., via a
|
||||||
|
force-destroy primitive or a hook that swaps tstate
|
||||||
|
to main pre-fork), this submodule's git history holds
|
||||||
|
the structural sketch of what a working impl would
|
||||||
|
look like.
|
||||||
|
|
||||||
|
'''
|
||||||
|
if not _has_subints:
|
||||||
|
raise RuntimeError(
|
||||||
|
f'The {"subint_fork"!r} spawn backend requires '
|
||||||
|
f'Python 3.14+.\n'
|
||||||
|
f'Current runtime: {sys.version}'
|
||||||
|
)
|
||||||
|
|
||||||
|
raise NotImplementedError(
|
||||||
|
'The `subint_fork` spawn backend is blocked at the '
|
||||||
|
'CPython level — `os.fork()` from a non-main '
|
||||||
|
'sub-interpreter is refused by '
|
||||||
|
'`PyOS_AfterFork_Child()` → '
|
||||||
|
'`_PyInterpreterState_DeleteExceptMain()`, which '
|
||||||
|
'aborts the child with '
|
||||||
|
'`Fatal Python error: not main interpreter`.\n'
|
||||||
|
'\n'
|
||||||
|
'See '
|
||||||
|
'`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` '
|
||||||
|
'for the full analysis + upstream-report draft.'
|
||||||
|
)
|
||||||
Loading…
Reference in New Issue