Doc `subint_fork` as blocked by CPython post-fork
Empirical finding: the WIP `subint_fork_proc` scaffold landed in `cf0e3e6f` does *not* work on current CPython. The `fork()` syscall succeeds in the parent, but the CHILD aborts immediately during `PyOS_AfterFork_Child()` → `_PyInterpreterState_DeleteExceptMain()`, which gates on the current tstate belonging to the main interp — the child dies with `Fatal Python error: not main interpreter`. CPython devs acknowledge the fragility with an in-source comment (`// Ideally we could guarantee tstate is running main.`) but expose no user-facing hook to satisfy the precondition — so the strategy is structurally dead until upstream changes. Rather than delete the scaffold, reshape it into a documented dead-end so the next person with this idea lands on the reason rather than rediscovering the same CPython-level refusal. Deats, - Move `subint_fork_proc` out of `tractor.spawn._subint` into a new `tractor.spawn._subint_fork` dedicated module (153 LOC). Module + fn docstrings now describe the blockage directly; the fn body is trimmed to a `NotImplementedError` pointing at the analysis doc — no more dead-code `bootstrap` sketch bloating `_subint.py`. - `_spawn.py`: keep `'subint_fork'` in `SpawnMethodKey` + the `_methods` dispatch so `--spawn-backend=subint_fork` routes to a clean `NotImplementedError` rather than "invalid backend"; comment calls out the blockage. Collapse the duplicate py3.14 feature-gate in `try_set_start_method()` into a combined `case 'subint' | 'subint_fork':` arm. - New 337-line analysis: `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`. Annotated walkthrough from the user-visible fatal error down to the specific `Modules/posixmodule.c` + `Python/pystate.c` source lines enforcing the refusal, plus an upstream-report draft. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-codesubint_forkserver_backend
parent
eee79a0357
commit
0f48ed2eb9
|
|
@ -0,0 +1,337 @@
|
|||
# `os.fork()` from a non-main sub-interpreter aborts the child (CPython refuses post-fork cleanup)
|
||||
|
||||
Third `subint`-class analysis in this project. Unlike its
|
||||
two siblings (`subint_sigint_starvation_issue.md`,
|
||||
`subint_cancel_delivery_hang_issue.md`), this one is not a
|
||||
hang — it's a **hard CPython-level refusal** of an
|
||||
experimental spawn strategy we wanted to try.
|
||||
|
||||
## TL;DR
|
||||
|
||||
An in-process sub-interpreter cannot be used as a
|
||||
"launchpad" for `os.fork()` on current CPython. The fork
|
||||
syscall succeeds in the parent, but the forked CHILD
|
||||
process is aborted immediately by CPython's post-fork
|
||||
cleanup with:
|
||||
|
||||
```
|
||||
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||
```
|
||||
|
||||
This is enforced by a hard `PyStatus_ERR` gate in
|
||||
`Python/pystate.c`. The CPython devs acknowledge the
|
||||
fragility with an in-source comment (`// Ideally we could
|
||||
guarantee tstate is running main.`) but provide no
|
||||
mechanism to satisfy the precondition from user code.
|
||||
|
||||
**Implication for tractor**: the `subint_fork` backend
|
||||
sketched in `tractor.spawn._subint_fork` is structurally
|
||||
dead on current CPython. The submodule is kept as
|
||||
documentation of the attempt; `--spawn-backend=subint_fork`
|
||||
raises `NotImplementedError` pointing here.
|
||||
|
||||
## Context — why we tried this
|
||||
|
||||
The motivation is issue #379's "Our own thoughts, ideas
|
||||
for `fork()`-workaround/hacks..." section. The existing
|
||||
trio-backend (`tractor.spawn._trio.trio_proc`) spawns
|
||||
subactors via `trio.lowlevel.open_process()` → ultimately
|
||||
`posix_spawn()` or `fork+exec`, from the parent's main
|
||||
interpreter that is currently running `trio.run()`. This
|
||||
brushes against a known-fragile interaction between
|
||||
`trio` and `fork()` tracked in
|
||||
[python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
|
||||
and siblings — mostly mitigated in `tractor`'s case only
|
||||
incidentally (we `exec()` immediately post-fork).
|
||||
|
||||
The idea was:
|
||||
|
||||
1. Create a subint that has *never* imported `trio`.
|
||||
2. From a worker thread in that subint, call `os.fork()`.
|
||||
3. In the child, `execv()` back into
|
||||
`python -m tractor._child` — same as `trio_proc` does.
|
||||
4. The fork is from a trio-free context → trio+fork
|
||||
hazards avoided regardless of downstream behavior.
|
||||
|
||||
The parent-side orchestration (`ipc_server.wait_for_peer`,
|
||||
`SpawnSpec`, `Portal` yield) would reuse
|
||||
`trio_proc`'s flow verbatim, with only the subproc-spawn
|
||||
mechanics swapped.
|
||||
|
||||
## Symptom
|
||||
|
||||
Running the prototype (`tractor.spawn._subint_fork.subint_fork_proc`,
|
||||
see git history prior to the stub revert) on py3.14:
|
||||
|
||||
```
|
||||
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||
Python runtime state: initialized
|
||||
|
||||
Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first):
|
||||
File "<script>", line 2 in <module>
|
||||
<script>:2: DeprecationWarning: This process (pid=802985) is multi-threaded, use of fork() may lead to deadlocks in the child.
|
||||
```
|
||||
|
||||
Key clues:
|
||||
|
||||
- The **`DeprecationWarning`** fires in the parent (before
|
||||
fork completes) — fork *is* executing, we get that far.
|
||||
- The **`Fatal Python error`** comes from the child — it
|
||||
aborts during CPython's post-fork C initialization
|
||||
before any user Python runs in the child.
|
||||
- The thread name `subint-fork-lau[nchpad]` is ours —
|
||||
confirms the fork is being called from the launchpad
|
||||
subint's driver thread.
|
||||
|
||||
## CPython source walkthrough
|
||||
|
||||
### Call site — `Modules/posixmodule.c:728-793`
|
||||
|
||||
The post-fork-child hook CPython runs in the child process:
|
||||
|
||||
```c
|
||||
void
|
||||
PyOS_AfterFork_Child(void)
|
||||
{
|
||||
PyStatus status;
|
||||
_PyRuntimeState *runtime = &_PyRuntime;
|
||||
|
||||
// re-creates runtime->interpreters.mutex (HEAD_UNLOCK)
|
||||
status = _PyRuntimeState_ReInitThreads(runtime);
|
||||
...
|
||||
|
||||
PyThreadState *tstate = _PyThreadState_GET();
|
||||
_Py_EnsureTstateNotNULL(tstate);
|
||||
|
||||
...
|
||||
|
||||
// Ideally we could guarantee tstate is running main. ← !!!
|
||||
_PyInterpreterState_ReinitRunningMain(tstate);
|
||||
|
||||
status = _PyEval_ReInitThreads(tstate);
|
||||
...
|
||||
|
||||
status = _PyInterpreterState_DeleteExceptMain(runtime);
|
||||
if (_PyStatus_EXCEPTION(status)) {
|
||||
goto fatal_error;
|
||||
}
|
||||
...
|
||||
|
||||
fatal_error:
|
||||
Py_ExitStatusException(status);
|
||||
}
|
||||
```
|
||||
|
||||
The `// Ideally we could guarantee tstate is running
|
||||
main.` comment is a flashing warning sign — the CPython
|
||||
devs *know* this path is fragile when fork is called from
|
||||
a non-main subint, but they've chosen to abort rather than
|
||||
silently corrupt state. Arguably the right call.
|
||||
|
||||
### The refusal — `Python/pystate.c:1035-1075`
|
||||
|
||||
```c
|
||||
/*
|
||||
* Delete all interpreter states except the main interpreter. If there
|
||||
* is a current interpreter state, it *must* be the main interpreter.
|
||||
*/
|
||||
PyStatus
|
||||
_PyInterpreterState_DeleteExceptMain(_PyRuntimeState *runtime)
|
||||
{
|
||||
struct pyinterpreters *interpreters = &runtime->interpreters;
|
||||
|
||||
PyThreadState *tstate = _PyThreadState_Swap(runtime, NULL);
|
||||
if (tstate != NULL && tstate->interp != interpreters->main) {
|
||||
return _PyStatus_ERR("not main interpreter"); ← our error
|
||||
}
|
||||
|
||||
HEAD_LOCK(runtime);
|
||||
PyInterpreterState *interp = interpreters->head;
|
||||
interpreters->head = NULL;
|
||||
while (interp != NULL) {
|
||||
if (interp == interpreters->main) {
|
||||
interpreters->main->next = NULL;
|
||||
interpreters->head = interp;
|
||||
interp = interp->next;
|
||||
continue;
|
||||
}
|
||||
|
||||
// XXX Won't this fail since PyInterpreterState_Clear() requires
|
||||
// the "current" tstate to be set?
|
||||
PyInterpreterState_Clear(interp); // XXX must activate?
|
||||
zapthreads(interp);
|
||||
...
|
||||
}
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
The comment in the docstring (`If there is a current
|
||||
interpreter state, it *must* be the main interpreter.`) is
|
||||
the formal API contract. The `XXX` comments further in
|
||||
suggest the CPython team is already aware this function
|
||||
has latent issues even in the happy path.
|
||||
|
||||
## Chain summary
|
||||
|
||||
1. Our launchpad subint's driver OS-thread calls
|
||||
`os.fork()`.
|
||||
2. `fork()` succeeds. Child wakes up with:
|
||||
- The parent's full memory image (including all
|
||||
subints).
|
||||
- Only the *calling* thread alive (the driver thread).
|
||||
- `_PyThreadState_GET()` on that thread returns the
|
||||
**launchpad subint's tstate**, *not* main's.
|
||||
3. CPython runs `PyOS_AfterFork_Child()`.
|
||||
4. It reaches `_PyInterpreterState_DeleteExceptMain()`.
|
||||
5. Gate check fails: `tstate->interp != interpreters->main`.
|
||||
6. `PyStatus_ERR("not main interpreter")` → `fatal_error`
|
||||
goto → `Py_ExitStatusException()` → child aborts.
|
||||
|
||||
Parent-side consequence: `os.fork()` in the subint
|
||||
bootstrap returned successfully with the child's PID, but
|
||||
the child died before connecting back. Our parent's
|
||||
`ipc_server.wait_for_peer(uid)` would hang forever — the
|
||||
child never gets to `_actor_child_main`.
|
||||
|
||||
## Definitive answer to "Open Question 1"
|
||||
|
||||
From the (now-stub) `subint_fork_proc` docstring:
|
||||
|
||||
> Does CPython allow `os.fork()` from a non-main
|
||||
> sub-interpreter under the legacy config?
|
||||
|
||||
**No.** Not in a usable-by-user-code sense. The fork
|
||||
syscall is not blocked, but the child cannot survive
|
||||
CPython's post-fork initialization. This is enforced, not
|
||||
accidental, and the CPython devs have acknowledged the
|
||||
fragility in-source.
|
||||
|
||||
## What we'd need from CPython to unblock
|
||||
|
||||
Any one of these, from least-to-most invasive:
|
||||
|
||||
1. **A pre-fork hook mechanism** that lets user code (or
|
||||
tractor itself via `os.register_at_fork(before=...)`)
|
||||
swap the current tstate to main before fork runs. The
|
||||
swap would need to work across the subint→main
|
||||
boundary, which is the actual hard part —
|
||||
`_PyThreadState_Swap()` exists but is internal.
|
||||
|
||||
2. **A `_PyInterpreterState_DeleteExceptFor(tstate->interp)`
|
||||
variant** that cleans up all *other* subints while
|
||||
preserving the calling subint's state. Lets the child
|
||||
continue executing in the subint after fork; a
|
||||
subsequent `execv()` clears everything at the OS
|
||||
level anyway.
|
||||
|
||||
3. **A cleaner error** than `Fatal Python error` aborting
|
||||
the child. Even without fixing the underlying
|
||||
capability, a raised Python-level exception in the
|
||||
parent's `fork()` call (rather than a silent child
|
||||
abort) would at least make the failure mode
|
||||
debuggable.
|
||||
|
||||
## Upstream-report draft (for CPython issue tracker)
|
||||
|
||||
### Title
|
||||
|
||||
> `os.fork()` from a non-main sub-interpreter aborts the
|
||||
> child with a fatal error in `PyOS_AfterFork_Child`; can
|
||||
> we at least make it a clean `RuntimeError` in the
|
||||
> parent?
|
||||
|
||||
### Body
|
||||
|
||||
> **Version**: Python 3.14.x
|
||||
>
|
||||
> **Summary**: Calling `os.fork()` from a thread currently
|
||||
> executing inside a sub-interpreter causes the forked
|
||||
> child process to abort during CPython's post-fork
|
||||
> cleanup, with the following output in the child:
|
||||
>
|
||||
> ```
|
||||
> Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||
> ```
|
||||
>
|
||||
> From the **parent's** point of view the fork succeeded
|
||||
> (returned a valid child PID). The failure is completely
|
||||
> opaque to parent-side Python code — unless the parent
|
||||
> does `os.waitpid()` it won't even notice the child
|
||||
> died.
|
||||
>
|
||||
> **Root cause** (as I understand it from reading sources):
|
||||
> `Modules/posixmodule.c::PyOS_AfterFork_Child()` calls
|
||||
> `_PyInterpreterState_DeleteExceptMain()` with a
|
||||
> precondition that `_PyThreadState_GET()->interp` be the
|
||||
> main interpreter. When `fork()` is called from a thread
|
||||
> executing inside a subinterpreter, the child wakes up
|
||||
> with its tstate still pointing at the subint, and the
|
||||
> gate in `Python/pystate.c:1044-1047` fails.
|
||||
>
|
||||
> A comment in the source
|
||||
> (`Modules/posixmodule.c:753` — `// Ideally we could
|
||||
> guarantee tstate is running main.`) suggests this is a
|
||||
> known-fragile path rather than an intentional
|
||||
> invariant.
|
||||
>
|
||||
> **Use case**: I was experimenting with using a
|
||||
> sub-interpreter as a "fork launchpad" — have a subint
|
||||
> that has never imported `trio`, call `os.fork()` from
|
||||
> that subint's thread, and in the child `execv()` back
|
||||
> into a fresh Python interpreter process. The goal was
|
||||
> to sidestep known issues with `trio` + `fork()`
|
||||
> interaction (see
|
||||
> [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614))
|
||||
> by guaranteeing the forking context had never been
|
||||
> "contaminated" by trio's imports or globals. This
|
||||
> approach would allow `trio`-using applications to
|
||||
> combine `fork`-based subprocess spawning with
|
||||
> per-worker `trio.run()` runtimes — a fairly common
|
||||
> pattern that currently requires workarounds.
|
||||
>
|
||||
> **Request**:
|
||||
>
|
||||
> Ideally: make fork-from-subint work (e.g., by swapping
|
||||
> the caller's tstate to main in the pre-fork hook), or
|
||||
> provide a `_PyInterpreterState_DeleteExceptFor(interp)`
|
||||
> variant that permits the caller's subint to survive
|
||||
> post-fork so user code can subsequently `execv()`.
|
||||
>
|
||||
> Minimally: convert the fatal child-side abort into a
|
||||
> clean `RuntimeError` (or similar) raised in the
|
||||
> parent's `fork()` call. Even if the capability isn't
|
||||
> expanded, the failure mode should be debuggable by
|
||||
> user-code in the parent — right now it's a silent
|
||||
> child death with an error message buried in the
|
||||
> child's stderr that parent code can't programmatically
|
||||
> see.
|
||||
>
|
||||
> **Related**: PEP 684 (per-interpreter GIL), PEP 734
|
||||
> (`concurrent.interpreters` public API). The private
|
||||
> `_interpreters` module is what I used to create the
|
||||
> launchpad — behavior is the same whether using
|
||||
> `_interpreters.create('legacy')` or
|
||||
> `concurrent.interpreters.create()` (the latter was not
|
||||
> tested but the gate is identical).
|
||||
>
|
||||
> Happy to contribute a minimal reproducer + test case if
|
||||
> this is something the team wants to pursue.
|
||||
|
||||
## References
|
||||
|
||||
- `Modules/posixmodule.c:728` —
|
||||
[`PyOS_AfterFork_Child`](https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L728)
|
||||
- `Python/pystate.c:1040` —
|
||||
[`_PyInterpreterState_DeleteExceptMain`](https://github.com/python/cpython/blob/main/Python/pystate.c#L1040)
|
||||
- PEP 684 (per-interpreter GIL):
|
||||
<https://peps.python.org/pep-0684/>
|
||||
- PEP 734 (`concurrent.interpreters` public API):
|
||||
<https://peps.python.org/pep-0734/>
|
||||
- [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
|
||||
— the original motivation for the launchpad idea.
|
||||
- tractor issue #379 — "Our own thoughts, ideas for
|
||||
`fork()`-workaround/hacks..." section where this was
|
||||
first sketched.
|
||||
- `tractor.spawn._subint_fork` — in-tree stub preserving
|
||||
the attempted impl's shape in git history.
|
||||
|
|
@ -63,6 +63,15 @@ SpawnMethodKey = Literal[
|
|||
'mp_spawn',
|
||||
'mp_forkserver', # posix only
|
||||
'subint', # py3.14+ via `concurrent.interpreters` (PEP 734)
|
||||
# EXPERIMENTAL — blocked at the CPython level. The
|
||||
# design goal was a `trio+fork`-safe subproc spawn via
|
||||
# `os.fork()` from a trio-free launchpad sub-interpreter,
|
||||
# but CPython's `PyOS_AfterFork_Child` → `_PyInterpreterState_DeleteExceptMain`
|
||||
# requires fork come from the main interp. See
|
||||
# `tractor.spawn._subint_fork` +
|
||||
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
# + issue #379 for the full analysis.
|
||||
'subint_fork',
|
||||
]
|
||||
_spawn_method: SpawnMethodKey = 'trio'
|
||||
|
||||
|
|
@ -115,15 +124,13 @@ def try_set_start_method(
|
|||
case 'trio':
|
||||
_ctx = None
|
||||
|
||||
case 'subint':
|
||||
# subints need no `mp.context`; feature-gate on the
|
||||
# py3.14 public `concurrent.interpreters` wrapper
|
||||
# (PEP 734). We actually drive the private
|
||||
# `_interpreters` C module in legacy mode — see
|
||||
# `tractor.spawn._subint` for why — but py3.13's
|
||||
# vintage of that private module hangs under our
|
||||
# multi-trio usage, so we refuse it via the public-
|
||||
# module presence check.
|
||||
case 'subint' | 'subint_fork':
|
||||
# Both subint backends need no `mp.context`; both
|
||||
# feature-gate on the py3.14 public
|
||||
# `concurrent.interpreters` wrapper (PEP 734). See
|
||||
# `tractor.spawn._subint` for the detailed
|
||||
# reasoning and the distinction between the two
|
||||
# (`subint_fork` is WIP/experimental).
|
||||
from ._subint import _has_subints
|
||||
if not _has_subints:
|
||||
raise RuntimeError(
|
||||
|
|
@ -461,6 +468,7 @@ async def new_proc(
|
|||
from ._trio import trio_proc
|
||||
from ._mp import mp_proc
|
||||
from ._subint import subint_proc
|
||||
from ._subint_fork import subint_fork_proc
|
||||
|
||||
|
||||
# proc spawning backend target map
|
||||
|
|
@ -469,4 +477,10 @@ _methods: dict[SpawnMethodKey, Callable] = {
|
|||
'mp_spawn': mp_proc,
|
||||
'mp_forkserver': mp_proc,
|
||||
'subint': subint_proc,
|
||||
# blocked at CPython level — see `_subint_fork.py` +
|
||||
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||
# Kept here so `--spawn-backend=subint_fork` routes to a
|
||||
# clean `NotImplementedError` with pointer to the analysis,
|
||||
# rather than an "invalid backend" error.
|
||||
'subint_fork': subint_fork_proc,
|
||||
}
|
||||
|
|
|
|||
|
|
@ -433,203 +433,3 @@ async def subint_proc(
|
|||
actor_nursery._children.pop(uid, None)
|
||||
|
||||
|
||||
# ============================================================
|
||||
# WIP PROTOTYPE — `subint_fork_proc`
|
||||
# ============================================================
|
||||
# Experimental: use a sub-interpreter purely as a launchpad
|
||||
# from which to `os.fork()`, sidestepping the well-known
|
||||
# trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
|
||||
# the forking interp hasn't ever imported / run `trio`.
|
||||
#
|
||||
# The current `tractor.spawn._trio` backend already spawns a
|
||||
# subprocess and has the child connect back to the parent
|
||||
# over IPC. THIS prototype only changes *how* the subproc
|
||||
# comes into existence — everything downstream (parent-side
|
||||
# `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal` yield,
|
||||
# soft-kill) is reused verbatim.
|
||||
#
|
||||
# Reference: issue #379's "Our own thoughts, ideas for
|
||||
# fork()-workaround/hacks..." section.
|
||||
# ============================================================
|
||||
|
||||
|
||||
async def subint_fork_proc(
|
||||
name: str,
|
||||
actor_nursery: ActorNursery,
|
||||
subactor: Actor,
|
||||
errors: dict[tuple[str, str], Exception],
|
||||
|
||||
# passed through to actor main
|
||||
bind_addrs: list[UnwrappedAddress],
|
||||
parent_addr: UnwrappedAddress,
|
||||
_runtime_vars: dict[str, Any],
|
||||
*,
|
||||
infect_asyncio: bool = False,
|
||||
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
|
||||
proc_kwargs: dict[str, any] = {},
|
||||
|
||||
) -> None:
|
||||
'''
|
||||
EXPERIMENTAL / WIP: `trio`-safe `fork()` via a pristine
|
||||
sub-interpreter launchpad.
|
||||
|
||||
Core trick
|
||||
----------
|
||||
Create a fresh subint that has *never* imported `trio`.
|
||||
From a worker thread, drive that subint to call
|
||||
`os.fork()`. In the forked CHILD process, `exec()` back
|
||||
into `python -m tractor._child` (a fresh process). In the
|
||||
fork PARENT (still inside the launchpad subint), do
|
||||
nothing — just let the subint's `exec` call return and
|
||||
the worker thread exit. The parent-side trio task then
|
||||
waits for the child process to connect back using the
|
||||
same `ipc_server.wait_for_peer()` flow as `trio_proc`.
|
||||
|
||||
Why this matters
|
||||
----------------
|
||||
The existing `trio_proc` backend spawns a subprocess via
|
||||
`trio.lowlevel.open_process()` which ultimately uses
|
||||
`posix_spawn()` (or `fork+exec`) from the parent's main
|
||||
interpreter — the one running `trio.run()`. That path is
|
||||
affected by the trio+fork issues tracked in
|
||||
python-trio/trio#1614 and related, some of which are
|
||||
side-stepped only incidentally because we always `exec()`
|
||||
immediately after fork.
|
||||
|
||||
By forking from a pristine subint instead, we have a
|
||||
known-clean-of-trio fork parent. If we later want to try
|
||||
**fork-without-exec** for faster startup and automatic
|
||||
parent-`__main__` inheritance (the property `mp.fork`
|
||||
gives for free), this approach could unlock that cleanly.
|
||||
|
||||
Relationship to the other backends
|
||||
----------------------------------
|
||||
- `trio_proc`: fork/exec from main interp → affected by
|
||||
trio+fork issues, solved via immediate exec.
|
||||
- `subint_proc`: in-process subint, no fork at all →
|
||||
affected by shared-GIL abandoned-thread hazards (see
|
||||
`ai/conc-anal/subint_sigint_starvation_issue.md`).
|
||||
- `subint_fork_proc` (THIS): OS-level subproc (like
|
||||
`trio_proc`) BUT forked from a trio-free subint →
|
||||
avoids both issue-classes above, at the cost of an
|
||||
extra subint create/destroy per spawn.
|
||||
|
||||
Status
|
||||
------
|
||||
**NOT IMPLEMENTED** beyond the bootstrap scaffolding
|
||||
below. Open questions needing empirical validation:
|
||||
|
||||
1. Does CPython allow `os.fork()` from a non-main
|
||||
sub-interpreter under the legacy config? The public
|
||||
API is silent; there may be PEP 684 safety guards.
|
||||
2. Does the forked child need to fully `exec()` or can
|
||||
we stay fork-without-exec and `trio.run()` directly
|
||||
from within the launchpad subint in the child? The
|
||||
latter is the "interesting" mode — faster startup,
|
||||
`__main__` inheritance — but opens the question of
|
||||
what residual state from the parent's main interp
|
||||
leaks into the child's subint.
|
||||
3. How do `signal.set_wakeup_fd()`, installed signal
|
||||
handlers, and other process-global state interact
|
||||
when the forking thread is inside a subint? The
|
||||
child presumably inherits them but a fresh
|
||||
`trio.run()` resets what it cares about.
|
||||
|
||||
'''
|
||||
if not _has_subints:
|
||||
raise RuntimeError(
|
||||
f'The {"subint_fork"!r} spawn backend requires '
|
||||
f'Python 3.14+ (private stdlib `_interpreters` C '
|
||||
f'module + tractor-usage stability).\n'
|
||||
f'Current runtime: {sys.version}'
|
||||
)
|
||||
|
||||
raise NotImplementedError(
|
||||
'`subint_fork_proc` is a WIP prototype scaffold — '
|
||||
'the driver thread + fork-bootstrap + connect-back '
|
||||
'orchestration below is not yet wired up. See '
|
||||
'issue #379 for context.\n'
|
||||
'(Structure kept in-tree so the next iteration has '
|
||||
'a concrete starting point rather than a blank page.)'
|
||||
)
|
||||
|
||||
# ------------------------------------------------------------
|
||||
# SKETCH (below is intentionally dead code; kept so reviewers
|
||||
# can see the shape we'd plausibly build up to). Roughly
|
||||
# mirrors `subint_proc` structure but WITHOUT the in-process
|
||||
# subint lifetime management — the subint only lives long
|
||||
# enough to call `os.fork()`.
|
||||
# ------------------------------------------------------------
|
||||
|
||||
# Create the launchpad subint. Legacy config matches
|
||||
# `subint_proc`'s reasoning (msgspec / PEP 684). For
|
||||
# fork-via-subint, isolation is moot since we don't
|
||||
# *stay* in the subint — we just need it trio-free.
|
||||
interp_id: int = _interpreters.create('legacy')
|
||||
log.runtime(
|
||||
f'Created launchpad subint for fork-spawn\n'
|
||||
f'(>\n'
|
||||
f' |_interp_id={interp_id}\n'
|
||||
)
|
||||
|
||||
uid: tuple[str, str] = subactor.aid.uid
|
||||
loglevel: str | None = subactor.loglevel
|
||||
|
||||
# Bootstrap fires inside the launchpad subint on a
|
||||
# worker OS-thread. Calls `os.fork()`. In the child,
|
||||
# `execv` back into the existing `python -m tractor._child`
|
||||
# CLI entry — which is what `trio_proc` already uses — so
|
||||
# the connect-back dance is identical. In the fork-parent
|
||||
# (still in the launchpad subint), return so the thread
|
||||
# can exit and we can `_interpreters.destroy()` the
|
||||
# launchpad.
|
||||
#
|
||||
# NOTE, `os.execv()` replaces the entire process image
|
||||
# (all interps, all threads — CPython handles this at the
|
||||
# OS level), so subint cleanup in the child is a no-op.
|
||||
import shlex
|
||||
uid_repr: str = repr(str(uid))
|
||||
parent_addr_repr: str = repr(str(parent_addr))
|
||||
bootstrap: str = (
|
||||
'import os, sys\n'
|
||||
'pid = os.fork()\n'
|
||||
'if pid == 0:\n'
|
||||
' # CHILD: full `exec` into fresh Python for\n'
|
||||
' # maximum isolation. (A `fork`-without-exec\n'
|
||||
' # variant would skip this and call\n'
|
||||
' # `_actor_child_main` directly — see class\n'
|
||||
' # docstring "Open question 2".)\n'
|
||||
' os.execv(\n'
|
||||
' sys.executable,\n'
|
||||
' [\n'
|
||||
' sys.executable,\n'
|
||||
" '-m',\n"
|
||||
" 'tractor._child',\n"
|
||||
f' {shlex.quote("--uid")!r},\n'
|
||||
f' {uid_repr},\n'
|
||||
f' {shlex.quote("--parent_addr")!r},\n'
|
||||
f' {parent_addr_repr},\n'
|
||||
+ (
|
||||
f' {shlex.quote("--loglevel")!r},\n'
|
||||
f' {loglevel!r},\n'
|
||||
if loglevel else ''
|
||||
)
|
||||
+ (
|
||||
f' {shlex.quote("--asyncio")!r},\n'
|
||||
if infect_asyncio else ''
|
||||
)
|
||||
+ ' ],\n'
|
||||
' )\n'
|
||||
'# FORK-PARENT branch falls through — we just want\n'
|
||||
'# the launchpad subint to finish so the driver\n'
|
||||
'# thread exits.\n'
|
||||
)
|
||||
|
||||
# TODO: orchestrate driver thread (mirror `subint_proc`'s
|
||||
# `_subint_target` pattern), then await
|
||||
# `ipc_server.wait_for_peer(uid)` on the parent side —
|
||||
# same as `trio_proc`. Soft-kill path is simpler here
|
||||
# than in `subint_proc`: we're managing an OS subproc,
|
||||
# not a legacy subint, so `Portal.cancel_actor()` + wait
|
||||
# + OS-level `SIGKILL` fallback (like `trio_proc`'s
|
||||
# `hard_kill()`) applies directly.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,153 @@
|
|||
# tractor: structured concurrent "actors".
|
||||
# Copyright 2018-eternity Tyler Goodlet.
|
||||
|
||||
# This program is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU Affero General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU Affero General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU Affero General Public License
|
||||
# along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||
|
||||
'''
|
||||
`subint_fork` spawn backend — BLOCKED at CPython level.
|
||||
|
||||
The idea was to use a sub-interpreter purely as a launchpad
|
||||
from which to call `os.fork()`, sidestepping the well-known
|
||||
trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
|
||||
the forking interp had never imported `trio`.
|
||||
|
||||
**IT DOES NOT WORK ON CURRENT CPYTHON.** The fork syscall
|
||||
itself succeeds (in the parent), but the forked CHILD
|
||||
process aborts immediately during CPython's post-fork
|
||||
cleanup — `PyOS_AfterFork_Child()` calls
|
||||
`_PyInterpreterState_DeleteExceptMain()` which refuses to
|
||||
operate when the current tstate belongs to a non-main
|
||||
sub-interpreter.
|
||||
|
||||
Full annotated walkthrough from the user-visible error
|
||||
(`Fatal Python error: _PyInterpreterState_DeleteExceptMain:
|
||||
not main interpreter`) down to the specific CPython source
|
||||
lines that enforce this is in
|
||||
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||
|
||||
We keep this submodule as a dedicated documentation of the
|
||||
attempt. If CPython ever lifts the restriction (e.g., via a
|
||||
force-destroy primitive or a hook that swaps tstate to main
|
||||
pre-fork), the structural sketch preserved in this file's
|
||||
git history is a concrete starting point for a working impl.
|
||||
|
||||
See also: issue #379's "Our own thoughts, ideas for
|
||||
`fork()`-workaround/hacks..." section.
|
||||
|
||||
'''
|
||||
from __future__ import annotations
|
||||
import sys
|
||||
from typing import (
|
||||
Any,
|
||||
TYPE_CHECKING,
|
||||
)
|
||||
|
||||
import trio
|
||||
from trio import TaskStatus
|
||||
|
||||
from tractor.runtime._portal import Portal
|
||||
from ._subint import _has_subints
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from tractor.discovery._addr import UnwrappedAddress
|
||||
from tractor.runtime._runtime import Actor
|
||||
from tractor.runtime._supervise import ActorNursery
|
||||
|
||||
|
||||
async def subint_fork_proc(
|
||||
name: str,
|
||||
actor_nursery: ActorNursery,
|
||||
subactor: Actor,
|
||||
errors: dict[tuple[str, str], Exception],
|
||||
|
||||
bind_addrs: list[UnwrappedAddress],
|
||||
parent_addr: UnwrappedAddress,
|
||||
_runtime_vars: dict[str, Any],
|
||||
*,
|
||||
infect_asyncio: bool = False,
|
||||
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
|
||||
proc_kwargs: dict[str, any] = {},
|
||||
|
||||
) -> None:
|
||||
'''
|
||||
EXPERIMENTAL — currently blocked by a CPython invariant.
|
||||
|
||||
Attempted design
|
||||
----------------
|
||||
1. Parent creates a fresh legacy-config subint.
|
||||
2. A worker OS-thread drives the subint through a
|
||||
bootstrap that calls `os.fork()`.
|
||||
3. In the forked CHILD, `os.execv()` back into
|
||||
`python -m tractor._child` (fresh process).
|
||||
4. In the fork-PARENT, the launchpad subint is destroyed;
|
||||
parent-side trio task proceeds identically to
|
||||
`trio_proc()` (wait for child connect-back, send
|
||||
`SpawnSpec`, yield `Portal`, etc.).
|
||||
|
||||
Why it doesn't work
|
||||
-------------------
|
||||
CPython's `PyOS_AfterFork_Child()` (in
|
||||
`Modules/posixmodule.c`) calls
|
||||
`_PyInterpreterState_DeleteExceptMain()` (in
|
||||
`Python/pystate.c`) as part of post-fork cleanup. That
|
||||
function requires the current `PyThreadState` belong to
|
||||
the **main** interpreter. When `os.fork()` is called
|
||||
from within a sub-interpreter, the child wakes up with
|
||||
its tstate still pointing at the (now-stale) subint, and
|
||||
this check fails with `PyStatus_ERR("not main
|
||||
interpreter")`, triggering a `fatal_error` goto and
|
||||
aborting the child process.
|
||||
|
||||
CPython devs acknowledge the fragility with a
|
||||
`// Ideally we could guarantee tstate is running main.`
|
||||
comment right above the call site.
|
||||
|
||||
See
|
||||
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
for the full annotated walkthrough + upstream-report
|
||||
draft.
|
||||
|
||||
Why we keep this stub
|
||||
---------------------
|
||||
- Documents the attempt in-tree so the next person who
|
||||
has this idea finds the reason it doesn't work rather
|
||||
than rediscovering the same CPython-level dead end.
|
||||
- If CPython ever lifts the restriction (e.g., via a
|
||||
force-destroy primitive or a hook that swaps tstate
|
||||
to main pre-fork), this submodule's git history holds
|
||||
the structural sketch of what a working impl would
|
||||
look like.
|
||||
|
||||
'''
|
||||
if not _has_subints:
|
||||
raise RuntimeError(
|
||||
f'The {"subint_fork"!r} spawn backend requires '
|
||||
f'Python 3.14+.\n'
|
||||
f'Current runtime: {sys.version}'
|
||||
)
|
||||
|
||||
raise NotImplementedError(
|
||||
'The `subint_fork` spawn backend is blocked at the '
|
||||
'CPython level — `os.fork()` from a non-main '
|
||||
'sub-interpreter is refused by '
|
||||
'`PyOS_AfterFork_Child()` → '
|
||||
'`_PyInterpreterState_DeleteExceptMain()`, which '
|
||||
'aborts the child with '
|
||||
'`Fatal Python error: not main interpreter`.\n'
|
||||
'\n'
|
||||
'See '
|
||||
'`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` '
|
||||
'for the full analysis + upstream-report draft.'
|
||||
)
|
||||
Loading…
Reference in New Issue