Compare commits
10 Commits
e3318a5483
...
66dda9e449
| Author | SHA1 | Date |
|---|---|---|
|
|
66dda9e449 | |
|
|
43bd6a6410 | |
|
|
5bd5f957d3 | |
|
|
1d4867e51c | |
|
|
372a0f3247 | |
|
|
37cfaa202b | |
|
|
797f57ce7b | |
|
|
cf0e3e6f8b | |
|
|
99d70337b7 | |
|
|
a617b52140 |
|
|
@ -0,0 +1,337 @@
|
|||
# `os.fork()` from a non-main sub-interpreter aborts the child (CPython refuses post-fork cleanup)
|
||||
|
||||
Third `subint`-class analysis in this project. Unlike its
|
||||
two siblings (`subint_sigint_starvation_issue.md`,
|
||||
`subint_cancel_delivery_hang_issue.md`), this one is not a
|
||||
hang — it's a **hard CPython-level refusal** of an
|
||||
experimental spawn strategy we wanted to try.
|
||||
|
||||
## TL;DR
|
||||
|
||||
An in-process sub-interpreter cannot be used as a
|
||||
"launchpad" for `os.fork()` on current CPython. The fork
|
||||
syscall succeeds in the parent, but the forked CHILD
|
||||
process is aborted immediately by CPython's post-fork
|
||||
cleanup with:
|
||||
|
||||
```
|
||||
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||
```
|
||||
|
||||
This is enforced by a hard `PyStatus_ERR` gate in
|
||||
`Python/pystate.c`. The CPython devs acknowledge the
|
||||
fragility with an in-source comment (`// Ideally we could
|
||||
guarantee tstate is running main.`) but provide no
|
||||
mechanism to satisfy the precondition from user code.
|
||||
|
||||
**Implication for tractor**: the `subint_fork` backend
|
||||
sketched in `tractor.spawn._subint_fork` is structurally
|
||||
dead on current CPython. The submodule is kept as
|
||||
documentation of the attempt; `--spawn-backend=subint_fork`
|
||||
raises `NotImplementedError` pointing here.
|
||||
|
||||
## Context — why we tried this
|
||||
|
||||
The motivation is issue #379's "Our own thoughts, ideas
|
||||
for `fork()`-workaround/hacks..." section. The existing
|
||||
trio-backend (`tractor.spawn._trio.trio_proc`) spawns
|
||||
subactors via `trio.lowlevel.open_process()` → ultimately
|
||||
`posix_spawn()` or `fork+exec`, from the parent's main
|
||||
interpreter that is currently running `trio.run()`. This
|
||||
brushes against a known-fragile interaction between
|
||||
`trio` and `fork()` tracked in
|
||||
[python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
|
||||
and siblings — mostly mitigated in `tractor`'s case only
|
||||
incidentally (we `exec()` immediately post-fork).
|
||||
|
||||
The idea was:
|
||||
|
||||
1. Create a subint that has *never* imported `trio`.
|
||||
2. From a worker thread in that subint, call `os.fork()`.
|
||||
3. In the child, `execv()` back into
|
||||
`python -m tractor._child` — same as `trio_proc` does.
|
||||
4. The fork is from a trio-free context → trio+fork
|
||||
hazards avoided regardless of downstream behavior.
|
||||
|
||||
The parent-side orchestration (`ipc_server.wait_for_peer`,
|
||||
`SpawnSpec`, `Portal` yield) would reuse
|
||||
`trio_proc`'s flow verbatim, with only the subproc-spawn
|
||||
mechanics swapped.
|
||||
|
||||
## Symptom
|
||||
|
||||
Running the prototype (`tractor.spawn._subint_fork.subint_fork_proc`,
|
||||
see git history prior to the stub revert) on py3.14:
|
||||
|
||||
```
|
||||
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||
Python runtime state: initialized
|
||||
|
||||
Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first):
|
||||
File "<script>", line 2 in <module>
|
||||
<script>:2: DeprecationWarning: This process (pid=802985) is multi-threaded, use of fork() may lead to deadlocks in the child.
|
||||
```
|
||||
|
||||
Key clues:
|
||||
|
||||
- The **`DeprecationWarning`** fires in the parent (before
|
||||
fork completes) — fork *is* executing, we get that far.
|
||||
- The **`Fatal Python error`** comes from the child — it
|
||||
aborts during CPython's post-fork C initialization
|
||||
before any user Python runs in the child.
|
||||
- The thread name `subint-fork-lau[nchpad]` is ours —
|
||||
confirms the fork is being called from the launchpad
|
||||
subint's driver thread.
|
||||
|
||||
## CPython source walkthrough
|
||||
|
||||
### Call site — `Modules/posixmodule.c:728-793`
|
||||
|
||||
The post-fork-child hook CPython runs in the child process:
|
||||
|
||||
```c
|
||||
void
|
||||
PyOS_AfterFork_Child(void)
|
||||
{
|
||||
PyStatus status;
|
||||
_PyRuntimeState *runtime = &_PyRuntime;
|
||||
|
||||
// re-creates runtime->interpreters.mutex (HEAD_UNLOCK)
|
||||
status = _PyRuntimeState_ReInitThreads(runtime);
|
||||
...
|
||||
|
||||
PyThreadState *tstate = _PyThreadState_GET();
|
||||
_Py_EnsureTstateNotNULL(tstate);
|
||||
|
||||
...
|
||||
|
||||
// Ideally we could guarantee tstate is running main. ← !!!
|
||||
_PyInterpreterState_ReinitRunningMain(tstate);
|
||||
|
||||
status = _PyEval_ReInitThreads(tstate);
|
||||
...
|
||||
|
||||
status = _PyInterpreterState_DeleteExceptMain(runtime);
|
||||
if (_PyStatus_EXCEPTION(status)) {
|
||||
goto fatal_error;
|
||||
}
|
||||
...
|
||||
|
||||
fatal_error:
|
||||
Py_ExitStatusException(status);
|
||||
}
|
||||
```
|
||||
|
||||
The `// Ideally we could guarantee tstate is running
|
||||
main.` comment is a flashing warning sign — the CPython
|
||||
devs *know* this path is fragile when fork is called from
|
||||
a non-main subint, but they've chosen to abort rather than
|
||||
silently corrupt state. Arguably the right call.
|
||||
|
||||
### The refusal — `Python/pystate.c:1035-1075`
|
||||
|
||||
```c
|
||||
/*
|
||||
* Delete all interpreter states except the main interpreter. If there
|
||||
* is a current interpreter state, it *must* be the main interpreter.
|
||||
*/
|
||||
PyStatus
|
||||
_PyInterpreterState_DeleteExceptMain(_PyRuntimeState *runtime)
|
||||
{
|
||||
struct pyinterpreters *interpreters = &runtime->interpreters;
|
||||
|
||||
PyThreadState *tstate = _PyThreadState_Swap(runtime, NULL);
|
||||
if (tstate != NULL && tstate->interp != interpreters->main) {
|
||||
return _PyStatus_ERR("not main interpreter"); ← our error
|
||||
}
|
||||
|
||||
HEAD_LOCK(runtime);
|
||||
PyInterpreterState *interp = interpreters->head;
|
||||
interpreters->head = NULL;
|
||||
while (interp != NULL) {
|
||||
if (interp == interpreters->main) {
|
||||
interpreters->main->next = NULL;
|
||||
interpreters->head = interp;
|
||||
interp = interp->next;
|
||||
continue;
|
||||
}
|
||||
|
||||
// XXX Won't this fail since PyInterpreterState_Clear() requires
|
||||
// the "current" tstate to be set?
|
||||
PyInterpreterState_Clear(interp); // XXX must activate?
|
||||
zapthreads(interp);
|
||||
...
|
||||
}
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
The comment in the docstring (`If there is a current
|
||||
interpreter state, it *must* be the main interpreter.`) is
|
||||
the formal API contract. The `XXX` comments further in
|
||||
suggest the CPython team is already aware this function
|
||||
has latent issues even in the happy path.
|
||||
|
||||
## Chain summary
|
||||
|
||||
1. Our launchpad subint's driver OS-thread calls
|
||||
`os.fork()`.
|
||||
2. `fork()` succeeds. Child wakes up with:
|
||||
- The parent's full memory image (including all
|
||||
subints).
|
||||
- Only the *calling* thread alive (the driver thread).
|
||||
- `_PyThreadState_GET()` on that thread returns the
|
||||
**launchpad subint's tstate**, *not* main's.
|
||||
3. CPython runs `PyOS_AfterFork_Child()`.
|
||||
4. It reaches `_PyInterpreterState_DeleteExceptMain()`.
|
||||
5. Gate check fails: `tstate->interp != interpreters->main`.
|
||||
6. `PyStatus_ERR("not main interpreter")` → `fatal_error`
|
||||
goto → `Py_ExitStatusException()` → child aborts.
|
||||
|
||||
Parent-side consequence: `os.fork()` in the subint
|
||||
bootstrap returned successfully with the child's PID, but
|
||||
the child died before connecting back. Our parent's
|
||||
`ipc_server.wait_for_peer(uid)` would hang forever — the
|
||||
child never gets to `_actor_child_main`.
|
||||
|
||||
## Definitive answer to "Open Question 1"
|
||||
|
||||
From the (now-stub) `subint_fork_proc` docstring:
|
||||
|
||||
> Does CPython allow `os.fork()` from a non-main
|
||||
> sub-interpreter under the legacy config?
|
||||
|
||||
**No.** Not in a usable-by-user-code sense. The fork
|
||||
syscall is not blocked, but the child cannot survive
|
||||
CPython's post-fork initialization. This is enforced, not
|
||||
accidental, and the CPython devs have acknowledged the
|
||||
fragility in-source.
|
||||
|
||||
## What we'd need from CPython to unblock
|
||||
|
||||
Any one of these, from least-to-most invasive:
|
||||
|
||||
1. **A pre-fork hook mechanism** that lets user code (or
|
||||
tractor itself via `os.register_at_fork(before=...)`)
|
||||
swap the current tstate to main before fork runs. The
|
||||
swap would need to work across the subint→main
|
||||
boundary, which is the actual hard part —
|
||||
`_PyThreadState_Swap()` exists but is internal.
|
||||
|
||||
2. **A `_PyInterpreterState_DeleteExceptFor(tstate->interp)`
|
||||
variant** that cleans up all *other* subints while
|
||||
preserving the calling subint's state. Lets the child
|
||||
continue executing in the subint after fork; a
|
||||
subsequent `execv()` clears everything at the OS
|
||||
level anyway.
|
||||
|
||||
3. **A cleaner error** than `Fatal Python error` aborting
|
||||
the child. Even without fixing the underlying
|
||||
capability, a raised Python-level exception in the
|
||||
parent's `fork()` call (rather than a silent child
|
||||
abort) would at least make the failure mode
|
||||
debuggable.
|
||||
|
||||
## Upstream-report draft (for CPython issue tracker)
|
||||
|
||||
### Title
|
||||
|
||||
> `os.fork()` from a non-main sub-interpreter aborts the
|
||||
> child with a fatal error in `PyOS_AfterFork_Child`; can
|
||||
> we at least make it a clean `RuntimeError` in the
|
||||
> parent?
|
||||
|
||||
### Body
|
||||
|
||||
> **Version**: Python 3.14.x
|
||||
>
|
||||
> **Summary**: Calling `os.fork()` from a thread currently
|
||||
> executing inside a sub-interpreter causes the forked
|
||||
> child process to abort during CPython's post-fork
|
||||
> cleanup, with the following output in the child:
|
||||
>
|
||||
> ```
|
||||
> Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||
> ```
|
||||
>
|
||||
> From the **parent's** point of view the fork succeeded
|
||||
> (returned a valid child PID). The failure is completely
|
||||
> opaque to parent-side Python code — unless the parent
|
||||
> does `os.waitpid()` it won't even notice the child
|
||||
> died.
|
||||
>
|
||||
> **Root cause** (as I understand it from reading sources):
|
||||
> `Modules/posixmodule.c::PyOS_AfterFork_Child()` calls
|
||||
> `_PyInterpreterState_DeleteExceptMain()` with a
|
||||
> precondition that `_PyThreadState_GET()->interp` be the
|
||||
> main interpreter. When `fork()` is called from a thread
|
||||
> executing inside a subinterpreter, the child wakes up
|
||||
> with its tstate still pointing at the subint, and the
|
||||
> gate in `Python/pystate.c:1044-1047` fails.
|
||||
>
|
||||
> A comment in the source
|
||||
> (`Modules/posixmodule.c:753` — `// Ideally we could
|
||||
> guarantee tstate is running main.`) suggests this is a
|
||||
> known-fragile path rather than an intentional
|
||||
> invariant.
|
||||
>
|
||||
> **Use case**: I was experimenting with using a
|
||||
> sub-interpreter as a "fork launchpad" — have a subint
|
||||
> that has never imported `trio`, call `os.fork()` from
|
||||
> that subint's thread, and in the child `execv()` back
|
||||
> into a fresh Python interpreter process. The goal was
|
||||
> to sidestep known issues with `trio` + `fork()`
|
||||
> interaction (see
|
||||
> [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614))
|
||||
> by guaranteeing the forking context had never been
|
||||
> "contaminated" by trio's imports or globals. This
|
||||
> approach would allow `trio`-using applications to
|
||||
> combine `fork`-based subprocess spawning with
|
||||
> per-worker `trio.run()` runtimes — a fairly common
|
||||
> pattern that currently requires workarounds.
|
||||
>
|
||||
> **Request**:
|
||||
>
|
||||
> Ideally: make fork-from-subint work (e.g., by swapping
|
||||
> the caller's tstate to main in the pre-fork hook), or
|
||||
> provide a `_PyInterpreterState_DeleteExceptFor(interp)`
|
||||
> variant that permits the caller's subint to survive
|
||||
> post-fork so user code can subsequently `execv()`.
|
||||
>
|
||||
> Minimally: convert the fatal child-side abort into a
|
||||
> clean `RuntimeError` (or similar) raised in the
|
||||
> parent's `fork()` call. Even if the capability isn't
|
||||
> expanded, the failure mode should be debuggable by
|
||||
> user-code in the parent — right now it's a silent
|
||||
> child death with an error message buried in the
|
||||
> child's stderr that parent code can't programmatically
|
||||
> see.
|
||||
>
|
||||
> **Related**: PEP 684 (per-interpreter GIL), PEP 734
|
||||
> (`concurrent.interpreters` public API). The private
|
||||
> `_interpreters` module is what I used to create the
|
||||
> launchpad — behavior is the same whether using
|
||||
> `_interpreters.create('legacy')` or
|
||||
> `concurrent.interpreters.create()` (the latter was not
|
||||
> tested but the gate is identical).
|
||||
>
|
||||
> Happy to contribute a minimal reproducer + test case if
|
||||
> this is something the team wants to pursue.
|
||||
|
||||
## References
|
||||
|
||||
- `Modules/posixmodule.c:728` —
|
||||
[`PyOS_AfterFork_Child`](https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L728)
|
||||
- `Python/pystate.c:1040` —
|
||||
[`_PyInterpreterState_DeleteExceptMain`](https://github.com/python/cpython/blob/main/Python/pystate.c#L1040)
|
||||
- PEP 684 (per-interpreter GIL):
|
||||
<https://peps.python.org/pep-0684/>
|
||||
- PEP 734 (`concurrent.interpreters` public API):
|
||||
<https://peps.python.org/pep-0734/>
|
||||
- [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
|
||||
— the original motivation for the launchpad idea.
|
||||
- tractor issue #379 — "Our own thoughts, ideas for
|
||||
`fork()`-workaround/hacks..." section where this was
|
||||
first sketched.
|
||||
- `tractor.spawn._subint_fork` — in-tree stub preserving
|
||||
the attempted impl's shape in git history.
|
||||
|
|
@ -0,0 +1,373 @@
|
|||
#!/usr/bin/env python3
|
||||
'''
|
||||
Standalone CPython-level feasibility check for the "main-interp
|
||||
worker-thread forkserver + subint-hosted trio" architecture
|
||||
proposed as a workaround to the CPython-level refusal
|
||||
documented in
|
||||
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||
|
||||
Purpose
|
||||
-------
|
||||
Deliberately NOT a `tractor` test. Zero `tractor` imports.
|
||||
Uses `_interpreters` (private stdlib) + `os.fork()` directly so
|
||||
the signal is unambiguous — pass/fail here is a property of
|
||||
CPython alone, independent of our runtime.
|
||||
|
||||
Run each scenario in isolation; the child's fate is observable
|
||||
only via `os.waitpid()` of the parent and the scenario's own
|
||||
status prints.
|
||||
|
||||
Scenarios (pick one with `--scenario <name>`)
|
||||
---------------------------------------------
|
||||
|
||||
- `control_subint_thread_fork` — the KNOWN-BROKEN case we
|
||||
documented in `subint_fork_blocked_by_cpython_post_fork_issue.md`:
|
||||
drive a subint from a thread, call `os.fork()` inside its
|
||||
`_interpreters.exec()`, watch the child abort. **Included as
|
||||
a control** — if this scenario DOESN'T abort the child, our
|
||||
analysis is wrong and we should re-check everything.
|
||||
|
||||
- `main_thread_fork` — baseline sanity. Call `os.fork()` from
|
||||
the process's main thread. Must always succeed; if this
|
||||
fails something much bigger is broken.
|
||||
|
||||
- `worker_thread_fork` — the architectural assertion. Spawn a
|
||||
regular `threading.Thread` (attached to main interp, NOT a
|
||||
subint), have IT call `os.fork()`. Child should survive
|
||||
post-fork cleanup.
|
||||
|
||||
- `full_architecture` — end-to-end: main-interp worker thread
|
||||
forks. In the child, fork-thread (still main-interp) creates
|
||||
a subint, drives a second worker thread inside it that runs
|
||||
a trivial `trio.run()`. Validates the "root runtime lives in
|
||||
a subint in the child" piece of the proposed arch.
|
||||
|
||||
All scenarios print a self-contained pass/fail banner. Exit
|
||||
code 0 on expected outcome (which for `control_*` means "child
|
||||
aborted", not "child succeeded"!).
|
||||
|
||||
Requires Python 3.14+.
|
||||
|
||||
Usage
|
||||
-----
|
||||
::
|
||||
|
||||
python subint_fork_from_main_thread_smoketest.py \\
|
||||
--scenario main_thread_fork
|
||||
|
||||
python subint_fork_from_main_thread_smoketest.py \\
|
||||
--scenario full_architecture
|
||||
|
||||
'''
|
||||
from __future__ import annotations
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
|
||||
|
||||
# Hard-require py3.14 for the public `concurrent.interpreters`
|
||||
# API (we still drop to `_interpreters` internally, same as
|
||||
# `tractor.spawn._subint`).
|
||||
try:
|
||||
from concurrent import interpreters as _public_interpreters # noqa: F401
|
||||
import _interpreters # type: ignore
|
||||
except ImportError:
|
||||
print(
|
||||
'FAIL (setup): requires Python 3.14+ '
|
||||
'(missing `concurrent.interpreters`)',
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
|
||||
# The actual primitives this script exercises live in
|
||||
# `tractor.spawn._subint_forkserver` — we re-import them here
|
||||
# rather than inlining so the module and the validation stay
|
||||
# in sync. (Early versions of this file had them inline for
|
||||
# the "zero tractor imports" isolation guarantee; now that
|
||||
# CPython-level feasibility is confirmed, the validated
|
||||
# primitives have moved into tractor proper.)
|
||||
from tractor.spawn._subint_forkserver import (
|
||||
fork_from_worker_thread,
|
||||
run_subint_in_worker_thread,
|
||||
wait_child,
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# small observability helpers (test-harness only)
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
def _banner(title: str) -> None:
|
||||
line = '=' * 60
|
||||
print(f'\n{line}\n{title}\n{line}', flush=True)
|
||||
|
||||
|
||||
def _report(
|
||||
label: str,
|
||||
*,
|
||||
ok: bool,
|
||||
status_str: str,
|
||||
expect_exit_ok: bool,
|
||||
) -> None:
|
||||
verdict: str = 'PASS' if ok else 'FAIL'
|
||||
expected_str: str = (
|
||||
'normal exit (rc=0)'
|
||||
if expect_exit_ok
|
||||
else 'abnormal death (signal or nonzero exit)'
|
||||
)
|
||||
print(
|
||||
f'[{verdict}] {label}: '
|
||||
f'expected {expected_str}; observed {status_str}',
|
||||
flush=True,
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# scenario: `control_subint_thread_fork` (known-broken)
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
def scenario_control_subint_thread_fork() -> int:
|
||||
_banner(
|
||||
'[control] fork from INSIDE a subint (expected: child aborts)'
|
||||
)
|
||||
interp_id = _interpreters.create('legacy')
|
||||
print(f' created subint {interp_id}', flush=True)
|
||||
|
||||
# Shared flag: child writes a sentinel file we can detect from
|
||||
# the parent. If the child manages to write this, CPython's
|
||||
# post-fork refusal is NOT happening → analysis is wrong.
|
||||
sentinel = '/tmp/subint_fork_smoketest_control_child_ran'
|
||||
try:
|
||||
os.unlink(sentinel)
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
|
||||
bootstrap = (
|
||||
'import os\n'
|
||||
'pid = os.fork()\n'
|
||||
'if pid == 0:\n'
|
||||
# child — if CPython's refusal fires this code never runs
|
||||
f' with open({sentinel!r}, "w") as f:\n'
|
||||
' f.write("ran")\n'
|
||||
' os._exit(0)\n'
|
||||
'else:\n'
|
||||
# parent side (inside the launchpad subint) — stash the
|
||||
# forked PID on a shareable dict so we can waitpid()
|
||||
# from the outer main interp. We can't just return it;
|
||||
# _interpreters.exec() returns nothing useful.
|
||||
' import builtins\n'
|
||||
' builtins._forked_child_pid = pid\n'
|
||||
)
|
||||
|
||||
# NOTE, we can't easily pull state back from the subint.
|
||||
# For the CONTROL scenario we just time-bound the fork +
|
||||
# check the sentinel. If sentinel exists → child ran →
|
||||
# analysis wrong. If not → child aborted → analysis
|
||||
# confirmed.
|
||||
done = threading.Event()
|
||||
|
||||
def _drive() -> None:
|
||||
try:
|
||||
_interpreters.exec(interp_id, bootstrap)
|
||||
except Exception as err:
|
||||
print(
|
||||
f' subint bootstrap raised (expected on some '
|
||||
f'CPython versions): {type(err).__name__}: {err}',
|
||||
flush=True,
|
||||
)
|
||||
finally:
|
||||
done.set()
|
||||
|
||||
t = threading.Thread(
|
||||
target=_drive,
|
||||
name='control-subint-fork-launchpad',
|
||||
daemon=True,
|
||||
)
|
||||
t.start()
|
||||
done.wait(timeout=5.0)
|
||||
t.join(timeout=2.0)
|
||||
|
||||
# Give the (possibly-aborted) child a moment to die.
|
||||
time.sleep(0.5)
|
||||
|
||||
sentinel_present = os.path.exists(sentinel)
|
||||
verdict = (
|
||||
# "PASS" for our analysis means sentinel NOT present.
|
||||
'PASS' if not sentinel_present else 'FAIL (UNEXPECTED)'
|
||||
)
|
||||
print(
|
||||
f'[{verdict}] control: sentinel present={sentinel_present} '
|
||||
f'(analysis predicts False — child should abort before '
|
||||
f'writing)',
|
||||
flush=True,
|
||||
)
|
||||
if sentinel_present:
|
||||
os.unlink(sentinel)
|
||||
|
||||
try:
|
||||
_interpreters.destroy(interp_id)
|
||||
except _interpreters.InterpreterError:
|
||||
pass
|
||||
|
||||
return 0 if not sentinel_present else 1
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# scenario: `main_thread_fork` (baseline sanity)
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
def scenario_main_thread_fork() -> int:
|
||||
_banner(
|
||||
'[baseline] fork from MAIN thread (expected: child exits normally)'
|
||||
)
|
||||
|
||||
pid = os.fork()
|
||||
if pid == 0:
|
||||
os._exit(0)
|
||||
|
||||
return 0 if _wait_child(
|
||||
pid,
|
||||
label='main_thread_fork',
|
||||
expect_exit_ok=True,
|
||||
) else 1
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# scenario: `worker_thread_fork` (architectural assertion)
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
def _run_worker_thread_fork_scenario(
|
||||
label: str,
|
||||
*,
|
||||
child_target=None,
|
||||
) -> int:
|
||||
'''
|
||||
Thin wrapper: delegate the actual fork to the
|
||||
`tractor.spawn._subint_forkserver` primitive, then wait
|
||||
on the child and render a pass/fail banner.
|
||||
|
||||
'''
|
||||
try:
|
||||
pid: int = fork_from_worker_thread(
|
||||
child_target=child_target,
|
||||
thread_name=f'worker-fork-thread[{label}]',
|
||||
)
|
||||
except RuntimeError as err:
|
||||
print(f'[FAIL] {label}: {err}', flush=True)
|
||||
return 1
|
||||
print(f' forked child pid={pid}', flush=True)
|
||||
ok, status_str = wait_child(pid, expect_exit_ok=True)
|
||||
_report(
|
||||
label,
|
||||
ok=ok,
|
||||
status_str=status_str,
|
||||
expect_exit_ok=True,
|
||||
)
|
||||
return 0 if ok else 1
|
||||
|
||||
|
||||
def scenario_worker_thread_fork() -> int:
|
||||
_banner(
|
||||
'[arch] fork from MAIN-INTERP WORKER thread '
|
||||
'(expected: child exits normally — this is the one '
|
||||
'that matters)'
|
||||
)
|
||||
return _run_worker_thread_fork_scenario(
|
||||
'worker_thread_fork',
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# scenario: `full_architecture`
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
_CHILD_TRIO_BOOTSTRAP: str = (
|
||||
'import trio\n'
|
||||
'async def _main():\n'
|
||||
' await trio.sleep(0.05)\n'
|
||||
' return 42\n'
|
||||
'result = trio.run(_main)\n'
|
||||
'assert result == 42, f"trio.run returned {result}"\n'
|
||||
'print(" CHILD subint: trio.run OK, result=42", '
|
||||
'flush=True)\n'
|
||||
)
|
||||
|
||||
|
||||
def _child_trio_in_subint() -> int:
|
||||
'''
|
||||
CHILD-side `child_target`: drive a trivial `trio.run()`
|
||||
inside a fresh legacy-config subint on a worker thread,
|
||||
using the `tractor.spawn._subint_forkserver.run_subint_in_worker_thread`
|
||||
primitive. Returns 0 on success.
|
||||
|
||||
'''
|
||||
try:
|
||||
run_subint_in_worker_thread(
|
||||
_CHILD_TRIO_BOOTSTRAP,
|
||||
thread_name='child-subint-trio-thread',
|
||||
)
|
||||
except RuntimeError as err:
|
||||
print(
|
||||
f' CHILD: run_subint_in_worker_thread timed out / thread '
|
||||
f'never returned: {err}',
|
||||
flush=True,
|
||||
)
|
||||
return 3
|
||||
except BaseException as err:
|
||||
print(
|
||||
f' CHILD: subint bootstrap raised: '
|
||||
f'{type(err).__name__}: {err}',
|
||||
flush=True,
|
||||
)
|
||||
return 4
|
||||
return 0
|
||||
|
||||
|
||||
def scenario_full_architecture() -> int:
|
||||
_banner(
|
||||
'[arch-full] worker-thread fork + child runs trio in a '
|
||||
'subint (end-to-end proposed arch)'
|
||||
)
|
||||
return _run_worker_thread_fork_scenario(
|
||||
'full_architecture',
|
||||
child_target=_child_trio_in_subint,
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# main
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
SCENARIOS: dict[str, Callable[[], int]] = {
|
||||
'control_subint_thread_fork': scenario_control_subint_thread_fork,
|
||||
'main_thread_fork': scenario_main_thread_fork,
|
||||
'worker_thread_fork': scenario_worker_thread_fork,
|
||||
'full_architecture': scenario_full_architecture,
|
||||
}
|
||||
|
||||
|
||||
def main() -> int:
|
||||
ap = argparse.ArgumentParser(
|
||||
description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
ap.add_argument(
|
||||
'--scenario',
|
||||
choices=sorted(SCENARIOS.keys()),
|
||||
required=True,
|
||||
)
|
||||
args = ap.parse_args()
|
||||
return SCENARIOS[args.scenario]()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
|
|
@ -0,0 +1,184 @@
|
|||
# Revisit `subint_forkserver` thread-cache constraints once msgspec PEP 684 support lands
|
||||
|
||||
Follow-up tracker for cleanup work gated on the msgspec
|
||||
PEP 684 adoption upstream ([jcrist/msgspec#563](https://github.com/jcrist/msgspec/issues/563)).
|
||||
|
||||
Context — why this exists
|
||||
-------------------------
|
||||
|
||||
The `tractor.spawn._subint_forkserver` submodule currently
|
||||
carries two "non-trio" thread-hygiene constraints whose
|
||||
necessity is tangled with issues that *should* dissolve
|
||||
under PEP 684 isolated-mode subinterpreters:
|
||||
|
||||
1. `fork_from_worker_thread()` / `run_subint_in_worker_thread()`
|
||||
internally allocate a **dedicated `threading.Thread`**
|
||||
rather than using `trio.to_thread.run_sync()`.
|
||||
2. The test helper is named
|
||||
`run_fork_in_non_trio_thread()` — the
|
||||
`non_trio` qualifier is load-bearing today.
|
||||
|
||||
This doc catalogs *why* those constraints exist, which of
|
||||
them isolated-mode would fix, and what the
|
||||
audit-and-cleanup path looks like once msgspec #563 is
|
||||
resolved.
|
||||
|
||||
The three reasons the constraints exist
|
||||
---------------------------------------
|
||||
|
||||
### 1. GIL-starvation class → fixed by PEP 684 isolated mode
|
||||
|
||||
The class-A hang documented in
|
||||
`subint_sigint_starvation_issue.md` is entirely about
|
||||
legacy-config subints **sharing the main GIL**. Once
|
||||
msgspec #563 lands and tractor flips
|
||||
`tractor.spawn._subint` to
|
||||
`concurrent.interpreters.create()` (isolated config), each
|
||||
subint gets its own GIL. Abandoned subint threads can't
|
||||
contend for main's GIL → can't starve the main trio loop
|
||||
→ signal-wakeup-pipe drains normally → no SIGINT-drop.
|
||||
|
||||
This class of hazard **dissolves entirely**. The
|
||||
non-trio-thread requirement for *this reason* disappears.
|
||||
|
||||
### 2. Destroy race / tstate-recycling → orthogonal; unclear
|
||||
|
||||
The `subint_proc` dedicated-thread fix (commit `26fb8206`)
|
||||
addressed a different issue: `_interpreters.destroy(interp_id)`
|
||||
was blocking on a trio-cache worker that had run an
|
||||
earlier `interp.exec()` for that subint. Working
|
||||
hypothesis at the time was "the cached thread retains the
|
||||
subint's tstate".
|
||||
|
||||
But tstate-handling is **not specific to GIL mode** —
|
||||
`_PyXI_Enter` / `_PyXI_Exit` (the C-level machinery both
|
||||
configs use to enter/leave a subint from a thread) should
|
||||
restore the caller's tstate regardless of GIL config. So
|
||||
isolated mode **doesn't obviously fix this**. It might be:
|
||||
|
||||
- A py3.13 bug fixed in later versions — we saw the race
|
||||
first on 3.13 and never re-tested on 3.14 after moving
|
||||
to dedicated threads.
|
||||
- A genuine CPython quirk around cached threads that
|
||||
exec'd into a subint, persisting across GIL modes.
|
||||
- Something else we misdiagnosed — the empirical fix
|
||||
(dedicated thread) worked but the analysis may have
|
||||
been incomplete.
|
||||
|
||||
Only way to know: once we're on isolated mode, empirically
|
||||
retry `trio.to_thread.run_sync(interp.exec, ...)` and see
|
||||
if `destroy()` still blocks. If it does, keep the
|
||||
dedicated thread; if not, one constraint relaxed.
|
||||
|
||||
### 3. Fork-from-main-interp-tstate (the constraint in this module's helper names)
|
||||
|
||||
The fork-from-main-interp-tstate invariant — CPython's
|
||||
`PyOS_AfterFork_Child` →
|
||||
`_PyInterpreterState_DeleteExceptMain` gate documented in
|
||||
`subint_fork_blocked_by_cpython_post_fork_issue.md` — is
|
||||
about the calling thread's **current** tstate at the
|
||||
moment `os.fork()` runs. If trio's cache threads never
|
||||
enter subints at all, their tstate is plain main-interp,
|
||||
and fork from them would be fine.
|
||||
|
||||
The reason the smoke test +
|
||||
`run_fork_in_non_trio_thread` test helper
|
||||
currently use a dedicated `threading.Thread` is narrow:
|
||||
**we don't want to risk a trio cache thread that has
|
||||
previously been used as a subint driver being the one that
|
||||
picks up the fork job**. If cached tstate doesn't get
|
||||
cleared (back to reason #2), the fork's child-side
|
||||
post-init would see the wrong interp and abort.
|
||||
|
||||
In an isolated-mode world where msgspec works:
|
||||
|
||||
- `subint_proc` would use the public
|
||||
`concurrent.interpreters.create()` + `Interpreter.exec()`
|
||||
/ `Interpreter.close()` — which *should* handle tstate
|
||||
cleanly (they're the "blessed" API).
|
||||
- If so, trio's cache threads are safe to fork from
|
||||
regardless of whether they've previously driven subints.
|
||||
- → the `non_trio` qualifier in
|
||||
`run_fork_in_non_trio_thread` becomes
|
||||
*overcautious* rather than load-bearing, and the
|
||||
dedicated-thread primitives in `_subint_forkserver.py`
|
||||
can likely be replaced with straight
|
||||
`trio.to_thread.run_sync()` wrappers.
|
||||
|
||||
TL;DR
|
||||
-----
|
||||
|
||||
| constraint | fixed by isolated mode? |
|
||||
|---|---|
|
||||
| GIL-starvation (class A) | **yes** |
|
||||
| destroy race on cached worker | unclear — empirical test on py3.14 + isolated API required |
|
||||
| fork-from-main-tstate requirement on worker | **probably yes, conditional on the destroy-race question above** |
|
||||
|
||||
If #2 also resolves on py3.14+ with isolated mode,
|
||||
tractor could drop the `non_trio` qualifier from the fork
|
||||
helper's name and just use `trio.to_thread.run_sync(...)`
|
||||
for everything. But **we shouldn't do that preemptively**
|
||||
— the current cautious design is cheap (one dedicated
|
||||
thread per fork / per subint-exec) and correct.
|
||||
|
||||
Audit plan when msgspec #563 lands
|
||||
----------------------------------
|
||||
|
||||
Assuming msgspec grows `Py_mod_multiple_interpreters`
|
||||
support:
|
||||
|
||||
1. **Flip `tractor.spawn._subint` to isolated mode.** Drop
|
||||
the `_interpreters.create('legacy')` call in favor of
|
||||
the public API (`concurrent.interpreters.create()` +
|
||||
`Interpreter.exec()` / `Interpreter.close()`). Run the
|
||||
three `ai/conc-anal/subint_*_issue.md` reproducers —
|
||||
class-A (`test_stale_entry_is_deleted` etc.) should
|
||||
pass without the `skipon_spawn_backend('subint')` marks
|
||||
(revisit the marker inventory).
|
||||
|
||||
2. **Empirical destroy-race retest.** In `subint_proc`,
|
||||
swap the dedicated `threading.Thread` back to
|
||||
`trio.to_thread.run_sync(Interpreter.exec, ...,
|
||||
abandon_on_cancel=False)` and run the full subint test
|
||||
suite. If `Interpreter.close()` (or the backing
|
||||
destroy) blocks the same way as the legacy version
|
||||
did, revert and keep the dedicated thread.
|
||||
|
||||
3. **If #2 clean**, audit `_subint_forkserver.py`:
|
||||
- Rename `run_fork_in_non_trio_thread` → drop the
|
||||
`_non_trio_` qualifier (e.g. `run_fork_in_thread`) or
|
||||
inline the two-line `trio.to_thread.run_sync` call at
|
||||
the call sites and drop the helper entirely.
|
||||
- Consider whether `fork_from_worker_thread` +
|
||||
`run_subint_in_worker_thread` still warrant being
|
||||
separate module-level primitives or whether they
|
||||
collapse into a compound
|
||||
`trio.to_thread.run_sync`-driven pattern inside the
|
||||
(future) `subint_forkserver_proc` backend.
|
||||
|
||||
4. **Doc fallout.** `subint_sigint_starvation_issue.md`
|
||||
and `subint_cancel_delivery_hang_issue.md` both cite
|
||||
the legacy-GIL-sharing architecture as the root cause.
|
||||
Close them with commit-refs to the isolated-mode
|
||||
migration. This doc itself should get a closing
|
||||
post-mortem section noting which of #1/#2/#3 actually
|
||||
resolved vs persisted.
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
- `tractor.spawn._subint_forkserver` — the in-tree module
|
||||
whose constraints this doc catalogs.
|
||||
- `ai/conc-anal/subint_sigint_starvation_issue.md` — the
|
||||
GIL-starvation class.
|
||||
- `ai/conc-anal/subint_cancel_delivery_hang_issue.md` —
|
||||
sibling Ctrl-C-able hang class.
|
||||
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
— why fork-from-subint is blocked (this drives the
|
||||
forkserver-via-non-subint-thread workaround).
|
||||
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
|
||||
— empirical validation for the workaround.
|
||||
- [PEP 684 — per-interpreter GIL](https://peps.python.org/pep-0684/)
|
||||
- [PEP 734 — `concurrent.interpreters` public API](https://peps.python.org/pep-0734/)
|
||||
- [jcrist/msgspec#563 — PEP 684 support tracker](https://github.com/jcrist/msgspec/issues/563)
|
||||
- tractor issue #379 — subint backend tracking.
|
||||
|
|
@ -0,0 +1,155 @@
|
|||
---
|
||||
model: claude-opus-4-7[1m]
|
||||
service: claude
|
||||
session: subints-phase-b-hardening-and-fork-block
|
||||
timestamp: 2026-04-22T20:07:23Z
|
||||
git_ref: 797f57c
|
||||
scope: code
|
||||
substantive: true
|
||||
raw_file: 20260422T200723Z_797f57c_prompt_io.raw.md
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
Session-spanning work on the Phase B `subint` spawn-backend.
|
||||
Three distinct sub-phases in one log:
|
||||
|
||||
1. **Py3.13 gate tightening** — diagnose a reproducible hang
|
||||
of subint spawn flow under py3.13 (works on py3.14), trace
|
||||
to a private `_interpreters` module vintage issue, tighten
|
||||
our feature gate from "`_interpreters` present" to "public
|
||||
`concurrent.interpreters` present" (i.e. py3.14+).
|
||||
2. **Test-harness hardening** — add `pytest-timeout` dep, put
|
||||
`@pytest.mark.timeout(30, method='thread')` on the
|
||||
three known-hanging subint tests cataloged in
|
||||
`ai/conc-anal/subint_sigint_starvation_issue.md`. Separately,
|
||||
code-review the user's in-flight `skipon_spawn_backend`
|
||||
marker implementation; find four bugs; refactor to use
|
||||
`item.iter_markers()`.
|
||||
3. **`subint_fork` prototype → CPython-block finding** — draft
|
||||
a WIP `subint_fork_proc` backend using a sub-interpreter as
|
||||
a launchpad for `os.fork()` (to sidestep trio#1614). User
|
||||
tests on py3.14, hits
|
||||
`Fatal Python error: _PyInterpreterState_DeleteExceptMain:
|
||||
not main interpreter`. Walk CPython sources (local clone at
|
||||
`~/repos/cpython/`) to pinpoint the refusal
|
||||
(`Modules/posixmodule.c:728` → `Python/pystate.c:1040`).
|
||||
Revert implementation to a `NotImplementedError` stub in a
|
||||
new `_subint_fork.py` submodule, document the finding in a
|
||||
third `conc-anal/` doc with an upstream-report draft for
|
||||
the CPython issue tracker. Finally, discuss user's proposed
|
||||
workaround architecture (main-interp worker-thread
|
||||
forkserver) and draft a standalone smoke-test script for
|
||||
feasibility validation.
|
||||
|
||||
## Response summary
|
||||
|
||||
All three sub-phases landed concrete artifacts:
|
||||
|
||||
**Sub-phase 1** — `_subint.py` + `_spawn.py` gates + error
|
||||
messages updated to require py3.14+ via the public
|
||||
`concurrent.interpreters` module presence check. Module
|
||||
docstring revised to explain the empirical reason
|
||||
(py3.13's private `_interpreters` vintage wedges under
|
||||
multi-trio-task usage even though minimal standalone
|
||||
reproducers work fine there). Test-module
|
||||
`pytest.importorskip` likewise switched.
|
||||
|
||||
**Sub-phase 2** — `pytest-timeout>=2.3` added to `testing`
|
||||
dep group. `@pytest.mark.timeout(30, method='thread')`
|
||||
applied on:
|
||||
- `tests/discovery/test_registrar.py::test_stale_entry_is_deleted`
|
||||
- `tests/test_cancellation.py::test_cancel_while_childs_child_in_sync_sleep`
|
||||
- `tests/test_cancellation.py::test_multierror_fast_nursery`
|
||||
- `tests/test_subint_cancellation.py::test_subint_non_checkpointing_child`
|
||||
|
||||
`method='thread'` documented inline as load-bearing — the
|
||||
GIL-starvation path that drops `SIGINT` would equally drop
|
||||
`SIGALRM`, so only a watchdog-thread timeout can reliably
|
||||
escape.
|
||||
|
||||
`skipon_spawn_backend` plugin refactored into a single
|
||||
`iter_markers`-driven loop in `pytest_collection_modifyitems`
|
||||
(~30 LOC replacing ~30 LOC of nested conditionals). Four
|
||||
bugs dissolved: wrong `.get()` key, module-level `pytestmark`
|
||||
suppressing per-test marks, unhandled `pytestmark = [list]`
|
||||
form, `pytest.Makr` typo. Marker help text updated to
|
||||
document the variadic backend-list + `reason=` kwarg
|
||||
surface.
|
||||
|
||||
**Sub-phase 3** — Prototype drafted (then reverted):
|
||||
|
||||
- `tractor/spawn/_subint_fork.py` — new dedicated submodule
|
||||
housing the `subint_fork_proc` stub. Module docstring +
|
||||
fn docstring explain the attempt, the CPython-level
|
||||
block, and the reason for keeping the stub in-tree
|
||||
(documentation of the attempt + starting point if CPython
|
||||
ever lifts the restriction).
|
||||
- `tractor/spawn/_spawn.py` — `'subint_fork'` registered as a
|
||||
`SpawnMethodKey` literal + in `_methods`, so
|
||||
`--spawn-backend=subint_fork` routes to a clean
|
||||
`NotImplementedError` pointing at the analysis doc rather
|
||||
than an "invalid backend" error.
|
||||
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` —
|
||||
third sibling conc-anal doc. Full annotated CPython
|
||||
source walkthrough from user-visible
|
||||
`Fatal Python error` → `Modules/posixmodule.c:728
|
||||
PyOS_AfterFork_Child()` → `Python/pystate.c:1040
|
||||
_PyInterpreterState_DeleteExceptMain()` gate. Includes a
|
||||
copy-paste-ready upstream-report draft for the CPython
|
||||
issue tracker with a two-tier ask (ideally "make it work",
|
||||
minimally "cleaner error than `Fatal Python error`
|
||||
aborting the child").
|
||||
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py` —
|
||||
standalone zero-tractor-import CPython-level smoke test
|
||||
for the user's proposed workaround architecture
|
||||
(forkserver on a main-interp worker thread). Four
|
||||
argparse-driven scenarios: `control_subint_thread_fork`
|
||||
(reproduces the known-broken case as a test-harness
|
||||
sanity), `main_thread_fork` (baseline), `worker_thread_fork`
|
||||
(architectural assertion), `full_architecture`
|
||||
(end-to-end trio-in-subint in forked child). User will
|
||||
run on py3.14 next.
|
||||
|
||||
## Files changed
|
||||
|
||||
See `git log 26fb820..HEAD --stat` for the canonical list.
|
||||
New files this session:
|
||||
- `tractor/spawn/_subint_fork.py`
|
||||
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
|
||||
|
||||
Modified (diff pointers in raw log):
|
||||
- `tractor/spawn/_subint.py` (py3.14 gate)
|
||||
- `tractor/spawn/_spawn.py` (`subint_fork` registration)
|
||||
- `tractor/_testing/pytest.py` (`skipon_spawn_backend` refactor)
|
||||
- `pyproject.toml` (`pytest-timeout` dep)
|
||||
- `tests/discovery/test_registrar.py`,
|
||||
`tests/test_cancellation.py`,
|
||||
`tests/test_subint_cancellation.py` (timeout marks,
|
||||
cross-refs to conc-anal docs)
|
||||
|
||||
## Human edits
|
||||
|
||||
Several back-and-forth iterations with user-driven
|
||||
adjustments during the session:
|
||||
|
||||
- User corrected my initial mis-classification of
|
||||
`test_cancel_while_childs_child_in_sync_sleep[subint-False]`
|
||||
as Ctrl-C-able — second strace showed `EAGAIN`, putting
|
||||
it squarely in class A (GIL-starvation). Re-analysis
|
||||
preserved in the raw log.
|
||||
- User independently fixed the `.get(reason)` → `.get('reason', reason)`
|
||||
bug in the marker plugin before my review; preserved their
|
||||
fix.
|
||||
- User suggested moving the `subint_fork_proc` stub from
|
||||
the bottom of `_subint.py` into its own
|
||||
`_subint_fork.py` submodule — applied.
|
||||
- User asked to keep the forkserver-architecture
|
||||
discussion as background for the smoke-test rather than
|
||||
committing to a tractor-side refactor until the smoke
|
||||
test validates the CPython-level assumptions.
|
||||
|
||||
Commit messages in this range (b025c982 … 797f57c) were
|
||||
drafted via `/commit-msg` + `rewrap.py --width 67`; user
|
||||
landed them with the usual review.
|
||||
|
|
@ -0,0 +1,343 @@
|
|||
---
|
||||
model: claude-opus-4-7[1m]
|
||||
service: claude
|
||||
timestamp: 2026-04-22T20:07:23Z
|
||||
git_ref: 797f57c
|
||||
diff_cmd: git log 26fb820..HEAD # all session commits since the destroy-race fix log
|
||||
---
|
||||
|
||||
Session-spanning conversation covering the Phase B hardening
|
||||
of the `subint` spawn-backend and an investigation into a
|
||||
proposed `subint_fork` follow-up which turned out to be
|
||||
blocked at the CPython level. This log is a narrative capture
|
||||
of the substantive turns (not every message) and references
|
||||
the concrete code + docs the session produced. Per diff-ref
|
||||
mode the actual code diffs are pointed at via `git log` on
|
||||
each ref rather than duplicated inline.
|
||||
|
||||
## Narrative of the substantive turns
|
||||
|
||||
### Py3.13 hang / gate tightening
|
||||
|
||||
Diagnosed a reproducible hang of the `subint` backend under
|
||||
py3.13 (test_spawning tests wedge after root-actor bringup).
|
||||
Root cause: py3.13's vintage of the private `_interpreters` C
|
||||
module has a latent thread/subint-interaction issue that
|
||||
`_interpreters.exec()` silently fails to progress under
|
||||
tractor's multi-trio usage pattern — even though a minimal
|
||||
standalone `threading.Thread` + `_interpreters.exec()`
|
||||
reproducer works fine on the same Python. Empirically
|
||||
py3.14 fixes it.
|
||||
|
||||
Fix (from this session): tighten the `_has_subints` gate in
|
||||
`tractor.spawn._subint` from "private module importable" to
|
||||
"public `concurrent.interpreters` present" — which is 3.14+
|
||||
only. This leaves `subint_proc()` unchanged in behavior (we
|
||||
still call the *private* `_interpreters.create('legacy')`
|
||||
etc. under the hood) but refuses to engage on 3.13.
|
||||
|
||||
Also tightened the matching gate in
|
||||
`tractor.spawn._spawn.try_set_start_method('subint')` and
|
||||
rev'd the corresponding error messages from "3.13+" to
|
||||
"3.14+" with a sentence explaining why. Test-module
|
||||
`pytest.importorskip` switched from `_interpreters` →
|
||||
`concurrent.interpreters` to match.
|
||||
|
||||
### `pytest-timeout` dep + `skipon_spawn_backend` marker plumbing
|
||||
|
||||
Added `pytest-timeout>=2.3` to the `testing` dep group with
|
||||
an inline comment pointing at the `ai/conc-anal/*.md` docs.
|
||||
Applied `@pytest.mark.timeout(30, method='thread')` (the
|
||||
`method='thread'` is load-bearing — `signal`-method
|
||||
`SIGALRM` suffers the same GIL-starvation path that drops
|
||||
`SIGINT` in the class-A hang pattern) to the three known-
|
||||
hanging subint tests cataloged in
|
||||
`subint_sigint_starvation_issue.md`.
|
||||
|
||||
Separately code-reviewed the user's newly-staged
|
||||
`skipon_spawn_backend` pytest marker implementation in
|
||||
`tractor/_testing/pytest.py`. Found four bugs:
|
||||
|
||||
1. `modmark.kwargs.get(reason)` called `.get()` with the
|
||||
*variable* `reason` as the dict key instead of the string
|
||||
`'reason'` — user-supplied `reason=` was never picked up.
|
||||
(User had already fixed this locally via `.get('reason',
|
||||
reason)` by the time my review happened — preserved that
|
||||
fix.)
|
||||
2. The module-level `pytestmark` branch suppressed per-test
|
||||
marker handling (the `else:` was an `else:` rather than
|
||||
independent iteration).
|
||||
3. `mod_pytestmark.mark` assumed a single
|
||||
`MarkDecorator` — broke on the valid-pytest `pytestmark =
|
||||
[mark, mark]` list form.
|
||||
4. Typo: `pytest.Makr` → `pytest.Mark`.
|
||||
|
||||
Refactored the hook to use `item.iter_markers(name=...)`
|
||||
which walks function + class + module scopes uniformly and
|
||||
handles both `pytestmark` forms natively. ~30 LOC replaced
|
||||
the original ~30 LOC of nested conditionals, all four bugs
|
||||
dissolved. Also updated the marker help string to reflect
|
||||
the variadic `*start_methods` + `reason=` surface.
|
||||
|
||||
### `subint_fork_proc` prototype attempt
|
||||
|
||||
User's hypothesis: the known trio+`fork()` issues
|
||||
(python-trio/trio#1614) could be sidestepped by using a
|
||||
sub-interpreter purely as a launchpad — `os.fork()` from a
|
||||
subint that has never imported trio → child is in a
|
||||
trio-free context. In the child `execv()` back into
|
||||
`python -m tractor._child` and the downstream handshake
|
||||
matches `trio_proc()` identically.
|
||||
|
||||
Drafted the prototype at `tractor/spawn/_subint.py`'s bottom
|
||||
(originally — later moved to its own submod, see below):
|
||||
launchpad-subint creation, bootstrap code-string with
|
||||
`os.fork()` + `execv()`, driver-thread orchestration,
|
||||
parent-side `ipc_server.wait_for_peer()` dance. Registered
|
||||
`'subint_fork'` as a new `SpawnMethodKey` literal, added
|
||||
`case 'subint' | 'subint_fork':` feature-gate arm in
|
||||
`try_set_start_method()`, added entry in `_methods` dict.
|
||||
|
||||
### CPython-level block discovered
|
||||
|
||||
User tested on py3.14 and saw:
|
||||
|
||||
```
|
||||
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||
Python runtime state: initialized
|
||||
|
||||
Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first):
|
||||
File "<script>", line 2 in <module>
|
||||
<script>:2: DeprecationWarning: This process (pid=802985) is multi-threaded, use of fork() may lead to deadlocks in the child.
|
||||
```
|
||||
|
||||
Walked CPython sources (local clone at `~/repos/cpython/`):
|
||||
|
||||
- **`Modules/posixmodule.c:728` `PyOS_AfterFork_Child()`** —
|
||||
post-fork child-side cleanup. Calls
|
||||
`_PyInterpreterState_DeleteExceptMain(runtime)` with
|
||||
`goto fatal_error` on non-zero status. Has the
|
||||
`// Ideally we could guarantee tstate is running main.`
|
||||
self-acknowledging-fragile comment directly above.
|
||||
|
||||
- **`Python/pystate.c:1040`
|
||||
`_PyInterpreterState_DeleteExceptMain()`** — the
|
||||
refusal. Hard `PyStatus_ERR("not main interpreter")` gate
|
||||
when `tstate->interp != interpreters->main`. Docstring
|
||||
formally declares the precondition ("If there is a
|
||||
current interpreter state, it *must* be the main
|
||||
interpreter"). `XXX` comments acknowledge further latent
|
||||
issues within.
|
||||
|
||||
Definitive answer to "Open Question 1" of the prototype
|
||||
docstring: **no, CPython does not support `os.fork()` from
|
||||
a non-main sub-interpreter**. Not because the fork syscall
|
||||
is blocked (it isn't — the parent returns a valid pid),
|
||||
but because the child cannot survive CPython's post-fork
|
||||
initialization. This is an enforced invariant, not an
|
||||
incidental limitation.
|
||||
|
||||
### Revert: move to stub submod + doc the finding
|
||||
|
||||
Per user request:
|
||||
|
||||
1. Reverted the working `subint_fork_proc` body to a
|
||||
`NotImplementedError` stub, MOVED to its own submod
|
||||
`tractor/spawn/_subint_fork.py` (keeps `_subint.py`
|
||||
focused on the working `subint_proc` backend).
|
||||
2. Updated `_spawn.py` to import the stub from the new
|
||||
submod path; kept `'subint_fork'` in `SpawnMethodKey` +
|
||||
`_methods` so `--spawn-backend=subint_fork` routes to a
|
||||
clean `NotImplementedError` with pointer to the analysis
|
||||
doc rather than an "invalid backend" error.
|
||||
3. Wrote
|
||||
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
with the full annotated CPython walkthrough + an
|
||||
upstream-report draft for the CPython issue tracker.
|
||||
Draft has a two-tier ask: ideally "make it work"
|
||||
(pre-fork tstate-swap hook or `DeleteExceptFor(interp)`
|
||||
variant), minimally "give us a clean `RuntimeError` in
|
||||
the parent instead of a `Fatal Python error` aborting
|
||||
the child silently".
|
||||
|
||||
### Design discussion — main-interp-thread forkserver workaround
|
||||
|
||||
User proposed: set up a "subint forking server" that fork()s
|
||||
on behalf of subint callers. Core insight: the CPython gate
|
||||
is on `tstate->interp`, not thread identity, so **any thread
|
||||
whose tstate is main-interp** can fork cleanly. A worker
|
||||
thread attached to main-interp (never entering a subint)
|
||||
satisfies the precondition.
|
||||
|
||||
Structurally this is `mp.forkserver` (which tractor already
|
||||
has as `mp_forkserver`) but **in-process**: instead of a
|
||||
separate Python subproc as the fork server, we'd put the
|
||||
forkserver on a thread in the tractor parent process. Pros:
|
||||
faster spawn (no IPC marshalling to external server + no
|
||||
separate Python startup), inherits already-imported modules
|
||||
for free. Cons: less crash isolation (forkserver failure
|
||||
takes the whole process).
|
||||
|
||||
Required tractor-side refactor: move the root actor's
|
||||
`trio.run()` off main-interp-main-thread (so main-thread can
|
||||
run the forkserver loop). Nontrivial; approximately the same
|
||||
magnitude as "Phase C".
|
||||
|
||||
The design would also not fully resolve the class-A
|
||||
GIL-starvation issue because child actors' trio still runs
|
||||
inside subints (legacy config, msgspec PEP 684 pending).
|
||||
Would mitigate SIGINT-starvation specifically if signal
|
||||
handling moves to the forkserver thread.
|
||||
|
||||
Recommended pre-commitment: a standalone CPython-only smoke
|
||||
test validating the four assumptions the arch rests on,
|
||||
before any tractor-side work.
|
||||
|
||||
### Smoke-test script drafted
|
||||
|
||||
Wrote `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`:
|
||||
argparse-driven, four scenarios (`control_subint_thread_fork`
|
||||
reproducing the known-broken case, `main_thread_fork`
|
||||
baseline, `worker_thread_fork` the architectural assertion,
|
||||
`full_architecture` end-to-end with trio in a subint in the
|
||||
forked child). No `tractor` imports; pure CPython + `_interpreters`
|
||||
+ `trio`. Bails cleanly on py<3.14. Pass/fail banners per
|
||||
scenario.
|
||||
|
||||
User will validate on their py3.14 env next.
|
||||
|
||||
## Per-code-artifact provenance
|
||||
|
||||
### `tractor/spawn/_subint_fork.py` (new submod)
|
||||
|
||||
> `git show 797f57c -- tractor/spawn/_subint_fork.py`
|
||||
|
||||
NotImplementedError stub for the subint-fork backend. Module
|
||||
docstring + fn docstring explain the attempt, the CPython
|
||||
block, and why the stub is kept in-tree. No runtime behavior
|
||||
beyond raising with a pointer at the conc-anal doc.
|
||||
|
||||
### `tractor/spawn/_spawn.py` (modified)
|
||||
|
||||
> `git log 26fb820..HEAD -- tractor/spawn/_spawn.py`
|
||||
|
||||
- Added `'subint_fork'` to `SpawnMethodKey` literal with a
|
||||
block comment explaining the CPython-level block.
|
||||
- Generalized the `case 'subint':` arm to `case 'subint' |
|
||||
'subint_fork':` since both use the same py3.14+ gate.
|
||||
- Registered `subint_fork_proc` in `_methods` with a
|
||||
pointer-comment at the analysis doc.
|
||||
|
||||
### `tractor/spawn/_subint.py` (modified across session)
|
||||
|
||||
> `git log 26fb820..HEAD -- tractor/spawn/_subint.py`
|
||||
|
||||
- Tightened `_has_subints` gate: dual-requires public
|
||||
`concurrent.interpreters` + private `_interpreters`
|
||||
(tests for py3.14-or-newer on the public-API presence,
|
||||
then uses the private one for legacy-config subints
|
||||
because `msgspec` still blocks the public isolated mode
|
||||
per jcrist/msgspec#563).
|
||||
- Updated module docstring, `subint_proc()` docstring, and
|
||||
gate-error messages to reflect the 3.14+ requirement and
|
||||
the reason (py3.13 wedges under multi-trio usage even
|
||||
though the private module exists there).
|
||||
|
||||
### `tractor/_testing/pytest.py` (modified)
|
||||
|
||||
> `git log 26fb820..HEAD -- tractor/_testing/pytest.py`
|
||||
|
||||
- New `skipon_spawn_backend(*start_methods, reason=...)`
|
||||
pytest marker expanded into `pytest.mark.skip(reason=...)`
|
||||
at collection time via
|
||||
`pytest_collection_modifyitems()`.
|
||||
- Implementation uses `item.iter_markers(name=...)` which
|
||||
walks function + class + module scopes uniformly and
|
||||
handles both `pytestmark = <single Mark>` and
|
||||
`pytestmark = [mark, ...]` forms natively. ~30-LOC
|
||||
single-loop refactor replacing a prior nested
|
||||
conditional that had four bugs (see "Review" narrative
|
||||
above).
|
||||
- Added `pytest.Config` / `pytest.Function` /
|
||||
`pytest.FixtureRequest` type annotations on fixture
|
||||
signatures while touching the file.
|
||||
|
||||
### `pyproject.toml` (modified)
|
||||
|
||||
> `git log 26fb820..HEAD -- pyproject.toml`
|
||||
|
||||
Added `pytest-timeout>=2.3` to `testing` dep group with
|
||||
comment pointing at the `ai/conc-anal/` docs.
|
||||
|
||||
### `tests/discovery/test_registrar.py`,
|
||||
`tests/test_subint_cancellation.py`,
|
||||
`tests/test_cancellation.py` (modified)
|
||||
|
||||
> `git log 26fb820..HEAD -- tests/`
|
||||
|
||||
Applied `@pytest.mark.timeout(30, method='thread')` on
|
||||
known-hanging subint tests. Extended comments to cross-
|
||||
reference the `ai/conc-anal/*.md` docs. `method='thread'`
|
||||
is documented inline as load-bearing (`signal`-method
|
||||
SIGALRM suffers the same GIL-starvation path that drops
|
||||
SIGINT).
|
||||
|
||||
### `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` (new)
|
||||
|
||||
> `git show 797f57c -- ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
|
||||
Third sibling doc under `conc-anal/`. Structure: TL;DR,
|
||||
context ("what we tried"), symptom (the user's exact
|
||||
`Fatal Python error` output), CPython source walkthrough
|
||||
with excerpted snippets from `posixmodule.c` +
|
||||
`pystate.c`, chain summary, definitive answer to Open
|
||||
Question 1, `## Upstream-report draft (for CPython issue
|
||||
tracker)` section with a two-tier ask, references.
|
||||
|
||||
### `ai/conc-anal/subint_fork_from_main_thread_smoketest.py` (new, THIS turn)
|
||||
|
||||
Zero-tractor-import smoke test for the proposed workaround
|
||||
architecture. Four argparse-driven scenarios covering the
|
||||
control case + baseline + arch-critical case + end-to-end.
|
||||
Pass/fail banners per scenario; clean `--help` output;
|
||||
py3.13 early-exit.
|
||||
|
||||
## Non-code output (verbatim)
|
||||
|
||||
### The `strace` signature that kicked off the CPython
|
||||
walkthrough
|
||||
|
||||
```
|
||||
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
|
||||
write(16, "\2", 1) = -1 EAGAIN (Resource temporarily unavailable)
|
||||
rt_sigreturn({mask=[WINCH]}) = 139801964688928
|
||||
```
|
||||
|
||||
### Key user quotes framing the direction
|
||||
|
||||
> ok actually we get this [fatal error] ... see if you can
|
||||
> take a look at what's going on, in particular wrt to
|
||||
> cpython's sources. pretty sure there's a local copy at
|
||||
> ~/repos/cpython/
|
||||
|
||||
(Drove the CPython walkthrough that produced the
|
||||
definitive refusal chain.)
|
||||
|
||||
> is there any reason we can't just sidestep this "must fork
|
||||
> from main thread in main subint" issue by simply ensuring
|
||||
> a "subint forking server" is always setup prior to
|
||||
> invoking trio in a non-main-thread subint ...
|
||||
|
||||
(Drove the main-interp-thread-forkserver architectural
|
||||
discussion + smoke-test script design.)
|
||||
|
||||
### CPython source tags for quick jump-back
|
||||
|
||||
```
|
||||
Modules/posixmodule.c:728 PyOS_AfterFork_Child()
|
||||
Modules/posixmodule.c:753 // Ideally we could guarantee tstate is running main.
|
||||
Modules/posixmodule.c:778 status = _PyInterpreterState_DeleteExceptMain(runtime);
|
||||
|
||||
Python/pystate.c:1040 _PyInterpreterState_DeleteExceptMain()
|
||||
Python/pystate.c:1044-1047 tstate->interp != main → PyStatus_ERR("not main interpreter")
|
||||
```
|
||||
|
|
@ -139,7 +139,9 @@ def pytest_addoption(
|
|||
|
||||
|
||||
@pytest.fixture(scope='session', autouse=True)
|
||||
def loglevel(request) -> str:
|
||||
def loglevel(
|
||||
request: pytest.FixtureRequest,
|
||||
) -> str:
|
||||
import tractor
|
||||
orig = tractor.log._default_loglevel
|
||||
level = tractor.log._default_loglevel = request.config.option.loglevel
|
||||
|
|
@ -156,7 +158,7 @@ def loglevel(request) -> str:
|
|||
|
||||
@pytest.fixture(scope='function')
|
||||
def test_log(
|
||||
request,
|
||||
request: pytest.FixtureRequest,
|
||||
loglevel: str,
|
||||
) -> tractor.log.StackLevelAdapter:
|
||||
'''
|
||||
|
|
|
|||
|
|
@ -146,13 +146,12 @@ def spawn(
|
|||
ids='ctl-c={}'.format,
|
||||
)
|
||||
def ctlc(
|
||||
request,
|
||||
request: pytest.FixtureRequest,
|
||||
ci_env: bool,
|
||||
|
||||
) -> bool:
|
||||
|
||||
use_ctlc = request.param
|
||||
|
||||
use_ctlc: bool = request.param
|
||||
node = request.node
|
||||
markers = node.own_markers
|
||||
for mark in markers:
|
||||
|
|
|
|||
|
|
@ -520,8 +520,6 @@ async def kill_transport(
|
|||
|
||||
|
||||
|
||||
# @pytest.mark.parametrize('use_signal', [False, True])
|
||||
#
|
||||
# Wall-clock bound via `pytest-timeout` (`method='thread'`).
|
||||
# Under `--spawn-backend=subint` this test can wedge in an
|
||||
# un-Ctrl-C-able state (abandoned-subint + shared-GIL
|
||||
|
|
@ -537,6 +535,16 @@ async def kill_transport(
|
|||
3, # NOTE should be a 2.1s happy path.
|
||||
method='thread',
|
||||
)
|
||||
@pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
reason=(
|
||||
'XXX SUBINT HANGING TEST XXX\n'
|
||||
'See oustanding issue(s)\n'
|
||||
# TODO, put issue link!
|
||||
)
|
||||
)
|
||||
# @pytest.mark.parametrize('use_signal', [False, True])
|
||||
#
|
||||
def test_stale_entry_is_deleted(
|
||||
debug_mode: bool,
|
||||
daemon: subprocess.Popen,
|
||||
|
|
@ -549,12 +557,6 @@ def test_stale_entry_is_deleted(
|
|||
stale entry and not delivering a bad portal.
|
||||
|
||||
'''
|
||||
if start_method == 'subint':
|
||||
pytest.skip(
|
||||
'XXX SUBINT HANGING TEST XXX\n'
|
||||
'See oustanding issue(s)\n'
|
||||
)
|
||||
|
||||
async def main():
|
||||
|
||||
name: str = 'transport_fails_actor'
|
||||
|
|
|
|||
|
|
@ -0,0 +1,329 @@
|
|||
'''
|
||||
Integration exercises for the `tractor.spawn._subint_forkserver`
|
||||
submodule at three tiers:
|
||||
|
||||
1. the low-level primitives
|
||||
(`fork_from_worker_thread()` +
|
||||
`run_subint_in_worker_thread()`) driven from inside a real
|
||||
`trio.run()` in the parent process,
|
||||
|
||||
2. the full `subint_forkserver_proc` spawn backend wired
|
||||
through tractor's normal actor-nursery + portal-RPC
|
||||
machinery — i.e. `open_root_actor` + `open_nursery` +
|
||||
`run_in_actor` against a subactor spawned via fork from a
|
||||
main-interp worker thread.
|
||||
|
||||
Background
|
||||
----------
|
||||
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
establishes that `os.fork()` from a non-main sub-interpreter
|
||||
aborts the child at the CPython level. The sibling
|
||||
`subint_fork_from_main_thread_smoketest.py` proves the escape
|
||||
hatch: fork from a main-interp *worker thread* (one that has
|
||||
never entered a subint) works, and the forked child can then
|
||||
host its own `trio.run()` inside a fresh subint.
|
||||
|
||||
Those smoke-test scenarios are standalone — no trio runtime
|
||||
in the *parent*. Tiers (1)+(2) here cover the primitives
|
||||
driven from inside `trio.run()` in the parent, and tier (3)
|
||||
(the `*_spawn_basic` test) drives the registered
|
||||
`subint_forkserver` spawn backend end-to-end against the
|
||||
tractor runtime.
|
||||
|
||||
Gating
|
||||
------
|
||||
- py3.14+ (via `concurrent.interpreters` presence)
|
||||
- no `--spawn-backend` restriction — the backend-level test
|
||||
flips `tractor.spawn._spawn._spawn_method` programmatically
|
||||
(via `try_set_start_method('subint_forkserver')`) and
|
||||
restores it on teardown, so these tests are independent of
|
||||
the session-level CLI backend choice.
|
||||
|
||||
'''
|
||||
from __future__ import annotations
|
||||
from functools import partial
|
||||
import os
|
||||
|
||||
import pytest
|
||||
import trio
|
||||
|
||||
import tractor
|
||||
from tractor.devx import dump_on_hang
|
||||
|
||||
|
||||
# Gate: subint forkserver primitives require py3.14+. Check
|
||||
# the public stdlib wrapper's presence (added in 3.14) rather
|
||||
# than `_interpreters` directly — see
|
||||
# `tractor.spawn._subint` for why.
|
||||
pytest.importorskip('concurrent.interpreters')
|
||||
|
||||
from tractor.spawn._subint_forkserver import ( # noqa: E402
|
||||
fork_from_worker_thread,
|
||||
run_subint_in_worker_thread,
|
||||
wait_child,
|
||||
)
|
||||
from tractor.spawn import _spawn as _spawn_mod # noqa: E402
|
||||
from tractor.spawn._spawn import try_set_start_method # noqa: E402
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# child-side callables (passed via `child_target=` across fork)
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
_CHILD_TRIO_BOOTSTRAP: str = (
|
||||
'import trio\n'
|
||||
'async def _main():\n'
|
||||
' await trio.sleep(0.05)\n'
|
||||
' return 42\n'
|
||||
'result = trio.run(_main)\n'
|
||||
'assert result == 42, f"trio.run returned {result}"\n'
|
||||
)
|
||||
|
||||
|
||||
def _child_trio_in_subint() -> int:
|
||||
'''
|
||||
`child_target` for the trio-in-child scenario: drive a
|
||||
trivial `trio.run()` inside a fresh legacy-config subint
|
||||
on a worker thread.
|
||||
|
||||
Returns an exit code suitable for `os._exit()`:
|
||||
- 0: subint-hosted `trio.run()` succeeded
|
||||
- 3: driver thread hang (timeout inside `run_subint_in_worker_thread`)
|
||||
- 4: subint bootstrap raised some other exception
|
||||
|
||||
'''
|
||||
try:
|
||||
run_subint_in_worker_thread(
|
||||
_CHILD_TRIO_BOOTSTRAP,
|
||||
thread_name='child-subint-trio-thread',
|
||||
)
|
||||
except RuntimeError:
|
||||
# timeout / thread-never-returned
|
||||
return 3
|
||||
except BaseException:
|
||||
return 4
|
||||
return 0
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# parent-side harnesses (run inside `trio.run()`)
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
async def run_fork_in_non_trio_thread(
|
||||
deadline: float,
|
||||
*,
|
||||
child_target=None,
|
||||
) -> int:
|
||||
'''
|
||||
From inside a parent `trio.run()`, off-load the
|
||||
forkserver primitive to a main-interp worker thread via
|
||||
`trio.to_thread.run_sync()` and return the forked child's
|
||||
pid.
|
||||
|
||||
Then `wait_child()` on that pid (also off-loaded so we
|
||||
don't block trio's event loop on `waitpid()`) and assert
|
||||
the child exited cleanly.
|
||||
|
||||
'''
|
||||
with trio.fail_after(deadline):
|
||||
# NOTE: `fork_from_worker_thread` internally spawns its
|
||||
# own dedicated `threading.Thread` (not from trio's
|
||||
# cache) and joins it before returning — so we can
|
||||
# safely off-load via `to_thread.run_sync` without
|
||||
# worrying about the trio-thread-cache recycling the
|
||||
# runner. Pass `abandon_on_cancel=False` for the
|
||||
# same "bounded + clean" rationale we use in
|
||||
# `_subint.subint_proc`.
|
||||
pid: int = await trio.to_thread.run_sync(
|
||||
partial(
|
||||
fork_from_worker_thread,
|
||||
child_target,
|
||||
thread_name='test-subint-forkserver',
|
||||
),
|
||||
abandon_on_cancel=False,
|
||||
)
|
||||
assert pid > 0
|
||||
|
||||
ok, status_str = await trio.to_thread.run_sync(
|
||||
partial(
|
||||
wait_child,
|
||||
pid,
|
||||
expect_exit_ok=True,
|
||||
),
|
||||
abandon_on_cancel=False,
|
||||
)
|
||||
assert ok, (
|
||||
f'forked child did not exit cleanly: '
|
||||
f'{status_str}'
|
||||
)
|
||||
return pid
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# tests
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
# Bounded wall-clock via `pytest-timeout` (`method='thread'`)
|
||||
# for the usual GIL-hostage safety reason documented in the
|
||||
# sibling `test_subint_cancellation.py` / the class-A
|
||||
# `subint_sigint_starvation_issue.md`. Each test also has an
|
||||
# inner `trio.fail_after()` so assertion failures fire fast
|
||||
# under normal conditions.
|
||||
@pytest.mark.timeout(30, method='thread')
|
||||
def test_fork_from_worker_thread_via_trio() -> None:
|
||||
'''
|
||||
Baseline: inside `trio.run()`, call
|
||||
`fork_from_worker_thread()` via `trio.to_thread.run_sync()`,
|
||||
get a child pid back, reap the child cleanly.
|
||||
|
||||
No trio-in-child. If this regresses we know the parent-
|
||||
side trio↔worker-thread plumbing is broken independent
|
||||
of any child-side subint machinery.
|
||||
|
||||
'''
|
||||
deadline: float = 10.0
|
||||
with dump_on_hang(
|
||||
seconds=deadline,
|
||||
path='/tmp/subint_forkserver_baseline.dump',
|
||||
):
|
||||
pid: int = trio.run(
|
||||
partial(run_fork_in_non_trio_thread, deadline),
|
||||
)
|
||||
# parent-side sanity — we got a real pid back.
|
||||
assert isinstance(pid, int) and pid > 0
|
||||
# by now the child has been waited on; it shouldn't be
|
||||
# reap-able again.
|
||||
with pytest.raises((ChildProcessError, OSError)):
|
||||
os.waitpid(pid, os.WNOHANG)
|
||||
|
||||
|
||||
@pytest.mark.timeout(30, method='thread')
|
||||
def test_fork_and_run_trio_in_child() -> None:
|
||||
'''
|
||||
End-to-end: inside the parent's `trio.run()`, off-load
|
||||
`fork_from_worker_thread()` to a worker thread, have the
|
||||
forked child then create a fresh subint and run
|
||||
`trio.run()` inside it on yet another worker thread.
|
||||
|
||||
This is the full "forkserver + trio-in-subint-in-child"
|
||||
pattern the proposed `subint_forkserver` spawn backend
|
||||
would rest on.
|
||||
|
||||
'''
|
||||
deadline: float = 15.0
|
||||
with dump_on_hang(
|
||||
seconds=deadline,
|
||||
path='/tmp/subint_forkserver_trio_in_child.dump',
|
||||
):
|
||||
pid: int = trio.run(
|
||||
partial(
|
||||
run_fork_in_non_trio_thread,
|
||||
deadline,
|
||||
child_target=_child_trio_in_subint,
|
||||
),
|
||||
)
|
||||
assert isinstance(pid, int) and pid > 0
|
||||
|
||||
|
||||
# ----------------------------------------------------------------
|
||||
# tier-3 backend test: drive the registered `subint_forkserver`
|
||||
# spawn backend end-to-end through tractor's actor-nursery +
|
||||
# portal-RPC machinery.
|
||||
# ----------------------------------------------------------------
|
||||
|
||||
|
||||
async def _trivial_rpc() -> str:
|
||||
'''
|
||||
Minimal subactor-side RPC body: just return a sentinel
|
||||
string the parent can assert on.
|
||||
|
||||
'''
|
||||
return 'hello from subint-forkserver child'
|
||||
|
||||
|
||||
async def _happy_path_forkserver(
|
||||
reg_addr: tuple[str, int | str],
|
||||
deadline: float,
|
||||
) -> None:
|
||||
'''
|
||||
Parent-side harness: stand up a root actor, open an actor
|
||||
nursery, spawn one subactor via the currently-selected
|
||||
spawn backend (which this test will have flipped to
|
||||
`subint_forkserver`), run a trivial RPC through its
|
||||
portal, assert the round-trip result.
|
||||
|
||||
'''
|
||||
with trio.fail_after(deadline):
|
||||
async with (
|
||||
tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
),
|
||||
tractor.open_nursery() as an,
|
||||
):
|
||||
portal: tractor.Portal = await an.run_in_actor(
|
||||
_trivial_rpc,
|
||||
name='subint-forkserver-child',
|
||||
)
|
||||
result: str = await portal.wait_for_result()
|
||||
assert result == 'hello from subint-forkserver child'
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def forkserver_spawn_method():
|
||||
'''
|
||||
Flip `tractor.spawn._spawn._spawn_method` to
|
||||
`'subint_forkserver'` for the duration of a test, then
|
||||
restore whatever was in place before (usually the
|
||||
session-level CLI choice, typically `'trio'`).
|
||||
|
||||
Without this, other tests in the same session would
|
||||
observe the global flip and start spawning via fork —
|
||||
which is almost certainly NOT what their assertions were
|
||||
written against.
|
||||
|
||||
'''
|
||||
prev_method: str = _spawn_mod._spawn_method
|
||||
prev_ctx = _spawn_mod._ctx
|
||||
try_set_start_method('subint_forkserver')
|
||||
try:
|
||||
yield
|
||||
finally:
|
||||
_spawn_mod._spawn_method = prev_method
|
||||
_spawn_mod._ctx = prev_ctx
|
||||
|
||||
|
||||
@pytest.mark.timeout(60, method='thread')
|
||||
def test_subint_forkserver_spawn_basic(
|
||||
reg_addr: tuple[str, int | str],
|
||||
forkserver_spawn_method,
|
||||
) -> None:
|
||||
'''
|
||||
Happy-path: spawn ONE subactor via the
|
||||
`subint_forkserver` backend (parent-side fork from a
|
||||
main-interp worker thread), do a trivial portal-RPC
|
||||
round-trip, tear the nursery down cleanly.
|
||||
|
||||
If this passes, the "forkserver + tractor runtime" arch
|
||||
is proven end-to-end: the registered
|
||||
`subint_forkserver_proc` spawn target successfully
|
||||
forks a child, the child runs `_actor_child_main()` +
|
||||
completes IPC handshake + serves an RPC, and the parent
|
||||
reaps via `_ForkedProc.wait()` without regressing any of
|
||||
the normal nursery teardown invariants.
|
||||
|
||||
'''
|
||||
deadline: float = 20.0
|
||||
with dump_on_hang(
|
||||
seconds=deadline,
|
||||
path='/tmp/subint_forkserver_spawn_basic.dump',
|
||||
):
|
||||
trio.run(
|
||||
partial(
|
||||
_happy_path_forkserver,
|
||||
reg_addr,
|
||||
deadline,
|
||||
),
|
||||
)
|
||||
|
|
@ -21,6 +21,16 @@ _non_linux: bool = platform.system() != 'Linux'
|
|||
_friggin_windows: bool = platform.system() == 'Windows'
|
||||
|
||||
|
||||
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
reason=(
|
||||
'XXX SUBINT HANGING TEST XXX\n'
|
||||
'See oustanding issue(s)\n'
|
||||
# TODO, put issue link!
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
async def assert_err(delay=0):
|
||||
await trio.sleep(delay)
|
||||
assert 0
|
||||
|
|
@ -110,8 +120,17 @@ def test_remote_error(reg_addr, args_err):
|
|||
assert exc.boxed_type == errtype
|
||||
|
||||
|
||||
# @pytest.mark.skipon_spawn_backend(
|
||||
# 'subint',
|
||||
# reason=(
|
||||
# 'XXX SUBINT HANGING TEST XXX\n'
|
||||
# 'See oustanding issue(s)\n'
|
||||
# # TODO, put issue link!
|
||||
# )
|
||||
# )
|
||||
def test_multierror(
|
||||
reg_addr: tuple[str, int],
|
||||
start_method: str,
|
||||
):
|
||||
'''
|
||||
Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
||||
|
|
@ -141,15 +160,28 @@ def test_multierror(
|
|||
trio.run(main)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('delay', (0, 0.5))
|
||||
@pytest.mark.parametrize(
|
||||
'num_subactors', range(25, 26),
|
||||
'delay',
|
||||
(0, 0.5),
|
||||
ids='delays={}'.format,
|
||||
)
|
||||
def test_multierror_fast_nursery(reg_addr, start_method, num_subactors, delay):
|
||||
"""Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
||||
@pytest.mark.parametrize(
|
||||
'num_subactors',
|
||||
range(25, 26),
|
||||
ids= 'num_subs={}'.format,
|
||||
)
|
||||
def test_multierror_fast_nursery(
|
||||
reg_addr: tuple,
|
||||
start_method: str,
|
||||
num_subactors: int,
|
||||
delay: float,
|
||||
):
|
||||
'''
|
||||
Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
||||
more then one actor errors and also with a delay before failure
|
||||
to test failure during an ongoing spawning.
|
||||
"""
|
||||
|
||||
'''
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
|
|
@ -189,8 +221,15 @@ async def do_nothing():
|
|||
pass
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mechanism', ['nursery_cancel', KeyboardInterrupt])
|
||||
def test_cancel_single_subactor(reg_addr, mechanism):
|
||||
@pytest.mark.parametrize(
|
||||
'mechanism', [
|
||||
'nursery_cancel',
|
||||
KeyboardInterrupt,
|
||||
])
|
||||
def test_cancel_single_subactor(
|
||||
reg_addr: tuple,
|
||||
mechanism: str|KeyboardInterrupt,
|
||||
):
|
||||
'''
|
||||
Ensure a ``ActorNursery.start_actor()`` spawned subactor
|
||||
cancels when the nursery is cancelled.
|
||||
|
|
@ -232,9 +271,12 @@ async def stream_forever():
|
|||
await trio.sleep(0.01)
|
||||
|
||||
|
||||
@tractor_test
|
||||
async def test_cancel_infinite_streamer(start_method):
|
||||
|
||||
@tractor_test(
|
||||
timeout=6,
|
||||
)
|
||||
async def test_cancel_infinite_streamer(
|
||||
start_method: str
|
||||
):
|
||||
# stream for at most 1 seconds
|
||||
with (
|
||||
trio.fail_after(4),
|
||||
|
|
@ -257,6 +299,14 @@ async def test_cancel_infinite_streamer(start_method):
|
|||
assert n.cancelled
|
||||
|
||||
|
||||
# @pytest.mark.skipon_spawn_backend(
|
||||
# 'subint',
|
||||
# reason=(
|
||||
# 'XXX SUBINT HANGING TEST XXX\n'
|
||||
# 'See oustanding issue(s)\n'
|
||||
# # TODO, put issue link!
|
||||
# )
|
||||
# )
|
||||
@pytest.mark.parametrize(
|
||||
'num_actors_and_errs',
|
||||
[
|
||||
|
|
@ -286,7 +336,9 @@ async def test_cancel_infinite_streamer(start_method):
|
|||
'no_daemon_actors_fail_all_run_in_actors_sleep_then_fail',
|
||||
],
|
||||
)
|
||||
@tractor_test
|
||||
@tractor_test(
|
||||
timeout=10,
|
||||
)
|
||||
async def test_some_cancels_all(
|
||||
num_actors_and_errs: tuple,
|
||||
start_method: str,
|
||||
|
|
@ -370,7 +422,10 @@ async def test_some_cancels_all(
|
|||
pytest.fail("Should have gotten a remote assertion error?")
|
||||
|
||||
|
||||
async def spawn_and_error(breadth, depth) -> None:
|
||||
async def spawn_and_error(
|
||||
breadth: int,
|
||||
depth: int,
|
||||
) -> None:
|
||||
name = tractor.current_actor().name
|
||||
async with tractor.open_nursery() as nursery:
|
||||
for i in range(breadth):
|
||||
|
|
@ -396,7 +451,10 @@ async def spawn_and_error(breadth, depth) -> None:
|
|||
|
||||
|
||||
@tractor_test
|
||||
async def test_nested_multierrors(loglevel, start_method):
|
||||
async def test_nested_multierrors(
|
||||
loglevel: str,
|
||||
start_method: str,
|
||||
):
|
||||
'''
|
||||
Test that failed actor sets are wrapped in `BaseExceptionGroup`s. This
|
||||
test goes only 2 nurseries deep but we should eventually have tests
|
||||
|
|
@ -483,20 +541,21 @@ async def test_nested_multierrors(loglevel, start_method):
|
|||
|
||||
@no_windows
|
||||
def test_cancel_via_SIGINT(
|
||||
loglevel,
|
||||
start_method,
|
||||
spawn_backend,
|
||||
loglevel: str,
|
||||
start_method: str,
|
||||
):
|
||||
"""Ensure that a control-C (SIGINT) signal cancels both the parent and
|
||||
'''
|
||||
Ensure that a control-C (SIGINT) signal cancels both the parent and
|
||||
child processes in trionic fashion
|
||||
"""
|
||||
|
||||
'''
|
||||
pid: int = os.getpid()
|
||||
|
||||
async def main():
|
||||
with trio.fail_after(2):
|
||||
async with tractor.open_nursery() as tn:
|
||||
await tn.start_actor('sucka')
|
||||
if 'mp' in spawn_backend:
|
||||
if 'mp' in start_method:
|
||||
time.sleep(0.1)
|
||||
os.kill(pid, signal.SIGINT)
|
||||
await trio.sleep_forever()
|
||||
|
|
@ -580,6 +639,14 @@ async def spawn_sub_with_sync_blocking_task():
|
|||
print('exiting first subactor layer..\n')
|
||||
|
||||
|
||||
# @pytest.mark.skipon_spawn_backend(
|
||||
# 'subint',
|
||||
# reason=(
|
||||
# 'XXX SUBINT HANGING TEST XXX\n'
|
||||
# 'See oustanding issue(s)\n'
|
||||
# # TODO, put issue link!
|
||||
# )
|
||||
# )
|
||||
@pytest.mark.parametrize(
|
||||
'man_cancel_outer',
|
||||
[
|
||||
|
|
@ -694,7 +761,7 @@ def test_cancel_while_childs_child_in_sync_sleep(
|
|||
|
||||
|
||||
def test_fast_graceful_cancel_when_spawn_task_in_soft_proc_wait_for_daemon(
|
||||
start_method,
|
||||
start_method: str,
|
||||
):
|
||||
'''
|
||||
This is a very subtle test which demonstrates how cancellation
|
||||
|
|
|
|||
|
|
@ -26,6 +26,15 @@ from tractor._testing import (
|
|||
|
||||
from .conftest import cpu_scaling_factor
|
||||
|
||||
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
reason=(
|
||||
'XXX SUBINT GIL-CONTENTION HANGING TEST XXX\n'
|
||||
'See oustanding issue(s)\n'
|
||||
# TODO, put issue link!
|
||||
)
|
||||
)
|
||||
|
||||
# XXX TODO cases:
|
||||
# - [x] WE cancelled the peer and thus should not see any raised
|
||||
# `ContextCancelled` as it should be reaped silently?
|
||||
|
|
|
|||
|
|
@ -7,6 +7,14 @@ import tractor
|
|||
from tractor.experimental import msgpub
|
||||
from tractor._testing import tractor_test
|
||||
|
||||
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
reason=(
|
||||
'XXX SUBINT HANGING TEST XXX\n'
|
||||
'See oustanding issue(s)\n'
|
||||
# TODO, put issue link!
|
||||
)
|
||||
)
|
||||
|
||||
def test_type_checks():
|
||||
|
||||
|
|
|
|||
|
|
@ -14,6 +14,14 @@ from tractor.ipc._shm import (
|
|||
attach_shm_list,
|
||||
)
|
||||
|
||||
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
reason=(
|
||||
'XXX SUBINT GIL-CONTENTION HANGING TEST XXX\n'
|
||||
'See oustanding issue(s)\n'
|
||||
# TODO, put issue link!
|
||||
)
|
||||
)
|
||||
|
||||
@tractor.context
|
||||
async def child_attach_shml_alot(
|
||||
|
|
|
|||
|
|
@ -161,6 +161,14 @@ def test_subint_happy_teardown(
|
|||
trio.run(partial(_happy_path, reg_addr, deadline))
|
||||
|
||||
|
||||
@pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
reason=(
|
||||
'XXX SUBINT HANGING TEST XXX\n'
|
||||
'See oustanding issue(s)\n'
|
||||
# TODO, put issue link!
|
||||
)
|
||||
)
|
||||
# Wall-clock bound via `pytest-timeout` (`method='thread'`)
|
||||
# as defense-in-depth over the inner `trio.fail_after(15)`.
|
||||
# Under the orphaned-channel hang class described in
|
||||
|
|
|
|||
|
|
@ -224,8 +224,10 @@ def pytest_addoption(
|
|||
)
|
||||
|
||||
|
||||
def pytest_configure(config):
|
||||
backend = config.option.spawn_backend
|
||||
def pytest_configure(
|
||||
config: pytest.Config,
|
||||
):
|
||||
backend: str = config.option.spawn_backend
|
||||
from tractor.spawn._spawn import try_set_start_method
|
||||
try:
|
||||
try_set_start_method(backend)
|
||||
|
|
@ -241,10 +243,52 @@ def pytest_configure(config):
|
|||
'markers',
|
||||
'no_tpt(proto_key): test will (likely) not behave with tpt backend'
|
||||
)
|
||||
config.addinivalue_line(
|
||||
'markers',
|
||||
'skipon_spawn_backend(*start_methods, reason=None): '
|
||||
'skip this test under any of the given `--spawn-backend` '
|
||||
'values; useful for backend-specific known-hang / -borked '
|
||||
'cases (e.g. the `subint` GIL-starvation class documented '
|
||||
'in `ai/conc-anal/subint_sigint_starvation_issue.md`).'
|
||||
)
|
||||
|
||||
|
||||
def pytest_collection_modifyitems(
|
||||
config: pytest.Config,
|
||||
items: list[pytest.Function],
|
||||
):
|
||||
'''
|
||||
Expand any `@pytest.mark.skipon_spawn_backend('<backend>'[,
|
||||
...], reason='...')` markers into concrete
|
||||
`pytest.mark.skip(reason=...)` calls for tests whose
|
||||
backend-arg set contains the active `--spawn-backend`.
|
||||
|
||||
Uses `item.iter_markers(name=...)` which walks function +
|
||||
class + module-level marks in the correct scope order (and
|
||||
handles both the single-`MarkDecorator` and `list[Mark]`
|
||||
forms of a module-level `pytestmark`) — so the same marker
|
||||
works at any level a user puts it.
|
||||
|
||||
'''
|
||||
backend: str = config.option.spawn_backend
|
||||
default_reason: str = f'Borked on --spawn-backend={backend!r}'
|
||||
for item in items:
|
||||
for mark in item.iter_markers(name='skipon_spawn_backend'):
|
||||
if backend in mark.args:
|
||||
reason: str = mark.kwargs.get(
|
||||
'reason',
|
||||
default_reason,
|
||||
)
|
||||
item.add_marker(pytest.mark.skip(reason=reason))
|
||||
# first matching mark wins; no value in stacking
|
||||
# multiple `skip`s on the same item.
|
||||
break
|
||||
|
||||
|
||||
@pytest.fixture(scope='session')
|
||||
def debug_mode(request) -> bool:
|
||||
def debug_mode(
|
||||
request: pytest.FixtureRequest,
|
||||
) -> bool:
|
||||
'''
|
||||
Flag state for whether `--tpdb` (for `tractor`-py-debugger)
|
||||
was passed to the test run.
|
||||
|
|
@ -258,12 +302,16 @@ def debug_mode(request) -> bool:
|
|||
|
||||
|
||||
@pytest.fixture(scope='session')
|
||||
def spawn_backend(request) -> str:
|
||||
def spawn_backend(
|
||||
request: pytest.FixtureRequest,
|
||||
) -> str:
|
||||
return request.config.option.spawn_backend
|
||||
|
||||
|
||||
@pytest.fixture(scope='session')
|
||||
def tpt_protos(request) -> list[str]:
|
||||
def tpt_protos(
|
||||
request: pytest.FixtureRequest,
|
||||
) -> list[str]:
|
||||
|
||||
# allow quoting on CLI
|
||||
proto_keys: list[str] = [
|
||||
|
|
@ -291,7 +339,7 @@ def tpt_protos(request) -> list[str]:
|
|||
autouse=True,
|
||||
)
|
||||
def tpt_proto(
|
||||
request,
|
||||
request: pytest.FixtureRequest,
|
||||
tpt_protos: list[str],
|
||||
) -> str:
|
||||
proto_key: str = tpt_protos[0]
|
||||
|
|
@ -343,7 +391,6 @@ def pytest_generate_tests(
|
|||
metafunc: pytest.Metafunc,
|
||||
):
|
||||
spawn_backend: str = metafunc.config.option.spawn_backend
|
||||
|
||||
if not spawn_backend:
|
||||
# XXX some weird windows bug with `pytest`?
|
||||
spawn_backend = 'trio'
|
||||
|
|
|
|||
|
|
@ -63,6 +63,22 @@ SpawnMethodKey = Literal[
|
|||
'mp_spawn',
|
||||
'mp_forkserver', # posix only
|
||||
'subint', # py3.14+ via `concurrent.interpreters` (PEP 734)
|
||||
# EXPERIMENTAL — blocked at the CPython level. The
|
||||
# design goal was a `trio+fork`-safe subproc spawn via
|
||||
# `os.fork()` from a trio-free launchpad sub-interpreter,
|
||||
# but CPython's `PyOS_AfterFork_Child` → `_PyInterpreterState_DeleteExceptMain`
|
||||
# requires fork come from the main interp. See
|
||||
# `tractor.spawn._subint_fork` +
|
||||
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
# + issue #379 for the full analysis.
|
||||
'subint_fork',
|
||||
# EXPERIMENTAL — the `subint_fork` workaround. `os.fork()`
|
||||
# from a non-trio worker thread (never entered a subint)
|
||||
# is CPython-legal and works cleanly; forked child runs
|
||||
# `tractor._child._actor_child_main()` against a trio
|
||||
# runtime, exactly like `trio_proc` but via fork instead
|
||||
# of subproc-exec. See `tractor.spawn._subint_forkserver`.
|
||||
'subint_forkserver',
|
||||
]
|
||||
_spawn_method: SpawnMethodKey = 'trio'
|
||||
|
||||
|
|
@ -115,15 +131,14 @@ def try_set_start_method(
|
|||
case 'trio':
|
||||
_ctx = None
|
||||
|
||||
case 'subint':
|
||||
# subints need no `mp.context`; feature-gate on the
|
||||
# py3.14 public `concurrent.interpreters` wrapper
|
||||
# (PEP 734). We actually drive the private
|
||||
# `_interpreters` C module in legacy mode — see
|
||||
# `tractor.spawn._subint` for why — but py3.13's
|
||||
# vintage of that private module hangs under our
|
||||
# multi-trio usage, so we refuse it via the public-
|
||||
# module presence check.
|
||||
case 'subint' | 'subint_fork' | 'subint_forkserver':
|
||||
# All subint-family backends need no `mp.context`;
|
||||
# all three feature-gate on the py3.14 public
|
||||
# `concurrent.interpreters` wrapper (PEP 734). See
|
||||
# `tractor.spawn._subint` for the detailed
|
||||
# reasoning. `subint_fork` is blocked at the
|
||||
# CPython level (raises `NotImplementedError`);
|
||||
# `subint_forkserver` is the working workaround.
|
||||
from ._subint import _has_subints
|
||||
if not _has_subints:
|
||||
raise RuntimeError(
|
||||
|
|
@ -461,6 +476,8 @@ async def new_proc(
|
|||
from ._trio import trio_proc
|
||||
from ._mp import mp_proc
|
||||
from ._subint import subint_proc
|
||||
from ._subint_fork import subint_fork_proc
|
||||
from ._subint_forkserver import subint_forkserver_proc
|
||||
|
||||
|
||||
# proc spawning backend target map
|
||||
|
|
@ -469,4 +486,14 @@ _methods: dict[SpawnMethodKey, Callable] = {
|
|||
'mp_spawn': mp_proc,
|
||||
'mp_forkserver': mp_proc,
|
||||
'subint': subint_proc,
|
||||
# blocked at CPython level — see `_subint_fork.py` +
|
||||
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||
# Kept here so `--spawn-backend=subint_fork` routes to a
|
||||
# clean `NotImplementedError` with pointer to the analysis,
|
||||
# rather than an "invalid backend" error.
|
||||
'subint_fork': subint_fork_proc,
|
||||
# WIP — fork-from-non-trio-worker-thread, works on py3.14+
|
||||
# (validated via `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`).
|
||||
# See `tractor.spawn._subint_forkserver`.
|
||||
'subint_forkserver': subint_forkserver_proc,
|
||||
}
|
||||
|
|
|
|||
|
|
@ -431,3 +431,5 @@ async def subint_proc(
|
|||
finally:
|
||||
if not cancelled_during_spawn:
|
||||
actor_nursery._children.pop(uid, None)
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,153 @@
|
|||
# tractor: structured concurrent "actors".
|
||||
# Copyright 2018-eternity Tyler Goodlet.
|
||||
|
||||
# This program is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU Affero General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU Affero General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU Affero General Public License
|
||||
# along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||
|
||||
'''
|
||||
`subint_fork` spawn backend — BLOCKED at CPython level.
|
||||
|
||||
The idea was to use a sub-interpreter purely as a launchpad
|
||||
from which to call `os.fork()`, sidestepping the well-known
|
||||
trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
|
||||
the forking interp had never imported `trio`.
|
||||
|
||||
**IT DOES NOT WORK ON CURRENT CPYTHON.** The fork syscall
|
||||
itself succeeds (in the parent), but the forked CHILD
|
||||
process aborts immediately during CPython's post-fork
|
||||
cleanup — `PyOS_AfterFork_Child()` calls
|
||||
`_PyInterpreterState_DeleteExceptMain()` which refuses to
|
||||
operate when the current tstate belongs to a non-main
|
||||
sub-interpreter.
|
||||
|
||||
Full annotated walkthrough from the user-visible error
|
||||
(`Fatal Python error: _PyInterpreterState_DeleteExceptMain:
|
||||
not main interpreter`) down to the specific CPython source
|
||||
lines that enforce this is in
|
||||
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||
|
||||
We keep this submodule as a dedicated documentation of the
|
||||
attempt. If CPython ever lifts the restriction (e.g., via a
|
||||
force-destroy primitive or a hook that swaps tstate to main
|
||||
pre-fork), the structural sketch preserved in this file's
|
||||
git history is a concrete starting point for a working impl.
|
||||
|
||||
See also: issue #379's "Our own thoughts, ideas for
|
||||
`fork()`-workaround/hacks..." section.
|
||||
|
||||
'''
|
||||
from __future__ import annotations
|
||||
import sys
|
||||
from typing import (
|
||||
Any,
|
||||
TYPE_CHECKING,
|
||||
)
|
||||
|
||||
import trio
|
||||
from trio import TaskStatus
|
||||
|
||||
from tractor.runtime._portal import Portal
|
||||
from ._subint import _has_subints
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from tractor.discovery._addr import UnwrappedAddress
|
||||
from tractor.runtime._runtime import Actor
|
||||
from tractor.runtime._supervise import ActorNursery
|
||||
|
||||
|
||||
async def subint_fork_proc(
|
||||
name: str,
|
||||
actor_nursery: ActorNursery,
|
||||
subactor: Actor,
|
||||
errors: dict[tuple[str, str], Exception],
|
||||
|
||||
bind_addrs: list[UnwrappedAddress],
|
||||
parent_addr: UnwrappedAddress,
|
||||
_runtime_vars: dict[str, Any],
|
||||
*,
|
||||
infect_asyncio: bool = False,
|
||||
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
|
||||
proc_kwargs: dict[str, any] = {},
|
||||
|
||||
) -> None:
|
||||
'''
|
||||
EXPERIMENTAL — currently blocked by a CPython invariant.
|
||||
|
||||
Attempted design
|
||||
----------------
|
||||
1. Parent creates a fresh legacy-config subint.
|
||||
2. A worker OS-thread drives the subint through a
|
||||
bootstrap that calls `os.fork()`.
|
||||
3. In the forked CHILD, `os.execv()` back into
|
||||
`python -m tractor._child` (fresh process).
|
||||
4. In the fork-PARENT, the launchpad subint is destroyed;
|
||||
parent-side trio task proceeds identically to
|
||||
`trio_proc()` (wait for child connect-back, send
|
||||
`SpawnSpec`, yield `Portal`, etc.).
|
||||
|
||||
Why it doesn't work
|
||||
-------------------
|
||||
CPython's `PyOS_AfterFork_Child()` (in
|
||||
`Modules/posixmodule.c`) calls
|
||||
`_PyInterpreterState_DeleteExceptMain()` (in
|
||||
`Python/pystate.c`) as part of post-fork cleanup. That
|
||||
function requires the current `PyThreadState` belong to
|
||||
the **main** interpreter. When `os.fork()` is called
|
||||
from within a sub-interpreter, the child wakes up with
|
||||
its tstate still pointing at the (now-stale) subint, and
|
||||
this check fails with `PyStatus_ERR("not main
|
||||
interpreter")`, triggering a `fatal_error` goto and
|
||||
aborting the child process.
|
||||
|
||||
CPython devs acknowledge the fragility with a
|
||||
`// Ideally we could guarantee tstate is running main.`
|
||||
comment right above the call site.
|
||||
|
||||
See
|
||||
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
for the full annotated walkthrough + upstream-report
|
||||
draft.
|
||||
|
||||
Why we keep this stub
|
||||
---------------------
|
||||
- Documents the attempt in-tree so the next person who
|
||||
has this idea finds the reason it doesn't work rather
|
||||
than rediscovering the same CPython-level dead end.
|
||||
- If CPython ever lifts the restriction (e.g., via a
|
||||
force-destroy primitive or a hook that swaps tstate
|
||||
to main pre-fork), this submodule's git history holds
|
||||
the structural sketch of what a working impl would
|
||||
look like.
|
||||
|
||||
'''
|
||||
if not _has_subints:
|
||||
raise RuntimeError(
|
||||
f'The {"subint_fork"!r} spawn backend requires '
|
||||
f'Python 3.14+.\n'
|
||||
f'Current runtime: {sys.version}'
|
||||
)
|
||||
|
||||
raise NotImplementedError(
|
||||
'The `subint_fork` spawn backend is blocked at the '
|
||||
'CPython level — `os.fork()` from a non-main '
|
||||
'sub-interpreter is refused by '
|
||||
'`PyOS_AfterFork_Child()` → '
|
||||
'`_PyInterpreterState_DeleteExceptMain()`, which '
|
||||
'aborts the child with '
|
||||
'`Fatal Python error: not main interpreter`.\n'
|
||||
'\n'
|
||||
'See '
|
||||
'`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` '
|
||||
'for the full analysis + upstream-report draft.'
|
||||
)
|
||||
|
|
@ -0,0 +1,687 @@
|
|||
# tractor: structured concurrent "actors".
|
||||
# Copyright 2018-eternity Tyler Goodlet.
|
||||
|
||||
# This program is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU Affero General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU Affero General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU Affero General Public License
|
||||
# along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||
|
||||
'''
|
||||
Forkserver-style `os.fork()` primitives for the `subint`-hosted
|
||||
actor model.
|
||||
|
||||
Background
|
||||
----------
|
||||
CPython refuses `os.fork()` from a non-main sub-interpreter:
|
||||
`PyOS_AfterFork_Child()` →
|
||||
`_PyInterpreterState_DeleteExceptMain()` gates on the calling
|
||||
thread's tstate belonging to the main interpreter and aborts
|
||||
the forked child otherwise. The full walkthrough (with source
|
||||
refs) lives in
|
||||
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||
|
||||
However `os.fork()` from a regular `threading.Thread` attached
|
||||
to the *main* interpreter — i.e. a worker thread that has
|
||||
never entered a subint — works cleanly. Empirically validated
|
||||
across four scenarios by
|
||||
`ai/conc-anal/subint_fork_from_main_thread_smoketest.py` on
|
||||
py3.14.
|
||||
|
||||
This submodule lifts the validated primitives out of the
|
||||
smoke-test and into tractor proper, so they can eventually be
|
||||
wired into a real "subint forkserver" spawn backend — where:
|
||||
|
||||
- A dedicated main-interp worker thread owns all `os.fork()`
|
||||
calls (never enters a subint).
|
||||
- The tractor parent-actor's `trio.run()` lives in a
|
||||
sub-interpreter on a different worker thread.
|
||||
- When a spawn is requested, the trio-task signals the
|
||||
forkserver thread; the forkserver forks; child re-enters
|
||||
the same pattern (trio in a subint + forkserver on main).
|
||||
|
||||
This mirrors the stdlib `multiprocessing.forkserver` design
|
||||
but keeps the forkserver in-process for faster spawn latency
|
||||
and inherited parent state.
|
||||
|
||||
Status
|
||||
------
|
||||
**EXPERIMENTAL** — wired as the `'subint_forkserver'` entry
|
||||
in `tractor.spawn._spawn._methods` and selectable via
|
||||
`try_set_start_method('subint_forkserver')` / `--spawn-backend
|
||||
=subint_forkserver`. Parent-side spawn, child-side runtime
|
||||
bring-up and normal portal-RPC teardown are validated by the
|
||||
backend-tier test in
|
||||
`tests/spawn/test_subint_forkserver.py::test_subint_forkserver_spawn_basic`.
|
||||
|
||||
Still-open work (tracked on tractor #379):
|
||||
|
||||
- no cancellation / hard-kill stress coverage yet (counterpart
|
||||
to `tests/test_subint_cancellation.py` for the plain
|
||||
`subint` backend),
|
||||
- child-side "subint-hosted root runtime" mode (the second
|
||||
half of the envisioned arch — currently the forked child
|
||||
runs plain `_trio_main` via `spawn_method='trio'`; the
|
||||
subint-hosted variant is still the future step gated on
|
||||
msgspec PEP 684 support),
|
||||
- thread-hygiene audit of the two `threading.Thread`
|
||||
primitives below, gated on the same msgspec unblock
|
||||
(see TODO section further down).
|
||||
|
||||
TODO — cleanup gated on msgspec PEP 684 support
|
||||
-----------------------------------------------
|
||||
Both primitives below allocate a dedicated
|
||||
`threading.Thread` rather than using
|
||||
`trio.to_thread.run_sync()`. That's a cautious design
|
||||
rooted in three distinct-but-entangled issues (GIL
|
||||
starvation from legacy-config subints, tstate-recycling
|
||||
destroy race on trio cache threads, fork-from-main-tstate
|
||||
invariant). Some of those dissolve under PEP 684
|
||||
isolated-mode subints; one requires empirical re-testing
|
||||
to know.
|
||||
|
||||
Full analysis + audit plan for when we can revisit is in
|
||||
`ai/conc-anal/subint_forkserver_thread_constraints_on_pep684_issue.md`.
|
||||
Intent: file a follow-up GH issue linked to #379 once
|
||||
[jcrist/msgspec#563](https://github.com/jcrist/msgspec/issues/563)
|
||||
unblocks isolated-mode subints in tractor.
|
||||
|
||||
See also
|
||||
--------
|
||||
- `tractor.spawn._subint_fork` — the stub for the
|
||||
fork-from-subint strategy that DIDN'T work (kept as
|
||||
in-tree documentation of the attempt + CPython-level
|
||||
block).
|
||||
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||
— the CPython source walkthrough.
|
||||
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
|
||||
— the standalone feasibility check (now delegates to
|
||||
this module for the primitives it exercises).
|
||||
|
||||
'''
|
||||
from __future__ import annotations
|
||||
import os
|
||||
import signal
|
||||
import sys
|
||||
import threading
|
||||
from functools import partial
|
||||
from typing import (
|
||||
Any,
|
||||
Callable,
|
||||
TYPE_CHECKING,
|
||||
)
|
||||
|
||||
import trio
|
||||
from trio import TaskStatus
|
||||
|
||||
from tractor.log import get_logger
|
||||
from tractor.msg import (
|
||||
types as msgtypes,
|
||||
pretty_struct,
|
||||
)
|
||||
from tractor.runtime._state import current_actor
|
||||
from tractor.runtime._portal import Portal
|
||||
from ._spawn import (
|
||||
cancel_on_completion,
|
||||
soft_kill,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from tractor.discovery._addr import UnwrappedAddress
|
||||
from tractor.ipc import (
|
||||
_server,
|
||||
)
|
||||
from tractor.runtime._runtime import Actor
|
||||
from tractor.runtime._supervise import ActorNursery
|
||||
|
||||
|
||||
log = get_logger('tractor')
|
||||
|
||||
|
||||
# Feature-gate: py3.14+ via the public `concurrent.interpreters`
|
||||
# wrapper. Matches the gate in `tractor.spawn._subint` —
|
||||
# see that module's docstring for why we require the public
|
||||
# API's presence even though we reach into the private
|
||||
# `_interpreters` C module for actual calls.
|
||||
try:
|
||||
from concurrent import interpreters as _public_interpreters # noqa: F401 # type: ignore
|
||||
import _interpreters # type: ignore
|
||||
_has_subints: bool = True
|
||||
except ImportError:
|
||||
_interpreters = None # type: ignore
|
||||
_has_subints: bool = False
|
||||
|
||||
|
||||
def _format_child_exit(
|
||||
status: int,
|
||||
) -> str:
|
||||
'''
|
||||
Render `os.waitpid()`-returned status as a short human
|
||||
string (`'rc=0'` / `'signal=SIGABRT'` / etc.) for log
|
||||
output.
|
||||
|
||||
'''
|
||||
if os.WIFEXITED(status):
|
||||
return f'rc={os.WEXITSTATUS(status)}'
|
||||
elif os.WIFSIGNALED(status):
|
||||
sig: int = os.WTERMSIG(status)
|
||||
return f'signal={signal.Signals(sig).name}'
|
||||
else:
|
||||
return f'raw_status={status}'
|
||||
|
||||
|
||||
def wait_child(
|
||||
pid: int,
|
||||
*,
|
||||
expect_exit_ok: bool = True,
|
||||
) -> tuple[bool, str]:
|
||||
'''
|
||||
`os.waitpid()` + classify the child's exit as
|
||||
expected-or-not.
|
||||
|
||||
`expect_exit_ok=True` → expect clean `rc=0`. `False` →
|
||||
expect abnormal death (any signal or nonzero rc). Used
|
||||
by the control-case smoke-test scenario where CPython
|
||||
is meant to abort the child.
|
||||
|
||||
Returns `(ok, status_str)` — `ok` reflects whether the
|
||||
observed outcome matches `expect_exit_ok`, `status_str`
|
||||
is a short render of the actual status.
|
||||
|
||||
'''
|
||||
_, status = os.waitpid(pid, 0)
|
||||
exited_normally: bool = (
|
||||
os.WIFEXITED(status)
|
||||
and
|
||||
os.WEXITSTATUS(status) == 0
|
||||
)
|
||||
ok: bool = (
|
||||
exited_normally
|
||||
if expect_exit_ok
|
||||
else not exited_normally
|
||||
)
|
||||
return ok, _format_child_exit(status)
|
||||
|
||||
|
||||
def fork_from_worker_thread(
|
||||
child_target: Callable[[], int] | None = None,
|
||||
*,
|
||||
thread_name: str = 'subint-forkserver',
|
||||
join_timeout: float = 10.0,
|
||||
|
||||
) -> int:
|
||||
'''
|
||||
`os.fork()` from a main-interp worker thread; return the
|
||||
forked child's pid.
|
||||
|
||||
The calling context **must** be the main interpreter
|
||||
(not a subinterpreter) — that's the whole point of this
|
||||
primitive. A regular `threading.Thread(target=...)`
|
||||
spawned from main-interp code satisfies this
|
||||
automatically because Python attaches the thread's
|
||||
tstate to the *calling* interpreter, and our main
|
||||
thread's calling interp is always main.
|
||||
|
||||
If `child_target` is provided, it runs IN the forked
|
||||
child process before `os._exit` is called. The callable
|
||||
should return an int used as the child's exit rc. If
|
||||
`child_target` is None, the child `_exit(0)`s immediately
|
||||
(useful for the baseline sanity case).
|
||||
|
||||
On the PARENT side, this function drives the worker
|
||||
thread to completion (`fork()` returns near-instantly;
|
||||
the thread is expected to exit promptly) and then
|
||||
returns the forked child's pid. Raises `RuntimeError`
|
||||
if the worker thread fails to return within
|
||||
`join_timeout` seconds — that'd be an unexpected CPython
|
||||
pathology.
|
||||
|
||||
'''
|
||||
if not _has_subints:
|
||||
raise RuntimeError(
|
||||
'subint-forkserver primitives require Python '
|
||||
'3.14+ (public `concurrent.interpreters` module '
|
||||
'not present on this runtime).'
|
||||
)
|
||||
|
||||
# Use a pipe to shuttle the forked child's pid from the
|
||||
# worker thread back to the caller.
|
||||
rfd, wfd = os.pipe()
|
||||
|
||||
def _worker() -> None:
|
||||
'''
|
||||
Runs on the forkserver worker thread. Forks; child
|
||||
runs `child_target` (if any) and exits; parent side
|
||||
writes the child pid to the pipe so the main-thread
|
||||
caller can retrieve it.
|
||||
|
||||
'''
|
||||
pid: int = os.fork()
|
||||
if pid == 0:
|
||||
# CHILD: close the pid-pipe ends (we don't use
|
||||
# them here), run the user callable if any, exit.
|
||||
os.close(rfd)
|
||||
os.close(wfd)
|
||||
rc: int = 0
|
||||
if child_target is not None:
|
||||
try:
|
||||
rc = child_target() or 0
|
||||
except BaseException as err:
|
||||
log.error(
|
||||
f'subint-forkserver child_target '
|
||||
f'raised:\n'
|
||||
f'|_{type(err).__name__}: {err}'
|
||||
)
|
||||
rc = 2
|
||||
os._exit(rc)
|
||||
else:
|
||||
# PARENT (still inside the worker thread):
|
||||
# hand the child pid back to main via pipe.
|
||||
os.write(wfd, pid.to_bytes(8, 'little'))
|
||||
|
||||
worker: threading.Thread = threading.Thread(
|
||||
target=_worker,
|
||||
name=thread_name,
|
||||
daemon=False,
|
||||
)
|
||||
worker.start()
|
||||
worker.join(timeout=join_timeout)
|
||||
if worker.is_alive():
|
||||
# Pipe cleanup best-effort before bail.
|
||||
try:
|
||||
os.close(rfd)
|
||||
except OSError:
|
||||
pass
|
||||
try:
|
||||
os.close(wfd)
|
||||
except OSError:
|
||||
pass
|
||||
raise RuntimeError(
|
||||
f'subint-forkserver worker thread '
|
||||
f'{thread_name!r} did not return within '
|
||||
f'{join_timeout}s — this is unexpected since '
|
||||
f'`os.fork()` should return near-instantly on '
|
||||
f'the parent side.'
|
||||
)
|
||||
|
||||
pid_bytes: bytes = os.read(rfd, 8)
|
||||
os.close(rfd)
|
||||
os.close(wfd)
|
||||
pid: int = int.from_bytes(pid_bytes, 'little')
|
||||
log.runtime(
|
||||
f'subint-forkserver forked child\n'
|
||||
f'(>\n'
|
||||
f' |_pid={pid}\n'
|
||||
)
|
||||
return pid
|
||||
|
||||
|
||||
def run_subint_in_worker_thread(
|
||||
bootstrap: str,
|
||||
*,
|
||||
thread_name: str = 'subint-trio',
|
||||
join_timeout: float = 10.0,
|
||||
|
||||
) -> None:
|
||||
'''
|
||||
Create a fresh legacy-config sub-interpreter and drive
|
||||
the given `bootstrap` code string through
|
||||
`_interpreters.exec()` on a dedicated worker thread.
|
||||
|
||||
Naming mirrors `fork_from_worker_thread()`:
|
||||
"<action>_in_worker_thread" — the action here is "run a
|
||||
subint", not "run trio" per se. Typical `bootstrap`
|
||||
content does import `trio` + call `trio.run()`, but
|
||||
nothing about this primitive requires trio; it's a
|
||||
generic "host a subint on a worker thread" helper.
|
||||
Intended mainly for use inside a fork-child (see
|
||||
`tractor.spawn._subint_forkserver` module docstring) but
|
||||
works anywhere.
|
||||
|
||||
See `tractor.spawn._subint.subint_proc` for the matching
|
||||
pattern tractor uses at the sub-actor level.
|
||||
|
||||
Destroys the subint after the thread joins.
|
||||
|
||||
'''
|
||||
if not _has_subints:
|
||||
raise RuntimeError(
|
||||
'subint-forkserver primitives require Python '
|
||||
'3.14+.'
|
||||
)
|
||||
|
||||
interp_id: int = _interpreters.create('legacy')
|
||||
log.runtime(
|
||||
f'Created child-side subint for trio.run()\n'
|
||||
f'(>\n'
|
||||
f' |_interp_id={interp_id}\n'
|
||||
)
|
||||
|
||||
err: BaseException | None = None
|
||||
|
||||
def _drive() -> None:
|
||||
nonlocal err
|
||||
try:
|
||||
_interpreters.exec(interp_id, bootstrap)
|
||||
except BaseException as e:
|
||||
err = e
|
||||
|
||||
worker: threading.Thread = threading.Thread(
|
||||
target=_drive,
|
||||
name=thread_name,
|
||||
daemon=False,
|
||||
)
|
||||
worker.start()
|
||||
worker.join(timeout=join_timeout)
|
||||
|
||||
try:
|
||||
_interpreters.destroy(interp_id)
|
||||
except _interpreters.InterpreterError as e:
|
||||
log.warning(
|
||||
f'Could not destroy child-side subint '
|
||||
f'{interp_id}: {e}'
|
||||
)
|
||||
|
||||
if worker.is_alive():
|
||||
raise RuntimeError(
|
||||
f'child-side subint trio-driver thread '
|
||||
f'{thread_name!r} did not return within '
|
||||
f'{join_timeout}s.'
|
||||
)
|
||||
if err is not None:
|
||||
raise err
|
||||
|
||||
|
||||
class _ForkedProc:
|
||||
'''
|
||||
Thin `trio.Process`-compatible shim around a raw OS pid
|
||||
returned by `fork_from_worker_thread()`, exposing just
|
||||
enough surface for the `soft_kill()` / hard-reap pattern
|
||||
borrowed from `trio_proc()`.
|
||||
|
||||
Unlike `trio.Process`, we have no direct handles on the
|
||||
child's std-streams (fork-without-exec inherits the
|
||||
parent's FDs, but we don't marshal them into this
|
||||
wrapper) — `.stdin`/`.stdout`/`.stderr` are all `None`,
|
||||
which matches what `soft_kill()` handles via its
|
||||
`is not None` guards.
|
||||
|
||||
'''
|
||||
def __init__(self, pid: int):
|
||||
self.pid: int = pid
|
||||
self._returncode: int | None = None
|
||||
# `soft_kill`/`hard_kill` check these for pipe
|
||||
# teardown — all None since we didn't wire up pipes
|
||||
# on the fork-without-exec path.
|
||||
self.stdin = None
|
||||
self.stdout = None
|
||||
self.stderr = None
|
||||
|
||||
def poll(self) -> int | None:
|
||||
'''
|
||||
Non-blocking liveness probe. Returns `None` if the
|
||||
child is still running, else its exit code (negative
|
||||
for signal-death, matching `subprocess.Popen`
|
||||
convention).
|
||||
|
||||
'''
|
||||
if self._returncode is not None:
|
||||
return self._returncode
|
||||
try:
|
||||
waited_pid, status = os.waitpid(self.pid, os.WNOHANG)
|
||||
except ChildProcessError:
|
||||
# already reaped (or never existed) — treat as
|
||||
# clean exit for polling purposes.
|
||||
self._returncode = 0
|
||||
return 0
|
||||
if waited_pid == 0:
|
||||
return None
|
||||
self._returncode = self._parse_status(status)
|
||||
return self._returncode
|
||||
|
||||
@property
|
||||
def returncode(self) -> int | None:
|
||||
return self._returncode
|
||||
|
||||
async def wait(self) -> int:
|
||||
'''
|
||||
Async blocking wait for the child's exit, off-loaded
|
||||
to a trio cache thread so we don't block the event
|
||||
loop on `waitpid()`. Safe to call multiple times;
|
||||
subsequent calls return the cached rc without
|
||||
re-issuing the syscall.
|
||||
|
||||
'''
|
||||
if self._returncode is not None:
|
||||
return self._returncode
|
||||
_, status = await trio.to_thread.run_sync(
|
||||
os.waitpid,
|
||||
self.pid,
|
||||
0,
|
||||
abandon_on_cancel=False,
|
||||
)
|
||||
self._returncode = self._parse_status(status)
|
||||
return self._returncode
|
||||
|
||||
def kill(self) -> None:
|
||||
'''
|
||||
OS-level `SIGKILL` to the child. Swallows
|
||||
`ProcessLookupError` (already dead).
|
||||
|
||||
'''
|
||||
try:
|
||||
os.kill(self.pid, signal.SIGKILL)
|
||||
except ProcessLookupError:
|
||||
pass
|
||||
|
||||
def _parse_status(self, status: int) -> int:
|
||||
if os.WIFEXITED(status):
|
||||
return os.WEXITSTATUS(status)
|
||||
elif os.WIFSIGNALED(status):
|
||||
# negative rc by `subprocess.Popen` convention
|
||||
return -os.WTERMSIG(status)
|
||||
return 0
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return (
|
||||
f'<_ForkedProc pid={self.pid} '
|
||||
f'returncode={self._returncode}>'
|
||||
)
|
||||
|
||||
|
||||
async def subint_forkserver_proc(
|
||||
name: str,
|
||||
actor_nursery: ActorNursery,
|
||||
subactor: Actor,
|
||||
errors: dict[tuple[str, str], Exception],
|
||||
|
||||
# passed through to actor main
|
||||
bind_addrs: list[UnwrappedAddress],
|
||||
parent_addr: UnwrappedAddress,
|
||||
_runtime_vars: dict[str, Any],
|
||||
*,
|
||||
infect_asyncio: bool = False,
|
||||
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
|
||||
proc_kwargs: dict[str, any] = {},
|
||||
|
||||
) -> None:
|
||||
'''
|
||||
Spawn a subactor via `os.fork()` from a non-trio worker
|
||||
thread (see `fork_from_worker_thread()`), with the forked
|
||||
child running `tractor._child._actor_child_main()` and
|
||||
connecting back via tractor's normal IPC handshake.
|
||||
|
||||
Supervision model mirrors `trio_proc()` — we manage a
|
||||
real OS subprocess, so `Portal.cancel_actor()` +
|
||||
`soft_kill()` on graceful teardown and `os.kill(SIGKILL)`
|
||||
on hard-reap both apply directly (no
|
||||
`_interpreters.destroy()` voodoo needed since the child
|
||||
is in its own process).
|
||||
|
||||
The only real difference from `trio_proc` is the spawn
|
||||
mechanism: fork from a known-clean main-interp worker
|
||||
thread instead of `trio.lowlevel.open_process()`.
|
||||
|
||||
'''
|
||||
if not _has_subints:
|
||||
raise RuntimeError(
|
||||
f'The {"subint_forkserver"!r} spawn backend '
|
||||
f'requires Python 3.14+.\n'
|
||||
f'Current runtime: {sys.version}'
|
||||
)
|
||||
|
||||
uid: tuple[str, str] = subactor.aid.uid
|
||||
loglevel: str | None = subactor.loglevel
|
||||
|
||||
# Closure captured into the fork-child's memory image.
|
||||
# In the child this is the first post-fork Python code to
|
||||
# run, on what was the fork-worker thread in the parent.
|
||||
def _child_target() -> int:
|
||||
# Lazy import so the parent doesn't pay for it on
|
||||
# every spawn — it's module-level in `_child` but
|
||||
# cheap enough to re-resolve here.
|
||||
from tractor._child import _actor_child_main
|
||||
# XXX, fork inherits the parent's entire memory
|
||||
# image — including `tractor.runtime._state` globals
|
||||
# that encode "this process is the root actor":
|
||||
#
|
||||
# - `_runtime_vars['_is_root']` → True in parent
|
||||
# - pre-populated `_root_mailbox`, `_registry_addrs`
|
||||
# - the parent's `_current_actor` singleton
|
||||
#
|
||||
# A fresh `exec`-based child would start with the
|
||||
# `_state` module's defaults (all falsey / empty).
|
||||
# Replicate that here so the new child-side `Actor`
|
||||
# sees a "cold" runtime — otherwise `Actor.__init__`
|
||||
# takes the `is_root_process() == True` branch and
|
||||
# pre-populates `self.enable_modules`, which then
|
||||
# trips the `assert not self.enable_modules` gate at
|
||||
# the top of `Actor._from_parent()` on the subsequent
|
||||
# parent→child `SpawnSpec` handshake.
|
||||
from tractor.runtime import _state
|
||||
_state._current_actor = None
|
||||
_state._runtime_vars.update({
|
||||
'_is_root': False,
|
||||
'_root_mailbox': (None, None),
|
||||
'_root_addrs': [],
|
||||
'_registry_addrs': [],
|
||||
'_debug_mode': False,
|
||||
})
|
||||
_actor_child_main(
|
||||
uid=uid,
|
||||
loglevel=loglevel,
|
||||
parent_addr=parent_addr,
|
||||
infect_asyncio=infect_asyncio,
|
||||
# NOTE, from the child-side runtime's POV it's
|
||||
# a regular trio actor — it uses `_trio_main`,
|
||||
# receives `SpawnSpec` over IPC, etc. The
|
||||
# `subint_forkserver` name is a property of HOW
|
||||
# the parent spawned, not of what the child is.
|
||||
spawn_method='trio',
|
||||
)
|
||||
return 0
|
||||
|
||||
cancelled_during_spawn: bool = False
|
||||
proc: _ForkedProc | None = None
|
||||
ipc_server: _server.Server = actor_nursery._actor.ipc_server
|
||||
|
||||
try:
|
||||
try:
|
||||
pid: int = await trio.to_thread.run_sync(
|
||||
partial(
|
||||
fork_from_worker_thread,
|
||||
_child_target,
|
||||
thread_name=(
|
||||
f'subint-forkserver[{name}]'
|
||||
),
|
||||
),
|
||||
abandon_on_cancel=False,
|
||||
)
|
||||
proc = _ForkedProc(pid)
|
||||
log.runtime(
|
||||
f'Forked subactor via forkserver\n'
|
||||
f'(>\n'
|
||||
f' |_{proc}\n'
|
||||
)
|
||||
|
||||
event, chan = await ipc_server.wait_for_peer(uid)
|
||||
|
||||
except trio.Cancelled:
|
||||
cancelled_during_spawn = True
|
||||
raise
|
||||
|
||||
assert proc is not None
|
||||
|
||||
portal = Portal(chan)
|
||||
actor_nursery._children[uid] = (
|
||||
subactor,
|
||||
proc,
|
||||
portal,
|
||||
)
|
||||
|
||||
sspec = msgtypes.SpawnSpec(
|
||||
_parent_main_data=subactor._parent_main_data,
|
||||
enable_modules=subactor.enable_modules,
|
||||
reg_addrs=subactor.reg_addrs,
|
||||
bind_addrs=bind_addrs,
|
||||
_runtime_vars=_runtime_vars,
|
||||
)
|
||||
log.runtime(
|
||||
f'Sending spawn spec to forkserver child\n'
|
||||
f'{{}}=> {chan.aid.reprol()!r}\n'
|
||||
f'\n'
|
||||
f'{pretty_struct.pformat(sspec)}\n'
|
||||
)
|
||||
await chan.send(sspec)
|
||||
|
||||
curr_actor: Actor = current_actor()
|
||||
curr_actor._actoruid2nursery[uid] = actor_nursery
|
||||
|
||||
task_status.started(portal)
|
||||
|
||||
with trio.CancelScope(shield=True):
|
||||
await actor_nursery._join_procs.wait()
|
||||
|
||||
async with trio.open_nursery() as nursery:
|
||||
if portal in actor_nursery._cancel_after_result_on_exit:
|
||||
nursery.start_soon(
|
||||
cancel_on_completion,
|
||||
portal,
|
||||
subactor,
|
||||
errors,
|
||||
)
|
||||
|
||||
# reuse `trio_proc`'s soft-kill dance — `proc`
|
||||
# is our `_ForkedProc` shim which implements the
|
||||
# same `.poll()` / `.wait()` / `.kill()` surface
|
||||
# `soft_kill` expects.
|
||||
await soft_kill(
|
||||
proc,
|
||||
_ForkedProc.wait,
|
||||
portal,
|
||||
)
|
||||
nursery.cancel_scope.cancel()
|
||||
|
||||
finally:
|
||||
# Hard reap: SIGKILL + waitpid. Cheap since we have
|
||||
# the real OS pid, unlike `subint_proc` which has to
|
||||
# fuss with `_interpreters.destroy()` races.
|
||||
if proc is not None and proc.poll() is None:
|
||||
log.cancel(
|
||||
f'Hard killing forkserver subactor\n'
|
||||
f'>x)\n'
|
||||
f' |_{proc}\n'
|
||||
)
|
||||
with trio.CancelScope(shield=True):
|
||||
proc.kill()
|
||||
await proc.wait()
|
||||
|
||||
if not cancelled_during_spawn:
|
||||
actor_nursery._children.pop(uid, None)
|
||||
Loading…
Reference in New Issue