Compare commits
10 Commits
e3318a5483
...
66dda9e449
| Author | SHA1 | Date |
|---|---|---|
|
|
66dda9e449 | |
|
|
43bd6a6410 | |
|
|
5bd5f957d3 | |
|
|
1d4867e51c | |
|
|
372a0f3247 | |
|
|
37cfaa202b | |
|
|
797f57ce7b | |
|
|
cf0e3e6f8b | |
|
|
99d70337b7 | |
|
|
a617b52140 |
|
|
@ -0,0 +1,337 @@
|
||||||
|
# `os.fork()` from a non-main sub-interpreter aborts the child (CPython refuses post-fork cleanup)
|
||||||
|
|
||||||
|
Third `subint`-class analysis in this project. Unlike its
|
||||||
|
two siblings (`subint_sigint_starvation_issue.md`,
|
||||||
|
`subint_cancel_delivery_hang_issue.md`), this one is not a
|
||||||
|
hang — it's a **hard CPython-level refusal** of an
|
||||||
|
experimental spawn strategy we wanted to try.
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
An in-process sub-interpreter cannot be used as a
|
||||||
|
"launchpad" for `os.fork()` on current CPython. The fork
|
||||||
|
syscall succeeds in the parent, but the forked CHILD
|
||||||
|
process is aborted immediately by CPython's post-fork
|
||||||
|
cleanup with:
|
||||||
|
|
||||||
|
```
|
||||||
|
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||||
|
```
|
||||||
|
|
||||||
|
This is enforced by a hard `PyStatus_ERR` gate in
|
||||||
|
`Python/pystate.c`. The CPython devs acknowledge the
|
||||||
|
fragility with an in-source comment (`// Ideally we could
|
||||||
|
guarantee tstate is running main.`) but provide no
|
||||||
|
mechanism to satisfy the precondition from user code.
|
||||||
|
|
||||||
|
**Implication for tractor**: the `subint_fork` backend
|
||||||
|
sketched in `tractor.spawn._subint_fork` is structurally
|
||||||
|
dead on current CPython. The submodule is kept as
|
||||||
|
documentation of the attempt; `--spawn-backend=subint_fork`
|
||||||
|
raises `NotImplementedError` pointing here.
|
||||||
|
|
||||||
|
## Context — why we tried this
|
||||||
|
|
||||||
|
The motivation is issue #379's "Our own thoughts, ideas
|
||||||
|
for `fork()`-workaround/hacks..." section. The existing
|
||||||
|
trio-backend (`tractor.spawn._trio.trio_proc`) spawns
|
||||||
|
subactors via `trio.lowlevel.open_process()` → ultimately
|
||||||
|
`posix_spawn()` or `fork+exec`, from the parent's main
|
||||||
|
interpreter that is currently running `trio.run()`. This
|
||||||
|
brushes against a known-fragile interaction between
|
||||||
|
`trio` and `fork()` tracked in
|
||||||
|
[python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
|
||||||
|
and siblings — mostly mitigated in `tractor`'s case only
|
||||||
|
incidentally (we `exec()` immediately post-fork).
|
||||||
|
|
||||||
|
The idea was:
|
||||||
|
|
||||||
|
1. Create a subint that has *never* imported `trio`.
|
||||||
|
2. From a worker thread in that subint, call `os.fork()`.
|
||||||
|
3. In the child, `execv()` back into
|
||||||
|
`python -m tractor._child` — same as `trio_proc` does.
|
||||||
|
4. The fork is from a trio-free context → trio+fork
|
||||||
|
hazards avoided regardless of downstream behavior.
|
||||||
|
|
||||||
|
The parent-side orchestration (`ipc_server.wait_for_peer`,
|
||||||
|
`SpawnSpec`, `Portal` yield) would reuse
|
||||||
|
`trio_proc`'s flow verbatim, with only the subproc-spawn
|
||||||
|
mechanics swapped.
|
||||||
|
|
||||||
|
## Symptom
|
||||||
|
|
||||||
|
Running the prototype (`tractor.spawn._subint_fork.subint_fork_proc`,
|
||||||
|
see git history prior to the stub revert) on py3.14:
|
||||||
|
|
||||||
|
```
|
||||||
|
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||||
|
Python runtime state: initialized
|
||||||
|
|
||||||
|
Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first):
|
||||||
|
File "<script>", line 2 in <module>
|
||||||
|
<script>:2: DeprecationWarning: This process (pid=802985) is multi-threaded, use of fork() may lead to deadlocks in the child.
|
||||||
|
```
|
||||||
|
|
||||||
|
Key clues:
|
||||||
|
|
||||||
|
- The **`DeprecationWarning`** fires in the parent (before
|
||||||
|
fork completes) — fork *is* executing, we get that far.
|
||||||
|
- The **`Fatal Python error`** comes from the child — it
|
||||||
|
aborts during CPython's post-fork C initialization
|
||||||
|
before any user Python runs in the child.
|
||||||
|
- The thread name `subint-fork-lau[nchpad]` is ours —
|
||||||
|
confirms the fork is being called from the launchpad
|
||||||
|
subint's driver thread.
|
||||||
|
|
||||||
|
## CPython source walkthrough
|
||||||
|
|
||||||
|
### Call site — `Modules/posixmodule.c:728-793`
|
||||||
|
|
||||||
|
The post-fork-child hook CPython runs in the child process:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void
|
||||||
|
PyOS_AfterFork_Child(void)
|
||||||
|
{
|
||||||
|
PyStatus status;
|
||||||
|
_PyRuntimeState *runtime = &_PyRuntime;
|
||||||
|
|
||||||
|
// re-creates runtime->interpreters.mutex (HEAD_UNLOCK)
|
||||||
|
status = _PyRuntimeState_ReInitThreads(runtime);
|
||||||
|
...
|
||||||
|
|
||||||
|
PyThreadState *tstate = _PyThreadState_GET();
|
||||||
|
_Py_EnsureTstateNotNULL(tstate);
|
||||||
|
|
||||||
|
...
|
||||||
|
|
||||||
|
// Ideally we could guarantee tstate is running main. ← !!!
|
||||||
|
_PyInterpreterState_ReinitRunningMain(tstate);
|
||||||
|
|
||||||
|
status = _PyEval_ReInitThreads(tstate);
|
||||||
|
...
|
||||||
|
|
||||||
|
status = _PyInterpreterState_DeleteExceptMain(runtime);
|
||||||
|
if (_PyStatus_EXCEPTION(status)) {
|
||||||
|
goto fatal_error;
|
||||||
|
}
|
||||||
|
...
|
||||||
|
|
||||||
|
fatal_error:
|
||||||
|
Py_ExitStatusException(status);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `// Ideally we could guarantee tstate is running
|
||||||
|
main.` comment is a flashing warning sign — the CPython
|
||||||
|
devs *know* this path is fragile when fork is called from
|
||||||
|
a non-main subint, but they've chosen to abort rather than
|
||||||
|
silently corrupt state. Arguably the right call.
|
||||||
|
|
||||||
|
### The refusal — `Python/pystate.c:1035-1075`
|
||||||
|
|
||||||
|
```c
|
||||||
|
/*
|
||||||
|
* Delete all interpreter states except the main interpreter. If there
|
||||||
|
* is a current interpreter state, it *must* be the main interpreter.
|
||||||
|
*/
|
||||||
|
PyStatus
|
||||||
|
_PyInterpreterState_DeleteExceptMain(_PyRuntimeState *runtime)
|
||||||
|
{
|
||||||
|
struct pyinterpreters *interpreters = &runtime->interpreters;
|
||||||
|
|
||||||
|
PyThreadState *tstate = _PyThreadState_Swap(runtime, NULL);
|
||||||
|
if (tstate != NULL && tstate->interp != interpreters->main) {
|
||||||
|
return _PyStatus_ERR("not main interpreter"); ← our error
|
||||||
|
}
|
||||||
|
|
||||||
|
HEAD_LOCK(runtime);
|
||||||
|
PyInterpreterState *interp = interpreters->head;
|
||||||
|
interpreters->head = NULL;
|
||||||
|
while (interp != NULL) {
|
||||||
|
if (interp == interpreters->main) {
|
||||||
|
interpreters->main->next = NULL;
|
||||||
|
interpreters->head = interp;
|
||||||
|
interp = interp->next;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// XXX Won't this fail since PyInterpreterState_Clear() requires
|
||||||
|
// the "current" tstate to be set?
|
||||||
|
PyInterpreterState_Clear(interp); // XXX must activate?
|
||||||
|
zapthreads(interp);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The comment in the docstring (`If there is a current
|
||||||
|
interpreter state, it *must* be the main interpreter.`) is
|
||||||
|
the formal API contract. The `XXX` comments further in
|
||||||
|
suggest the CPython team is already aware this function
|
||||||
|
has latent issues even in the happy path.
|
||||||
|
|
||||||
|
## Chain summary
|
||||||
|
|
||||||
|
1. Our launchpad subint's driver OS-thread calls
|
||||||
|
`os.fork()`.
|
||||||
|
2. `fork()` succeeds. Child wakes up with:
|
||||||
|
- The parent's full memory image (including all
|
||||||
|
subints).
|
||||||
|
- Only the *calling* thread alive (the driver thread).
|
||||||
|
- `_PyThreadState_GET()` on that thread returns the
|
||||||
|
**launchpad subint's tstate**, *not* main's.
|
||||||
|
3. CPython runs `PyOS_AfterFork_Child()`.
|
||||||
|
4. It reaches `_PyInterpreterState_DeleteExceptMain()`.
|
||||||
|
5. Gate check fails: `tstate->interp != interpreters->main`.
|
||||||
|
6. `PyStatus_ERR("not main interpreter")` → `fatal_error`
|
||||||
|
goto → `Py_ExitStatusException()` → child aborts.
|
||||||
|
|
||||||
|
Parent-side consequence: `os.fork()` in the subint
|
||||||
|
bootstrap returned successfully with the child's PID, but
|
||||||
|
the child died before connecting back. Our parent's
|
||||||
|
`ipc_server.wait_for_peer(uid)` would hang forever — the
|
||||||
|
child never gets to `_actor_child_main`.
|
||||||
|
|
||||||
|
## Definitive answer to "Open Question 1"
|
||||||
|
|
||||||
|
From the (now-stub) `subint_fork_proc` docstring:
|
||||||
|
|
||||||
|
> Does CPython allow `os.fork()` from a non-main
|
||||||
|
> sub-interpreter under the legacy config?
|
||||||
|
|
||||||
|
**No.** Not in a usable-by-user-code sense. The fork
|
||||||
|
syscall is not blocked, but the child cannot survive
|
||||||
|
CPython's post-fork initialization. This is enforced, not
|
||||||
|
accidental, and the CPython devs have acknowledged the
|
||||||
|
fragility in-source.
|
||||||
|
|
||||||
|
## What we'd need from CPython to unblock
|
||||||
|
|
||||||
|
Any one of these, from least-to-most invasive:
|
||||||
|
|
||||||
|
1. **A pre-fork hook mechanism** that lets user code (or
|
||||||
|
tractor itself via `os.register_at_fork(before=...)`)
|
||||||
|
swap the current tstate to main before fork runs. The
|
||||||
|
swap would need to work across the subint→main
|
||||||
|
boundary, which is the actual hard part —
|
||||||
|
`_PyThreadState_Swap()` exists but is internal.
|
||||||
|
|
||||||
|
2. **A `_PyInterpreterState_DeleteExceptFor(tstate->interp)`
|
||||||
|
variant** that cleans up all *other* subints while
|
||||||
|
preserving the calling subint's state. Lets the child
|
||||||
|
continue executing in the subint after fork; a
|
||||||
|
subsequent `execv()` clears everything at the OS
|
||||||
|
level anyway.
|
||||||
|
|
||||||
|
3. **A cleaner error** than `Fatal Python error` aborting
|
||||||
|
the child. Even without fixing the underlying
|
||||||
|
capability, a raised Python-level exception in the
|
||||||
|
parent's `fork()` call (rather than a silent child
|
||||||
|
abort) would at least make the failure mode
|
||||||
|
debuggable.
|
||||||
|
|
||||||
|
## Upstream-report draft (for CPython issue tracker)
|
||||||
|
|
||||||
|
### Title
|
||||||
|
|
||||||
|
> `os.fork()` from a non-main sub-interpreter aborts the
|
||||||
|
> child with a fatal error in `PyOS_AfterFork_Child`; can
|
||||||
|
> we at least make it a clean `RuntimeError` in the
|
||||||
|
> parent?
|
||||||
|
|
||||||
|
### Body
|
||||||
|
|
||||||
|
> **Version**: Python 3.14.x
|
||||||
|
>
|
||||||
|
> **Summary**: Calling `os.fork()` from a thread currently
|
||||||
|
> executing inside a sub-interpreter causes the forked
|
||||||
|
> child process to abort during CPython's post-fork
|
||||||
|
> cleanup, with the following output in the child:
|
||||||
|
>
|
||||||
|
> ```
|
||||||
|
> Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||||
|
> ```
|
||||||
|
>
|
||||||
|
> From the **parent's** point of view the fork succeeded
|
||||||
|
> (returned a valid child PID). The failure is completely
|
||||||
|
> opaque to parent-side Python code — unless the parent
|
||||||
|
> does `os.waitpid()` it won't even notice the child
|
||||||
|
> died.
|
||||||
|
>
|
||||||
|
> **Root cause** (as I understand it from reading sources):
|
||||||
|
> `Modules/posixmodule.c::PyOS_AfterFork_Child()` calls
|
||||||
|
> `_PyInterpreterState_DeleteExceptMain()` with a
|
||||||
|
> precondition that `_PyThreadState_GET()->interp` be the
|
||||||
|
> main interpreter. When `fork()` is called from a thread
|
||||||
|
> executing inside a subinterpreter, the child wakes up
|
||||||
|
> with its tstate still pointing at the subint, and the
|
||||||
|
> gate in `Python/pystate.c:1044-1047` fails.
|
||||||
|
>
|
||||||
|
> A comment in the source
|
||||||
|
> (`Modules/posixmodule.c:753` — `// Ideally we could
|
||||||
|
> guarantee tstate is running main.`) suggests this is a
|
||||||
|
> known-fragile path rather than an intentional
|
||||||
|
> invariant.
|
||||||
|
>
|
||||||
|
> **Use case**: I was experimenting with using a
|
||||||
|
> sub-interpreter as a "fork launchpad" — have a subint
|
||||||
|
> that has never imported `trio`, call `os.fork()` from
|
||||||
|
> that subint's thread, and in the child `execv()` back
|
||||||
|
> into a fresh Python interpreter process. The goal was
|
||||||
|
> to sidestep known issues with `trio` + `fork()`
|
||||||
|
> interaction (see
|
||||||
|
> [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614))
|
||||||
|
> by guaranteeing the forking context had never been
|
||||||
|
> "contaminated" by trio's imports or globals. This
|
||||||
|
> approach would allow `trio`-using applications to
|
||||||
|
> combine `fork`-based subprocess spawning with
|
||||||
|
> per-worker `trio.run()` runtimes — a fairly common
|
||||||
|
> pattern that currently requires workarounds.
|
||||||
|
>
|
||||||
|
> **Request**:
|
||||||
|
>
|
||||||
|
> Ideally: make fork-from-subint work (e.g., by swapping
|
||||||
|
> the caller's tstate to main in the pre-fork hook), or
|
||||||
|
> provide a `_PyInterpreterState_DeleteExceptFor(interp)`
|
||||||
|
> variant that permits the caller's subint to survive
|
||||||
|
> post-fork so user code can subsequently `execv()`.
|
||||||
|
>
|
||||||
|
> Minimally: convert the fatal child-side abort into a
|
||||||
|
> clean `RuntimeError` (or similar) raised in the
|
||||||
|
> parent's `fork()` call. Even if the capability isn't
|
||||||
|
> expanded, the failure mode should be debuggable by
|
||||||
|
> user-code in the parent — right now it's a silent
|
||||||
|
> child death with an error message buried in the
|
||||||
|
> child's stderr that parent code can't programmatically
|
||||||
|
> see.
|
||||||
|
>
|
||||||
|
> **Related**: PEP 684 (per-interpreter GIL), PEP 734
|
||||||
|
> (`concurrent.interpreters` public API). The private
|
||||||
|
> `_interpreters` module is what I used to create the
|
||||||
|
> launchpad — behavior is the same whether using
|
||||||
|
> `_interpreters.create('legacy')` or
|
||||||
|
> `concurrent.interpreters.create()` (the latter was not
|
||||||
|
> tested but the gate is identical).
|
||||||
|
>
|
||||||
|
> Happy to contribute a minimal reproducer + test case if
|
||||||
|
> this is something the team wants to pursue.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `Modules/posixmodule.c:728` —
|
||||||
|
[`PyOS_AfterFork_Child`](https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L728)
|
||||||
|
- `Python/pystate.c:1040` —
|
||||||
|
[`_PyInterpreterState_DeleteExceptMain`](https://github.com/python/cpython/blob/main/Python/pystate.c#L1040)
|
||||||
|
- PEP 684 (per-interpreter GIL):
|
||||||
|
<https://peps.python.org/pep-0684/>
|
||||||
|
- PEP 734 (`concurrent.interpreters` public API):
|
||||||
|
<https://peps.python.org/pep-0734/>
|
||||||
|
- [python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614)
|
||||||
|
— the original motivation for the launchpad idea.
|
||||||
|
- tractor issue #379 — "Our own thoughts, ideas for
|
||||||
|
`fork()`-workaround/hacks..." section where this was
|
||||||
|
first sketched.
|
||||||
|
- `tractor.spawn._subint_fork` — in-tree stub preserving
|
||||||
|
the attempted impl's shape in git history.
|
||||||
|
|
@ -0,0 +1,373 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
'''
|
||||||
|
Standalone CPython-level feasibility check for the "main-interp
|
||||||
|
worker-thread forkserver + subint-hosted trio" architecture
|
||||||
|
proposed as a workaround to the CPython-level refusal
|
||||||
|
documented in
|
||||||
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||||
|
|
||||||
|
Purpose
|
||||||
|
-------
|
||||||
|
Deliberately NOT a `tractor` test. Zero `tractor` imports.
|
||||||
|
Uses `_interpreters` (private stdlib) + `os.fork()` directly so
|
||||||
|
the signal is unambiguous — pass/fail here is a property of
|
||||||
|
CPython alone, independent of our runtime.
|
||||||
|
|
||||||
|
Run each scenario in isolation; the child's fate is observable
|
||||||
|
only via `os.waitpid()` of the parent and the scenario's own
|
||||||
|
status prints.
|
||||||
|
|
||||||
|
Scenarios (pick one with `--scenario <name>`)
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
|
- `control_subint_thread_fork` — the KNOWN-BROKEN case we
|
||||||
|
documented in `subint_fork_blocked_by_cpython_post_fork_issue.md`:
|
||||||
|
drive a subint from a thread, call `os.fork()` inside its
|
||||||
|
`_interpreters.exec()`, watch the child abort. **Included as
|
||||||
|
a control** — if this scenario DOESN'T abort the child, our
|
||||||
|
analysis is wrong and we should re-check everything.
|
||||||
|
|
||||||
|
- `main_thread_fork` — baseline sanity. Call `os.fork()` from
|
||||||
|
the process's main thread. Must always succeed; if this
|
||||||
|
fails something much bigger is broken.
|
||||||
|
|
||||||
|
- `worker_thread_fork` — the architectural assertion. Spawn a
|
||||||
|
regular `threading.Thread` (attached to main interp, NOT a
|
||||||
|
subint), have IT call `os.fork()`. Child should survive
|
||||||
|
post-fork cleanup.
|
||||||
|
|
||||||
|
- `full_architecture` — end-to-end: main-interp worker thread
|
||||||
|
forks. In the child, fork-thread (still main-interp) creates
|
||||||
|
a subint, drives a second worker thread inside it that runs
|
||||||
|
a trivial `trio.run()`. Validates the "root runtime lives in
|
||||||
|
a subint in the child" piece of the proposed arch.
|
||||||
|
|
||||||
|
All scenarios print a self-contained pass/fail banner. Exit
|
||||||
|
code 0 on expected outcome (which for `control_*` means "child
|
||||||
|
aborted", not "child succeeded"!).
|
||||||
|
|
||||||
|
Requires Python 3.14+.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
-----
|
||||||
|
::
|
||||||
|
|
||||||
|
python subint_fork_from_main_thread_smoketest.py \\
|
||||||
|
--scenario main_thread_fork
|
||||||
|
|
||||||
|
python subint_fork_from_main_thread_smoketest.py \\
|
||||||
|
--scenario full_architecture
|
||||||
|
|
||||||
|
'''
|
||||||
|
from __future__ import annotations
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
|
||||||
|
|
||||||
|
# Hard-require py3.14 for the public `concurrent.interpreters`
|
||||||
|
# API (we still drop to `_interpreters` internally, same as
|
||||||
|
# `tractor.spawn._subint`).
|
||||||
|
try:
|
||||||
|
from concurrent import interpreters as _public_interpreters # noqa: F401
|
||||||
|
import _interpreters # type: ignore
|
||||||
|
except ImportError:
|
||||||
|
print(
|
||||||
|
'FAIL (setup): requires Python 3.14+ '
|
||||||
|
'(missing `concurrent.interpreters`)',
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
sys.exit(2)
|
||||||
|
|
||||||
|
|
||||||
|
# The actual primitives this script exercises live in
|
||||||
|
# `tractor.spawn._subint_forkserver` — we re-import them here
|
||||||
|
# rather than inlining so the module and the validation stay
|
||||||
|
# in sync. (Early versions of this file had them inline for
|
||||||
|
# the "zero tractor imports" isolation guarantee; now that
|
||||||
|
# CPython-level feasibility is confirmed, the validated
|
||||||
|
# primitives have moved into tractor proper.)
|
||||||
|
from tractor.spawn._subint_forkserver import (
|
||||||
|
fork_from_worker_thread,
|
||||||
|
run_subint_in_worker_thread,
|
||||||
|
wait_child,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# small observability helpers (test-harness only)
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _banner(title: str) -> None:
|
||||||
|
line = '=' * 60
|
||||||
|
print(f'\n{line}\n{title}\n{line}', flush=True)
|
||||||
|
|
||||||
|
|
||||||
|
def _report(
|
||||||
|
label: str,
|
||||||
|
*,
|
||||||
|
ok: bool,
|
||||||
|
status_str: str,
|
||||||
|
expect_exit_ok: bool,
|
||||||
|
) -> None:
|
||||||
|
verdict: str = 'PASS' if ok else 'FAIL'
|
||||||
|
expected_str: str = (
|
||||||
|
'normal exit (rc=0)'
|
||||||
|
if expect_exit_ok
|
||||||
|
else 'abnormal death (signal or nonzero exit)'
|
||||||
|
)
|
||||||
|
print(
|
||||||
|
f'[{verdict}] {label}: '
|
||||||
|
f'expected {expected_str}; observed {status_str}',
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# scenario: `control_subint_thread_fork` (known-broken)
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def scenario_control_subint_thread_fork() -> int:
|
||||||
|
_banner(
|
||||||
|
'[control] fork from INSIDE a subint (expected: child aborts)'
|
||||||
|
)
|
||||||
|
interp_id = _interpreters.create('legacy')
|
||||||
|
print(f' created subint {interp_id}', flush=True)
|
||||||
|
|
||||||
|
# Shared flag: child writes a sentinel file we can detect from
|
||||||
|
# the parent. If the child manages to write this, CPython's
|
||||||
|
# post-fork refusal is NOT happening → analysis is wrong.
|
||||||
|
sentinel = '/tmp/subint_fork_smoketest_control_child_ran'
|
||||||
|
try:
|
||||||
|
os.unlink(sentinel)
|
||||||
|
except FileNotFoundError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
bootstrap = (
|
||||||
|
'import os\n'
|
||||||
|
'pid = os.fork()\n'
|
||||||
|
'if pid == 0:\n'
|
||||||
|
# child — if CPython's refusal fires this code never runs
|
||||||
|
f' with open({sentinel!r}, "w") as f:\n'
|
||||||
|
' f.write("ran")\n'
|
||||||
|
' os._exit(0)\n'
|
||||||
|
'else:\n'
|
||||||
|
# parent side (inside the launchpad subint) — stash the
|
||||||
|
# forked PID on a shareable dict so we can waitpid()
|
||||||
|
# from the outer main interp. We can't just return it;
|
||||||
|
# _interpreters.exec() returns nothing useful.
|
||||||
|
' import builtins\n'
|
||||||
|
' builtins._forked_child_pid = pid\n'
|
||||||
|
)
|
||||||
|
|
||||||
|
# NOTE, we can't easily pull state back from the subint.
|
||||||
|
# For the CONTROL scenario we just time-bound the fork +
|
||||||
|
# check the sentinel. If sentinel exists → child ran →
|
||||||
|
# analysis wrong. If not → child aborted → analysis
|
||||||
|
# confirmed.
|
||||||
|
done = threading.Event()
|
||||||
|
|
||||||
|
def _drive() -> None:
|
||||||
|
try:
|
||||||
|
_interpreters.exec(interp_id, bootstrap)
|
||||||
|
except Exception as err:
|
||||||
|
print(
|
||||||
|
f' subint bootstrap raised (expected on some '
|
||||||
|
f'CPython versions): {type(err).__name__}: {err}',
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
finally:
|
||||||
|
done.set()
|
||||||
|
|
||||||
|
t = threading.Thread(
|
||||||
|
target=_drive,
|
||||||
|
name='control-subint-fork-launchpad',
|
||||||
|
daemon=True,
|
||||||
|
)
|
||||||
|
t.start()
|
||||||
|
done.wait(timeout=5.0)
|
||||||
|
t.join(timeout=2.0)
|
||||||
|
|
||||||
|
# Give the (possibly-aborted) child a moment to die.
|
||||||
|
time.sleep(0.5)
|
||||||
|
|
||||||
|
sentinel_present = os.path.exists(sentinel)
|
||||||
|
verdict = (
|
||||||
|
# "PASS" for our analysis means sentinel NOT present.
|
||||||
|
'PASS' if not sentinel_present else 'FAIL (UNEXPECTED)'
|
||||||
|
)
|
||||||
|
print(
|
||||||
|
f'[{verdict}] control: sentinel present={sentinel_present} '
|
||||||
|
f'(analysis predicts False — child should abort before '
|
||||||
|
f'writing)',
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
if sentinel_present:
|
||||||
|
os.unlink(sentinel)
|
||||||
|
|
||||||
|
try:
|
||||||
|
_interpreters.destroy(interp_id)
|
||||||
|
except _interpreters.InterpreterError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return 0 if not sentinel_present else 1
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# scenario: `main_thread_fork` (baseline sanity)
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def scenario_main_thread_fork() -> int:
|
||||||
|
_banner(
|
||||||
|
'[baseline] fork from MAIN thread (expected: child exits normally)'
|
||||||
|
)
|
||||||
|
|
||||||
|
pid = os.fork()
|
||||||
|
if pid == 0:
|
||||||
|
os._exit(0)
|
||||||
|
|
||||||
|
return 0 if _wait_child(
|
||||||
|
pid,
|
||||||
|
label='main_thread_fork',
|
||||||
|
expect_exit_ok=True,
|
||||||
|
) else 1
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# scenario: `worker_thread_fork` (architectural assertion)
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _run_worker_thread_fork_scenario(
|
||||||
|
label: str,
|
||||||
|
*,
|
||||||
|
child_target=None,
|
||||||
|
) -> int:
|
||||||
|
'''
|
||||||
|
Thin wrapper: delegate the actual fork to the
|
||||||
|
`tractor.spawn._subint_forkserver` primitive, then wait
|
||||||
|
on the child and render a pass/fail banner.
|
||||||
|
|
||||||
|
'''
|
||||||
|
try:
|
||||||
|
pid: int = fork_from_worker_thread(
|
||||||
|
child_target=child_target,
|
||||||
|
thread_name=f'worker-fork-thread[{label}]',
|
||||||
|
)
|
||||||
|
except RuntimeError as err:
|
||||||
|
print(f'[FAIL] {label}: {err}', flush=True)
|
||||||
|
return 1
|
||||||
|
print(f' forked child pid={pid}', flush=True)
|
||||||
|
ok, status_str = wait_child(pid, expect_exit_ok=True)
|
||||||
|
_report(
|
||||||
|
label,
|
||||||
|
ok=ok,
|
||||||
|
status_str=status_str,
|
||||||
|
expect_exit_ok=True,
|
||||||
|
)
|
||||||
|
return 0 if ok else 1
|
||||||
|
|
||||||
|
|
||||||
|
def scenario_worker_thread_fork() -> int:
|
||||||
|
_banner(
|
||||||
|
'[arch] fork from MAIN-INTERP WORKER thread '
|
||||||
|
'(expected: child exits normally — this is the one '
|
||||||
|
'that matters)'
|
||||||
|
)
|
||||||
|
return _run_worker_thread_fork_scenario(
|
||||||
|
'worker_thread_fork',
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# scenario: `full_architecture`
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
_CHILD_TRIO_BOOTSTRAP: str = (
|
||||||
|
'import trio\n'
|
||||||
|
'async def _main():\n'
|
||||||
|
' await trio.sleep(0.05)\n'
|
||||||
|
' return 42\n'
|
||||||
|
'result = trio.run(_main)\n'
|
||||||
|
'assert result == 42, f"trio.run returned {result}"\n'
|
||||||
|
'print(" CHILD subint: trio.run OK, result=42", '
|
||||||
|
'flush=True)\n'
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _child_trio_in_subint() -> int:
|
||||||
|
'''
|
||||||
|
CHILD-side `child_target`: drive a trivial `trio.run()`
|
||||||
|
inside a fresh legacy-config subint on a worker thread,
|
||||||
|
using the `tractor.spawn._subint_forkserver.run_subint_in_worker_thread`
|
||||||
|
primitive. Returns 0 on success.
|
||||||
|
|
||||||
|
'''
|
||||||
|
try:
|
||||||
|
run_subint_in_worker_thread(
|
||||||
|
_CHILD_TRIO_BOOTSTRAP,
|
||||||
|
thread_name='child-subint-trio-thread',
|
||||||
|
)
|
||||||
|
except RuntimeError as err:
|
||||||
|
print(
|
||||||
|
f' CHILD: run_subint_in_worker_thread timed out / thread '
|
||||||
|
f'never returned: {err}',
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
return 3
|
||||||
|
except BaseException as err:
|
||||||
|
print(
|
||||||
|
f' CHILD: subint bootstrap raised: '
|
||||||
|
f'{type(err).__name__}: {err}',
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
return 4
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def scenario_full_architecture() -> int:
|
||||||
|
_banner(
|
||||||
|
'[arch-full] worker-thread fork + child runs trio in a '
|
||||||
|
'subint (end-to-end proposed arch)'
|
||||||
|
)
|
||||||
|
return _run_worker_thread_fork_scenario(
|
||||||
|
'full_architecture',
|
||||||
|
child_target=_child_trio_in_subint,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# main
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
SCENARIOS: dict[str, Callable[[], int]] = {
|
||||||
|
'control_subint_thread_fork': scenario_control_subint_thread_fork,
|
||||||
|
'main_thread_fork': scenario_main_thread_fork,
|
||||||
|
'worker_thread_fork': scenario_worker_thread_fork,
|
||||||
|
'full_architecture': scenario_full_architecture,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
ap = argparse.ArgumentParser(
|
||||||
|
description=__doc__,
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
)
|
||||||
|
ap.add_argument(
|
||||||
|
'--scenario',
|
||||||
|
choices=sorted(SCENARIOS.keys()),
|
||||||
|
required=True,
|
||||||
|
)
|
||||||
|
args = ap.parse_args()
|
||||||
|
return SCENARIOS[args.scenario]()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
|
|
@ -0,0 +1,184 @@
|
||||||
|
# Revisit `subint_forkserver` thread-cache constraints once msgspec PEP 684 support lands
|
||||||
|
|
||||||
|
Follow-up tracker for cleanup work gated on the msgspec
|
||||||
|
PEP 684 adoption upstream ([jcrist/msgspec#563](https://github.com/jcrist/msgspec/issues/563)).
|
||||||
|
|
||||||
|
Context — why this exists
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
The `tractor.spawn._subint_forkserver` submodule currently
|
||||||
|
carries two "non-trio" thread-hygiene constraints whose
|
||||||
|
necessity is tangled with issues that *should* dissolve
|
||||||
|
under PEP 684 isolated-mode subinterpreters:
|
||||||
|
|
||||||
|
1. `fork_from_worker_thread()` / `run_subint_in_worker_thread()`
|
||||||
|
internally allocate a **dedicated `threading.Thread`**
|
||||||
|
rather than using `trio.to_thread.run_sync()`.
|
||||||
|
2. The test helper is named
|
||||||
|
`run_fork_in_non_trio_thread()` — the
|
||||||
|
`non_trio` qualifier is load-bearing today.
|
||||||
|
|
||||||
|
This doc catalogs *why* those constraints exist, which of
|
||||||
|
them isolated-mode would fix, and what the
|
||||||
|
audit-and-cleanup path looks like once msgspec #563 is
|
||||||
|
resolved.
|
||||||
|
|
||||||
|
The three reasons the constraints exist
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
### 1. GIL-starvation class → fixed by PEP 684 isolated mode
|
||||||
|
|
||||||
|
The class-A hang documented in
|
||||||
|
`subint_sigint_starvation_issue.md` is entirely about
|
||||||
|
legacy-config subints **sharing the main GIL**. Once
|
||||||
|
msgspec #563 lands and tractor flips
|
||||||
|
`tractor.spawn._subint` to
|
||||||
|
`concurrent.interpreters.create()` (isolated config), each
|
||||||
|
subint gets its own GIL. Abandoned subint threads can't
|
||||||
|
contend for main's GIL → can't starve the main trio loop
|
||||||
|
→ signal-wakeup-pipe drains normally → no SIGINT-drop.
|
||||||
|
|
||||||
|
This class of hazard **dissolves entirely**. The
|
||||||
|
non-trio-thread requirement for *this reason* disappears.
|
||||||
|
|
||||||
|
### 2. Destroy race / tstate-recycling → orthogonal; unclear
|
||||||
|
|
||||||
|
The `subint_proc` dedicated-thread fix (commit `26fb8206`)
|
||||||
|
addressed a different issue: `_interpreters.destroy(interp_id)`
|
||||||
|
was blocking on a trio-cache worker that had run an
|
||||||
|
earlier `interp.exec()` for that subint. Working
|
||||||
|
hypothesis at the time was "the cached thread retains the
|
||||||
|
subint's tstate".
|
||||||
|
|
||||||
|
But tstate-handling is **not specific to GIL mode** —
|
||||||
|
`_PyXI_Enter` / `_PyXI_Exit` (the C-level machinery both
|
||||||
|
configs use to enter/leave a subint from a thread) should
|
||||||
|
restore the caller's tstate regardless of GIL config. So
|
||||||
|
isolated mode **doesn't obviously fix this**. It might be:
|
||||||
|
|
||||||
|
- A py3.13 bug fixed in later versions — we saw the race
|
||||||
|
first on 3.13 and never re-tested on 3.14 after moving
|
||||||
|
to dedicated threads.
|
||||||
|
- A genuine CPython quirk around cached threads that
|
||||||
|
exec'd into a subint, persisting across GIL modes.
|
||||||
|
- Something else we misdiagnosed — the empirical fix
|
||||||
|
(dedicated thread) worked but the analysis may have
|
||||||
|
been incomplete.
|
||||||
|
|
||||||
|
Only way to know: once we're on isolated mode, empirically
|
||||||
|
retry `trio.to_thread.run_sync(interp.exec, ...)` and see
|
||||||
|
if `destroy()` still blocks. If it does, keep the
|
||||||
|
dedicated thread; if not, one constraint relaxed.
|
||||||
|
|
||||||
|
### 3. Fork-from-main-interp-tstate (the constraint in this module's helper names)
|
||||||
|
|
||||||
|
The fork-from-main-interp-tstate invariant — CPython's
|
||||||
|
`PyOS_AfterFork_Child` →
|
||||||
|
`_PyInterpreterState_DeleteExceptMain` gate documented in
|
||||||
|
`subint_fork_blocked_by_cpython_post_fork_issue.md` — is
|
||||||
|
about the calling thread's **current** tstate at the
|
||||||
|
moment `os.fork()` runs. If trio's cache threads never
|
||||||
|
enter subints at all, their tstate is plain main-interp,
|
||||||
|
and fork from them would be fine.
|
||||||
|
|
||||||
|
The reason the smoke test +
|
||||||
|
`run_fork_in_non_trio_thread` test helper
|
||||||
|
currently use a dedicated `threading.Thread` is narrow:
|
||||||
|
**we don't want to risk a trio cache thread that has
|
||||||
|
previously been used as a subint driver being the one that
|
||||||
|
picks up the fork job**. If cached tstate doesn't get
|
||||||
|
cleared (back to reason #2), the fork's child-side
|
||||||
|
post-init would see the wrong interp and abort.
|
||||||
|
|
||||||
|
In an isolated-mode world where msgspec works:
|
||||||
|
|
||||||
|
- `subint_proc` would use the public
|
||||||
|
`concurrent.interpreters.create()` + `Interpreter.exec()`
|
||||||
|
/ `Interpreter.close()` — which *should* handle tstate
|
||||||
|
cleanly (they're the "blessed" API).
|
||||||
|
- If so, trio's cache threads are safe to fork from
|
||||||
|
regardless of whether they've previously driven subints.
|
||||||
|
- → the `non_trio` qualifier in
|
||||||
|
`run_fork_in_non_trio_thread` becomes
|
||||||
|
*overcautious* rather than load-bearing, and the
|
||||||
|
dedicated-thread primitives in `_subint_forkserver.py`
|
||||||
|
can likely be replaced with straight
|
||||||
|
`trio.to_thread.run_sync()` wrappers.
|
||||||
|
|
||||||
|
TL;DR
|
||||||
|
-----
|
||||||
|
|
||||||
|
| constraint | fixed by isolated mode? |
|
||||||
|
|---|---|
|
||||||
|
| GIL-starvation (class A) | **yes** |
|
||||||
|
| destroy race on cached worker | unclear — empirical test on py3.14 + isolated API required |
|
||||||
|
| fork-from-main-tstate requirement on worker | **probably yes, conditional on the destroy-race question above** |
|
||||||
|
|
||||||
|
If #2 also resolves on py3.14+ with isolated mode,
|
||||||
|
tractor could drop the `non_trio` qualifier from the fork
|
||||||
|
helper's name and just use `trio.to_thread.run_sync(...)`
|
||||||
|
for everything. But **we shouldn't do that preemptively**
|
||||||
|
— the current cautious design is cheap (one dedicated
|
||||||
|
thread per fork / per subint-exec) and correct.
|
||||||
|
|
||||||
|
Audit plan when msgspec #563 lands
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
|
Assuming msgspec grows `Py_mod_multiple_interpreters`
|
||||||
|
support:
|
||||||
|
|
||||||
|
1. **Flip `tractor.spawn._subint` to isolated mode.** Drop
|
||||||
|
the `_interpreters.create('legacy')` call in favor of
|
||||||
|
the public API (`concurrent.interpreters.create()` +
|
||||||
|
`Interpreter.exec()` / `Interpreter.close()`). Run the
|
||||||
|
three `ai/conc-anal/subint_*_issue.md` reproducers —
|
||||||
|
class-A (`test_stale_entry_is_deleted` etc.) should
|
||||||
|
pass without the `skipon_spawn_backend('subint')` marks
|
||||||
|
(revisit the marker inventory).
|
||||||
|
|
||||||
|
2. **Empirical destroy-race retest.** In `subint_proc`,
|
||||||
|
swap the dedicated `threading.Thread` back to
|
||||||
|
`trio.to_thread.run_sync(Interpreter.exec, ...,
|
||||||
|
abandon_on_cancel=False)` and run the full subint test
|
||||||
|
suite. If `Interpreter.close()` (or the backing
|
||||||
|
destroy) blocks the same way as the legacy version
|
||||||
|
did, revert and keep the dedicated thread.
|
||||||
|
|
||||||
|
3. **If #2 clean**, audit `_subint_forkserver.py`:
|
||||||
|
- Rename `run_fork_in_non_trio_thread` → drop the
|
||||||
|
`_non_trio_` qualifier (e.g. `run_fork_in_thread`) or
|
||||||
|
inline the two-line `trio.to_thread.run_sync` call at
|
||||||
|
the call sites and drop the helper entirely.
|
||||||
|
- Consider whether `fork_from_worker_thread` +
|
||||||
|
`run_subint_in_worker_thread` still warrant being
|
||||||
|
separate module-level primitives or whether they
|
||||||
|
collapse into a compound
|
||||||
|
`trio.to_thread.run_sync`-driven pattern inside the
|
||||||
|
(future) `subint_forkserver_proc` backend.
|
||||||
|
|
||||||
|
4. **Doc fallout.** `subint_sigint_starvation_issue.md`
|
||||||
|
and `subint_cancel_delivery_hang_issue.md` both cite
|
||||||
|
the legacy-GIL-sharing architecture as the root cause.
|
||||||
|
Close them with commit-refs to the isolated-mode
|
||||||
|
migration. This doc itself should get a closing
|
||||||
|
post-mortem section noting which of #1/#2/#3 actually
|
||||||
|
resolved vs persisted.
|
||||||
|
|
||||||
|
References
|
||||||
|
----------
|
||||||
|
|
||||||
|
- `tractor.spawn._subint_forkserver` — the in-tree module
|
||||||
|
whose constraints this doc catalogs.
|
||||||
|
- `ai/conc-anal/subint_sigint_starvation_issue.md` — the
|
||||||
|
GIL-starvation class.
|
||||||
|
- `ai/conc-anal/subint_cancel_delivery_hang_issue.md` —
|
||||||
|
sibling Ctrl-C-able hang class.
|
||||||
|
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
— why fork-from-subint is blocked (this drives the
|
||||||
|
forkserver-via-non-subint-thread workaround).
|
||||||
|
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
|
||||||
|
— empirical validation for the workaround.
|
||||||
|
- [PEP 684 — per-interpreter GIL](https://peps.python.org/pep-0684/)
|
||||||
|
- [PEP 734 — `concurrent.interpreters` public API](https://peps.python.org/pep-0734/)
|
||||||
|
- [jcrist/msgspec#563 — PEP 684 support tracker](https://github.com/jcrist/msgspec/issues/563)
|
||||||
|
- tractor issue #379 — subint backend tracking.
|
||||||
|
|
@ -0,0 +1,155 @@
|
||||||
|
---
|
||||||
|
model: claude-opus-4-7[1m]
|
||||||
|
service: claude
|
||||||
|
session: subints-phase-b-hardening-and-fork-block
|
||||||
|
timestamp: 2026-04-22T20:07:23Z
|
||||||
|
git_ref: 797f57c
|
||||||
|
scope: code
|
||||||
|
substantive: true
|
||||||
|
raw_file: 20260422T200723Z_797f57c_prompt_io.raw.md
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prompt
|
||||||
|
|
||||||
|
Session-spanning work on the Phase B `subint` spawn-backend.
|
||||||
|
Three distinct sub-phases in one log:
|
||||||
|
|
||||||
|
1. **Py3.13 gate tightening** — diagnose a reproducible hang
|
||||||
|
of subint spawn flow under py3.13 (works on py3.14), trace
|
||||||
|
to a private `_interpreters` module vintage issue, tighten
|
||||||
|
our feature gate from "`_interpreters` present" to "public
|
||||||
|
`concurrent.interpreters` present" (i.e. py3.14+).
|
||||||
|
2. **Test-harness hardening** — add `pytest-timeout` dep, put
|
||||||
|
`@pytest.mark.timeout(30, method='thread')` on the
|
||||||
|
three known-hanging subint tests cataloged in
|
||||||
|
`ai/conc-anal/subint_sigint_starvation_issue.md`. Separately,
|
||||||
|
code-review the user's in-flight `skipon_spawn_backend`
|
||||||
|
marker implementation; find four bugs; refactor to use
|
||||||
|
`item.iter_markers()`.
|
||||||
|
3. **`subint_fork` prototype → CPython-block finding** — draft
|
||||||
|
a WIP `subint_fork_proc` backend using a sub-interpreter as
|
||||||
|
a launchpad for `os.fork()` (to sidestep trio#1614). User
|
||||||
|
tests on py3.14, hits
|
||||||
|
`Fatal Python error: _PyInterpreterState_DeleteExceptMain:
|
||||||
|
not main interpreter`. Walk CPython sources (local clone at
|
||||||
|
`~/repos/cpython/`) to pinpoint the refusal
|
||||||
|
(`Modules/posixmodule.c:728` → `Python/pystate.c:1040`).
|
||||||
|
Revert implementation to a `NotImplementedError` stub in a
|
||||||
|
new `_subint_fork.py` submodule, document the finding in a
|
||||||
|
third `conc-anal/` doc with an upstream-report draft for
|
||||||
|
the CPython issue tracker. Finally, discuss user's proposed
|
||||||
|
workaround architecture (main-interp worker-thread
|
||||||
|
forkserver) and draft a standalone smoke-test script for
|
||||||
|
feasibility validation.
|
||||||
|
|
||||||
|
## Response summary
|
||||||
|
|
||||||
|
All three sub-phases landed concrete artifacts:
|
||||||
|
|
||||||
|
**Sub-phase 1** — `_subint.py` + `_spawn.py` gates + error
|
||||||
|
messages updated to require py3.14+ via the public
|
||||||
|
`concurrent.interpreters` module presence check. Module
|
||||||
|
docstring revised to explain the empirical reason
|
||||||
|
(py3.13's private `_interpreters` vintage wedges under
|
||||||
|
multi-trio-task usage even though minimal standalone
|
||||||
|
reproducers work fine there). Test-module
|
||||||
|
`pytest.importorskip` likewise switched.
|
||||||
|
|
||||||
|
**Sub-phase 2** — `pytest-timeout>=2.3` added to `testing`
|
||||||
|
dep group. `@pytest.mark.timeout(30, method='thread')`
|
||||||
|
applied on:
|
||||||
|
- `tests/discovery/test_registrar.py::test_stale_entry_is_deleted`
|
||||||
|
- `tests/test_cancellation.py::test_cancel_while_childs_child_in_sync_sleep`
|
||||||
|
- `tests/test_cancellation.py::test_multierror_fast_nursery`
|
||||||
|
- `tests/test_subint_cancellation.py::test_subint_non_checkpointing_child`
|
||||||
|
|
||||||
|
`method='thread'` documented inline as load-bearing — the
|
||||||
|
GIL-starvation path that drops `SIGINT` would equally drop
|
||||||
|
`SIGALRM`, so only a watchdog-thread timeout can reliably
|
||||||
|
escape.
|
||||||
|
|
||||||
|
`skipon_spawn_backend` plugin refactored into a single
|
||||||
|
`iter_markers`-driven loop in `pytest_collection_modifyitems`
|
||||||
|
(~30 LOC replacing ~30 LOC of nested conditionals). Four
|
||||||
|
bugs dissolved: wrong `.get()` key, module-level `pytestmark`
|
||||||
|
suppressing per-test marks, unhandled `pytestmark = [list]`
|
||||||
|
form, `pytest.Makr` typo. Marker help text updated to
|
||||||
|
document the variadic backend-list + `reason=` kwarg
|
||||||
|
surface.
|
||||||
|
|
||||||
|
**Sub-phase 3** — Prototype drafted (then reverted):
|
||||||
|
|
||||||
|
- `tractor/spawn/_subint_fork.py` — new dedicated submodule
|
||||||
|
housing the `subint_fork_proc` stub. Module docstring +
|
||||||
|
fn docstring explain the attempt, the CPython-level
|
||||||
|
block, and the reason for keeping the stub in-tree
|
||||||
|
(documentation of the attempt + starting point if CPython
|
||||||
|
ever lifts the restriction).
|
||||||
|
- `tractor/spawn/_spawn.py` — `'subint_fork'` registered as a
|
||||||
|
`SpawnMethodKey` literal + in `_methods`, so
|
||||||
|
`--spawn-backend=subint_fork` routes to a clean
|
||||||
|
`NotImplementedError` pointing at the analysis doc rather
|
||||||
|
than an "invalid backend" error.
|
||||||
|
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` —
|
||||||
|
third sibling conc-anal doc. Full annotated CPython
|
||||||
|
source walkthrough from user-visible
|
||||||
|
`Fatal Python error` → `Modules/posixmodule.c:728
|
||||||
|
PyOS_AfterFork_Child()` → `Python/pystate.c:1040
|
||||||
|
_PyInterpreterState_DeleteExceptMain()` gate. Includes a
|
||||||
|
copy-paste-ready upstream-report draft for the CPython
|
||||||
|
issue tracker with a two-tier ask (ideally "make it work",
|
||||||
|
minimally "cleaner error than `Fatal Python error`
|
||||||
|
aborting the child").
|
||||||
|
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py` —
|
||||||
|
standalone zero-tractor-import CPython-level smoke test
|
||||||
|
for the user's proposed workaround architecture
|
||||||
|
(forkserver on a main-interp worker thread). Four
|
||||||
|
argparse-driven scenarios: `control_subint_thread_fork`
|
||||||
|
(reproduces the known-broken case as a test-harness
|
||||||
|
sanity), `main_thread_fork` (baseline), `worker_thread_fork`
|
||||||
|
(architectural assertion), `full_architecture`
|
||||||
|
(end-to-end trio-in-subint in forked child). User will
|
||||||
|
run on py3.14 next.
|
||||||
|
|
||||||
|
## Files changed
|
||||||
|
|
||||||
|
See `git log 26fb820..HEAD --stat` for the canonical list.
|
||||||
|
New files this session:
|
||||||
|
- `tractor/spawn/_subint_fork.py`
|
||||||
|
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
|
||||||
|
|
||||||
|
Modified (diff pointers in raw log):
|
||||||
|
- `tractor/spawn/_subint.py` (py3.14 gate)
|
||||||
|
- `tractor/spawn/_spawn.py` (`subint_fork` registration)
|
||||||
|
- `tractor/_testing/pytest.py` (`skipon_spawn_backend` refactor)
|
||||||
|
- `pyproject.toml` (`pytest-timeout` dep)
|
||||||
|
- `tests/discovery/test_registrar.py`,
|
||||||
|
`tests/test_cancellation.py`,
|
||||||
|
`tests/test_subint_cancellation.py` (timeout marks,
|
||||||
|
cross-refs to conc-anal docs)
|
||||||
|
|
||||||
|
## Human edits
|
||||||
|
|
||||||
|
Several back-and-forth iterations with user-driven
|
||||||
|
adjustments during the session:
|
||||||
|
|
||||||
|
- User corrected my initial mis-classification of
|
||||||
|
`test_cancel_while_childs_child_in_sync_sleep[subint-False]`
|
||||||
|
as Ctrl-C-able — second strace showed `EAGAIN`, putting
|
||||||
|
it squarely in class A (GIL-starvation). Re-analysis
|
||||||
|
preserved in the raw log.
|
||||||
|
- User independently fixed the `.get(reason)` → `.get('reason', reason)`
|
||||||
|
bug in the marker plugin before my review; preserved their
|
||||||
|
fix.
|
||||||
|
- User suggested moving the `subint_fork_proc` stub from
|
||||||
|
the bottom of `_subint.py` into its own
|
||||||
|
`_subint_fork.py` submodule — applied.
|
||||||
|
- User asked to keep the forkserver-architecture
|
||||||
|
discussion as background for the smoke-test rather than
|
||||||
|
committing to a tractor-side refactor until the smoke
|
||||||
|
test validates the CPython-level assumptions.
|
||||||
|
|
||||||
|
Commit messages in this range (b025c982 … 797f57c) were
|
||||||
|
drafted via `/commit-msg` + `rewrap.py --width 67`; user
|
||||||
|
landed them with the usual review.
|
||||||
|
|
@ -0,0 +1,343 @@
|
||||||
|
---
|
||||||
|
model: claude-opus-4-7[1m]
|
||||||
|
service: claude
|
||||||
|
timestamp: 2026-04-22T20:07:23Z
|
||||||
|
git_ref: 797f57c
|
||||||
|
diff_cmd: git log 26fb820..HEAD # all session commits since the destroy-race fix log
|
||||||
|
---
|
||||||
|
|
||||||
|
Session-spanning conversation covering the Phase B hardening
|
||||||
|
of the `subint` spawn-backend and an investigation into a
|
||||||
|
proposed `subint_fork` follow-up which turned out to be
|
||||||
|
blocked at the CPython level. This log is a narrative capture
|
||||||
|
of the substantive turns (not every message) and references
|
||||||
|
the concrete code + docs the session produced. Per diff-ref
|
||||||
|
mode the actual code diffs are pointed at via `git log` on
|
||||||
|
each ref rather than duplicated inline.
|
||||||
|
|
||||||
|
## Narrative of the substantive turns
|
||||||
|
|
||||||
|
### Py3.13 hang / gate tightening
|
||||||
|
|
||||||
|
Diagnosed a reproducible hang of the `subint` backend under
|
||||||
|
py3.13 (test_spawning tests wedge after root-actor bringup).
|
||||||
|
Root cause: py3.13's vintage of the private `_interpreters` C
|
||||||
|
module has a latent thread/subint-interaction issue that
|
||||||
|
`_interpreters.exec()` silently fails to progress under
|
||||||
|
tractor's multi-trio usage pattern — even though a minimal
|
||||||
|
standalone `threading.Thread` + `_interpreters.exec()`
|
||||||
|
reproducer works fine on the same Python. Empirically
|
||||||
|
py3.14 fixes it.
|
||||||
|
|
||||||
|
Fix (from this session): tighten the `_has_subints` gate in
|
||||||
|
`tractor.spawn._subint` from "private module importable" to
|
||||||
|
"public `concurrent.interpreters` present" — which is 3.14+
|
||||||
|
only. This leaves `subint_proc()` unchanged in behavior (we
|
||||||
|
still call the *private* `_interpreters.create('legacy')`
|
||||||
|
etc. under the hood) but refuses to engage on 3.13.
|
||||||
|
|
||||||
|
Also tightened the matching gate in
|
||||||
|
`tractor.spawn._spawn.try_set_start_method('subint')` and
|
||||||
|
rev'd the corresponding error messages from "3.13+" to
|
||||||
|
"3.14+" with a sentence explaining why. Test-module
|
||||||
|
`pytest.importorskip` switched from `_interpreters` →
|
||||||
|
`concurrent.interpreters` to match.
|
||||||
|
|
||||||
|
### `pytest-timeout` dep + `skipon_spawn_backend` marker plumbing
|
||||||
|
|
||||||
|
Added `pytest-timeout>=2.3` to the `testing` dep group with
|
||||||
|
an inline comment pointing at the `ai/conc-anal/*.md` docs.
|
||||||
|
Applied `@pytest.mark.timeout(30, method='thread')` (the
|
||||||
|
`method='thread'` is load-bearing — `signal`-method
|
||||||
|
`SIGALRM` suffers the same GIL-starvation path that drops
|
||||||
|
`SIGINT` in the class-A hang pattern) to the three known-
|
||||||
|
hanging subint tests cataloged in
|
||||||
|
`subint_sigint_starvation_issue.md`.
|
||||||
|
|
||||||
|
Separately code-reviewed the user's newly-staged
|
||||||
|
`skipon_spawn_backend` pytest marker implementation in
|
||||||
|
`tractor/_testing/pytest.py`. Found four bugs:
|
||||||
|
|
||||||
|
1. `modmark.kwargs.get(reason)` called `.get()` with the
|
||||||
|
*variable* `reason` as the dict key instead of the string
|
||||||
|
`'reason'` — user-supplied `reason=` was never picked up.
|
||||||
|
(User had already fixed this locally via `.get('reason',
|
||||||
|
reason)` by the time my review happened — preserved that
|
||||||
|
fix.)
|
||||||
|
2. The module-level `pytestmark` branch suppressed per-test
|
||||||
|
marker handling (the `else:` was an `else:` rather than
|
||||||
|
independent iteration).
|
||||||
|
3. `mod_pytestmark.mark` assumed a single
|
||||||
|
`MarkDecorator` — broke on the valid-pytest `pytestmark =
|
||||||
|
[mark, mark]` list form.
|
||||||
|
4. Typo: `pytest.Makr` → `pytest.Mark`.
|
||||||
|
|
||||||
|
Refactored the hook to use `item.iter_markers(name=...)`
|
||||||
|
which walks function + class + module scopes uniformly and
|
||||||
|
handles both `pytestmark` forms natively. ~30 LOC replaced
|
||||||
|
the original ~30 LOC of nested conditionals, all four bugs
|
||||||
|
dissolved. Also updated the marker help string to reflect
|
||||||
|
the variadic `*start_methods` + `reason=` surface.
|
||||||
|
|
||||||
|
### `subint_fork_proc` prototype attempt
|
||||||
|
|
||||||
|
User's hypothesis: the known trio+`fork()` issues
|
||||||
|
(python-trio/trio#1614) could be sidestepped by using a
|
||||||
|
sub-interpreter purely as a launchpad — `os.fork()` from a
|
||||||
|
subint that has never imported trio → child is in a
|
||||||
|
trio-free context. In the child `execv()` back into
|
||||||
|
`python -m tractor._child` and the downstream handshake
|
||||||
|
matches `trio_proc()` identically.
|
||||||
|
|
||||||
|
Drafted the prototype at `tractor/spawn/_subint.py`'s bottom
|
||||||
|
(originally — later moved to its own submod, see below):
|
||||||
|
launchpad-subint creation, bootstrap code-string with
|
||||||
|
`os.fork()` + `execv()`, driver-thread orchestration,
|
||||||
|
parent-side `ipc_server.wait_for_peer()` dance. Registered
|
||||||
|
`'subint_fork'` as a new `SpawnMethodKey` literal, added
|
||||||
|
`case 'subint' | 'subint_fork':` feature-gate arm in
|
||||||
|
`try_set_start_method()`, added entry in `_methods` dict.
|
||||||
|
|
||||||
|
### CPython-level block discovered
|
||||||
|
|
||||||
|
User tested on py3.14 and saw:
|
||||||
|
|
||||||
|
```
|
||||||
|
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
|
||||||
|
Python runtime state: initialized
|
||||||
|
|
||||||
|
Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first):
|
||||||
|
File "<script>", line 2 in <module>
|
||||||
|
<script>:2: DeprecationWarning: This process (pid=802985) is multi-threaded, use of fork() may lead to deadlocks in the child.
|
||||||
|
```
|
||||||
|
|
||||||
|
Walked CPython sources (local clone at `~/repos/cpython/`):
|
||||||
|
|
||||||
|
- **`Modules/posixmodule.c:728` `PyOS_AfterFork_Child()`** —
|
||||||
|
post-fork child-side cleanup. Calls
|
||||||
|
`_PyInterpreterState_DeleteExceptMain(runtime)` with
|
||||||
|
`goto fatal_error` on non-zero status. Has the
|
||||||
|
`// Ideally we could guarantee tstate is running main.`
|
||||||
|
self-acknowledging-fragile comment directly above.
|
||||||
|
|
||||||
|
- **`Python/pystate.c:1040`
|
||||||
|
`_PyInterpreterState_DeleteExceptMain()`** — the
|
||||||
|
refusal. Hard `PyStatus_ERR("not main interpreter")` gate
|
||||||
|
when `tstate->interp != interpreters->main`. Docstring
|
||||||
|
formally declares the precondition ("If there is a
|
||||||
|
current interpreter state, it *must* be the main
|
||||||
|
interpreter"). `XXX` comments acknowledge further latent
|
||||||
|
issues within.
|
||||||
|
|
||||||
|
Definitive answer to "Open Question 1" of the prototype
|
||||||
|
docstring: **no, CPython does not support `os.fork()` from
|
||||||
|
a non-main sub-interpreter**. Not because the fork syscall
|
||||||
|
is blocked (it isn't — the parent returns a valid pid),
|
||||||
|
but because the child cannot survive CPython's post-fork
|
||||||
|
initialization. This is an enforced invariant, not an
|
||||||
|
incidental limitation.
|
||||||
|
|
||||||
|
### Revert: move to stub submod + doc the finding
|
||||||
|
|
||||||
|
Per user request:
|
||||||
|
|
||||||
|
1. Reverted the working `subint_fork_proc` body to a
|
||||||
|
`NotImplementedError` stub, MOVED to its own submod
|
||||||
|
`tractor/spawn/_subint_fork.py` (keeps `_subint.py`
|
||||||
|
focused on the working `subint_proc` backend).
|
||||||
|
2. Updated `_spawn.py` to import the stub from the new
|
||||||
|
submod path; kept `'subint_fork'` in `SpawnMethodKey` +
|
||||||
|
`_methods` so `--spawn-backend=subint_fork` routes to a
|
||||||
|
clean `NotImplementedError` with pointer to the analysis
|
||||||
|
doc rather than an "invalid backend" error.
|
||||||
|
3. Wrote
|
||||||
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
with the full annotated CPython walkthrough + an
|
||||||
|
upstream-report draft for the CPython issue tracker.
|
||||||
|
Draft has a two-tier ask: ideally "make it work"
|
||||||
|
(pre-fork tstate-swap hook or `DeleteExceptFor(interp)`
|
||||||
|
variant), minimally "give us a clean `RuntimeError` in
|
||||||
|
the parent instead of a `Fatal Python error` aborting
|
||||||
|
the child silently".
|
||||||
|
|
||||||
|
### Design discussion — main-interp-thread forkserver workaround
|
||||||
|
|
||||||
|
User proposed: set up a "subint forking server" that fork()s
|
||||||
|
on behalf of subint callers. Core insight: the CPython gate
|
||||||
|
is on `tstate->interp`, not thread identity, so **any thread
|
||||||
|
whose tstate is main-interp** can fork cleanly. A worker
|
||||||
|
thread attached to main-interp (never entering a subint)
|
||||||
|
satisfies the precondition.
|
||||||
|
|
||||||
|
Structurally this is `mp.forkserver` (which tractor already
|
||||||
|
has as `mp_forkserver`) but **in-process**: instead of a
|
||||||
|
separate Python subproc as the fork server, we'd put the
|
||||||
|
forkserver on a thread in the tractor parent process. Pros:
|
||||||
|
faster spawn (no IPC marshalling to external server + no
|
||||||
|
separate Python startup), inherits already-imported modules
|
||||||
|
for free. Cons: less crash isolation (forkserver failure
|
||||||
|
takes the whole process).
|
||||||
|
|
||||||
|
Required tractor-side refactor: move the root actor's
|
||||||
|
`trio.run()` off main-interp-main-thread (so main-thread can
|
||||||
|
run the forkserver loop). Nontrivial; approximately the same
|
||||||
|
magnitude as "Phase C".
|
||||||
|
|
||||||
|
The design would also not fully resolve the class-A
|
||||||
|
GIL-starvation issue because child actors' trio still runs
|
||||||
|
inside subints (legacy config, msgspec PEP 684 pending).
|
||||||
|
Would mitigate SIGINT-starvation specifically if signal
|
||||||
|
handling moves to the forkserver thread.
|
||||||
|
|
||||||
|
Recommended pre-commitment: a standalone CPython-only smoke
|
||||||
|
test validating the four assumptions the arch rests on,
|
||||||
|
before any tractor-side work.
|
||||||
|
|
||||||
|
### Smoke-test script drafted
|
||||||
|
|
||||||
|
Wrote `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`:
|
||||||
|
argparse-driven, four scenarios (`control_subint_thread_fork`
|
||||||
|
reproducing the known-broken case, `main_thread_fork`
|
||||||
|
baseline, `worker_thread_fork` the architectural assertion,
|
||||||
|
`full_architecture` end-to-end with trio in a subint in the
|
||||||
|
forked child). No `tractor` imports; pure CPython + `_interpreters`
|
||||||
|
+ `trio`. Bails cleanly on py<3.14. Pass/fail banners per
|
||||||
|
scenario.
|
||||||
|
|
||||||
|
User will validate on their py3.14 env next.
|
||||||
|
|
||||||
|
## Per-code-artifact provenance
|
||||||
|
|
||||||
|
### `tractor/spawn/_subint_fork.py` (new submod)
|
||||||
|
|
||||||
|
> `git show 797f57c -- tractor/spawn/_subint_fork.py`
|
||||||
|
|
||||||
|
NotImplementedError stub for the subint-fork backend. Module
|
||||||
|
docstring + fn docstring explain the attempt, the CPython
|
||||||
|
block, and why the stub is kept in-tree. No runtime behavior
|
||||||
|
beyond raising with a pointer at the conc-anal doc.
|
||||||
|
|
||||||
|
### `tractor/spawn/_spawn.py` (modified)
|
||||||
|
|
||||||
|
> `git log 26fb820..HEAD -- tractor/spawn/_spawn.py`
|
||||||
|
|
||||||
|
- Added `'subint_fork'` to `SpawnMethodKey` literal with a
|
||||||
|
block comment explaining the CPython-level block.
|
||||||
|
- Generalized the `case 'subint':` arm to `case 'subint' |
|
||||||
|
'subint_fork':` since both use the same py3.14+ gate.
|
||||||
|
- Registered `subint_fork_proc` in `_methods` with a
|
||||||
|
pointer-comment at the analysis doc.
|
||||||
|
|
||||||
|
### `tractor/spawn/_subint.py` (modified across session)
|
||||||
|
|
||||||
|
> `git log 26fb820..HEAD -- tractor/spawn/_subint.py`
|
||||||
|
|
||||||
|
- Tightened `_has_subints` gate: dual-requires public
|
||||||
|
`concurrent.interpreters` + private `_interpreters`
|
||||||
|
(tests for py3.14-or-newer on the public-API presence,
|
||||||
|
then uses the private one for legacy-config subints
|
||||||
|
because `msgspec` still blocks the public isolated mode
|
||||||
|
per jcrist/msgspec#563).
|
||||||
|
- Updated module docstring, `subint_proc()` docstring, and
|
||||||
|
gate-error messages to reflect the 3.14+ requirement and
|
||||||
|
the reason (py3.13 wedges under multi-trio usage even
|
||||||
|
though the private module exists there).
|
||||||
|
|
||||||
|
### `tractor/_testing/pytest.py` (modified)
|
||||||
|
|
||||||
|
> `git log 26fb820..HEAD -- tractor/_testing/pytest.py`
|
||||||
|
|
||||||
|
- New `skipon_spawn_backend(*start_methods, reason=...)`
|
||||||
|
pytest marker expanded into `pytest.mark.skip(reason=...)`
|
||||||
|
at collection time via
|
||||||
|
`pytest_collection_modifyitems()`.
|
||||||
|
- Implementation uses `item.iter_markers(name=...)` which
|
||||||
|
walks function + class + module scopes uniformly and
|
||||||
|
handles both `pytestmark = <single Mark>` and
|
||||||
|
`pytestmark = [mark, ...]` forms natively. ~30-LOC
|
||||||
|
single-loop refactor replacing a prior nested
|
||||||
|
conditional that had four bugs (see "Review" narrative
|
||||||
|
above).
|
||||||
|
- Added `pytest.Config` / `pytest.Function` /
|
||||||
|
`pytest.FixtureRequest` type annotations on fixture
|
||||||
|
signatures while touching the file.
|
||||||
|
|
||||||
|
### `pyproject.toml` (modified)
|
||||||
|
|
||||||
|
> `git log 26fb820..HEAD -- pyproject.toml`
|
||||||
|
|
||||||
|
Added `pytest-timeout>=2.3` to `testing` dep group with
|
||||||
|
comment pointing at the `ai/conc-anal/` docs.
|
||||||
|
|
||||||
|
### `tests/discovery/test_registrar.py`,
|
||||||
|
`tests/test_subint_cancellation.py`,
|
||||||
|
`tests/test_cancellation.py` (modified)
|
||||||
|
|
||||||
|
> `git log 26fb820..HEAD -- tests/`
|
||||||
|
|
||||||
|
Applied `@pytest.mark.timeout(30, method='thread')` on
|
||||||
|
known-hanging subint tests. Extended comments to cross-
|
||||||
|
reference the `ai/conc-anal/*.md` docs. `method='thread'`
|
||||||
|
is documented inline as load-bearing (`signal`-method
|
||||||
|
SIGALRM suffers the same GIL-starvation path that drops
|
||||||
|
SIGINT).
|
||||||
|
|
||||||
|
### `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` (new)
|
||||||
|
|
||||||
|
> `git show 797f57c -- ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
|
||||||
|
Third sibling doc under `conc-anal/`. Structure: TL;DR,
|
||||||
|
context ("what we tried"), symptom (the user's exact
|
||||||
|
`Fatal Python error` output), CPython source walkthrough
|
||||||
|
with excerpted snippets from `posixmodule.c` +
|
||||||
|
`pystate.c`, chain summary, definitive answer to Open
|
||||||
|
Question 1, `## Upstream-report draft (for CPython issue
|
||||||
|
tracker)` section with a two-tier ask, references.
|
||||||
|
|
||||||
|
### `ai/conc-anal/subint_fork_from_main_thread_smoketest.py` (new, THIS turn)
|
||||||
|
|
||||||
|
Zero-tractor-import smoke test for the proposed workaround
|
||||||
|
architecture. Four argparse-driven scenarios covering the
|
||||||
|
control case + baseline + arch-critical case + end-to-end.
|
||||||
|
Pass/fail banners per scenario; clean `--help` output;
|
||||||
|
py3.13 early-exit.
|
||||||
|
|
||||||
|
## Non-code output (verbatim)
|
||||||
|
|
||||||
|
### The `strace` signature that kicked off the CPython
|
||||||
|
walkthrough
|
||||||
|
|
||||||
|
```
|
||||||
|
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
|
||||||
|
write(16, "\2", 1) = -1 EAGAIN (Resource temporarily unavailable)
|
||||||
|
rt_sigreturn({mask=[WINCH]}) = 139801964688928
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key user quotes framing the direction
|
||||||
|
|
||||||
|
> ok actually we get this [fatal error] ... see if you can
|
||||||
|
> take a look at what's going on, in particular wrt to
|
||||||
|
> cpython's sources. pretty sure there's a local copy at
|
||||||
|
> ~/repos/cpython/
|
||||||
|
|
||||||
|
(Drove the CPython walkthrough that produced the
|
||||||
|
definitive refusal chain.)
|
||||||
|
|
||||||
|
> is there any reason we can't just sidestep this "must fork
|
||||||
|
> from main thread in main subint" issue by simply ensuring
|
||||||
|
> a "subint forking server" is always setup prior to
|
||||||
|
> invoking trio in a non-main-thread subint ...
|
||||||
|
|
||||||
|
(Drove the main-interp-thread-forkserver architectural
|
||||||
|
discussion + smoke-test script design.)
|
||||||
|
|
||||||
|
### CPython source tags for quick jump-back
|
||||||
|
|
||||||
|
```
|
||||||
|
Modules/posixmodule.c:728 PyOS_AfterFork_Child()
|
||||||
|
Modules/posixmodule.c:753 // Ideally we could guarantee tstate is running main.
|
||||||
|
Modules/posixmodule.c:778 status = _PyInterpreterState_DeleteExceptMain(runtime);
|
||||||
|
|
||||||
|
Python/pystate.c:1040 _PyInterpreterState_DeleteExceptMain()
|
||||||
|
Python/pystate.c:1044-1047 tstate->interp != main → PyStatus_ERR("not main interpreter")
|
||||||
|
```
|
||||||
|
|
@ -139,7 +139,9 @@ def pytest_addoption(
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture(scope='session', autouse=True)
|
@pytest.fixture(scope='session', autouse=True)
|
||||||
def loglevel(request) -> str:
|
def loglevel(
|
||||||
|
request: pytest.FixtureRequest,
|
||||||
|
) -> str:
|
||||||
import tractor
|
import tractor
|
||||||
orig = tractor.log._default_loglevel
|
orig = tractor.log._default_loglevel
|
||||||
level = tractor.log._default_loglevel = request.config.option.loglevel
|
level = tractor.log._default_loglevel = request.config.option.loglevel
|
||||||
|
|
@ -156,7 +158,7 @@ def loglevel(request) -> str:
|
||||||
|
|
||||||
@pytest.fixture(scope='function')
|
@pytest.fixture(scope='function')
|
||||||
def test_log(
|
def test_log(
|
||||||
request,
|
request: pytest.FixtureRequest,
|
||||||
loglevel: str,
|
loglevel: str,
|
||||||
) -> tractor.log.StackLevelAdapter:
|
) -> tractor.log.StackLevelAdapter:
|
||||||
'''
|
'''
|
||||||
|
|
|
||||||
|
|
@ -146,13 +146,12 @@ def spawn(
|
||||||
ids='ctl-c={}'.format,
|
ids='ctl-c={}'.format,
|
||||||
)
|
)
|
||||||
def ctlc(
|
def ctlc(
|
||||||
request,
|
request: pytest.FixtureRequest,
|
||||||
ci_env: bool,
|
ci_env: bool,
|
||||||
|
|
||||||
) -> bool:
|
) -> bool:
|
||||||
|
|
||||||
use_ctlc = request.param
|
use_ctlc: bool = request.param
|
||||||
|
|
||||||
node = request.node
|
node = request.node
|
||||||
markers = node.own_markers
|
markers = node.own_markers
|
||||||
for mark in markers:
|
for mark in markers:
|
||||||
|
|
|
||||||
|
|
@ -520,8 +520,6 @@ async def kill_transport(
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# @pytest.mark.parametrize('use_signal', [False, True])
|
|
||||||
#
|
|
||||||
# Wall-clock bound via `pytest-timeout` (`method='thread'`).
|
# Wall-clock bound via `pytest-timeout` (`method='thread'`).
|
||||||
# Under `--spawn-backend=subint` this test can wedge in an
|
# Under `--spawn-backend=subint` this test can wedge in an
|
||||||
# un-Ctrl-C-able state (abandoned-subint + shared-GIL
|
# un-Ctrl-C-able state (abandoned-subint + shared-GIL
|
||||||
|
|
@ -537,6 +535,16 @@ async def kill_transport(
|
||||||
3, # NOTE should be a 2.1s happy path.
|
3, # NOTE should be a 2.1s happy path.
|
||||||
method='thread',
|
method='thread',
|
||||||
)
|
)
|
||||||
|
@pytest.mark.skipon_spawn_backend(
|
||||||
|
'subint',
|
||||||
|
reason=(
|
||||||
|
'XXX SUBINT HANGING TEST XXX\n'
|
||||||
|
'See oustanding issue(s)\n'
|
||||||
|
# TODO, put issue link!
|
||||||
|
)
|
||||||
|
)
|
||||||
|
# @pytest.mark.parametrize('use_signal', [False, True])
|
||||||
|
#
|
||||||
def test_stale_entry_is_deleted(
|
def test_stale_entry_is_deleted(
|
||||||
debug_mode: bool,
|
debug_mode: bool,
|
||||||
daemon: subprocess.Popen,
|
daemon: subprocess.Popen,
|
||||||
|
|
@ -549,12 +557,6 @@ def test_stale_entry_is_deleted(
|
||||||
stale entry and not delivering a bad portal.
|
stale entry and not delivering a bad portal.
|
||||||
|
|
||||||
'''
|
'''
|
||||||
if start_method == 'subint':
|
|
||||||
pytest.skip(
|
|
||||||
'XXX SUBINT HANGING TEST XXX\n'
|
|
||||||
'See oustanding issue(s)\n'
|
|
||||||
)
|
|
||||||
|
|
||||||
async def main():
|
async def main():
|
||||||
|
|
||||||
name: str = 'transport_fails_actor'
|
name: str = 'transport_fails_actor'
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,329 @@
|
||||||
|
'''
|
||||||
|
Integration exercises for the `tractor.spawn._subint_forkserver`
|
||||||
|
submodule at three tiers:
|
||||||
|
|
||||||
|
1. the low-level primitives
|
||||||
|
(`fork_from_worker_thread()` +
|
||||||
|
`run_subint_in_worker_thread()`) driven from inside a real
|
||||||
|
`trio.run()` in the parent process,
|
||||||
|
|
||||||
|
2. the full `subint_forkserver_proc` spawn backend wired
|
||||||
|
through tractor's normal actor-nursery + portal-RPC
|
||||||
|
machinery — i.e. `open_root_actor` + `open_nursery` +
|
||||||
|
`run_in_actor` against a subactor spawned via fork from a
|
||||||
|
main-interp worker thread.
|
||||||
|
|
||||||
|
Background
|
||||||
|
----------
|
||||||
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
establishes that `os.fork()` from a non-main sub-interpreter
|
||||||
|
aborts the child at the CPython level. The sibling
|
||||||
|
`subint_fork_from_main_thread_smoketest.py` proves the escape
|
||||||
|
hatch: fork from a main-interp *worker thread* (one that has
|
||||||
|
never entered a subint) works, and the forked child can then
|
||||||
|
host its own `trio.run()` inside a fresh subint.
|
||||||
|
|
||||||
|
Those smoke-test scenarios are standalone — no trio runtime
|
||||||
|
in the *parent*. Tiers (1)+(2) here cover the primitives
|
||||||
|
driven from inside `trio.run()` in the parent, and tier (3)
|
||||||
|
(the `*_spawn_basic` test) drives the registered
|
||||||
|
`subint_forkserver` spawn backend end-to-end against the
|
||||||
|
tractor runtime.
|
||||||
|
|
||||||
|
Gating
|
||||||
|
------
|
||||||
|
- py3.14+ (via `concurrent.interpreters` presence)
|
||||||
|
- no `--spawn-backend` restriction — the backend-level test
|
||||||
|
flips `tractor.spawn._spawn._spawn_method` programmatically
|
||||||
|
(via `try_set_start_method('subint_forkserver')`) and
|
||||||
|
restores it on teardown, so these tests are independent of
|
||||||
|
the session-level CLI backend choice.
|
||||||
|
|
||||||
|
'''
|
||||||
|
from __future__ import annotations
|
||||||
|
from functools import partial
|
||||||
|
import os
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
import trio
|
||||||
|
|
||||||
|
import tractor
|
||||||
|
from tractor.devx import dump_on_hang
|
||||||
|
|
||||||
|
|
||||||
|
# Gate: subint forkserver primitives require py3.14+. Check
|
||||||
|
# the public stdlib wrapper's presence (added in 3.14) rather
|
||||||
|
# than `_interpreters` directly — see
|
||||||
|
# `tractor.spawn._subint` for why.
|
||||||
|
pytest.importorskip('concurrent.interpreters')
|
||||||
|
|
||||||
|
from tractor.spawn._subint_forkserver import ( # noqa: E402
|
||||||
|
fork_from_worker_thread,
|
||||||
|
run_subint_in_worker_thread,
|
||||||
|
wait_child,
|
||||||
|
)
|
||||||
|
from tractor.spawn import _spawn as _spawn_mod # noqa: E402
|
||||||
|
from tractor.spawn._spawn import try_set_start_method # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# child-side callables (passed via `child_target=` across fork)
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
_CHILD_TRIO_BOOTSTRAP: str = (
|
||||||
|
'import trio\n'
|
||||||
|
'async def _main():\n'
|
||||||
|
' await trio.sleep(0.05)\n'
|
||||||
|
' return 42\n'
|
||||||
|
'result = trio.run(_main)\n'
|
||||||
|
'assert result == 42, f"trio.run returned {result}"\n'
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _child_trio_in_subint() -> int:
|
||||||
|
'''
|
||||||
|
`child_target` for the trio-in-child scenario: drive a
|
||||||
|
trivial `trio.run()` inside a fresh legacy-config subint
|
||||||
|
on a worker thread.
|
||||||
|
|
||||||
|
Returns an exit code suitable for `os._exit()`:
|
||||||
|
- 0: subint-hosted `trio.run()` succeeded
|
||||||
|
- 3: driver thread hang (timeout inside `run_subint_in_worker_thread`)
|
||||||
|
- 4: subint bootstrap raised some other exception
|
||||||
|
|
||||||
|
'''
|
||||||
|
try:
|
||||||
|
run_subint_in_worker_thread(
|
||||||
|
_CHILD_TRIO_BOOTSTRAP,
|
||||||
|
thread_name='child-subint-trio-thread',
|
||||||
|
)
|
||||||
|
except RuntimeError:
|
||||||
|
# timeout / thread-never-returned
|
||||||
|
return 3
|
||||||
|
except BaseException:
|
||||||
|
return 4
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# parent-side harnesses (run inside `trio.run()`)
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
async def run_fork_in_non_trio_thread(
|
||||||
|
deadline: float,
|
||||||
|
*,
|
||||||
|
child_target=None,
|
||||||
|
) -> int:
|
||||||
|
'''
|
||||||
|
From inside a parent `trio.run()`, off-load the
|
||||||
|
forkserver primitive to a main-interp worker thread via
|
||||||
|
`trio.to_thread.run_sync()` and return the forked child's
|
||||||
|
pid.
|
||||||
|
|
||||||
|
Then `wait_child()` on that pid (also off-loaded so we
|
||||||
|
don't block trio's event loop on `waitpid()`) and assert
|
||||||
|
the child exited cleanly.
|
||||||
|
|
||||||
|
'''
|
||||||
|
with trio.fail_after(deadline):
|
||||||
|
# NOTE: `fork_from_worker_thread` internally spawns its
|
||||||
|
# own dedicated `threading.Thread` (not from trio's
|
||||||
|
# cache) and joins it before returning — so we can
|
||||||
|
# safely off-load via `to_thread.run_sync` without
|
||||||
|
# worrying about the trio-thread-cache recycling the
|
||||||
|
# runner. Pass `abandon_on_cancel=False` for the
|
||||||
|
# same "bounded + clean" rationale we use in
|
||||||
|
# `_subint.subint_proc`.
|
||||||
|
pid: int = await trio.to_thread.run_sync(
|
||||||
|
partial(
|
||||||
|
fork_from_worker_thread,
|
||||||
|
child_target,
|
||||||
|
thread_name='test-subint-forkserver',
|
||||||
|
),
|
||||||
|
abandon_on_cancel=False,
|
||||||
|
)
|
||||||
|
assert pid > 0
|
||||||
|
|
||||||
|
ok, status_str = await trio.to_thread.run_sync(
|
||||||
|
partial(
|
||||||
|
wait_child,
|
||||||
|
pid,
|
||||||
|
expect_exit_ok=True,
|
||||||
|
),
|
||||||
|
abandon_on_cancel=False,
|
||||||
|
)
|
||||||
|
assert ok, (
|
||||||
|
f'forked child did not exit cleanly: '
|
||||||
|
f'{status_str}'
|
||||||
|
)
|
||||||
|
return pid
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# tests
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
# Bounded wall-clock via `pytest-timeout` (`method='thread'`)
|
||||||
|
# for the usual GIL-hostage safety reason documented in the
|
||||||
|
# sibling `test_subint_cancellation.py` / the class-A
|
||||||
|
# `subint_sigint_starvation_issue.md`. Each test also has an
|
||||||
|
# inner `trio.fail_after()` so assertion failures fire fast
|
||||||
|
# under normal conditions.
|
||||||
|
@pytest.mark.timeout(30, method='thread')
|
||||||
|
def test_fork_from_worker_thread_via_trio() -> None:
|
||||||
|
'''
|
||||||
|
Baseline: inside `trio.run()`, call
|
||||||
|
`fork_from_worker_thread()` via `trio.to_thread.run_sync()`,
|
||||||
|
get a child pid back, reap the child cleanly.
|
||||||
|
|
||||||
|
No trio-in-child. If this regresses we know the parent-
|
||||||
|
side trio↔worker-thread plumbing is broken independent
|
||||||
|
of any child-side subint machinery.
|
||||||
|
|
||||||
|
'''
|
||||||
|
deadline: float = 10.0
|
||||||
|
with dump_on_hang(
|
||||||
|
seconds=deadline,
|
||||||
|
path='/tmp/subint_forkserver_baseline.dump',
|
||||||
|
):
|
||||||
|
pid: int = trio.run(
|
||||||
|
partial(run_fork_in_non_trio_thread, deadline),
|
||||||
|
)
|
||||||
|
# parent-side sanity — we got a real pid back.
|
||||||
|
assert isinstance(pid, int) and pid > 0
|
||||||
|
# by now the child has been waited on; it shouldn't be
|
||||||
|
# reap-able again.
|
||||||
|
with pytest.raises((ChildProcessError, OSError)):
|
||||||
|
os.waitpid(pid, os.WNOHANG)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.timeout(30, method='thread')
|
||||||
|
def test_fork_and_run_trio_in_child() -> None:
|
||||||
|
'''
|
||||||
|
End-to-end: inside the parent's `trio.run()`, off-load
|
||||||
|
`fork_from_worker_thread()` to a worker thread, have the
|
||||||
|
forked child then create a fresh subint and run
|
||||||
|
`trio.run()` inside it on yet another worker thread.
|
||||||
|
|
||||||
|
This is the full "forkserver + trio-in-subint-in-child"
|
||||||
|
pattern the proposed `subint_forkserver` spawn backend
|
||||||
|
would rest on.
|
||||||
|
|
||||||
|
'''
|
||||||
|
deadline: float = 15.0
|
||||||
|
with dump_on_hang(
|
||||||
|
seconds=deadline,
|
||||||
|
path='/tmp/subint_forkserver_trio_in_child.dump',
|
||||||
|
):
|
||||||
|
pid: int = trio.run(
|
||||||
|
partial(
|
||||||
|
run_fork_in_non_trio_thread,
|
||||||
|
deadline,
|
||||||
|
child_target=_child_trio_in_subint,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
assert isinstance(pid, int) and pid > 0
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
# tier-3 backend test: drive the registered `subint_forkserver`
|
||||||
|
# spawn backend end-to-end through tractor's actor-nursery +
|
||||||
|
# portal-RPC machinery.
|
||||||
|
# ----------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
async def _trivial_rpc() -> str:
|
||||||
|
'''
|
||||||
|
Minimal subactor-side RPC body: just return a sentinel
|
||||||
|
string the parent can assert on.
|
||||||
|
|
||||||
|
'''
|
||||||
|
return 'hello from subint-forkserver child'
|
||||||
|
|
||||||
|
|
||||||
|
async def _happy_path_forkserver(
|
||||||
|
reg_addr: tuple[str, int | str],
|
||||||
|
deadline: float,
|
||||||
|
) -> None:
|
||||||
|
'''
|
||||||
|
Parent-side harness: stand up a root actor, open an actor
|
||||||
|
nursery, spawn one subactor via the currently-selected
|
||||||
|
spawn backend (which this test will have flipped to
|
||||||
|
`subint_forkserver`), run a trivial RPC through its
|
||||||
|
portal, assert the round-trip result.
|
||||||
|
|
||||||
|
'''
|
||||||
|
with trio.fail_after(deadline):
|
||||||
|
async with (
|
||||||
|
tractor.open_root_actor(
|
||||||
|
registry_addrs=[reg_addr],
|
||||||
|
),
|
||||||
|
tractor.open_nursery() as an,
|
||||||
|
):
|
||||||
|
portal: tractor.Portal = await an.run_in_actor(
|
||||||
|
_trivial_rpc,
|
||||||
|
name='subint-forkserver-child',
|
||||||
|
)
|
||||||
|
result: str = await portal.wait_for_result()
|
||||||
|
assert result == 'hello from subint-forkserver child'
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def forkserver_spawn_method():
|
||||||
|
'''
|
||||||
|
Flip `tractor.spawn._spawn._spawn_method` to
|
||||||
|
`'subint_forkserver'` for the duration of a test, then
|
||||||
|
restore whatever was in place before (usually the
|
||||||
|
session-level CLI choice, typically `'trio'`).
|
||||||
|
|
||||||
|
Without this, other tests in the same session would
|
||||||
|
observe the global flip and start spawning via fork —
|
||||||
|
which is almost certainly NOT what their assertions were
|
||||||
|
written against.
|
||||||
|
|
||||||
|
'''
|
||||||
|
prev_method: str = _spawn_mod._spawn_method
|
||||||
|
prev_ctx = _spawn_mod._ctx
|
||||||
|
try_set_start_method('subint_forkserver')
|
||||||
|
try:
|
||||||
|
yield
|
||||||
|
finally:
|
||||||
|
_spawn_mod._spawn_method = prev_method
|
||||||
|
_spawn_mod._ctx = prev_ctx
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.timeout(60, method='thread')
|
||||||
|
def test_subint_forkserver_spawn_basic(
|
||||||
|
reg_addr: tuple[str, int | str],
|
||||||
|
forkserver_spawn_method,
|
||||||
|
) -> None:
|
||||||
|
'''
|
||||||
|
Happy-path: spawn ONE subactor via the
|
||||||
|
`subint_forkserver` backend (parent-side fork from a
|
||||||
|
main-interp worker thread), do a trivial portal-RPC
|
||||||
|
round-trip, tear the nursery down cleanly.
|
||||||
|
|
||||||
|
If this passes, the "forkserver + tractor runtime" arch
|
||||||
|
is proven end-to-end: the registered
|
||||||
|
`subint_forkserver_proc` spawn target successfully
|
||||||
|
forks a child, the child runs `_actor_child_main()` +
|
||||||
|
completes IPC handshake + serves an RPC, and the parent
|
||||||
|
reaps via `_ForkedProc.wait()` without regressing any of
|
||||||
|
the normal nursery teardown invariants.
|
||||||
|
|
||||||
|
'''
|
||||||
|
deadline: float = 20.0
|
||||||
|
with dump_on_hang(
|
||||||
|
seconds=deadline,
|
||||||
|
path='/tmp/subint_forkserver_spawn_basic.dump',
|
||||||
|
):
|
||||||
|
trio.run(
|
||||||
|
partial(
|
||||||
|
_happy_path_forkserver,
|
||||||
|
reg_addr,
|
||||||
|
deadline,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
@ -21,6 +21,16 @@ _non_linux: bool = platform.system() != 'Linux'
|
||||||
_friggin_windows: bool = platform.system() == 'Windows'
|
_friggin_windows: bool = platform.system() == 'Windows'
|
||||||
|
|
||||||
|
|
||||||
|
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||||
|
'subint',
|
||||||
|
reason=(
|
||||||
|
'XXX SUBINT HANGING TEST XXX\n'
|
||||||
|
'See oustanding issue(s)\n'
|
||||||
|
# TODO, put issue link!
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
async def assert_err(delay=0):
|
async def assert_err(delay=0):
|
||||||
await trio.sleep(delay)
|
await trio.sleep(delay)
|
||||||
assert 0
|
assert 0
|
||||||
|
|
@ -110,8 +120,17 @@ def test_remote_error(reg_addr, args_err):
|
||||||
assert exc.boxed_type == errtype
|
assert exc.boxed_type == errtype
|
||||||
|
|
||||||
|
|
||||||
|
# @pytest.mark.skipon_spawn_backend(
|
||||||
|
# 'subint',
|
||||||
|
# reason=(
|
||||||
|
# 'XXX SUBINT HANGING TEST XXX\n'
|
||||||
|
# 'See oustanding issue(s)\n'
|
||||||
|
# # TODO, put issue link!
|
||||||
|
# )
|
||||||
|
# )
|
||||||
def test_multierror(
|
def test_multierror(
|
||||||
reg_addr: tuple[str, int],
|
reg_addr: tuple[str, int],
|
||||||
|
start_method: str,
|
||||||
):
|
):
|
||||||
'''
|
'''
|
||||||
Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
||||||
|
|
@ -141,15 +160,28 @@ def test_multierror(
|
||||||
trio.run(main)
|
trio.run(main)
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize('delay', (0, 0.5))
|
|
||||||
@pytest.mark.parametrize(
|
@pytest.mark.parametrize(
|
||||||
'num_subactors', range(25, 26),
|
'delay',
|
||||||
|
(0, 0.5),
|
||||||
|
ids='delays={}'.format,
|
||||||
)
|
)
|
||||||
def test_multierror_fast_nursery(reg_addr, start_method, num_subactors, delay):
|
@pytest.mark.parametrize(
|
||||||
"""Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
'num_subactors',
|
||||||
|
range(25, 26),
|
||||||
|
ids= 'num_subs={}'.format,
|
||||||
|
)
|
||||||
|
def test_multierror_fast_nursery(
|
||||||
|
reg_addr: tuple,
|
||||||
|
start_method: str,
|
||||||
|
num_subactors: int,
|
||||||
|
delay: float,
|
||||||
|
):
|
||||||
|
'''
|
||||||
|
Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
||||||
more then one actor errors and also with a delay before failure
|
more then one actor errors and also with a delay before failure
|
||||||
to test failure during an ongoing spawning.
|
to test failure during an ongoing spawning.
|
||||||
"""
|
|
||||||
|
'''
|
||||||
async def main():
|
async def main():
|
||||||
async with tractor.open_nursery(
|
async with tractor.open_nursery(
|
||||||
registry_addrs=[reg_addr],
|
registry_addrs=[reg_addr],
|
||||||
|
|
@ -189,8 +221,15 @@ async def do_nothing():
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize('mechanism', ['nursery_cancel', KeyboardInterrupt])
|
@pytest.mark.parametrize(
|
||||||
def test_cancel_single_subactor(reg_addr, mechanism):
|
'mechanism', [
|
||||||
|
'nursery_cancel',
|
||||||
|
KeyboardInterrupt,
|
||||||
|
])
|
||||||
|
def test_cancel_single_subactor(
|
||||||
|
reg_addr: tuple,
|
||||||
|
mechanism: str|KeyboardInterrupt,
|
||||||
|
):
|
||||||
'''
|
'''
|
||||||
Ensure a ``ActorNursery.start_actor()`` spawned subactor
|
Ensure a ``ActorNursery.start_actor()`` spawned subactor
|
||||||
cancels when the nursery is cancelled.
|
cancels when the nursery is cancelled.
|
||||||
|
|
@ -232,9 +271,12 @@ async def stream_forever():
|
||||||
await trio.sleep(0.01)
|
await trio.sleep(0.01)
|
||||||
|
|
||||||
|
|
||||||
@tractor_test
|
@tractor_test(
|
||||||
async def test_cancel_infinite_streamer(start_method):
|
timeout=6,
|
||||||
|
)
|
||||||
|
async def test_cancel_infinite_streamer(
|
||||||
|
start_method: str
|
||||||
|
):
|
||||||
# stream for at most 1 seconds
|
# stream for at most 1 seconds
|
||||||
with (
|
with (
|
||||||
trio.fail_after(4),
|
trio.fail_after(4),
|
||||||
|
|
@ -257,6 +299,14 @@ async def test_cancel_infinite_streamer(start_method):
|
||||||
assert n.cancelled
|
assert n.cancelled
|
||||||
|
|
||||||
|
|
||||||
|
# @pytest.mark.skipon_spawn_backend(
|
||||||
|
# 'subint',
|
||||||
|
# reason=(
|
||||||
|
# 'XXX SUBINT HANGING TEST XXX\n'
|
||||||
|
# 'See oustanding issue(s)\n'
|
||||||
|
# # TODO, put issue link!
|
||||||
|
# )
|
||||||
|
# )
|
||||||
@pytest.mark.parametrize(
|
@pytest.mark.parametrize(
|
||||||
'num_actors_and_errs',
|
'num_actors_and_errs',
|
||||||
[
|
[
|
||||||
|
|
@ -286,7 +336,9 @@ async def test_cancel_infinite_streamer(start_method):
|
||||||
'no_daemon_actors_fail_all_run_in_actors_sleep_then_fail',
|
'no_daemon_actors_fail_all_run_in_actors_sleep_then_fail',
|
||||||
],
|
],
|
||||||
)
|
)
|
||||||
@tractor_test
|
@tractor_test(
|
||||||
|
timeout=10,
|
||||||
|
)
|
||||||
async def test_some_cancels_all(
|
async def test_some_cancels_all(
|
||||||
num_actors_and_errs: tuple,
|
num_actors_and_errs: tuple,
|
||||||
start_method: str,
|
start_method: str,
|
||||||
|
|
@ -370,7 +422,10 @@ async def test_some_cancels_all(
|
||||||
pytest.fail("Should have gotten a remote assertion error?")
|
pytest.fail("Should have gotten a remote assertion error?")
|
||||||
|
|
||||||
|
|
||||||
async def spawn_and_error(breadth, depth) -> None:
|
async def spawn_and_error(
|
||||||
|
breadth: int,
|
||||||
|
depth: int,
|
||||||
|
) -> None:
|
||||||
name = tractor.current_actor().name
|
name = tractor.current_actor().name
|
||||||
async with tractor.open_nursery() as nursery:
|
async with tractor.open_nursery() as nursery:
|
||||||
for i in range(breadth):
|
for i in range(breadth):
|
||||||
|
|
@ -396,7 +451,10 @@ async def spawn_and_error(breadth, depth) -> None:
|
||||||
|
|
||||||
|
|
||||||
@tractor_test
|
@tractor_test
|
||||||
async def test_nested_multierrors(loglevel, start_method):
|
async def test_nested_multierrors(
|
||||||
|
loglevel: str,
|
||||||
|
start_method: str,
|
||||||
|
):
|
||||||
'''
|
'''
|
||||||
Test that failed actor sets are wrapped in `BaseExceptionGroup`s. This
|
Test that failed actor sets are wrapped in `BaseExceptionGroup`s. This
|
||||||
test goes only 2 nurseries deep but we should eventually have tests
|
test goes only 2 nurseries deep but we should eventually have tests
|
||||||
|
|
@ -483,20 +541,21 @@ async def test_nested_multierrors(loglevel, start_method):
|
||||||
|
|
||||||
@no_windows
|
@no_windows
|
||||||
def test_cancel_via_SIGINT(
|
def test_cancel_via_SIGINT(
|
||||||
loglevel,
|
loglevel: str,
|
||||||
start_method,
|
start_method: str,
|
||||||
spawn_backend,
|
|
||||||
):
|
):
|
||||||
"""Ensure that a control-C (SIGINT) signal cancels both the parent and
|
'''
|
||||||
|
Ensure that a control-C (SIGINT) signal cancels both the parent and
|
||||||
child processes in trionic fashion
|
child processes in trionic fashion
|
||||||
"""
|
|
||||||
|
'''
|
||||||
pid: int = os.getpid()
|
pid: int = os.getpid()
|
||||||
|
|
||||||
async def main():
|
async def main():
|
||||||
with trio.fail_after(2):
|
with trio.fail_after(2):
|
||||||
async with tractor.open_nursery() as tn:
|
async with tractor.open_nursery() as tn:
|
||||||
await tn.start_actor('sucka')
|
await tn.start_actor('sucka')
|
||||||
if 'mp' in spawn_backend:
|
if 'mp' in start_method:
|
||||||
time.sleep(0.1)
|
time.sleep(0.1)
|
||||||
os.kill(pid, signal.SIGINT)
|
os.kill(pid, signal.SIGINT)
|
||||||
await trio.sleep_forever()
|
await trio.sleep_forever()
|
||||||
|
|
@ -580,6 +639,14 @@ async def spawn_sub_with_sync_blocking_task():
|
||||||
print('exiting first subactor layer..\n')
|
print('exiting first subactor layer..\n')
|
||||||
|
|
||||||
|
|
||||||
|
# @pytest.mark.skipon_spawn_backend(
|
||||||
|
# 'subint',
|
||||||
|
# reason=(
|
||||||
|
# 'XXX SUBINT HANGING TEST XXX\n'
|
||||||
|
# 'See oustanding issue(s)\n'
|
||||||
|
# # TODO, put issue link!
|
||||||
|
# )
|
||||||
|
# )
|
||||||
@pytest.mark.parametrize(
|
@pytest.mark.parametrize(
|
||||||
'man_cancel_outer',
|
'man_cancel_outer',
|
||||||
[
|
[
|
||||||
|
|
@ -694,7 +761,7 @@ def test_cancel_while_childs_child_in_sync_sleep(
|
||||||
|
|
||||||
|
|
||||||
def test_fast_graceful_cancel_when_spawn_task_in_soft_proc_wait_for_daemon(
|
def test_fast_graceful_cancel_when_spawn_task_in_soft_proc_wait_for_daemon(
|
||||||
start_method,
|
start_method: str,
|
||||||
):
|
):
|
||||||
'''
|
'''
|
||||||
This is a very subtle test which demonstrates how cancellation
|
This is a very subtle test which demonstrates how cancellation
|
||||||
|
|
|
||||||
|
|
@ -26,6 +26,15 @@ from tractor._testing import (
|
||||||
|
|
||||||
from .conftest import cpu_scaling_factor
|
from .conftest import cpu_scaling_factor
|
||||||
|
|
||||||
|
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||||
|
'subint',
|
||||||
|
reason=(
|
||||||
|
'XXX SUBINT GIL-CONTENTION HANGING TEST XXX\n'
|
||||||
|
'See oustanding issue(s)\n'
|
||||||
|
# TODO, put issue link!
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
# XXX TODO cases:
|
# XXX TODO cases:
|
||||||
# - [x] WE cancelled the peer and thus should not see any raised
|
# - [x] WE cancelled the peer and thus should not see any raised
|
||||||
# `ContextCancelled` as it should be reaped silently?
|
# `ContextCancelled` as it should be reaped silently?
|
||||||
|
|
|
||||||
|
|
@ -7,6 +7,14 @@ import tractor
|
||||||
from tractor.experimental import msgpub
|
from tractor.experimental import msgpub
|
||||||
from tractor._testing import tractor_test
|
from tractor._testing import tractor_test
|
||||||
|
|
||||||
|
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||||
|
'subint',
|
||||||
|
reason=(
|
||||||
|
'XXX SUBINT HANGING TEST XXX\n'
|
||||||
|
'See oustanding issue(s)\n'
|
||||||
|
# TODO, put issue link!
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
def test_type_checks():
|
def test_type_checks():
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -14,6 +14,14 @@ from tractor.ipc._shm import (
|
||||||
attach_shm_list,
|
attach_shm_list,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||||
|
'subint',
|
||||||
|
reason=(
|
||||||
|
'XXX SUBINT GIL-CONTENTION HANGING TEST XXX\n'
|
||||||
|
'See oustanding issue(s)\n'
|
||||||
|
# TODO, put issue link!
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
@tractor.context
|
@tractor.context
|
||||||
async def child_attach_shml_alot(
|
async def child_attach_shml_alot(
|
||||||
|
|
|
||||||
|
|
@ -161,6 +161,14 @@ def test_subint_happy_teardown(
|
||||||
trio.run(partial(_happy_path, reg_addr, deadline))
|
trio.run(partial(_happy_path, reg_addr, deadline))
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.skipon_spawn_backend(
|
||||||
|
'subint',
|
||||||
|
reason=(
|
||||||
|
'XXX SUBINT HANGING TEST XXX\n'
|
||||||
|
'See oustanding issue(s)\n'
|
||||||
|
# TODO, put issue link!
|
||||||
|
)
|
||||||
|
)
|
||||||
# Wall-clock bound via `pytest-timeout` (`method='thread'`)
|
# Wall-clock bound via `pytest-timeout` (`method='thread'`)
|
||||||
# as defense-in-depth over the inner `trio.fail_after(15)`.
|
# as defense-in-depth over the inner `trio.fail_after(15)`.
|
||||||
# Under the orphaned-channel hang class described in
|
# Under the orphaned-channel hang class described in
|
||||||
|
|
|
||||||
|
|
@ -224,8 +224,10 @@ def pytest_addoption(
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def pytest_configure(config):
|
def pytest_configure(
|
||||||
backend = config.option.spawn_backend
|
config: pytest.Config,
|
||||||
|
):
|
||||||
|
backend: str = config.option.spawn_backend
|
||||||
from tractor.spawn._spawn import try_set_start_method
|
from tractor.spawn._spawn import try_set_start_method
|
||||||
try:
|
try:
|
||||||
try_set_start_method(backend)
|
try_set_start_method(backend)
|
||||||
|
|
@ -241,10 +243,52 @@ def pytest_configure(config):
|
||||||
'markers',
|
'markers',
|
||||||
'no_tpt(proto_key): test will (likely) not behave with tpt backend'
|
'no_tpt(proto_key): test will (likely) not behave with tpt backend'
|
||||||
)
|
)
|
||||||
|
config.addinivalue_line(
|
||||||
|
'markers',
|
||||||
|
'skipon_spawn_backend(*start_methods, reason=None): '
|
||||||
|
'skip this test under any of the given `--spawn-backend` '
|
||||||
|
'values; useful for backend-specific known-hang / -borked '
|
||||||
|
'cases (e.g. the `subint` GIL-starvation class documented '
|
||||||
|
'in `ai/conc-anal/subint_sigint_starvation_issue.md`).'
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def pytest_collection_modifyitems(
|
||||||
|
config: pytest.Config,
|
||||||
|
items: list[pytest.Function],
|
||||||
|
):
|
||||||
|
'''
|
||||||
|
Expand any `@pytest.mark.skipon_spawn_backend('<backend>'[,
|
||||||
|
...], reason='...')` markers into concrete
|
||||||
|
`pytest.mark.skip(reason=...)` calls for tests whose
|
||||||
|
backend-arg set contains the active `--spawn-backend`.
|
||||||
|
|
||||||
|
Uses `item.iter_markers(name=...)` which walks function +
|
||||||
|
class + module-level marks in the correct scope order (and
|
||||||
|
handles both the single-`MarkDecorator` and `list[Mark]`
|
||||||
|
forms of a module-level `pytestmark`) — so the same marker
|
||||||
|
works at any level a user puts it.
|
||||||
|
|
||||||
|
'''
|
||||||
|
backend: str = config.option.spawn_backend
|
||||||
|
default_reason: str = f'Borked on --spawn-backend={backend!r}'
|
||||||
|
for item in items:
|
||||||
|
for mark in item.iter_markers(name='skipon_spawn_backend'):
|
||||||
|
if backend in mark.args:
|
||||||
|
reason: str = mark.kwargs.get(
|
||||||
|
'reason',
|
||||||
|
default_reason,
|
||||||
|
)
|
||||||
|
item.add_marker(pytest.mark.skip(reason=reason))
|
||||||
|
# first matching mark wins; no value in stacking
|
||||||
|
# multiple `skip`s on the same item.
|
||||||
|
break
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture(scope='session')
|
@pytest.fixture(scope='session')
|
||||||
def debug_mode(request) -> bool:
|
def debug_mode(
|
||||||
|
request: pytest.FixtureRequest,
|
||||||
|
) -> bool:
|
||||||
'''
|
'''
|
||||||
Flag state for whether `--tpdb` (for `tractor`-py-debugger)
|
Flag state for whether `--tpdb` (for `tractor`-py-debugger)
|
||||||
was passed to the test run.
|
was passed to the test run.
|
||||||
|
|
@ -258,12 +302,16 @@ def debug_mode(request) -> bool:
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture(scope='session')
|
@pytest.fixture(scope='session')
|
||||||
def spawn_backend(request) -> str:
|
def spawn_backend(
|
||||||
|
request: pytest.FixtureRequest,
|
||||||
|
) -> str:
|
||||||
return request.config.option.spawn_backend
|
return request.config.option.spawn_backend
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture(scope='session')
|
@pytest.fixture(scope='session')
|
||||||
def tpt_protos(request) -> list[str]:
|
def tpt_protos(
|
||||||
|
request: pytest.FixtureRequest,
|
||||||
|
) -> list[str]:
|
||||||
|
|
||||||
# allow quoting on CLI
|
# allow quoting on CLI
|
||||||
proto_keys: list[str] = [
|
proto_keys: list[str] = [
|
||||||
|
|
@ -291,7 +339,7 @@ def tpt_protos(request) -> list[str]:
|
||||||
autouse=True,
|
autouse=True,
|
||||||
)
|
)
|
||||||
def tpt_proto(
|
def tpt_proto(
|
||||||
request,
|
request: pytest.FixtureRequest,
|
||||||
tpt_protos: list[str],
|
tpt_protos: list[str],
|
||||||
) -> str:
|
) -> str:
|
||||||
proto_key: str = tpt_protos[0]
|
proto_key: str = tpt_protos[0]
|
||||||
|
|
@ -343,7 +391,6 @@ def pytest_generate_tests(
|
||||||
metafunc: pytest.Metafunc,
|
metafunc: pytest.Metafunc,
|
||||||
):
|
):
|
||||||
spawn_backend: str = metafunc.config.option.spawn_backend
|
spawn_backend: str = metafunc.config.option.spawn_backend
|
||||||
|
|
||||||
if not spawn_backend:
|
if not spawn_backend:
|
||||||
# XXX some weird windows bug with `pytest`?
|
# XXX some weird windows bug with `pytest`?
|
||||||
spawn_backend = 'trio'
|
spawn_backend = 'trio'
|
||||||
|
|
|
||||||
|
|
@ -63,6 +63,22 @@ SpawnMethodKey = Literal[
|
||||||
'mp_spawn',
|
'mp_spawn',
|
||||||
'mp_forkserver', # posix only
|
'mp_forkserver', # posix only
|
||||||
'subint', # py3.14+ via `concurrent.interpreters` (PEP 734)
|
'subint', # py3.14+ via `concurrent.interpreters` (PEP 734)
|
||||||
|
# EXPERIMENTAL — blocked at the CPython level. The
|
||||||
|
# design goal was a `trio+fork`-safe subproc spawn via
|
||||||
|
# `os.fork()` from a trio-free launchpad sub-interpreter,
|
||||||
|
# but CPython's `PyOS_AfterFork_Child` → `_PyInterpreterState_DeleteExceptMain`
|
||||||
|
# requires fork come from the main interp. See
|
||||||
|
# `tractor.spawn._subint_fork` +
|
||||||
|
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
# + issue #379 for the full analysis.
|
||||||
|
'subint_fork',
|
||||||
|
# EXPERIMENTAL — the `subint_fork` workaround. `os.fork()`
|
||||||
|
# from a non-trio worker thread (never entered a subint)
|
||||||
|
# is CPython-legal and works cleanly; forked child runs
|
||||||
|
# `tractor._child._actor_child_main()` against a trio
|
||||||
|
# runtime, exactly like `trio_proc` but via fork instead
|
||||||
|
# of subproc-exec. See `tractor.spawn._subint_forkserver`.
|
||||||
|
'subint_forkserver',
|
||||||
]
|
]
|
||||||
_spawn_method: SpawnMethodKey = 'trio'
|
_spawn_method: SpawnMethodKey = 'trio'
|
||||||
|
|
||||||
|
|
@ -115,15 +131,14 @@ def try_set_start_method(
|
||||||
case 'trio':
|
case 'trio':
|
||||||
_ctx = None
|
_ctx = None
|
||||||
|
|
||||||
case 'subint':
|
case 'subint' | 'subint_fork' | 'subint_forkserver':
|
||||||
# subints need no `mp.context`; feature-gate on the
|
# All subint-family backends need no `mp.context`;
|
||||||
# py3.14 public `concurrent.interpreters` wrapper
|
# all three feature-gate on the py3.14 public
|
||||||
# (PEP 734). We actually drive the private
|
# `concurrent.interpreters` wrapper (PEP 734). See
|
||||||
# `_interpreters` C module in legacy mode — see
|
# `tractor.spawn._subint` for the detailed
|
||||||
# `tractor.spawn._subint` for why — but py3.13's
|
# reasoning. `subint_fork` is blocked at the
|
||||||
# vintage of that private module hangs under our
|
# CPython level (raises `NotImplementedError`);
|
||||||
# multi-trio usage, so we refuse it via the public-
|
# `subint_forkserver` is the working workaround.
|
||||||
# module presence check.
|
|
||||||
from ._subint import _has_subints
|
from ._subint import _has_subints
|
||||||
if not _has_subints:
|
if not _has_subints:
|
||||||
raise RuntimeError(
|
raise RuntimeError(
|
||||||
|
|
@ -461,6 +476,8 @@ async def new_proc(
|
||||||
from ._trio import trio_proc
|
from ._trio import trio_proc
|
||||||
from ._mp import mp_proc
|
from ._mp import mp_proc
|
||||||
from ._subint import subint_proc
|
from ._subint import subint_proc
|
||||||
|
from ._subint_fork import subint_fork_proc
|
||||||
|
from ._subint_forkserver import subint_forkserver_proc
|
||||||
|
|
||||||
|
|
||||||
# proc spawning backend target map
|
# proc spawning backend target map
|
||||||
|
|
@ -469,4 +486,14 @@ _methods: dict[SpawnMethodKey, Callable] = {
|
||||||
'mp_spawn': mp_proc,
|
'mp_spawn': mp_proc,
|
||||||
'mp_forkserver': mp_proc,
|
'mp_forkserver': mp_proc,
|
||||||
'subint': subint_proc,
|
'subint': subint_proc,
|
||||||
|
# blocked at CPython level — see `_subint_fork.py` +
|
||||||
|
# `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||||
|
# Kept here so `--spawn-backend=subint_fork` routes to a
|
||||||
|
# clean `NotImplementedError` with pointer to the analysis,
|
||||||
|
# rather than an "invalid backend" error.
|
||||||
|
'subint_fork': subint_fork_proc,
|
||||||
|
# WIP — fork-from-non-trio-worker-thread, works on py3.14+
|
||||||
|
# (validated via `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`).
|
||||||
|
# See `tractor.spawn._subint_forkserver`.
|
||||||
|
'subint_forkserver': subint_forkserver_proc,
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -431,3 +431,5 @@ async def subint_proc(
|
||||||
finally:
|
finally:
|
||||||
if not cancelled_during_spawn:
|
if not cancelled_during_spawn:
|
||||||
actor_nursery._children.pop(uid, None)
|
actor_nursery._children.pop(uid, None)
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,153 @@
|
||||||
|
# tractor: structured concurrent "actors".
|
||||||
|
# Copyright 2018-eternity Tyler Goodlet.
|
||||||
|
|
||||||
|
# This program is free software: you can redistribute it and/or modify
|
||||||
|
# it under the terms of the GNU Affero General Public License as published by
|
||||||
|
# the Free Software Foundation, either version 3 of the License, or
|
||||||
|
# (at your option) any later version.
|
||||||
|
|
||||||
|
# This program is distributed in the hope that it will be useful,
|
||||||
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
# GNU Affero General Public License for more details.
|
||||||
|
|
||||||
|
# You should have received a copy of the GNU Affero General Public License
|
||||||
|
# along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
'''
|
||||||
|
`subint_fork` spawn backend — BLOCKED at CPython level.
|
||||||
|
|
||||||
|
The idea was to use a sub-interpreter purely as a launchpad
|
||||||
|
from which to call `os.fork()`, sidestepping the well-known
|
||||||
|
trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
|
||||||
|
the forking interp had never imported `trio`.
|
||||||
|
|
||||||
|
**IT DOES NOT WORK ON CURRENT CPYTHON.** The fork syscall
|
||||||
|
itself succeeds (in the parent), but the forked CHILD
|
||||||
|
process aborts immediately during CPython's post-fork
|
||||||
|
cleanup — `PyOS_AfterFork_Child()` calls
|
||||||
|
`_PyInterpreterState_DeleteExceptMain()` which refuses to
|
||||||
|
operate when the current tstate belongs to a non-main
|
||||||
|
sub-interpreter.
|
||||||
|
|
||||||
|
Full annotated walkthrough from the user-visible error
|
||||||
|
(`Fatal Python error: _PyInterpreterState_DeleteExceptMain:
|
||||||
|
not main interpreter`) down to the specific CPython source
|
||||||
|
lines that enforce this is in
|
||||||
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||||
|
|
||||||
|
We keep this submodule as a dedicated documentation of the
|
||||||
|
attempt. If CPython ever lifts the restriction (e.g., via a
|
||||||
|
force-destroy primitive or a hook that swaps tstate to main
|
||||||
|
pre-fork), the structural sketch preserved in this file's
|
||||||
|
git history is a concrete starting point for a working impl.
|
||||||
|
|
||||||
|
See also: issue #379's "Our own thoughts, ideas for
|
||||||
|
`fork()`-workaround/hacks..." section.
|
||||||
|
|
||||||
|
'''
|
||||||
|
from __future__ import annotations
|
||||||
|
import sys
|
||||||
|
from typing import (
|
||||||
|
Any,
|
||||||
|
TYPE_CHECKING,
|
||||||
|
)
|
||||||
|
|
||||||
|
import trio
|
||||||
|
from trio import TaskStatus
|
||||||
|
|
||||||
|
from tractor.runtime._portal import Portal
|
||||||
|
from ._subint import _has_subints
|
||||||
|
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from tractor.discovery._addr import UnwrappedAddress
|
||||||
|
from tractor.runtime._runtime import Actor
|
||||||
|
from tractor.runtime._supervise import ActorNursery
|
||||||
|
|
||||||
|
|
||||||
|
async def subint_fork_proc(
|
||||||
|
name: str,
|
||||||
|
actor_nursery: ActorNursery,
|
||||||
|
subactor: Actor,
|
||||||
|
errors: dict[tuple[str, str], Exception],
|
||||||
|
|
||||||
|
bind_addrs: list[UnwrappedAddress],
|
||||||
|
parent_addr: UnwrappedAddress,
|
||||||
|
_runtime_vars: dict[str, Any],
|
||||||
|
*,
|
||||||
|
infect_asyncio: bool = False,
|
||||||
|
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
|
||||||
|
proc_kwargs: dict[str, any] = {},
|
||||||
|
|
||||||
|
) -> None:
|
||||||
|
'''
|
||||||
|
EXPERIMENTAL — currently blocked by a CPython invariant.
|
||||||
|
|
||||||
|
Attempted design
|
||||||
|
----------------
|
||||||
|
1. Parent creates a fresh legacy-config subint.
|
||||||
|
2. A worker OS-thread drives the subint through a
|
||||||
|
bootstrap that calls `os.fork()`.
|
||||||
|
3. In the forked CHILD, `os.execv()` back into
|
||||||
|
`python -m tractor._child` (fresh process).
|
||||||
|
4. In the fork-PARENT, the launchpad subint is destroyed;
|
||||||
|
parent-side trio task proceeds identically to
|
||||||
|
`trio_proc()` (wait for child connect-back, send
|
||||||
|
`SpawnSpec`, yield `Portal`, etc.).
|
||||||
|
|
||||||
|
Why it doesn't work
|
||||||
|
-------------------
|
||||||
|
CPython's `PyOS_AfterFork_Child()` (in
|
||||||
|
`Modules/posixmodule.c`) calls
|
||||||
|
`_PyInterpreterState_DeleteExceptMain()` (in
|
||||||
|
`Python/pystate.c`) as part of post-fork cleanup. That
|
||||||
|
function requires the current `PyThreadState` belong to
|
||||||
|
the **main** interpreter. When `os.fork()` is called
|
||||||
|
from within a sub-interpreter, the child wakes up with
|
||||||
|
its tstate still pointing at the (now-stale) subint, and
|
||||||
|
this check fails with `PyStatus_ERR("not main
|
||||||
|
interpreter")`, triggering a `fatal_error` goto and
|
||||||
|
aborting the child process.
|
||||||
|
|
||||||
|
CPython devs acknowledge the fragility with a
|
||||||
|
`// Ideally we could guarantee tstate is running main.`
|
||||||
|
comment right above the call site.
|
||||||
|
|
||||||
|
See
|
||||||
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
for the full annotated walkthrough + upstream-report
|
||||||
|
draft.
|
||||||
|
|
||||||
|
Why we keep this stub
|
||||||
|
---------------------
|
||||||
|
- Documents the attempt in-tree so the next person who
|
||||||
|
has this idea finds the reason it doesn't work rather
|
||||||
|
than rediscovering the same CPython-level dead end.
|
||||||
|
- If CPython ever lifts the restriction (e.g., via a
|
||||||
|
force-destroy primitive or a hook that swaps tstate
|
||||||
|
to main pre-fork), this submodule's git history holds
|
||||||
|
the structural sketch of what a working impl would
|
||||||
|
look like.
|
||||||
|
|
||||||
|
'''
|
||||||
|
if not _has_subints:
|
||||||
|
raise RuntimeError(
|
||||||
|
f'The {"subint_fork"!r} spawn backend requires '
|
||||||
|
f'Python 3.14+.\n'
|
||||||
|
f'Current runtime: {sys.version}'
|
||||||
|
)
|
||||||
|
|
||||||
|
raise NotImplementedError(
|
||||||
|
'The `subint_fork` spawn backend is blocked at the '
|
||||||
|
'CPython level — `os.fork()` from a non-main '
|
||||||
|
'sub-interpreter is refused by '
|
||||||
|
'`PyOS_AfterFork_Child()` → '
|
||||||
|
'`_PyInterpreterState_DeleteExceptMain()`, which '
|
||||||
|
'aborts the child with '
|
||||||
|
'`Fatal Python error: not main interpreter`.\n'
|
||||||
|
'\n'
|
||||||
|
'See '
|
||||||
|
'`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` '
|
||||||
|
'for the full analysis + upstream-report draft.'
|
||||||
|
)
|
||||||
|
|
@ -0,0 +1,687 @@
|
||||||
|
# tractor: structured concurrent "actors".
|
||||||
|
# Copyright 2018-eternity Tyler Goodlet.
|
||||||
|
|
||||||
|
# This program is free software: you can redistribute it and/or modify
|
||||||
|
# it under the terms of the GNU Affero General Public License as published by
|
||||||
|
# the Free Software Foundation, either version 3 of the License, or
|
||||||
|
# (at your option) any later version.
|
||||||
|
|
||||||
|
# This program is distributed in the hope that it will be useful,
|
||||||
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
# GNU Affero General Public License for more details.
|
||||||
|
|
||||||
|
# You should have received a copy of the GNU Affero General Public License
|
||||||
|
# along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
'''
|
||||||
|
Forkserver-style `os.fork()` primitives for the `subint`-hosted
|
||||||
|
actor model.
|
||||||
|
|
||||||
|
Background
|
||||||
|
----------
|
||||||
|
CPython refuses `os.fork()` from a non-main sub-interpreter:
|
||||||
|
`PyOS_AfterFork_Child()` →
|
||||||
|
`_PyInterpreterState_DeleteExceptMain()` gates on the calling
|
||||||
|
thread's tstate belonging to the main interpreter and aborts
|
||||||
|
the forked child otherwise. The full walkthrough (with source
|
||||||
|
refs) lives in
|
||||||
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||||
|
|
||||||
|
However `os.fork()` from a regular `threading.Thread` attached
|
||||||
|
to the *main* interpreter — i.e. a worker thread that has
|
||||||
|
never entered a subint — works cleanly. Empirically validated
|
||||||
|
across four scenarios by
|
||||||
|
`ai/conc-anal/subint_fork_from_main_thread_smoketest.py` on
|
||||||
|
py3.14.
|
||||||
|
|
||||||
|
This submodule lifts the validated primitives out of the
|
||||||
|
smoke-test and into tractor proper, so they can eventually be
|
||||||
|
wired into a real "subint forkserver" spawn backend — where:
|
||||||
|
|
||||||
|
- A dedicated main-interp worker thread owns all `os.fork()`
|
||||||
|
calls (never enters a subint).
|
||||||
|
- The tractor parent-actor's `trio.run()` lives in a
|
||||||
|
sub-interpreter on a different worker thread.
|
||||||
|
- When a spawn is requested, the trio-task signals the
|
||||||
|
forkserver thread; the forkserver forks; child re-enters
|
||||||
|
the same pattern (trio in a subint + forkserver on main).
|
||||||
|
|
||||||
|
This mirrors the stdlib `multiprocessing.forkserver` design
|
||||||
|
but keeps the forkserver in-process for faster spawn latency
|
||||||
|
and inherited parent state.
|
||||||
|
|
||||||
|
Status
|
||||||
|
------
|
||||||
|
**EXPERIMENTAL** — wired as the `'subint_forkserver'` entry
|
||||||
|
in `tractor.spawn._spawn._methods` and selectable via
|
||||||
|
`try_set_start_method('subint_forkserver')` / `--spawn-backend
|
||||||
|
=subint_forkserver`. Parent-side spawn, child-side runtime
|
||||||
|
bring-up and normal portal-RPC teardown are validated by the
|
||||||
|
backend-tier test in
|
||||||
|
`tests/spawn/test_subint_forkserver.py::test_subint_forkserver_spawn_basic`.
|
||||||
|
|
||||||
|
Still-open work (tracked on tractor #379):
|
||||||
|
|
||||||
|
- no cancellation / hard-kill stress coverage yet (counterpart
|
||||||
|
to `tests/test_subint_cancellation.py` for the plain
|
||||||
|
`subint` backend),
|
||||||
|
- child-side "subint-hosted root runtime" mode (the second
|
||||||
|
half of the envisioned arch — currently the forked child
|
||||||
|
runs plain `_trio_main` via `spawn_method='trio'`; the
|
||||||
|
subint-hosted variant is still the future step gated on
|
||||||
|
msgspec PEP 684 support),
|
||||||
|
- thread-hygiene audit of the two `threading.Thread`
|
||||||
|
primitives below, gated on the same msgspec unblock
|
||||||
|
(see TODO section further down).
|
||||||
|
|
||||||
|
TODO — cleanup gated on msgspec PEP 684 support
|
||||||
|
-----------------------------------------------
|
||||||
|
Both primitives below allocate a dedicated
|
||||||
|
`threading.Thread` rather than using
|
||||||
|
`trio.to_thread.run_sync()`. That's a cautious design
|
||||||
|
rooted in three distinct-but-entangled issues (GIL
|
||||||
|
starvation from legacy-config subints, tstate-recycling
|
||||||
|
destroy race on trio cache threads, fork-from-main-tstate
|
||||||
|
invariant). Some of those dissolve under PEP 684
|
||||||
|
isolated-mode subints; one requires empirical re-testing
|
||||||
|
to know.
|
||||||
|
|
||||||
|
Full analysis + audit plan for when we can revisit is in
|
||||||
|
`ai/conc-anal/subint_forkserver_thread_constraints_on_pep684_issue.md`.
|
||||||
|
Intent: file a follow-up GH issue linked to #379 once
|
||||||
|
[jcrist/msgspec#563](https://github.com/jcrist/msgspec/issues/563)
|
||||||
|
unblocks isolated-mode subints in tractor.
|
||||||
|
|
||||||
|
See also
|
||||||
|
--------
|
||||||
|
- `tractor.spawn._subint_fork` — the stub for the
|
||||||
|
fork-from-subint strategy that DIDN'T work (kept as
|
||||||
|
in-tree documentation of the attempt + CPython-level
|
||||||
|
block).
|
||||||
|
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
|
— the CPython source walkthrough.
|
||||||
|
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
|
||||||
|
— the standalone feasibility check (now delegates to
|
||||||
|
this module for the primitives it exercises).
|
||||||
|
|
||||||
|
'''
|
||||||
|
from __future__ import annotations
|
||||||
|
import os
|
||||||
|
import signal
|
||||||
|
import sys
|
||||||
|
import threading
|
||||||
|
from functools import partial
|
||||||
|
from typing import (
|
||||||
|
Any,
|
||||||
|
Callable,
|
||||||
|
TYPE_CHECKING,
|
||||||
|
)
|
||||||
|
|
||||||
|
import trio
|
||||||
|
from trio import TaskStatus
|
||||||
|
|
||||||
|
from tractor.log import get_logger
|
||||||
|
from tractor.msg import (
|
||||||
|
types as msgtypes,
|
||||||
|
pretty_struct,
|
||||||
|
)
|
||||||
|
from tractor.runtime._state import current_actor
|
||||||
|
from tractor.runtime._portal import Portal
|
||||||
|
from ._spawn import (
|
||||||
|
cancel_on_completion,
|
||||||
|
soft_kill,
|
||||||
|
)
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from tractor.discovery._addr import UnwrappedAddress
|
||||||
|
from tractor.ipc import (
|
||||||
|
_server,
|
||||||
|
)
|
||||||
|
from tractor.runtime._runtime import Actor
|
||||||
|
from tractor.runtime._supervise import ActorNursery
|
||||||
|
|
||||||
|
|
||||||
|
log = get_logger('tractor')
|
||||||
|
|
||||||
|
|
||||||
|
# Feature-gate: py3.14+ via the public `concurrent.interpreters`
|
||||||
|
# wrapper. Matches the gate in `tractor.spawn._subint` —
|
||||||
|
# see that module's docstring for why we require the public
|
||||||
|
# API's presence even though we reach into the private
|
||||||
|
# `_interpreters` C module for actual calls.
|
||||||
|
try:
|
||||||
|
from concurrent import interpreters as _public_interpreters # noqa: F401 # type: ignore
|
||||||
|
import _interpreters # type: ignore
|
||||||
|
_has_subints: bool = True
|
||||||
|
except ImportError:
|
||||||
|
_interpreters = None # type: ignore
|
||||||
|
_has_subints: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
def _format_child_exit(
|
||||||
|
status: int,
|
||||||
|
) -> str:
|
||||||
|
'''
|
||||||
|
Render `os.waitpid()`-returned status as a short human
|
||||||
|
string (`'rc=0'` / `'signal=SIGABRT'` / etc.) for log
|
||||||
|
output.
|
||||||
|
|
||||||
|
'''
|
||||||
|
if os.WIFEXITED(status):
|
||||||
|
return f'rc={os.WEXITSTATUS(status)}'
|
||||||
|
elif os.WIFSIGNALED(status):
|
||||||
|
sig: int = os.WTERMSIG(status)
|
||||||
|
return f'signal={signal.Signals(sig).name}'
|
||||||
|
else:
|
||||||
|
return f'raw_status={status}'
|
||||||
|
|
||||||
|
|
||||||
|
def wait_child(
|
||||||
|
pid: int,
|
||||||
|
*,
|
||||||
|
expect_exit_ok: bool = True,
|
||||||
|
) -> tuple[bool, str]:
|
||||||
|
'''
|
||||||
|
`os.waitpid()` + classify the child's exit as
|
||||||
|
expected-or-not.
|
||||||
|
|
||||||
|
`expect_exit_ok=True` → expect clean `rc=0`. `False` →
|
||||||
|
expect abnormal death (any signal or nonzero rc). Used
|
||||||
|
by the control-case smoke-test scenario where CPython
|
||||||
|
is meant to abort the child.
|
||||||
|
|
||||||
|
Returns `(ok, status_str)` — `ok` reflects whether the
|
||||||
|
observed outcome matches `expect_exit_ok`, `status_str`
|
||||||
|
is a short render of the actual status.
|
||||||
|
|
||||||
|
'''
|
||||||
|
_, status = os.waitpid(pid, 0)
|
||||||
|
exited_normally: bool = (
|
||||||
|
os.WIFEXITED(status)
|
||||||
|
and
|
||||||
|
os.WEXITSTATUS(status) == 0
|
||||||
|
)
|
||||||
|
ok: bool = (
|
||||||
|
exited_normally
|
||||||
|
if expect_exit_ok
|
||||||
|
else not exited_normally
|
||||||
|
)
|
||||||
|
return ok, _format_child_exit(status)
|
||||||
|
|
||||||
|
|
||||||
|
def fork_from_worker_thread(
|
||||||
|
child_target: Callable[[], int] | None = None,
|
||||||
|
*,
|
||||||
|
thread_name: str = 'subint-forkserver',
|
||||||
|
join_timeout: float = 10.0,
|
||||||
|
|
||||||
|
) -> int:
|
||||||
|
'''
|
||||||
|
`os.fork()` from a main-interp worker thread; return the
|
||||||
|
forked child's pid.
|
||||||
|
|
||||||
|
The calling context **must** be the main interpreter
|
||||||
|
(not a subinterpreter) — that's the whole point of this
|
||||||
|
primitive. A regular `threading.Thread(target=...)`
|
||||||
|
spawned from main-interp code satisfies this
|
||||||
|
automatically because Python attaches the thread's
|
||||||
|
tstate to the *calling* interpreter, and our main
|
||||||
|
thread's calling interp is always main.
|
||||||
|
|
||||||
|
If `child_target` is provided, it runs IN the forked
|
||||||
|
child process before `os._exit` is called. The callable
|
||||||
|
should return an int used as the child's exit rc. If
|
||||||
|
`child_target` is None, the child `_exit(0)`s immediately
|
||||||
|
(useful for the baseline sanity case).
|
||||||
|
|
||||||
|
On the PARENT side, this function drives the worker
|
||||||
|
thread to completion (`fork()` returns near-instantly;
|
||||||
|
the thread is expected to exit promptly) and then
|
||||||
|
returns the forked child's pid. Raises `RuntimeError`
|
||||||
|
if the worker thread fails to return within
|
||||||
|
`join_timeout` seconds — that'd be an unexpected CPython
|
||||||
|
pathology.
|
||||||
|
|
||||||
|
'''
|
||||||
|
if not _has_subints:
|
||||||
|
raise RuntimeError(
|
||||||
|
'subint-forkserver primitives require Python '
|
||||||
|
'3.14+ (public `concurrent.interpreters` module '
|
||||||
|
'not present on this runtime).'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Use a pipe to shuttle the forked child's pid from the
|
||||||
|
# worker thread back to the caller.
|
||||||
|
rfd, wfd = os.pipe()
|
||||||
|
|
||||||
|
def _worker() -> None:
|
||||||
|
'''
|
||||||
|
Runs on the forkserver worker thread. Forks; child
|
||||||
|
runs `child_target` (if any) and exits; parent side
|
||||||
|
writes the child pid to the pipe so the main-thread
|
||||||
|
caller can retrieve it.
|
||||||
|
|
||||||
|
'''
|
||||||
|
pid: int = os.fork()
|
||||||
|
if pid == 0:
|
||||||
|
# CHILD: close the pid-pipe ends (we don't use
|
||||||
|
# them here), run the user callable if any, exit.
|
||||||
|
os.close(rfd)
|
||||||
|
os.close(wfd)
|
||||||
|
rc: int = 0
|
||||||
|
if child_target is not None:
|
||||||
|
try:
|
||||||
|
rc = child_target() or 0
|
||||||
|
except BaseException as err:
|
||||||
|
log.error(
|
||||||
|
f'subint-forkserver child_target '
|
||||||
|
f'raised:\n'
|
||||||
|
f'|_{type(err).__name__}: {err}'
|
||||||
|
)
|
||||||
|
rc = 2
|
||||||
|
os._exit(rc)
|
||||||
|
else:
|
||||||
|
# PARENT (still inside the worker thread):
|
||||||
|
# hand the child pid back to main via pipe.
|
||||||
|
os.write(wfd, pid.to_bytes(8, 'little'))
|
||||||
|
|
||||||
|
worker: threading.Thread = threading.Thread(
|
||||||
|
target=_worker,
|
||||||
|
name=thread_name,
|
||||||
|
daemon=False,
|
||||||
|
)
|
||||||
|
worker.start()
|
||||||
|
worker.join(timeout=join_timeout)
|
||||||
|
if worker.is_alive():
|
||||||
|
# Pipe cleanup best-effort before bail.
|
||||||
|
try:
|
||||||
|
os.close(rfd)
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
os.close(wfd)
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
raise RuntimeError(
|
||||||
|
f'subint-forkserver worker thread '
|
||||||
|
f'{thread_name!r} did not return within '
|
||||||
|
f'{join_timeout}s — this is unexpected since '
|
||||||
|
f'`os.fork()` should return near-instantly on '
|
||||||
|
f'the parent side.'
|
||||||
|
)
|
||||||
|
|
||||||
|
pid_bytes: bytes = os.read(rfd, 8)
|
||||||
|
os.close(rfd)
|
||||||
|
os.close(wfd)
|
||||||
|
pid: int = int.from_bytes(pid_bytes, 'little')
|
||||||
|
log.runtime(
|
||||||
|
f'subint-forkserver forked child\n'
|
||||||
|
f'(>\n'
|
||||||
|
f' |_pid={pid}\n'
|
||||||
|
)
|
||||||
|
return pid
|
||||||
|
|
||||||
|
|
||||||
|
def run_subint_in_worker_thread(
|
||||||
|
bootstrap: str,
|
||||||
|
*,
|
||||||
|
thread_name: str = 'subint-trio',
|
||||||
|
join_timeout: float = 10.0,
|
||||||
|
|
||||||
|
) -> None:
|
||||||
|
'''
|
||||||
|
Create a fresh legacy-config sub-interpreter and drive
|
||||||
|
the given `bootstrap` code string through
|
||||||
|
`_interpreters.exec()` on a dedicated worker thread.
|
||||||
|
|
||||||
|
Naming mirrors `fork_from_worker_thread()`:
|
||||||
|
"<action>_in_worker_thread" — the action here is "run a
|
||||||
|
subint", not "run trio" per se. Typical `bootstrap`
|
||||||
|
content does import `trio` + call `trio.run()`, but
|
||||||
|
nothing about this primitive requires trio; it's a
|
||||||
|
generic "host a subint on a worker thread" helper.
|
||||||
|
Intended mainly for use inside a fork-child (see
|
||||||
|
`tractor.spawn._subint_forkserver` module docstring) but
|
||||||
|
works anywhere.
|
||||||
|
|
||||||
|
See `tractor.spawn._subint.subint_proc` for the matching
|
||||||
|
pattern tractor uses at the sub-actor level.
|
||||||
|
|
||||||
|
Destroys the subint after the thread joins.
|
||||||
|
|
||||||
|
'''
|
||||||
|
if not _has_subints:
|
||||||
|
raise RuntimeError(
|
||||||
|
'subint-forkserver primitives require Python '
|
||||||
|
'3.14+.'
|
||||||
|
)
|
||||||
|
|
||||||
|
interp_id: int = _interpreters.create('legacy')
|
||||||
|
log.runtime(
|
||||||
|
f'Created child-side subint for trio.run()\n'
|
||||||
|
f'(>\n'
|
||||||
|
f' |_interp_id={interp_id}\n'
|
||||||
|
)
|
||||||
|
|
||||||
|
err: BaseException | None = None
|
||||||
|
|
||||||
|
def _drive() -> None:
|
||||||
|
nonlocal err
|
||||||
|
try:
|
||||||
|
_interpreters.exec(interp_id, bootstrap)
|
||||||
|
except BaseException as e:
|
||||||
|
err = e
|
||||||
|
|
||||||
|
worker: threading.Thread = threading.Thread(
|
||||||
|
target=_drive,
|
||||||
|
name=thread_name,
|
||||||
|
daemon=False,
|
||||||
|
)
|
||||||
|
worker.start()
|
||||||
|
worker.join(timeout=join_timeout)
|
||||||
|
|
||||||
|
try:
|
||||||
|
_interpreters.destroy(interp_id)
|
||||||
|
except _interpreters.InterpreterError as e:
|
||||||
|
log.warning(
|
||||||
|
f'Could not destroy child-side subint '
|
||||||
|
f'{interp_id}: {e}'
|
||||||
|
)
|
||||||
|
|
||||||
|
if worker.is_alive():
|
||||||
|
raise RuntimeError(
|
||||||
|
f'child-side subint trio-driver thread '
|
||||||
|
f'{thread_name!r} did not return within '
|
||||||
|
f'{join_timeout}s.'
|
||||||
|
)
|
||||||
|
if err is not None:
|
||||||
|
raise err
|
||||||
|
|
||||||
|
|
||||||
|
class _ForkedProc:
|
||||||
|
'''
|
||||||
|
Thin `trio.Process`-compatible shim around a raw OS pid
|
||||||
|
returned by `fork_from_worker_thread()`, exposing just
|
||||||
|
enough surface for the `soft_kill()` / hard-reap pattern
|
||||||
|
borrowed from `trio_proc()`.
|
||||||
|
|
||||||
|
Unlike `trio.Process`, we have no direct handles on the
|
||||||
|
child's std-streams (fork-without-exec inherits the
|
||||||
|
parent's FDs, but we don't marshal them into this
|
||||||
|
wrapper) — `.stdin`/`.stdout`/`.stderr` are all `None`,
|
||||||
|
which matches what `soft_kill()` handles via its
|
||||||
|
`is not None` guards.
|
||||||
|
|
||||||
|
'''
|
||||||
|
def __init__(self, pid: int):
|
||||||
|
self.pid: int = pid
|
||||||
|
self._returncode: int | None = None
|
||||||
|
# `soft_kill`/`hard_kill` check these for pipe
|
||||||
|
# teardown — all None since we didn't wire up pipes
|
||||||
|
# on the fork-without-exec path.
|
||||||
|
self.stdin = None
|
||||||
|
self.stdout = None
|
||||||
|
self.stderr = None
|
||||||
|
|
||||||
|
def poll(self) -> int | None:
|
||||||
|
'''
|
||||||
|
Non-blocking liveness probe. Returns `None` if the
|
||||||
|
child is still running, else its exit code (negative
|
||||||
|
for signal-death, matching `subprocess.Popen`
|
||||||
|
convention).
|
||||||
|
|
||||||
|
'''
|
||||||
|
if self._returncode is not None:
|
||||||
|
return self._returncode
|
||||||
|
try:
|
||||||
|
waited_pid, status = os.waitpid(self.pid, os.WNOHANG)
|
||||||
|
except ChildProcessError:
|
||||||
|
# already reaped (or never existed) — treat as
|
||||||
|
# clean exit for polling purposes.
|
||||||
|
self._returncode = 0
|
||||||
|
return 0
|
||||||
|
if waited_pid == 0:
|
||||||
|
return None
|
||||||
|
self._returncode = self._parse_status(status)
|
||||||
|
return self._returncode
|
||||||
|
|
||||||
|
@property
|
||||||
|
def returncode(self) -> int | None:
|
||||||
|
return self._returncode
|
||||||
|
|
||||||
|
async def wait(self) -> int:
|
||||||
|
'''
|
||||||
|
Async blocking wait for the child's exit, off-loaded
|
||||||
|
to a trio cache thread so we don't block the event
|
||||||
|
loop on `waitpid()`. Safe to call multiple times;
|
||||||
|
subsequent calls return the cached rc without
|
||||||
|
re-issuing the syscall.
|
||||||
|
|
||||||
|
'''
|
||||||
|
if self._returncode is not None:
|
||||||
|
return self._returncode
|
||||||
|
_, status = await trio.to_thread.run_sync(
|
||||||
|
os.waitpid,
|
||||||
|
self.pid,
|
||||||
|
0,
|
||||||
|
abandon_on_cancel=False,
|
||||||
|
)
|
||||||
|
self._returncode = self._parse_status(status)
|
||||||
|
return self._returncode
|
||||||
|
|
||||||
|
def kill(self) -> None:
|
||||||
|
'''
|
||||||
|
OS-level `SIGKILL` to the child. Swallows
|
||||||
|
`ProcessLookupError` (already dead).
|
||||||
|
|
||||||
|
'''
|
||||||
|
try:
|
||||||
|
os.kill(self.pid, signal.SIGKILL)
|
||||||
|
except ProcessLookupError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
def _parse_status(self, status: int) -> int:
|
||||||
|
if os.WIFEXITED(status):
|
||||||
|
return os.WEXITSTATUS(status)
|
||||||
|
elif os.WIFSIGNALED(status):
|
||||||
|
# negative rc by `subprocess.Popen` convention
|
||||||
|
return -os.WTERMSIG(status)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
def __repr__(self) -> str:
|
||||||
|
return (
|
||||||
|
f'<_ForkedProc pid={self.pid} '
|
||||||
|
f'returncode={self._returncode}>'
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
async def subint_forkserver_proc(
|
||||||
|
name: str,
|
||||||
|
actor_nursery: ActorNursery,
|
||||||
|
subactor: Actor,
|
||||||
|
errors: dict[tuple[str, str], Exception],
|
||||||
|
|
||||||
|
# passed through to actor main
|
||||||
|
bind_addrs: list[UnwrappedAddress],
|
||||||
|
parent_addr: UnwrappedAddress,
|
||||||
|
_runtime_vars: dict[str, Any],
|
||||||
|
*,
|
||||||
|
infect_asyncio: bool = False,
|
||||||
|
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
|
||||||
|
proc_kwargs: dict[str, any] = {},
|
||||||
|
|
||||||
|
) -> None:
|
||||||
|
'''
|
||||||
|
Spawn a subactor via `os.fork()` from a non-trio worker
|
||||||
|
thread (see `fork_from_worker_thread()`), with the forked
|
||||||
|
child running `tractor._child._actor_child_main()` and
|
||||||
|
connecting back via tractor's normal IPC handshake.
|
||||||
|
|
||||||
|
Supervision model mirrors `trio_proc()` — we manage a
|
||||||
|
real OS subprocess, so `Portal.cancel_actor()` +
|
||||||
|
`soft_kill()` on graceful teardown and `os.kill(SIGKILL)`
|
||||||
|
on hard-reap both apply directly (no
|
||||||
|
`_interpreters.destroy()` voodoo needed since the child
|
||||||
|
is in its own process).
|
||||||
|
|
||||||
|
The only real difference from `trio_proc` is the spawn
|
||||||
|
mechanism: fork from a known-clean main-interp worker
|
||||||
|
thread instead of `trio.lowlevel.open_process()`.
|
||||||
|
|
||||||
|
'''
|
||||||
|
if not _has_subints:
|
||||||
|
raise RuntimeError(
|
||||||
|
f'The {"subint_forkserver"!r} spawn backend '
|
||||||
|
f'requires Python 3.14+.\n'
|
||||||
|
f'Current runtime: {sys.version}'
|
||||||
|
)
|
||||||
|
|
||||||
|
uid: tuple[str, str] = subactor.aid.uid
|
||||||
|
loglevel: str | None = subactor.loglevel
|
||||||
|
|
||||||
|
# Closure captured into the fork-child's memory image.
|
||||||
|
# In the child this is the first post-fork Python code to
|
||||||
|
# run, on what was the fork-worker thread in the parent.
|
||||||
|
def _child_target() -> int:
|
||||||
|
# Lazy import so the parent doesn't pay for it on
|
||||||
|
# every spawn — it's module-level in `_child` but
|
||||||
|
# cheap enough to re-resolve here.
|
||||||
|
from tractor._child import _actor_child_main
|
||||||
|
# XXX, fork inherits the parent's entire memory
|
||||||
|
# image — including `tractor.runtime._state` globals
|
||||||
|
# that encode "this process is the root actor":
|
||||||
|
#
|
||||||
|
# - `_runtime_vars['_is_root']` → True in parent
|
||||||
|
# - pre-populated `_root_mailbox`, `_registry_addrs`
|
||||||
|
# - the parent's `_current_actor` singleton
|
||||||
|
#
|
||||||
|
# A fresh `exec`-based child would start with the
|
||||||
|
# `_state` module's defaults (all falsey / empty).
|
||||||
|
# Replicate that here so the new child-side `Actor`
|
||||||
|
# sees a "cold" runtime — otherwise `Actor.__init__`
|
||||||
|
# takes the `is_root_process() == True` branch and
|
||||||
|
# pre-populates `self.enable_modules`, which then
|
||||||
|
# trips the `assert not self.enable_modules` gate at
|
||||||
|
# the top of `Actor._from_parent()` on the subsequent
|
||||||
|
# parent→child `SpawnSpec` handshake.
|
||||||
|
from tractor.runtime import _state
|
||||||
|
_state._current_actor = None
|
||||||
|
_state._runtime_vars.update({
|
||||||
|
'_is_root': False,
|
||||||
|
'_root_mailbox': (None, None),
|
||||||
|
'_root_addrs': [],
|
||||||
|
'_registry_addrs': [],
|
||||||
|
'_debug_mode': False,
|
||||||
|
})
|
||||||
|
_actor_child_main(
|
||||||
|
uid=uid,
|
||||||
|
loglevel=loglevel,
|
||||||
|
parent_addr=parent_addr,
|
||||||
|
infect_asyncio=infect_asyncio,
|
||||||
|
# NOTE, from the child-side runtime's POV it's
|
||||||
|
# a regular trio actor — it uses `_trio_main`,
|
||||||
|
# receives `SpawnSpec` over IPC, etc. The
|
||||||
|
# `subint_forkserver` name is a property of HOW
|
||||||
|
# the parent spawned, not of what the child is.
|
||||||
|
spawn_method='trio',
|
||||||
|
)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
cancelled_during_spawn: bool = False
|
||||||
|
proc: _ForkedProc | None = None
|
||||||
|
ipc_server: _server.Server = actor_nursery._actor.ipc_server
|
||||||
|
|
||||||
|
try:
|
||||||
|
try:
|
||||||
|
pid: int = await trio.to_thread.run_sync(
|
||||||
|
partial(
|
||||||
|
fork_from_worker_thread,
|
||||||
|
_child_target,
|
||||||
|
thread_name=(
|
||||||
|
f'subint-forkserver[{name}]'
|
||||||
|
),
|
||||||
|
),
|
||||||
|
abandon_on_cancel=False,
|
||||||
|
)
|
||||||
|
proc = _ForkedProc(pid)
|
||||||
|
log.runtime(
|
||||||
|
f'Forked subactor via forkserver\n'
|
||||||
|
f'(>\n'
|
||||||
|
f' |_{proc}\n'
|
||||||
|
)
|
||||||
|
|
||||||
|
event, chan = await ipc_server.wait_for_peer(uid)
|
||||||
|
|
||||||
|
except trio.Cancelled:
|
||||||
|
cancelled_during_spawn = True
|
||||||
|
raise
|
||||||
|
|
||||||
|
assert proc is not None
|
||||||
|
|
||||||
|
portal = Portal(chan)
|
||||||
|
actor_nursery._children[uid] = (
|
||||||
|
subactor,
|
||||||
|
proc,
|
||||||
|
portal,
|
||||||
|
)
|
||||||
|
|
||||||
|
sspec = msgtypes.SpawnSpec(
|
||||||
|
_parent_main_data=subactor._parent_main_data,
|
||||||
|
enable_modules=subactor.enable_modules,
|
||||||
|
reg_addrs=subactor.reg_addrs,
|
||||||
|
bind_addrs=bind_addrs,
|
||||||
|
_runtime_vars=_runtime_vars,
|
||||||
|
)
|
||||||
|
log.runtime(
|
||||||
|
f'Sending spawn spec to forkserver child\n'
|
||||||
|
f'{{}}=> {chan.aid.reprol()!r}\n'
|
||||||
|
f'\n'
|
||||||
|
f'{pretty_struct.pformat(sspec)}\n'
|
||||||
|
)
|
||||||
|
await chan.send(sspec)
|
||||||
|
|
||||||
|
curr_actor: Actor = current_actor()
|
||||||
|
curr_actor._actoruid2nursery[uid] = actor_nursery
|
||||||
|
|
||||||
|
task_status.started(portal)
|
||||||
|
|
||||||
|
with trio.CancelScope(shield=True):
|
||||||
|
await actor_nursery._join_procs.wait()
|
||||||
|
|
||||||
|
async with trio.open_nursery() as nursery:
|
||||||
|
if portal in actor_nursery._cancel_after_result_on_exit:
|
||||||
|
nursery.start_soon(
|
||||||
|
cancel_on_completion,
|
||||||
|
portal,
|
||||||
|
subactor,
|
||||||
|
errors,
|
||||||
|
)
|
||||||
|
|
||||||
|
# reuse `trio_proc`'s soft-kill dance — `proc`
|
||||||
|
# is our `_ForkedProc` shim which implements the
|
||||||
|
# same `.poll()` / `.wait()` / `.kill()` surface
|
||||||
|
# `soft_kill` expects.
|
||||||
|
await soft_kill(
|
||||||
|
proc,
|
||||||
|
_ForkedProc.wait,
|
||||||
|
portal,
|
||||||
|
)
|
||||||
|
nursery.cancel_scope.cancel()
|
||||||
|
|
||||||
|
finally:
|
||||||
|
# Hard reap: SIGKILL + waitpid. Cheap since we have
|
||||||
|
# the real OS pid, unlike `subint_proc` which has to
|
||||||
|
# fuss with `_interpreters.destroy()` races.
|
||||||
|
if proc is not None and proc.poll() is None:
|
||||||
|
log.cancel(
|
||||||
|
f'Hard killing forkserver subactor\n'
|
||||||
|
f'>x)\n'
|
||||||
|
f' |_{proc}\n'
|
||||||
|
)
|
||||||
|
with trio.CancelScope(shield=True):
|
||||||
|
proc.kill()
|
||||||
|
await proc.wait()
|
||||||
|
|
||||||
|
if not cancelled_during_spawn:
|
||||||
|
actor_nursery._children.pop(uid, None)
|
||||||
Loading…
Reference in New Issue