Add CPython-level `subint_fork` workaround smoketest

Standalone script to validate the "main-interp worker-thread
forkserver + subint-hosted trio" arch proposed as a workaround
to the CPython-level refusal doc'd in
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.

Deliberately NOT a `tractor` test — zero `tractor` imports.
Uses `_interpreters` (private stdlib) + `os.fork()` directly so
pass/fail is a property of CPython alone, independent of our
runtime. Requires py3.14+.

Deats,
- four scenarios via `--scenario`:
  - `control_subint_thread_fork` — the KNOWN-BROKEN case as a
    harness sanity; if the child DOESN'T abort, our analysis
    is wrong
  - `main_thread_fork` — baseline sanity, must always succeed
  - `worker_thread_fork` — architectural assertion: regular
    `threading.Thread` attached to main interp calls
    `os.fork()`; child should survive post-fork cleanup
  - `full_architecture` — end-to-end: fork from a main-interp
    worker thread, then in child create a subint driving a
    worker thread running `trio.run()`
- exit code 0 on EXPECTED outcome (for `control_*` that means
  "child aborted", not "child succeeded")
- each scenario prints a self-contained pass/fail banner; use
  `os.waitpid()` of the parent + per-scenario status prints to
  observe the child's fate

Also, log NLNet provenance for this session's three-sub-phase
work (py3.13 gate tightening, `pytest-timeout` + marker
refactor, `subint_fork` prototype → CPython-block finding).

Prompt-IO: ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
subint_forkserver_backend
Gud Boi 2026-04-22 16:40:52 -04:00
parent 0f48ed2eb9
commit de4f470b6c
3 changed files with 938 additions and 0 deletions

View File

@ -0,0 +1,440 @@
#!/usr/bin/env python3
'''
Standalone CPython-level feasibility check for the "main-interp
worker-thread forkserver + subint-hosted trio" architecture
proposed as a workaround to the CPython-level refusal
documented in
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
Purpose
-------
Deliberately NOT a `tractor` test. Zero `tractor` imports.
Uses `_interpreters` (private stdlib) + `os.fork()` directly so
the signal is unambiguous pass/fail here is a property of
CPython alone, independent of our runtime.
Run each scenario in isolation; the child's fate is observable
only via `os.waitpid()` of the parent and the scenario's own
status prints.
Scenarios (pick one with `--scenario <name>`)
---------------------------------------------
- `control_subint_thread_fork` the KNOWN-BROKEN case we
documented in `subint_fork_blocked_by_cpython_post_fork_issue.md`:
drive a subint from a thread, call `os.fork()` inside its
`_interpreters.exec()`, watch the child abort. **Included as
a control** if this scenario DOESN'T abort the child, our
analysis is wrong and we should re-check everything.
- `main_thread_fork` baseline sanity. Call `os.fork()` from
the process's main thread. Must always succeed; if this
fails something much bigger is broken.
- `worker_thread_fork` the architectural assertion. Spawn a
regular `threading.Thread` (attached to main interp, NOT a
subint), have IT call `os.fork()`. Child should survive
post-fork cleanup.
- `full_architecture` end-to-end: main-interp worker thread
forks. In the child, fork-thread (still main-interp) creates
a subint, drives a second worker thread inside it that runs
a trivial `trio.run()`. Validates the "root runtime lives in
a subint in the child" piece of the proposed arch.
All scenarios print a self-contained pass/fail banner. Exit
code 0 on expected outcome (which for `control_*` means "child
aborted", not "child succeeded"!).
Requires Python 3.14+.
Usage
-----
::
python subint_fork_from_main_thread_smoketest.py \\
--scenario main_thread_fork
python subint_fork_from_main_thread_smoketest.py \\
--scenario full_architecture
'''
from __future__ import annotations
import argparse
import os
import signal
import sys
import threading
import time
from typing import Callable
# Hard-require py3.14 for the public `concurrent.interpreters`
# API (we still drop to `_interpreters` internally, same as
# `tractor.spawn._subint`).
try:
from concurrent import interpreters as _public_interpreters # noqa: F401
import _interpreters # type: ignore
except ImportError:
print(
'FAIL (setup): requires Python 3.14+ '
'(missing `concurrent.interpreters`)',
file=sys.stderr,
)
sys.exit(2)
# ----------------------------------------------------------------
# small observability helpers
# ----------------------------------------------------------------
def _banner(title: str) -> None:
line = '=' * 60
print(f'\n{line}\n{title}\n{line}', flush=True)
def _wait_child(
pid: int,
*,
label: str,
expect_exit_ok: bool,
) -> bool:
'''
Await a forked child's exit status and render pass/fail.
`expect_exit_ok=True` means we expect a normal exit (code
0 via WEXITSTATUS). `expect_exit_ok=False` means we expect
an abnormal death (WIFSIGNALED or nonzero WEXITSTATUS)
used for the `control_*` scenario where CPython is
supposed to abort the child.
'''
_, status = os.waitpid(pid, 0)
exited_normally = os.WIFEXITED(status) and os.WEXITSTATUS(status) == 0
signaled = os.WIFSIGNALED(status)
sig = os.WTERMSIG(status) if signaled else None
rc = os.WEXITSTATUS(status) if os.WIFEXITED(status) else None
if expect_exit_ok:
ok = exited_normally
expected_str = 'normal exit (rc=0)'
else:
ok = not exited_normally
expected_str = (
'abnormal death (signal or nonzero exit)'
)
verdict = 'PASS' if ok else 'FAIL'
status_str = (
f'signal={signal.Signals(sig).name}'
if signaled
else f'rc={rc}'
)
print(
f'[{verdict}] {label}: '
f'expected {expected_str}; observed {status_str}',
flush=True,
)
return ok
# ----------------------------------------------------------------
# scenario: `control_subint_thread_fork` (known-broken)
# ----------------------------------------------------------------
def scenario_control_subint_thread_fork() -> int:
_banner(
'[control] fork from INSIDE a subint (expected: child aborts)'
)
interp_id = _interpreters.create('legacy')
print(f' created subint {interp_id}', flush=True)
# Shared flag: child writes a sentinel file we can detect from
# the parent. If the child manages to write this, CPython's
# post-fork refusal is NOT happening → analysis is wrong.
sentinel = '/tmp/subint_fork_smoketest_control_child_ran'
try:
os.unlink(sentinel)
except FileNotFoundError:
pass
bootstrap = (
'import os\n'
'pid = os.fork()\n'
'if pid == 0:\n'
# child — if CPython's refusal fires this code never runs
f' with open({sentinel!r}, "w") as f:\n'
' f.write("ran")\n'
' os._exit(0)\n'
'else:\n'
# parent side (inside the launchpad subint) — stash the
# forked PID on a shareable dict so we can waitpid()
# from the outer main interp. We can't just return it;
# _interpreters.exec() returns nothing useful.
' import builtins\n'
' builtins._forked_child_pid = pid\n'
)
# NOTE, we can't easily pull state back from the subint.
# For the CONTROL scenario we just time-bound the fork +
# check the sentinel. If sentinel exists → child ran →
# analysis wrong. If not → child aborted → analysis
# confirmed.
done = threading.Event()
def _drive() -> None:
try:
_interpreters.exec(interp_id, bootstrap)
except Exception as err:
print(
f' subint bootstrap raised (expected on some '
f'CPython versions): {type(err).__name__}: {err}',
flush=True,
)
finally:
done.set()
t = threading.Thread(
target=_drive,
name='control-subint-fork-launchpad',
daemon=True,
)
t.start()
done.wait(timeout=5.0)
t.join(timeout=2.0)
# Give the (possibly-aborted) child a moment to die.
time.sleep(0.5)
sentinel_present = os.path.exists(sentinel)
verdict = (
# "PASS" for our analysis means sentinel NOT present.
'PASS' if not sentinel_present else 'FAIL (UNEXPECTED)'
)
print(
f'[{verdict}] control: sentinel present={sentinel_present} '
f'(analysis predicts False — child should abort before '
f'writing)',
flush=True,
)
if sentinel_present:
os.unlink(sentinel)
try:
_interpreters.destroy(interp_id)
except _interpreters.InterpreterError:
pass
return 0 if not sentinel_present else 1
# ----------------------------------------------------------------
# scenario: `main_thread_fork` (baseline sanity)
# ----------------------------------------------------------------
def scenario_main_thread_fork() -> int:
_banner(
'[baseline] fork from MAIN thread (expected: child exits normally)'
)
pid = os.fork()
if pid == 0:
os._exit(0)
return 0 if _wait_child(
pid,
label='main_thread_fork',
expect_exit_ok=True,
) else 1
# ----------------------------------------------------------------
# scenario: `worker_thread_fork` (architectural assertion)
# ----------------------------------------------------------------
def _fork_from_worker_thread(
child_target: Callable[[], int] | None = None,
label: str = 'worker_thread_fork',
) -> int:
'''
Fork from a main-interp worker thread (not a subint).
Returns the child's exit code observed by the parent.
`child_target` is called IN THE CHILD before `os._exit`.
If omitted, the child just `_exit(0)`s immediately.
`label` is used in the pass/fail banner so reuse of this
helper across scenarios reports the scenario name, not
just the underlying fork-mechanism name.
'''
# Use a simple pipe to shuttle the child PID back to main.
rfd, wfd = os.pipe()
def _worker() -> None:
pid = os.fork()
if pid == 0:
# CHILD: close parent's pipe ends, do work, exit.
os.close(rfd)
os.close(wfd)
rc = 0
if child_target is not None:
try:
rc = child_target() or 0
except BaseException as err:
print(
f' CHILD: child_target raised: '
f'{type(err).__name__}: {err}',
file=sys.stderr, flush=True,
)
rc = 2
os._exit(rc)
else:
# PARENT (still in worker thread): send pid to
# main thread via the pipe.
os.write(wfd, pid.to_bytes(8, 'little'))
t = threading.Thread(
target=_worker,
name=f'worker-fork-thread[{label}]',
daemon=False,
)
t.start()
t.join(timeout=10.0)
if t.is_alive():
print(
f'[FAIL] {label}: worker-thread fork driver '
f'did not return in 10s',
flush=True,
)
return 1
pid_bytes = os.read(rfd, 8)
os.close(rfd)
os.close(wfd)
pid = int.from_bytes(pid_bytes, 'little')
print(f' forked child pid={pid}', flush=True)
return 0 if _wait_child(
pid,
label=label,
expect_exit_ok=True,
) else 1
def scenario_worker_thread_fork() -> int:
_banner(
'[arch] fork from MAIN-INTERP WORKER thread '
'(expected: child exits normally — this is the one '
'that matters)'
)
return _fork_from_worker_thread(
child_target=None,
label='worker_thread_fork',
)
# ----------------------------------------------------------------
# scenario: `full_architecture`
# ----------------------------------------------------------------
def _child_trio_in_subint() -> int:
'''
CHILD-side: from fork-thread (main-interp), create a fresh
subint and run `trio.run()` in it on a dedicated worker
thread. Returns 0 on success.
'''
child_interp = _interpreters.create('legacy')
subint_bootstrap = (
'import trio\n'
'async def _main():\n'
' await trio.sleep(0.05)\n'
' return 42\n'
'result = trio.run(_main)\n'
'assert result == 42, f"trio.run returned {result}"\n'
'print(" CHILD subint: trio.run OK, result=42", '
'flush=True)\n'
)
err = None
def _drive() -> None:
nonlocal err
try:
_interpreters.exec(child_interp, subint_bootstrap)
except BaseException as e:
err = e
t = threading.Thread(
target=_drive,
name='child-subint-trio-thread',
daemon=False,
)
t.start()
t.join(timeout=10.0)
try:
_interpreters.destroy(child_interp)
except _interpreters.InterpreterError:
pass
if t.is_alive():
print(
' CHILD: subint trio thread did not return in 10s',
flush=True,
)
return 3
if err is not None:
print(
f' CHILD: subint bootstrap raised: '
f'{type(err).__name__}: {err}',
flush=True,
)
return 4
return 0
def scenario_full_architecture() -> int:
_banner(
'[arch-full] worker-thread fork + child runs trio in a '
'subint (end-to-end proposed arch)'
)
return _fork_from_worker_thread(
child_target=_child_trio_in_subint,
label='full_architecture',
)
# ----------------------------------------------------------------
# main
# ----------------------------------------------------------------
SCENARIOS: dict[str, Callable[[], int]] = {
'control_subint_thread_fork': scenario_control_subint_thread_fork,
'main_thread_fork': scenario_main_thread_fork,
'worker_thread_fork': scenario_worker_thread_fork,
'full_architecture': scenario_full_architecture,
}
def main() -> int:
ap = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
ap.add_argument(
'--scenario',
choices=sorted(SCENARIOS.keys()),
required=True,
)
args = ap.parse_args()
return SCENARIOS[args.scenario]()
if __name__ == '__main__':
sys.exit(main())

View File

@ -0,0 +1,155 @@
---
model: claude-opus-4-7[1m]
service: claude
session: subints-phase-b-hardening-and-fork-block
timestamp: 2026-04-22T20:07:23Z
git_ref: 797f57c
scope: code
substantive: true
raw_file: 20260422T200723Z_797f57c_prompt_io.raw.md
---
## Prompt
Session-spanning work on the Phase B `subint` spawn-backend.
Three distinct sub-phases in one log:
1. **Py3.13 gate tightening** — diagnose a reproducible hang
of subint spawn flow under py3.13 (works on py3.14), trace
to a private `_interpreters` module vintage issue, tighten
our feature gate from "`_interpreters` present" to "public
`concurrent.interpreters` present" (i.e. py3.14+).
2. **Test-harness hardening** — add `pytest-timeout` dep, put
`@pytest.mark.timeout(30, method='thread')` on the
three known-hanging subint tests cataloged in
`ai/conc-anal/subint_sigint_starvation_issue.md`. Separately,
code-review the user's in-flight `skipon_spawn_backend`
marker implementation; find four bugs; refactor to use
`item.iter_markers()`.
3. **`subint_fork` prototype → CPython-block finding** — draft
a WIP `subint_fork_proc` backend using a sub-interpreter as
a launchpad for `os.fork()` (to sidestep trio#1614). User
tests on py3.14, hits
`Fatal Python error: _PyInterpreterState_DeleteExceptMain:
not main interpreter`. Walk CPython sources (local clone at
`~/repos/cpython/`) to pinpoint the refusal
(`Modules/posixmodule.c:728` → `Python/pystate.c:1040`).
Revert implementation to a `NotImplementedError` stub in a
new `_subint_fork.py` submodule, document the finding in a
third `conc-anal/` doc with an upstream-report draft for
the CPython issue tracker. Finally, discuss user's proposed
workaround architecture (main-interp worker-thread
forkserver) and draft a standalone smoke-test script for
feasibility validation.
## Response summary
All three sub-phases landed concrete artifacts:
**Sub-phase 1** — `_subint.py` + `_spawn.py` gates + error
messages updated to require py3.14+ via the public
`concurrent.interpreters` module presence check. Module
docstring revised to explain the empirical reason
(py3.13's private `_interpreters` vintage wedges under
multi-trio-task usage even though minimal standalone
reproducers work fine there). Test-module
`pytest.importorskip` likewise switched.
**Sub-phase 2** — `pytest-timeout>=2.3` added to `testing`
dep group. `@pytest.mark.timeout(30, method='thread')`
applied on:
- `tests/discovery/test_registrar.py::test_stale_entry_is_deleted`
- `tests/test_cancellation.py::test_cancel_while_childs_child_in_sync_sleep`
- `tests/test_cancellation.py::test_multierror_fast_nursery`
- `tests/test_subint_cancellation.py::test_subint_non_checkpointing_child`
`method='thread'` documented inline as load-bearing — the
GIL-starvation path that drops `SIGINT` would equally drop
`SIGALRM`, so only a watchdog-thread timeout can reliably
escape.
`skipon_spawn_backend` plugin refactored into a single
`iter_markers`-driven loop in `pytest_collection_modifyitems`
(~30 LOC replacing ~30 LOC of nested conditionals). Four
bugs dissolved: wrong `.get()` key, module-level `pytestmark`
suppressing per-test marks, unhandled `pytestmark = [list]`
form, `pytest.Makr` typo. Marker help text updated to
document the variadic backend-list + `reason=` kwarg
surface.
**Sub-phase 3** — Prototype drafted (then reverted):
- `tractor/spawn/_subint_fork.py` — new dedicated submodule
housing the `subint_fork_proc` stub. Module docstring +
fn docstring explain the attempt, the CPython-level
block, and the reason for keeping the stub in-tree
(documentation of the attempt + starting point if CPython
ever lifts the restriction).
- `tractor/spawn/_spawn.py``'subint_fork'` registered as a
`SpawnMethodKey` literal + in `_methods`, so
`--spawn-backend=subint_fork` routes to a clean
`NotImplementedError` pointing at the analysis doc rather
than an "invalid backend" error.
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
third sibling conc-anal doc. Full annotated CPython
source walkthrough from user-visible
`Fatal Python error` → `Modules/posixmodule.c:728
PyOS_AfterFork_Child()` → `Python/pystate.c:1040
_PyInterpreterState_DeleteExceptMain()` gate. Includes a
copy-paste-ready upstream-report draft for the CPython
issue tracker with a two-tier ask (ideally "make it work",
minimally "cleaner error than `Fatal Python error`
aborting the child").
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
standalone zero-tractor-import CPython-level smoke test
for the user's proposed workaround architecture
(forkserver on a main-interp worker thread). Four
argparse-driven scenarios: `control_subint_thread_fork`
(reproduces the known-broken case as a test-harness
sanity), `main_thread_fork` (baseline), `worker_thread_fork`
(architectural assertion), `full_architecture`
(end-to-end trio-in-subint in forked child). User will
run on py3.14 next.
## Files changed
See `git log 26fb820..HEAD --stat` for the canonical list.
New files this session:
- `tractor/spawn/_subint_fork.py`
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
Modified (diff pointers in raw log):
- `tractor/spawn/_subint.py` (py3.14 gate)
- `tractor/spawn/_spawn.py` (`subint_fork` registration)
- `tractor/_testing/pytest.py` (`skipon_spawn_backend` refactor)
- `pyproject.toml` (`pytest-timeout` dep)
- `tests/discovery/test_registrar.py`,
`tests/test_cancellation.py`,
`tests/test_subint_cancellation.py` (timeout marks,
cross-refs to conc-anal docs)
## Human edits
Several back-and-forth iterations with user-driven
adjustments during the session:
- User corrected my initial mis-classification of
`test_cancel_while_childs_child_in_sync_sleep[subint-False]`
as Ctrl-C-able — second strace showed `EAGAIN`, putting
it squarely in class A (GIL-starvation). Re-analysis
preserved in the raw log.
- User independently fixed the `.get(reason)``.get('reason', reason)`
bug in the marker plugin before my review; preserved their
fix.
- User suggested moving the `subint_fork_proc` stub from
the bottom of `_subint.py` into its own
`_subint_fork.py` submodule — applied.
- User asked to keep the forkserver-architecture
discussion as background for the smoke-test rather than
committing to a tractor-side refactor until the smoke
test validates the CPython-level assumptions.
Commit messages in this range (b025c982 … 797f57c) were
drafted via `/commit-msg` + `rewrap.py --width 67`; user
landed them with the usual review.

View File

@ -0,0 +1,343 @@
---
model: claude-opus-4-7[1m]
service: claude
timestamp: 2026-04-22T20:07:23Z
git_ref: 797f57c
diff_cmd: git log 26fb820..HEAD # all session commits since the destroy-race fix log
---
Session-spanning conversation covering the Phase B hardening
of the `subint` spawn-backend and an investigation into a
proposed `subint_fork` follow-up which turned out to be
blocked at the CPython level. This log is a narrative capture
of the substantive turns (not every message) and references
the concrete code + docs the session produced. Per diff-ref
mode the actual code diffs are pointed at via `git log` on
each ref rather than duplicated inline.
## Narrative of the substantive turns
### Py3.13 hang / gate tightening
Diagnosed a reproducible hang of the `subint` backend under
py3.13 (test_spawning tests wedge after root-actor bringup).
Root cause: py3.13's vintage of the private `_interpreters` C
module has a latent thread/subint-interaction issue that
`_interpreters.exec()` silently fails to progress under
tractor's multi-trio usage pattern — even though a minimal
standalone `threading.Thread` + `_interpreters.exec()`
reproducer works fine on the same Python. Empirically
py3.14 fixes it.
Fix (from this session): tighten the `_has_subints` gate in
`tractor.spawn._subint` from "private module importable" to
"public `concurrent.interpreters` present" — which is 3.14+
only. This leaves `subint_proc()` unchanged in behavior (we
still call the *private* `_interpreters.create('legacy')`
etc. under the hood) but refuses to engage on 3.13.
Also tightened the matching gate in
`tractor.spawn._spawn.try_set_start_method('subint')` and
rev'd the corresponding error messages from "3.13+" to
"3.14+" with a sentence explaining why. Test-module
`pytest.importorskip` switched from `_interpreters`
`concurrent.interpreters` to match.
### `pytest-timeout` dep + `skipon_spawn_backend` marker plumbing
Added `pytest-timeout>=2.3` to the `testing` dep group with
an inline comment pointing at the `ai/conc-anal/*.md` docs.
Applied `@pytest.mark.timeout(30, method='thread')` (the
`method='thread'` is load-bearing — `signal`-method
`SIGALRM` suffers the same GIL-starvation path that drops
`SIGINT` in the class-A hang pattern) to the three known-
hanging subint tests cataloged in
`subint_sigint_starvation_issue.md`.
Separately code-reviewed the user's newly-staged
`skipon_spawn_backend` pytest marker implementation in
`tractor/_testing/pytest.py`. Found four bugs:
1. `modmark.kwargs.get(reason)` called `.get()` with the
*variable* `reason` as the dict key instead of the string
`'reason'` — user-supplied `reason=` was never picked up.
(User had already fixed this locally via `.get('reason',
reason)` by the time my review happened — preserved that
fix.)
2. The module-level `pytestmark` branch suppressed per-test
marker handling (the `else:` was an `else:` rather than
independent iteration).
3. `mod_pytestmark.mark` assumed a single
`MarkDecorator` — broke on the valid-pytest `pytestmark =
[mark, mark]` list form.
4. Typo: `pytest.Makr``pytest.Mark`.
Refactored the hook to use `item.iter_markers(name=...)`
which walks function + class + module scopes uniformly and
handles both `pytestmark` forms natively. ~30 LOC replaced
the original ~30 LOC of nested conditionals, all four bugs
dissolved. Also updated the marker help string to reflect
the variadic `*start_methods` + `reason=` surface.
### `subint_fork_proc` prototype attempt
User's hypothesis: the known trio+`fork()` issues
(python-trio/trio#1614) could be sidestepped by using a
sub-interpreter purely as a launchpad — `os.fork()` from a
subint that has never imported trio → child is in a
trio-free context. In the child `execv()` back into
`python -m tractor._child` and the downstream handshake
matches `trio_proc()` identically.
Drafted the prototype at `tractor/spawn/_subint.py`'s bottom
(originally — later moved to its own submod, see below):
launchpad-subint creation, bootstrap code-string with
`os.fork()` + `execv()`, driver-thread orchestration,
parent-side `ipc_server.wait_for_peer()` dance. Registered
`'subint_fork'` as a new `SpawnMethodKey` literal, added
`case 'subint' | 'subint_fork':` feature-gate arm in
`try_set_start_method()`, added entry in `_methods` dict.
### CPython-level block discovered
User tested on py3.14 and saw:
```
Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter
Python runtime state: initialized
Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first):
File "<script>", line 2 in <module>
<script>:2: DeprecationWarning: This process (pid=802985) is multi-threaded, use of fork() may lead to deadlocks in the child.
```
Walked CPython sources (local clone at `~/repos/cpython/`):
- **`Modules/posixmodule.c:728` `PyOS_AfterFork_Child()`** —
post-fork child-side cleanup. Calls
`_PyInterpreterState_DeleteExceptMain(runtime)` with
`goto fatal_error` on non-zero status. Has the
`// Ideally we could guarantee tstate is running main.`
self-acknowledging-fragile comment directly above.
- **`Python/pystate.c:1040`
`_PyInterpreterState_DeleteExceptMain()`** — the
refusal. Hard `PyStatus_ERR("not main interpreter")` gate
when `tstate->interp != interpreters->main`. Docstring
formally declares the precondition ("If there is a
current interpreter state, it *must* be the main
interpreter"). `XXX` comments acknowledge further latent
issues within.
Definitive answer to "Open Question 1" of the prototype
docstring: **no, CPython does not support `os.fork()` from
a non-main sub-interpreter**. Not because the fork syscall
is blocked (it isn't — the parent returns a valid pid),
but because the child cannot survive CPython's post-fork
initialization. This is an enforced invariant, not an
incidental limitation.
### Revert: move to stub submod + doc the finding
Per user request:
1. Reverted the working `subint_fork_proc` body to a
`NotImplementedError` stub, MOVED to its own submod
`tractor/spawn/_subint_fork.py` (keeps `_subint.py`
focused on the working `subint_proc` backend).
2. Updated `_spawn.py` to import the stub from the new
submod path; kept `'subint_fork'` in `SpawnMethodKey` +
`_methods` so `--spawn-backend=subint_fork` routes to a
clean `NotImplementedError` with pointer to the analysis
doc rather than an "invalid backend" error.
3. Wrote
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
with the full annotated CPython walkthrough + an
upstream-report draft for the CPython issue tracker.
Draft has a two-tier ask: ideally "make it work"
(pre-fork tstate-swap hook or `DeleteExceptFor(interp)`
variant), minimally "give us a clean `RuntimeError` in
the parent instead of a `Fatal Python error` aborting
the child silently".
### Design discussion — main-interp-thread forkserver workaround
User proposed: set up a "subint forking server" that fork()s
on behalf of subint callers. Core insight: the CPython gate
is on `tstate->interp`, not thread identity, so **any thread
whose tstate is main-interp** can fork cleanly. A worker
thread attached to main-interp (never entering a subint)
satisfies the precondition.
Structurally this is `mp.forkserver` (which tractor already
has as `mp_forkserver`) but **in-process**: instead of a
separate Python subproc as the fork server, we'd put the
forkserver on a thread in the tractor parent process. Pros:
faster spawn (no IPC marshalling to external server + no
separate Python startup), inherits already-imported modules
for free. Cons: less crash isolation (forkserver failure
takes the whole process).
Required tractor-side refactor: move the root actor's
`trio.run()` off main-interp-main-thread (so main-thread can
run the forkserver loop). Nontrivial; approximately the same
magnitude as "Phase C".
The design would also not fully resolve the class-A
GIL-starvation issue because child actors' trio still runs
inside subints (legacy config, msgspec PEP 684 pending).
Would mitigate SIGINT-starvation specifically if signal
handling moves to the forkserver thread.
Recommended pre-commitment: a standalone CPython-only smoke
test validating the four assumptions the arch rests on,
before any tractor-side work.
### Smoke-test script drafted
Wrote `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`:
argparse-driven, four scenarios (`control_subint_thread_fork`
reproducing the known-broken case, `main_thread_fork`
baseline, `worker_thread_fork` the architectural assertion,
`full_architecture` end-to-end with trio in a subint in the
forked child). No `tractor` imports; pure CPython + `_interpreters`
+ `trio`. Bails cleanly on py<3.14. Pass/fail banners per
scenario.
User will validate on their py3.14 env next.
## Per-code-artifact provenance
### `tractor/spawn/_subint_fork.py` (new submod)
> `git show 797f57c -- tractor/spawn/_subint_fork.py`
NotImplementedError stub for the subint-fork backend. Module
docstring + fn docstring explain the attempt, the CPython
block, and why the stub is kept in-tree. No runtime behavior
beyond raising with a pointer at the conc-anal doc.
### `tractor/spawn/_spawn.py` (modified)
> `git log 26fb820..HEAD -- tractor/spawn/_spawn.py`
- Added `'subint_fork'` to `SpawnMethodKey` literal with a
block comment explaining the CPython-level block.
- Generalized the `case 'subint':` arm to `case 'subint' |
'subint_fork':` since both use the same py3.14+ gate.
- Registered `subint_fork_proc` in `_methods` with a
pointer-comment at the analysis doc.
### `tractor/spawn/_subint.py` (modified across session)
> `git log 26fb820..HEAD -- tractor/spawn/_subint.py`
- Tightened `_has_subints` gate: dual-requires public
`concurrent.interpreters` + private `_interpreters`
(tests for py3.14-or-newer on the public-API presence,
then uses the private one for legacy-config subints
because `msgspec` still blocks the public isolated mode
per jcrist/msgspec#563).
- Updated module docstring, `subint_proc()` docstring, and
gate-error messages to reflect the 3.14+ requirement and
the reason (py3.13 wedges under multi-trio usage even
though the private module exists there).
### `tractor/_testing/pytest.py` (modified)
> `git log 26fb820..HEAD -- tractor/_testing/pytest.py`
- New `skipon_spawn_backend(*start_methods, reason=...)`
pytest marker expanded into `pytest.mark.skip(reason=...)`
at collection time via
`pytest_collection_modifyitems()`.
- Implementation uses `item.iter_markers(name=...)` which
walks function + class + module scopes uniformly and
handles both `pytestmark = <single Mark>` and
`pytestmark = [mark, ...]` forms natively. ~30-LOC
single-loop refactor replacing a prior nested
conditional that had four bugs (see "Review" narrative
above).
- Added `pytest.Config` / `pytest.Function` /
`pytest.FixtureRequest` type annotations on fixture
signatures while touching the file.
### `pyproject.toml` (modified)
> `git log 26fb820..HEAD -- pyproject.toml`
Added `pytest-timeout>=2.3` to `testing` dep group with
comment pointing at the `ai/conc-anal/` docs.
### `tests/discovery/test_registrar.py`,
`tests/test_subint_cancellation.py`,
`tests/test_cancellation.py` (modified)
> `git log 26fb820..HEAD -- tests/`
Applied `@pytest.mark.timeout(30, method='thread')` on
known-hanging subint tests. Extended comments to cross-
reference the `ai/conc-anal/*.md` docs. `method='thread'`
is documented inline as load-bearing (`signal`-method
SIGALRM suffers the same GIL-starvation path that drops
SIGINT).
### `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` (new)
> `git show 797f57c -- ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
Third sibling doc under `conc-anal/`. Structure: TL;DR,
context ("what we tried"), symptom (the user's exact
`Fatal Python error` output), CPython source walkthrough
with excerpted snippets from `posixmodule.c` +
`pystate.c`, chain summary, definitive answer to Open
Question 1, `## Upstream-report draft (for CPython issue
tracker)` section with a two-tier ask, references.
### `ai/conc-anal/subint_fork_from_main_thread_smoketest.py` (new, THIS turn)
Zero-tractor-import smoke test for the proposed workaround
architecture. Four argparse-driven scenarios covering the
control case + baseline + arch-critical case + end-to-end.
Pass/fail banners per scenario; clean `--help` output;
py3.13 early-exit.
## Non-code output (verbatim)
### The `strace` signature that kicked off the CPython
walkthrough
```
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
write(16, "\2", 1) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigreturn({mask=[WINCH]}) = 139801964688928
```
### Key user quotes framing the direction
> ok actually we get this [fatal error] ... see if you can
> take a look at what's going on, in particular wrt to
> cpython's sources. pretty sure there's a local copy at
> ~/repos/cpython/
(Drove the CPython walkthrough that produced the
definitive refusal chain.)
> is there any reason we can't just sidestep this "must fork
> from main thread in main subint" issue by simply ensuring
> a "subint forking server" is always setup prior to
> invoking trio in a non-main-thread subint ...
(Drove the main-interp-thread-forkserver architectural
discussion + smoke-test script design.)
### CPython source tags for quick jump-back
```
Modules/posixmodule.c:728 PyOS_AfterFork_Child()
Modules/posixmodule.c:753 // Ideally we could guarantee tstate is running main.
Modules/posixmodule.c:778 status = _PyInterpreterState_DeleteExceptMain(runtime);
Python/pystate.c:1040 _PyInterpreterState_DeleteExceptMain()
Python/pystate.c:1044-1047 tstate->interp != main → PyStatus_ERR("not main interpreter")
```