Add WIP `subint_fork_proc` backend scaffold

Experimental third spawn backend: use a fresh
sub-interpreter purely as a trio-free launchpad from
which to `os.fork()` + exec back into
`python -m tractor._child`. Per issue #379's
"fork()-workaround/hacks" thread.

Intent is to sidestep both,
- the trio+fork hazards hitting `trio_proc` (python- trio/trio#1614 et
  al.), since the forking interp is guaranteed trio-free.

- the shared-GIL abandoned-thread hazards hitting `subint_proc`
  (`ai/conc-anal/subint_sigint_starvation_issue.md`), since we don't
  *stay* in the subint — it only lives long enough to call `os.fork()`

Downstream of the fork+exec, all the existing `trio_proc` plumbing is
reused verbatim: `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal`
yield, soft-kill.

Status: NOT wired up beyond scaffolding. The fn raises
`NotImplementedError` immediately; the `bootstrap` fork/exec string
builder and the `# TODO: orchestrate driver thread` block are kept
in-tree as deliberate dead code so the next iteration starts from
a concrete shape rather than a blank page.

Docstring calls out three open questions that need
empirical validation before wiring this up:
1. Does CPython permit `os.fork()` from a non-main
   legacy subint?
2. Can the child stay fork-without-exec and
   `trio.run()` directly from within the launchpad
   subint?
3. How do `signal.set_wakeup_fd()` handlers and other
   process-global state interact when the forking
   thread is inside a subint?

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
subint_forkserver_backend
Gud Boi 2026-04-22 13:32:39 -04:00
parent 4b2a0886c3
commit eee79a0357
1 changed files with 202 additions and 0 deletions

View File

@ -431,3 +431,205 @@ async def subint_proc(
finally: finally:
if not cancelled_during_spawn: if not cancelled_during_spawn:
actor_nursery._children.pop(uid, None) actor_nursery._children.pop(uid, None)
# ============================================================
# WIP PROTOTYPE — `subint_fork_proc`
# ============================================================
# Experimental: use a sub-interpreter purely as a launchpad
# from which to `os.fork()`, sidestepping the well-known
# trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
# the forking interp hasn't ever imported / run `trio`.
#
# The current `tractor.spawn._trio` backend already spawns a
# subprocess and has the child connect back to the parent
# over IPC. THIS prototype only changes *how* the subproc
# comes into existence — everything downstream (parent-side
# `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal` yield,
# soft-kill) is reused verbatim.
#
# Reference: issue #379's "Our own thoughts, ideas for
# fork()-workaround/hacks..." section.
# ============================================================
async def subint_fork_proc(
name: str,
actor_nursery: ActorNursery,
subactor: Actor,
errors: dict[tuple[str, str], Exception],
# passed through to actor main
bind_addrs: list[UnwrappedAddress],
parent_addr: UnwrappedAddress,
_runtime_vars: dict[str, Any],
*,
infect_asyncio: bool = False,
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
proc_kwargs: dict[str, any] = {},
) -> None:
'''
EXPERIMENTAL / WIP: `trio`-safe `fork()` via a pristine
sub-interpreter launchpad.
Core trick
----------
Create a fresh subint that has *never* imported `trio`.
From a worker thread, drive that subint to call
`os.fork()`. In the forked CHILD process, `exec()` back
into `python -m tractor._child` (a fresh process). In the
fork PARENT (still inside the launchpad subint), do
nothing just let the subint's `exec` call return and
the worker thread exit. The parent-side trio task then
waits for the child process to connect back using the
same `ipc_server.wait_for_peer()` flow as `trio_proc`.
Why this matters
----------------
The existing `trio_proc` backend spawns a subprocess via
`trio.lowlevel.open_process()` which ultimately uses
`posix_spawn()` (or `fork+exec`) from the parent's main
interpreter the one running `trio.run()`. That path is
affected by the trio+fork issues tracked in
python-trio/trio#1614 and related, some of which are
side-stepped only incidentally because we always `exec()`
immediately after fork.
By forking from a pristine subint instead, we have a
known-clean-of-trio fork parent. If we later want to try
**fork-without-exec** for faster startup and automatic
parent-`__main__` inheritance (the property `mp.fork`
gives for free), this approach could unlock that cleanly.
Relationship to the other backends
----------------------------------
- `trio_proc`: fork/exec from main interp affected by
trio+fork issues, solved via immediate exec.
- `subint_proc`: in-process subint, no fork at all
affected by shared-GIL abandoned-thread hazards (see
`ai/conc-anal/subint_sigint_starvation_issue.md`).
- `subint_fork_proc` (THIS): OS-level subproc (like
`trio_proc`) BUT forked from a trio-free subint
avoids both issue-classes above, at the cost of an
extra subint create/destroy per spawn.
Status
------
**NOT IMPLEMENTED** beyond the bootstrap scaffolding
below. Open questions needing empirical validation:
1. Does CPython allow `os.fork()` from a non-main
sub-interpreter under the legacy config? The public
API is silent; there may be PEP 684 safety guards.
2. Does the forked child need to fully `exec()` or can
we stay fork-without-exec and `trio.run()` directly
from within the launchpad subint in the child? The
latter is the "interesting" mode faster startup,
`__main__` inheritance but opens the question of
what residual state from the parent's main interp
leaks into the child's subint.
3. How do `signal.set_wakeup_fd()`, installed signal
handlers, and other process-global state interact
when the forking thread is inside a subint? The
child presumably inherits them but a fresh
`trio.run()` resets what it cares about.
'''
if not _has_subints:
raise RuntimeError(
f'The {"subint_fork"!r} spawn backend requires '
f'Python 3.14+ (private stdlib `_interpreters` C '
f'module + tractor-usage stability).\n'
f'Current runtime: {sys.version}'
)
raise NotImplementedError(
'`subint_fork_proc` is a WIP prototype scaffold — '
'the driver thread + fork-bootstrap + connect-back '
'orchestration below is not yet wired up. See '
'issue #379 for context.\n'
'(Structure kept in-tree so the next iteration has '
'a concrete starting point rather than a blank page.)'
)
# ------------------------------------------------------------
# SKETCH (below is intentionally dead code; kept so reviewers
# can see the shape we'd plausibly build up to). Roughly
# mirrors `subint_proc` structure but WITHOUT the in-process
# subint lifetime management — the subint only lives long
# enough to call `os.fork()`.
# ------------------------------------------------------------
# Create the launchpad subint. Legacy config matches
# `subint_proc`'s reasoning (msgspec / PEP 684). For
# fork-via-subint, isolation is moot since we don't
# *stay* in the subint — we just need it trio-free.
interp_id: int = _interpreters.create('legacy')
log.runtime(
f'Created launchpad subint for fork-spawn\n'
f'(>\n'
f' |_interp_id={interp_id}\n'
)
uid: tuple[str, str] = subactor.aid.uid
loglevel: str | None = subactor.loglevel
# Bootstrap fires inside the launchpad subint on a
# worker OS-thread. Calls `os.fork()`. In the child,
# `execv` back into the existing `python -m tractor._child`
# CLI entry — which is what `trio_proc` already uses — so
# the connect-back dance is identical. In the fork-parent
# (still in the launchpad subint), return so the thread
# can exit and we can `_interpreters.destroy()` the
# launchpad.
#
# NOTE, `os.execv()` replaces the entire process image
# (all interps, all threads — CPython handles this at the
# OS level), so subint cleanup in the child is a no-op.
import shlex
uid_repr: str = repr(str(uid))
parent_addr_repr: str = repr(str(parent_addr))
bootstrap: str = (
'import os, sys\n'
'pid = os.fork()\n'
'if pid == 0:\n'
' # CHILD: full `exec` into fresh Python for\n'
' # maximum isolation. (A `fork`-without-exec\n'
' # variant would skip this and call\n'
' # `_actor_child_main` directly — see class\n'
' # docstring "Open question 2".)\n'
' os.execv(\n'
' sys.executable,\n'
' [\n'
' sys.executable,\n'
" '-m',\n"
" 'tractor._child',\n"
f' {shlex.quote("--uid")!r},\n'
f' {uid_repr},\n'
f' {shlex.quote("--parent_addr")!r},\n'
f' {parent_addr_repr},\n'
+ (
f' {shlex.quote("--loglevel")!r},\n'
f' {loglevel!r},\n'
if loglevel else ''
)
+ (
f' {shlex.quote("--asyncio")!r},\n'
if infect_asyncio else ''
)
+ ' ],\n'
' )\n'
'# FORK-PARENT branch falls through — we just want\n'
'# the launchpad subint to finish so the driver\n'
'# thread exits.\n'
)
# TODO: orchestrate driver thread (mirror `subint_proc`'s
# `_subint_target` pattern), then await
# `ipc_server.wait_for_peer(uid)` on the parent side —
# same as `trio_proc`. Soft-kill path is simpler here
# than in `subint_proc`: we're managing an OS subproc,
# not a legacy subint, so `Portal.cancel_actor()` + wait
# + OS-level `SIGKILL` fallback (like `trio_proc`'s
# `hard_kill()`) applies directly.