From eee79a0357f81ff20790468cc766442ff1e04f32 Mon Sep 17 00:00:00 2001 From: goodboy Date: Wed, 22 Apr 2026 13:32:39 -0400 Subject: [PATCH] Add WIP `subint_fork_proc` backend scaffold MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Experimental third spawn backend: use a fresh sub-interpreter purely as a trio-free launchpad from which to `os.fork()` + exec back into `python -m tractor._child`. Per issue #379's "fork()-workaround/hacks" thread. Intent is to sidestep both, - the trio+fork hazards hitting `trio_proc` (python- trio/trio#1614 et al.), since the forking interp is guaranteed trio-free. - the shared-GIL abandoned-thread hazards hitting `subint_proc` (`ai/conc-anal/subint_sigint_starvation_issue.md`), since we don't *stay* in the subint — it only lives long enough to call `os.fork()` Downstream of the fork+exec, all the existing `trio_proc` plumbing is reused verbatim: `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal` yield, soft-kill. Status: NOT wired up beyond scaffolding. The fn raises `NotImplementedError` immediately; the `bootstrap` fork/exec string builder and the `# TODO: orchestrate driver thread` block are kept in-tree as deliberate dead code so the next iteration starts from a concrete shape rather than a blank page. Docstring calls out three open questions that need empirical validation before wiring this up: 1. Does CPython permit `os.fork()` from a non-main legacy subint? 2. Can the child stay fork-without-exec and `trio.run()` directly from within the launchpad subint? 3. How do `signal.set_wakeup_fd()` handlers and other process-global state interact when the forking thread is inside a subint? (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code --- tractor/spawn/_subint.py | 202 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 202 insertions(+) diff --git a/tractor/spawn/_subint.py b/tractor/spawn/_subint.py index a521ad21..c9e5e686 100644 --- a/tractor/spawn/_subint.py +++ b/tractor/spawn/_subint.py @@ -431,3 +431,205 @@ async def subint_proc( finally: if not cancelled_during_spawn: actor_nursery._children.pop(uid, None) + + +# ============================================================ +# WIP PROTOTYPE — `subint_fork_proc` +# ============================================================ +# Experimental: use a sub-interpreter purely as a launchpad +# from which to `os.fork()`, sidestepping the well-known +# trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing +# the forking interp hasn't ever imported / run `trio`. +# +# The current `tractor.spawn._trio` backend already spawns a +# subprocess and has the child connect back to the parent +# over IPC. THIS prototype only changes *how* the subproc +# comes into existence — everything downstream (parent-side +# `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal` yield, +# soft-kill) is reused verbatim. +# +# Reference: issue #379's "Our own thoughts, ideas for +# fork()-workaround/hacks..." section. +# ============================================================ + + +async def subint_fork_proc( + name: str, + actor_nursery: ActorNursery, + subactor: Actor, + errors: dict[tuple[str, str], Exception], + + # passed through to actor main + bind_addrs: list[UnwrappedAddress], + parent_addr: UnwrappedAddress, + _runtime_vars: dict[str, Any], + *, + infect_asyncio: bool = False, + task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED, + proc_kwargs: dict[str, any] = {}, + +) -> None: + ''' + EXPERIMENTAL / WIP: `trio`-safe `fork()` via a pristine + sub-interpreter launchpad. + + Core trick + ---------- + Create a fresh subint that has *never* imported `trio`. + From a worker thread, drive that subint to call + `os.fork()`. In the forked CHILD process, `exec()` back + into `python -m tractor._child` (a fresh process). In the + fork PARENT (still inside the launchpad subint), do + nothing — just let the subint's `exec` call return and + the worker thread exit. The parent-side trio task then + waits for the child process to connect back using the + same `ipc_server.wait_for_peer()` flow as `trio_proc`. + + Why this matters + ---------------- + The existing `trio_proc` backend spawns a subprocess via + `trio.lowlevel.open_process()` which ultimately uses + `posix_spawn()` (or `fork+exec`) from the parent's main + interpreter — the one running `trio.run()`. That path is + affected by the trio+fork issues tracked in + python-trio/trio#1614 and related, some of which are + side-stepped only incidentally because we always `exec()` + immediately after fork. + + By forking from a pristine subint instead, we have a + known-clean-of-trio fork parent. If we later want to try + **fork-without-exec** for faster startup and automatic + parent-`__main__` inheritance (the property `mp.fork` + gives for free), this approach could unlock that cleanly. + + Relationship to the other backends + ---------------------------------- + - `trio_proc`: fork/exec from main interp → affected by + trio+fork issues, solved via immediate exec. + - `subint_proc`: in-process subint, no fork at all → + affected by shared-GIL abandoned-thread hazards (see + `ai/conc-anal/subint_sigint_starvation_issue.md`). + - `subint_fork_proc` (THIS): OS-level subproc (like + `trio_proc`) BUT forked from a trio-free subint → + avoids both issue-classes above, at the cost of an + extra subint create/destroy per spawn. + + Status + ------ + **NOT IMPLEMENTED** beyond the bootstrap scaffolding + below. Open questions needing empirical validation: + + 1. Does CPython allow `os.fork()` from a non-main + sub-interpreter under the legacy config? The public + API is silent; there may be PEP 684 safety guards. + 2. Does the forked child need to fully `exec()` or can + we stay fork-without-exec and `trio.run()` directly + from within the launchpad subint in the child? The + latter is the "interesting" mode — faster startup, + `__main__` inheritance — but opens the question of + what residual state from the parent's main interp + leaks into the child's subint. + 3. How do `signal.set_wakeup_fd()`, installed signal + handlers, and other process-global state interact + when the forking thread is inside a subint? The + child presumably inherits them but a fresh + `trio.run()` resets what it cares about. + + ''' + if not _has_subints: + raise RuntimeError( + f'The {"subint_fork"!r} spawn backend requires ' + f'Python 3.14+ (private stdlib `_interpreters` C ' + f'module + tractor-usage stability).\n' + f'Current runtime: {sys.version}' + ) + + raise NotImplementedError( + '`subint_fork_proc` is a WIP prototype scaffold — ' + 'the driver thread + fork-bootstrap + connect-back ' + 'orchestration below is not yet wired up. See ' + 'issue #379 for context.\n' + '(Structure kept in-tree so the next iteration has ' + 'a concrete starting point rather than a blank page.)' + ) + + # ------------------------------------------------------------ + # SKETCH (below is intentionally dead code; kept so reviewers + # can see the shape we'd plausibly build up to). Roughly + # mirrors `subint_proc` structure but WITHOUT the in-process + # subint lifetime management — the subint only lives long + # enough to call `os.fork()`. + # ------------------------------------------------------------ + + # Create the launchpad subint. Legacy config matches + # `subint_proc`'s reasoning (msgspec / PEP 684). For + # fork-via-subint, isolation is moot since we don't + # *stay* in the subint — we just need it trio-free. + interp_id: int = _interpreters.create('legacy') + log.runtime( + f'Created launchpad subint for fork-spawn\n' + f'(>\n' + f' |_interp_id={interp_id}\n' + ) + + uid: tuple[str, str] = subactor.aid.uid + loglevel: str | None = subactor.loglevel + + # Bootstrap fires inside the launchpad subint on a + # worker OS-thread. Calls `os.fork()`. In the child, + # `execv` back into the existing `python -m tractor._child` + # CLI entry — which is what `trio_proc` already uses — so + # the connect-back dance is identical. In the fork-parent + # (still in the launchpad subint), return so the thread + # can exit and we can `_interpreters.destroy()` the + # launchpad. + # + # NOTE, `os.execv()` replaces the entire process image + # (all interps, all threads — CPython handles this at the + # OS level), so subint cleanup in the child is a no-op. + import shlex + uid_repr: str = repr(str(uid)) + parent_addr_repr: str = repr(str(parent_addr)) + bootstrap: str = ( + 'import os, sys\n' + 'pid = os.fork()\n' + 'if pid == 0:\n' + ' # CHILD: full `exec` into fresh Python for\n' + ' # maximum isolation. (A `fork`-without-exec\n' + ' # variant would skip this and call\n' + ' # `_actor_child_main` directly — see class\n' + ' # docstring "Open question 2".)\n' + ' os.execv(\n' + ' sys.executable,\n' + ' [\n' + ' sys.executable,\n' + " '-m',\n" + " 'tractor._child',\n" + f' {shlex.quote("--uid")!r},\n' + f' {uid_repr},\n' + f' {shlex.quote("--parent_addr")!r},\n' + f' {parent_addr_repr},\n' + + ( + f' {shlex.quote("--loglevel")!r},\n' + f' {loglevel!r},\n' + if loglevel else '' + ) + + ( + f' {shlex.quote("--asyncio")!r},\n' + if infect_asyncio else '' + ) + + ' ],\n' + ' )\n' + '# FORK-PARENT branch falls through — we just want\n' + '# the launchpad subint to finish so the driver\n' + '# thread exits.\n' + ) + + # TODO: orchestrate driver thread (mirror `subint_proc`'s + # `_subint_target` pattern), then await + # `ipc_server.wait_for_peer(uid)` on the parent side — + # same as `trio_proc`. Soft-kill path is simpler here + # than in `subint_proc`: we're managing an OS subproc, + # not a legacy subint, so `Portal.cancel_actor()` + wait + # + OS-level `SIGKILL` fallback (like `trio_proc`'s + # `hard_kill()`) applies directly.