Label forkserver child as `subint_forkserver`

Follow-up to 72d1b901 (was prev commit adding `debug_mode` for `subint_forkserver`): that commit wired the runtime-side `subint_forkserver` SpawnSpec-recv gate in `Actor._from_parent`, but the `subint_forkserver_proc` child-target was still passing `spawn_method='trio'` to `_trio_main` — so `Actor.pformat()` / log lines would report the subactor as plain `'trio'` instead of the actual parent-side spawn mechanism. Flip the label to `'subint_forkserver'`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Enable `debug_mode` for `subint_forkserver`
2026-04-23 11:43:06 -04:00 · 2026-04-23 11:39:42 -04:00 · 2026-04-23 11:04:10 -04:00 · 2026-04-23 11:01:56 -04:00 · 2026-04-23 09:31:32 -04:00
5 changed files with 478 additions and 41 deletions
--- a/ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md
+++ b/ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md
@ -0,0 +1,385 @@
+# `subint_forkserver` backend: orphaned-subactor SIGINT wedged in `epoll_wait`
+
+Follow-up to the Phase C `subint_forkserver` spawn-backend
+PR (see `tractor.spawn._subint_forkserver`, issue #379).
+Surfaced by the xfail'd
+`tests/spawn/test_subint_forkserver.py::test_orphaned_subactor_sigint_cleanup_DRAFT`.
+
+Related-but-distinct from
+`subint_cancel_delivery_hang_issue.md` (orphaned-channel
+park AFTER subint teardown) and
+`subint_sigint_starvation_issue.md` (GIL-starvation,
+SIGINT never delivered): here the SIGINT IS delivered,
+trio's handler IS installed, but trio's event loop never
+wakes — so the KBI-at-checkpoint → `_trio_main` catch path
+(which is the runtime's *intentional* OS-cancel design)
+never fires.
+
+## TL;DR
+
+When a `subint_forkserver`-spawned subactor is orphaned
+(parent `SIGKILL`'d, no IPC cancel path available) and then
+externally `SIGINT`'d, the subactor hangs in
+`trio/_core/_io_epoll.py::get_events` (epoll_wait)
+indefinitely — even though:
+
+1. `threading.current_thread() is threading.main_thread()`
+   post-fork (CPython 3.14 re-designates correctly).
+2. Trio's SIGINT handler IS installed in the subactor
+   (`signal.getsignal(SIGINT)` returns
+   `<function KIManager.install.<locals>.handler at 0x...>`).
+3. The kernel does deliver SIGINT — the signal arrives at
+   the only thread in the process (the fork-inherited
+   worker which IS now "main" per Python).
+
+Yet `epoll_wait` does not return. Trio's wakeup-fd mechanism
+— the machinery that turns SIGINT into an epoll-wake — is
+somehow not firing the wakeup. Until that's fixed, the
+intentional "KBI-as-OS-cancel" path in
+`tractor/spawn/_entry.py::_trio_main:164` is unreachable
+for forkserver-spawned subactors whose parent dies.
+
+## Symptom
+
+Test: `tests/spawn/test_subint_forkserver.py::test_orphaned_subactor_sigint_cleanup_DRAFT`
+(currently marked `@pytest.mark.xfail(strict=True)`).
+
+1. Harness subprocess brings up a tractor root actor +
+   one `run_in_actor(_sleep_forever)` subactor via
+   `try_set_start_method('subint_forkserver')`.
+2. Harness prints `CHILD_PID` (subactor) and
+   `PARENT_READY` (root actor) markers to stdout.
+3. Test `os.kill(parent_pid, SIGKILL)` + `proc.wait()`
+   to fully reap the root-actor harness.
+4. Child (now reparented to pid 1) is still alive.
+5. Test `os.kill(child_pid, SIGINT)` and polls
+   `os.kill(child_pid, 0)` for up to 10s.
+6. **Observed**: the child is still alive at deadline —
+   SIGINT did not unwedge the trio loop.
+
+## What the "intentional" cancel path IS
+
+`tractor/spawn/_entry.py::_trio_main:157-186` —
+
+```python
+try:
+    if infect_asyncio:
+        actor._infected_aio = True
+        run_as_asyncio_guest(trio_main)
+    else:
+        trio.run(trio_main)
+
+except KeyboardInterrupt:
+    logmeth = log.cancel
+    exit_status: str = (
+        'Actor received KBI (aka an OS-cancel)\n'
+        ...
+    )
+```
+
+The "KBI == OS-cancel" mapping IS the runtime's
+deliberate, documented design. An OS-level SIGINT should
+flow as: kernel → trio handler → KBI at trio checkpoint
+→ unwinds `async_main` → surfaces at `_trio_main`'s
+`except KeyboardInterrupt:` → `log.cancel` + clean `rc=0`.
+
+**So fixing this hang is not "add a new SIGINT behavior" —
+it's "make the existing designed behavior actually fire in
+this backend config".** That's why option (B) ("fix root
+cause") is aligned with existing design intent, not a
+scope expansion.
+
+## Evidence
+
+### Positive control: standalone fork-from-worker + `trio.run(sleep_forever)` + SIGINT WORKS
+
+```python
+import os, signal, time, trio
+from tractor.spawn._subint_forkserver import (
+    fork_from_worker_thread, wait_child,
+)
+
+def child_target() -> int:
+    async def _main():
+        try:
+            await trio.sleep_forever()
+        except KeyboardInterrupt:
+            print('CHILD: caught KBI — trio SIGINT works!')
+            return
+    trio.run(_main)
+    return 0
+
+pid = fork_from_worker_thread(child_target, thread_name='trio-sigint-test')
+time.sleep(1.0)
+os.kill(pid, signal.SIGINT)
+wait_child(pid)
+```
+
+Result: `CHILD: caught KBI — trio SIGINT works!` + clean
+exit. So the fork-child + trio signal plumbing IS healthy
+in isolation. The hang appears only with the full tractor
+subactor runtime on top.
+
+### Negative test: full tractor subactor + orphan-SIGINT
+
+Equivalent to the xfail test. Traceback dump via
+`faulthandler.register(SIGUSR1, all_threads=True)` at the
+stuck moment:
+
+```
+Current thread 0x00007... [subint-forkserv] (most recent call first):
+  File ".../trio/_core/_io_epoll.py", line 245 in get_events
+  File ".../trio/_core/_run.py", line 2415 in run
+  File "tractor/spawn/_entry.py", line 162 in _trio_main
+  File "tractor/_child.py", line 72 in _actor_child_main
+  File "tractor/spawn/_subint_forkserver.py", line 650 in _child_target
+  File "tractor/spawn/_subint_forkserver.py", line 308 in _worker
+  File ".../threading.py", line 1024 in run
+```
+
+### Thread + signal-mask inventory of the stuck subactor
+
+Single thread (`tid == pid`, comm `'subint-forkserv'`,
+which IS `threading.main_thread()` post-fork):
+
+```
+SigBlk:  0000000000000000  # nothing blocked
+SigIgn:  0000000001001000  # SIGPIPE etc (Python defaults)
+SigCgt:  0000000108000202  # bit 1 = SIGINT caught
+```
+
+Bit 1 set in `SigCgt` → SIGINT handler IS installed. So
+trio's handler IS in place at the kernel level — not a
+"handler missing" situation.
+
+### Handler identity
+
+Inside the subactor's RPC body, `signal.getsignal(SIGINT)`
+returns `<function KIManager.install.<locals>.handler at
+0x...>` — trio's own `KIManager` handler. tractor's only
+SIGINT touches are `signal.getsignal()` *reads* (to stash
+into `debug.DebugStatus._trio_handler`); nothing writes
+over trio's handler outside the debug-REPL shielding path
+(`devx/debug/_tty_lock.py::shield_sigint`) which isn't
+engaged here (no debug_mode).
+
+## Ruled out
+
+- **GIL starvation / signal-pipe-full** (class A,
+  `subint_sigint_starvation_issue.md`): subactor runs on
+  its own GIL (separate OS process), not sharing with the
+  parent → no cross-process GIL contention. And `strace`-
+  equivalent in the signal mask shows SIGINT IS caught,
+  not queued.
+- **Orphaned channel park** (`subint_cancel_delivery_hang_issue.md`):
+  different failure mode — that one has trio iterating
+  normally and getting wedged on an orphaned
+  `chan.recv()` AFTER teardown. Here trio's event loop
+  itself never wakes.
+- **Tractor explicitly catching + swallowing KBI**:
+  greppable — the one `except KeyboardInterrupt:` in the
+  runtime is the INTENTIONAL cancel-path catch at
+  `_trio_main:164`. `async_main` uses `except Exception`
+  (not BaseException), so KBI should propagate through
+  cleanly if it ever fires.
+- **Missing `signal.set_wakeup_fd` (main-thread
+  restriction)**: post-fork, the fork-worker thread IS
+  `threading.main_thread()`, so trio's main-thread check
+  passes and its wakeup-fd install should succeed.
+
+## Root cause hypothesis (unverified)
+
+The SIGINT handler fires but trio's wakeup-fd write does
+not wake `epoll_wait`. Candidate causes, ranked by
+plausibility:
+
+1. **Wakeup-fd lifecycle race around tractor IPC setup.**
+   `async_main` spins up an IPC server + `process_messages`
+   loops early. Somewhere in that path the wakeup-fd that
+   trio registered with its epoll instance may be
+   closed/replaced/clobbered, so subsequent SIGINT writes
+   land on an fd that's no longer in the epoll set.
+   Evidence needed: compare
+   `signal.set_wakeup_fd(-1)` return value inside a
+   post-tractor-bringup RPC body vs. a pre-bringup
+   equivalent. If they differ, that's it.
+2. **Shielded cancel scope around `process_messages`.**
+   The RPC message loop is likely wrapped in a trio cancel
+   scope; if that scope is `shield=True` at any outer
+   layer, KBI scheduled at a checkpoint could be absorbed
+   by the shield and never bubble out to `_trio_main`.
+3. **Pre-fork wakeup-fd inheritance.** trio in the PARENT
+   process registered a wakeup-fd with its own epoll. The
+   child inherits the fd number but not the parent's
+   epoll instance — if tractor/trio re-uses the parent's
+   stale fd number anywhere, writes would go to a no-op
+   fd. (This is the least likely — `trio.run()` on the
+   child calls `KIManager.install` which should install a
+   fresh wakeup-fd from scratch.)
+
+## Cross-backend scope question
+
+**Untested**: does the same orphan-SIGINT hang reproduce
+against the `trio_proc` backend (stock subprocess + exec)?
+If yes → pre-existing tractor bug, independent of
+`subint_forkserver`. If no → something specific to the
+fork-from-worker path (e.g. inherited fds, mid-epoll-setup
+interference).
+
+**Quick repro for trio_proc**:
+
+```python
+# save as /tmp/trio_proc_orphan_sigint_repro.py
+import os, sys, signal, time, glob
+import subprocess as sp
+
+SCRIPT = '''
+import os, sys, trio, tractor
+async def _sleep_forever():
+    print(f"CHILD_PID={os.getpid()}", flush=True)
+    await trio.sleep_forever()
+
+async def _main():
+    async with (
+        tractor.open_root_actor(registry_addrs=[("127.0.0.1", 12350)]),
+        tractor.open_nursery() as an,
+    ):
+        await an.run_in_actor(_sleep_forever, name="sf-child")
+        print(f"PARENT_READY={os.getpid()}", flush=True)
+        await trio.sleep_forever()
+
+trio.run(_main)
+'''
+
+proc = sp.Popen(
+    [sys.executable, '-c', SCRIPT],
+    stdout=sp.PIPE, stderr=sp.STDOUT,
+)
+# parse CHILD_PID + PARENT_READY off proc.stdout ...
+# SIGKILL parent, SIGINT child, poll.
+```
+
+If that hangs too, open a broader issue; if not, this is
+`subint_forkserver`-specific (likely fd-inheritance-related).
+
+## Why this is ours to fix (not CPython's)
+
+- Signal IS delivered (`SigCgt` bitmask confirms).
+- Handler IS installed (trio's `KIManager`).
+- Thread identity is correct post-fork.
+- `_trio_main` already has the intentional KBI→clean-exit
+  path waiting to fire.
+
+Every CPython-level precondition is met. Something in
+tractor's runtime or trio's integration with it is
+breaking the SIGINT→wakeup→event-loop-wake pipeline.
+
+## Possible fix directions
+
+1. **Audit the wakeup-fd across tractor's IPC bringup.**
+   Add a trio startup hook that captures
+   `signal.set_wakeup_fd(-1)` at `_trio_main` entry,
+   after `async_main` enters, and periodically — assert
+   it's unchanged. If it moves, track down the writer.
+2. **Explicit `signal.set_wakeup_fd` reset after IPC
+   setup.** Brute force: re-install a fresh wakeup-fd
+   mid-bringup. Band-aid, but fast to try.
+3. **Ensure no `shield=True` cancel scope envelopes the
+   RPC-message-loop / IPC-server task.** If one does,
+   KBI-at-checkpoint never escapes.
+4. **Once fixed, the `child_sigint='trio'` mode on
+   `subint_forkserver_proc`** becomes effectively a no-op
+   or a doc-only mode — trio's natural handler already
+   does the right thing. Might end up removing the flag
+   entirely if there's no behavioral difference between
+   modes.
+
+## Current workaround
+
+None; `child_sigint` defaults to `'ipc'` (IPC cancel is
+the only reliable cancel path today), and the xfail test
+documents the gap. Operators hitting orphan-SIGINT get a
+hung process that needs `SIGKILL`.
+
+## Reproducer
+
+Inline, standalone (no pytest):
+
+```python
+# save as /tmp/orphan_sigint_repro.py  (py3.14+)
+import os, sys, signal, time, glob, trio
+import tractor
+from tractor.spawn._subint_forkserver import (
+    fork_from_worker_thread,
+)
+
+async def _sleep_forever():
+    print(f'SUBACTOR[{os.getpid()}]', flush=True)
+    await trio.sleep_forever()
+
+async def _main():
+    async with (
+        tractor.open_root_actor(
+            registry_addrs=[('127.0.0.1', 12349)],
+        ),
+        tractor.open_nursery() as an,
+    ):
+        await an.run_in_actor(_sleep_forever, name='sf-child')
+        await trio.sleep_forever()
+
+def child_target() -> int:
+    from tractor.spawn._spawn import try_set_start_method
+    try_set_start_method('subint_forkserver')
+    trio.run(_main)
+    return 0
+
+pid = fork_from_worker_thread(child_target, thread_name='repro')
+time.sleep(3.0)
+
+# find the subactor pid via /proc
+children = []
+for path in glob.glob(f'/proc/{pid}/task/*/children'):
+    with open(path) as f:
+        children.extend(int(x) for x in f.read().split() if x)
+subactor_pid = children[0]
+
+# SIGKILL root → orphan the subactor
+os.kill(pid, signal.SIGKILL)
+os.waitpid(pid, 0)
+time.sleep(0.3)
+
+# SIGINT the orphan — should cause clean trio exit
+os.kill(subactor_pid, signal.SIGINT)
+
+# poll for exit
+for _ in range(100):
+    try:
+        os.kill(subactor_pid, 0)
+        time.sleep(0.1)
+    except ProcessLookupError:
+        print('HARNESS: subactor exited cleanly ✔')
+        sys.exit(0)
+os.kill(subactor_pid, signal.SIGKILL)
+print('HARNESS: subactor hung — reproduced')
+sys.exit(1)
+```
+
+Expected (current): `HARNESS: subactor hung — reproduced`.
+
+After fix: `HARNESS: subactor exited cleanly ✔`.
+
+## References
+
+- `tractor/spawn/_entry.py::_trio_main:157-186` — the
+  intentional KBI→clean-exit path this bug makes
+  unreachable.
+- `tractor/spawn/_subint_forkserver` — the backend whose
+  orphan cancel-robustness this blocks.
+- `tests/spawn/test_subint_forkserver.py::test_orphaned_subactor_sigint_cleanup_DRAFT`
+  — the xfail'd reproducer in the test suite.
+- `ai/conc-anal/subint_cancel_delivery_hang_issue.md` —
+  sibling "orphaned channel park" hang (different class).
+- `ai/conc-anal/subint_sigint_starvation_issue.md` —
+  sibling "GIL starvation SIGINT drop" hang (different
+  class).
+- tractor issue #379 — subint backend tracking.
--- a/tests/spawn/test_subint_forkserver.py
+++ b/tests/spawn/test_subint_forkserver.py
@ -43,6 +43,7 @@ Gating
 from __future__ import annotations
 from functools import partial
 import os
+from pathlib import Path
 import platform
 import select
 import signal
@ -179,7 +180,8 @@ async def run_fork_in_non_trio_thread(
 # inner `trio.fail_after()` so assertion failures fire fast
 # under normal conditions.
@pytest.mark.timeout(30, method='thread')
-def test_fork_from_worker_thread_via_trio() -> None:
+def test_fork_from_worker_thread_via_trio(
+) -> None:
    '''
    Baseline: inside `trio.run()`, call
    `fork_from_worker_thread()` via `trio.to_thread.run_sync()`,
@ -447,19 +449,26 @@ def _process_alive(pid: int) -> bool:
@pytest.mark.xfail(
    strict=True,
    reason=(
-        'subint_forkserver orphan-child SIGINT gap: trio on the '
-        'fork-inherited non-main thread has no SIGINT wakeup-fd '
-        'handler installed, and CPython\'s default '
-        'KeyboardInterrupt delivery does NOT reach the trio '
-        'loop on this thread post-fork. Fix path TBD — see '
-        'TODO in `tractor.spawn._subint_forkserver`. Flip this '
-        'mark (or drop it) once the gap is closed.'
+        'subint_forkserver orphan-child SIGINT hang: trio\'s '
+        'event loop stays wedged in `epoll_wait` despite the '
+        'SIGINT handler being correctly installed and the '
+        'signal being delivered at the kernel level. NOT a '
+        '"handler missing on non-main thread" issue — post-'
+        'fork the worker IS `threading.main_thread()` and '
+        'trio\'s `KIManager` handler is confirmed installed. '
+        'Full analysis + ruled-out hypotheses + fix directions '
+        'in `ai/conc-anal/'
+        'subint_forkserver_orphan_sigint_hang_issue.md`. '
+        'Flip this mark (or drop it) once the gap is closed.'
    ),
 )
-@pytest.mark.timeout(60, method='thread')
+@pytest.mark.timeout(
+    30,
+    method='thread',
+)
 def test_orphaned_subactor_sigint_cleanup_DRAFT(
    reg_addr: tuple[str, int | str],
-    tmp_path,
+    tmp_path: Path,
 ) -> None:
    '''
    DRAFT — orphaned-subactor SIGINT survivability under the
@ -478,15 +487,23 @@ def test_orphaned_subactor_sigint_cleanup_DRAFT(
      5. Poll `os.kill(child_pid, 0)` for up to 10s — assert
         the child exits.

-    Empirical result (2026-04): currently **FAILS** — the
-    "post-fork single-thread → default KeyboardInterrupt lands
-    on trio's thread" hypothesis from the class-A/B
-    conc-anal discussion turned out to be wrong. SIGINT on the
-    orphan child doesn't unwind the trio loop. Most likely
-    CPython delivers `KeyboardInterrupt` specifically to
-    `threading.main_thread()`, whose tstate is dead in the
-    post-fork child (fork inherited the worker thread, not the
-    original main thread). Marked `xfail(strict=True)` so the
+    Empirical result (2026-04, py3.14): currently **FAILS** —
+    SIGINT on the orphan child doesn't unwind the trio loop,
+    despite trio's `KIManager` handler being correctly
+    installed in the subactor (the post-fork thread IS
+    `threading.main_thread()` on py3.14). `faulthandler` dump
+    shows the subactor wedged in `trio/_core/_io_epoll.py::
+    get_events` — the signal's supposed wakeup of the event
+    loop isn't firing. Full analysis + diagnostic evidence
+    in `ai/conc-anal/
+    subint_forkserver_orphan_sigint_hang_issue.md`.
+
+    The runtime's *intentional* "KBI-as-OS-cancel" path at
+    `tractor/spawn/_entry.py::_trio_main:164` is therefore
+    unreachable under this backend+config. Closing the gap is
+    aligned with existing design intent (make the already-
+    designed behavior actually fire), not a new feature.
+    Marked `xfail(strict=True)` so the
    mark flips to XPASS→fail once the gap is closed and we'll
    know to drop the mark.

@ -552,7 +569,8 @@ def test_orphaned_subactor_sigint_cleanup_DRAFT(

        # step 4+5: SIGINT the orphan, poll for exit.
        os.kill(child_pid, signal.SIGINT)
-        cleanup_deadline: float = time.monotonic() + 10.0
+        timeout: float = 6.0
+        cleanup_deadline: float = time.monotonic() + timeout
        while time.monotonic() < cleanup_deadline:
            if not _process_alive(child_pid):
                return  # <- success path
--- a/tractor/_root.py
+++ b/tractor/_root.py
@ -69,6 +69,20 @@ from ._exceptions import (
 logger = log.get_logger('tractor')


+# Spawn backends under which `debug_mode=True` is supported.
+# Requirement: the spawned subactor's root runtime must be
+# trio-native so `tractor.devx.debug._tty_lock` works. Matches
+# both the enable-site in `open_root_actor` and the cleanup-
+# site reset of `_runtime_vars['_debug_mode']` — keep them in
+# lockstep when adding backends.
+_DEBUG_COMPATIBLE_BACKENDS: tuple[str, ...] = (
+    'trio',
+    # forkserver children run `_trio_main` in their own OS
+    # process — same child-side runtime shape as `trio_proc`.
+    'subint_forkserver',
+)
+
+
 # TODO: stick this in a `@acm` defined in `devx.debug`?
 # -[ ] also maybe consider making this a `wrapt`-deco to
 #     save an indent level?
@ -293,10 +307,14 @@ async def open_root_actor(
        )
        loglevel: str = loglevel.upper()

+        # Debug-mode is currently only supported for backends whose
+        # subactor root runtime is trio-native (so `tractor.devx.
+        # debug._tty_lock` works). See `_DEBUG_COMPATIBLE_BACKENDS`
+        # module-const for the list.
        if (
            debug_mode
            and
-            _spawn._spawn_method == 'trio'
+            _spawn._spawn_method in _DEBUG_COMPATIBLE_BACKENDS
        ):
            _state._runtime_vars['_debug_mode'] = True

@ -318,7 +336,9 @@ async def open_root_actor(

        elif debug_mode:
            raise RuntimeError(
-                "Debug mode is only supported for the `trio` backend!"
+                f'Debug mode currently supported only for '
+                f'{_DEBUG_COMPATIBLE_BACKENDS!r} spawn backends, not '
+                f'{_spawn._spawn_method!r}.'
            )

        assert loglevel
@ -619,7 +639,7 @@ async def open_root_actor(
            if (
                debug_mode
                and
-                _spawn._spawn_method == 'trio'
+                _spawn._spawn_method in _DEBUG_COMPATIBLE_BACKENDS
            ):
                _state._runtime_vars['_debug_mode'] = False

--- a/tractor/runtime/_runtime.py
+++ b/tractor/runtime/_runtime.py
@ -870,7 +870,14 @@ class Actor:

            accept_addrs: list[UnwrappedAddress]|None = None

-            if self._spawn_method in ("trio", "subint"):
+            if self._spawn_method in (
+                'trio',
+                'subint',
+                # `subint_forkserver` parent-side sends a
+                # `SpawnSpec` over IPC just like the other two
+                # — fork child-side runtime is trio-native.
+                'subint_forkserver',
+            ):

                # Receive post-spawn runtime state from our parent.
                spawnspec: msgtypes.SpawnSpec = await chan.recv()
--- a/tractor/spawn/_subint_forkserver.py
+++ b/tractor/spawn/_subint_forkserver.py
@ -67,12 +67,15 @@ Still-open work (tracked on tractor #379):
  to `tests/test_subint_cancellation.py` for the plain
  `subint` backend),
 - `child_sigint='trio'` mode (flag scaffolded below; default
-  is `'ipc'`): install a manual SIGINT → trio-cancel bridge
-  in the fork-child prelude so externally-delivered SIGINT
-  reaches the child's trio loop even when the parent is
-  dead (no IPC cancel path). See the xfail'd
-  `test_orphaned_subactor_sigint_cleanup_DRAFT` for the
-  target behavior.
+  is `'ipc'`). Originally intended as a manual SIGINT →
+  trio-cancel bridge, but investigation showed trio's handler
+  IS already correctly installed in the fork-child subactor
+  — the orphan-SIGINT hang is actually a separate bug where
+  trio's event loop stays wedged in `epoll_wait` despite
+  delivery. See
+  `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
+  for the full trace + fix directions. Once that root cause
+  is fixed, this flag may end up a no-op / doc-only mode.
 - child-side "subint-hosted root runtime" mode (the second
  half of the envisioned arch — currently the forked child
  runs plain `_trio_main` via `spawn_method='trio'`; the
@ -588,11 +591,11 @@ async def subint_forkserver_proc(
        )
    if child_sigint == 'trio':
        raise NotImplementedError(
-            f"`child_sigint='trio'` mode — trio-native SIGINT "
-            f"plumbing in the fork-child — is scaffolded but "
-            f"not yet implemented. See the xfail'd "
-            f"`test_orphaned_subactor_sigint_cleanup_DRAFT` "
-            f"and the TODO in this module's docstring."
+            "`child_sigint='trio'` mode — trio-native SIGINT "
+            "plumbing in the fork-child — is scaffolded but "
+            "not yet implemented. See the xfail'd "
+            "`test_orphaned_subactor_sigint_cleanup_DRAFT` "
+            "and the TODO in this module's docstring."
        )

    uid: tuple[str, str] = subactor.aid.uid
@ -652,12 +655,16 @@ async def subint_forkserver_proc(
            loglevel=loglevel,
            parent_addr=parent_addr,
            infect_asyncio=infect_asyncio,
-            # NOTE, from the child-side runtime's POV it's
-            # a regular trio actor — it uses `_trio_main`,
-            # receives `SpawnSpec` over IPC, etc. The
-            # `subint_forkserver` name is a property of HOW
-            # the parent spawned, not of what the child is.
-            spawn_method='trio',
+            # The child's runtime is trio-native (uses
+            # `_trio_main` + receives `SpawnSpec` over IPC),
+            # but label it with the actual parent-side spawn
+            # mechanism so `Actor.pformat()` / log lines
+            # reflect reality. Downstream runtime gates that
+            # key on `_spawn_method` group `subint_forkserver`
+            # alongside `trio`/`subint` where the SpawnSpec
+            # IPC handshake is concerned — see
+            # `runtime._runtime.Actor._from_parent()`.
+            spawn_method='subint_forkserver',
        )
        return 0
Author	SHA1	Message	Date
Gud Boi	4425023500	Label forkserver child as `subint_forkserver` Follow-up to `72d1b901` (was prev commit adding `debug_mode` for `subint_forkserver`): that commit wired the runtime-side `subint_forkserver` SpawnSpec-recv gate in `Actor._from_parent`, but the `subint_forkserver_proc` child-target was still passing `spawn_method='trio'` to `_trio_main` — so `Actor.pformat()` / log lines would report the subactor as plain `'trio'` instead of the actual parent-side spawn mechanism. Flip the label to `'subint_forkserver'`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-23 11:43:06 -04:00
Gud Boi	72d1b9016a	Enable `debug_mode` for `subint_forkserver` The `subint_forkserver` backend's child runtime is trio-native (uses `_trio_main` + receives `SpawnSpec` over IPC just like `trio`/`subint`), so `tractor.devx.debug._tty_lock` works in those subactors. Wire the runtime gates that historically hard-coded `_spawn_method == 'trio'` to recognize this third backend. Deats, - new `_DEBUG_COMPATIBLE_BACKENDS` module-const in `tractor._root` listing the spawn backends whose subactor runtime is trio-native (`'trio'`, `'subint_forkserver'`). Both the enable-site (`_runtime_vars['_debug_mode'] = True`) and the cleanup-site reset key. off the same tuple — keep them in lockstep when adding backends - `open_root_actor`'s `RuntimeError` for unsupported backends now reports the full compatible-set + the rejected method instead of the stale "only `trio`" msg. - `runtime._runtime.Actor._from_parent`'s SpawnSpec-recv gate adds `'subint_forkserver'` to the existing `('trio', 'subint')` tuple — fork child-side runtime receives the same SpawnSpec IPC handshake as the others. - `subint_forkserver_proc` child-target now passes `spawn_method='subint_forkserver'` (was hard-coded `'trio'`) so `Actor.pformat()` / log lines reflect the actual parent-side spawn mechanism rather than masquerading as plain `trio`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-23 11:39:42 -04:00
Gud Boi	4227eabcba	Drop unneeded f-str prefixes	2026-04-23 11:04:10 -04:00
Gud Boi	af94080b21	Shorten some timeouts in `subint_forkserver` suites	2026-04-23 11:01:56 -04:00
Gud Boi	36cca96385	Refine `subint_forkserver` orphan-SIGINT diagnosis Empirical follow-up to the xfail'd orphan-SIGINT test: the hang is not "trio can't install a handler on a non-main thread" (the original hypothesis from the `child_sigint` scaffold commit). On py3.14: - `threading.current_thread() is threading.main_thread()` IS True post-fork — CPython re-designates the fork-inheriting thread as "main" correctly - trio's `KIManager` SIGINT handler IS installed in the subactor (`signal.getsignal(SIGINT)` confirms) - the kernel DOES deliver SIGINT to the thread But `faulthandler` dumps show the subactor wedged in `trio/_core/_io_epoll.py::get_events` — trio's wakeup-fd mechanism (which turns SIGINT into an epoll-wake) isn't firing. So the `except KeyboardInterrupt` at `tractor/spawn/_entry.py::_trio_main:164` — the runtime's intentional "KBI-as-OS-cancel" path — never fires. Deats, - new `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md` (+385 LOC): full writeup — TL;DR, symptom reproducer, the "intentional cancel path" the bug defeats, diagnostic evidence (`faulthandler` output + `getsignal` probe), ruled-out hypotheses (non-main-thread issue, wakeup-fd inheritance, KBI-as-trio-check-exception), and fix directions - `test_orphaned_subactor_sigint_cleanup_DRAFT` xfail `reason` + test docstring rewritten to match the refined understanding — old wording blamed the non-main-thread path, new wording points at the `epoll_wait` wedge + cross-refs the new conc-anal doc - `_subint_forkserver` module docstring's `child_sigint='trio'` bullet updated: now notes trio's handler is already correctly installed, so the flag may end up a no-op / doc-only mode once the real root cause is fixed Closing the gap aligns with existing design intent (make the already-designed "KBI-as-OS-cancel" behavior actually fire), not a new feature. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-23 09:31:32 -04:00