2026-04-22 22:00:06 +00:00
|
|
|
'''
|
|
|
|
|
Integration exercises for the `tractor.spawn._subint_forkserver`
|
Wire `subint_forkserver` as first-class backend
Promote `_subint_forkserver` from primitives-only into a
registered spawn backend: `'subint_forkserver'` is now a
`SpawnMethodKey` literal, dispatched via `_methods` to
the new `subint_forkserver_proc()` target, feature-gated
under the existing `subint`-family py3.14+ case, and
selectable via `--spawn-backend=subint_forkserver`.
Deats,
- new `subint_forkserver_proc()` spawn target in
`_subint_forkserver`:
- mirrors `trio_proc()`'s supervision model — real OS
subprocess so `Portal.cancel_actor()` + `soft_kill()`
on graceful teardown, `os.kill(SIGKILL)` on hard-reap
(no `_interpreters.destroy()` race to fuss over bc the
child lives in its own process)
- only real diff from `trio_proc` is the spawn mechanism:
fork from a main-interp worker thread via
`fork_from_worker_thread()` (off-loaded to trio's
thread pool) instead of `trio.lowlevel.open_process()`
- child-side `_child_target` closure runs
`tractor._child._actor_child_main()` with
`spawn_method='trio'` — the child is a regular trio
actor, "subint_forkserver" names how the parent
spawned, not what the child runs
- new `_ForkedProc` class — thin `trio.Process`-compatible
shim around a raw OS pid: `.poll()` via
`waitpid(WNOHANG)`, async `.wait()` off-loaded to a trio
cache thread, `.kill()` via `SIGKILL`, `.returncode`
cached for repeat calls. `.stdin`/`.stdout`/`.stderr`
are `None` (fork-w/o-exec inherits parent FDs; we don't
marshal them) which matches `soft_kill()`'s `is not None`
guards
Also, new backend-tier test
`test_subint_forkserver_spawn_basic` drives the registered
backend end-to-end via `open_root_actor` + `open_nursery` +
`run_in_actor` w/ a trivial portal-RPC round-trip. Uses a
`forkserver_spawn_method` fixture to flip
`_spawn_method`/`_ctx` for the test's duration + restore on
teardown (so other session-level tests don't observe the
global flip). Test module docstring reworked to describe
the three tiers now covered: (1) primitive-level, (2)
parent-trio-driven primitives, (3) full registered backend.
Status: still-open work (tracked on `tractor#379`) doc'd
inline in the module docstring — no cancel/hard-kill stress
coverage yet, child-side subint-hosted root runtime still
future (gated on `msgspec#563`), thread-hygiene audit
pending the same unblock.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 22:49:23 +00:00
|
|
|
submodule at three tiers:
|
|
|
|
|
|
|
|
|
|
1. the low-level primitives
|
|
|
|
|
(`fork_from_worker_thread()` +
|
|
|
|
|
`run_subint_in_worker_thread()`) driven from inside a real
|
|
|
|
|
`trio.run()` in the parent process,
|
|
|
|
|
|
|
|
|
|
2. the full `subint_forkserver_proc` spawn backend wired
|
|
|
|
|
through tractor's normal actor-nursery + portal-RPC
|
|
|
|
|
machinery — i.e. `open_root_actor` + `open_nursery` +
|
|
|
|
|
`run_in_actor` against a subactor spawned via fork from a
|
|
|
|
|
main-interp worker thread.
|
2026-04-22 22:00:06 +00:00
|
|
|
|
|
|
|
|
Background
|
|
|
|
|
----------
|
|
|
|
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
|
|
|
|
establishes that `os.fork()` from a non-main sub-interpreter
|
|
|
|
|
aborts the child at the CPython level. The sibling
|
|
|
|
|
`subint_fork_from_main_thread_smoketest.py` proves the escape
|
|
|
|
|
hatch: fork from a main-interp *worker thread* (one that has
|
|
|
|
|
never entered a subint) works, and the forked child can then
|
|
|
|
|
host its own `trio.run()` inside a fresh subint.
|
|
|
|
|
|
|
|
|
|
Those smoke-test scenarios are standalone — no trio runtime
|
Wire `subint_forkserver` as first-class backend
Promote `_subint_forkserver` from primitives-only into a
registered spawn backend: `'subint_forkserver'` is now a
`SpawnMethodKey` literal, dispatched via `_methods` to
the new `subint_forkserver_proc()` target, feature-gated
under the existing `subint`-family py3.14+ case, and
selectable via `--spawn-backend=subint_forkserver`.
Deats,
- new `subint_forkserver_proc()` spawn target in
`_subint_forkserver`:
- mirrors `trio_proc()`'s supervision model — real OS
subprocess so `Portal.cancel_actor()` + `soft_kill()`
on graceful teardown, `os.kill(SIGKILL)` on hard-reap
(no `_interpreters.destroy()` race to fuss over bc the
child lives in its own process)
- only real diff from `trio_proc` is the spawn mechanism:
fork from a main-interp worker thread via
`fork_from_worker_thread()` (off-loaded to trio's
thread pool) instead of `trio.lowlevel.open_process()`
- child-side `_child_target` closure runs
`tractor._child._actor_child_main()` with
`spawn_method='trio'` — the child is a regular trio
actor, "subint_forkserver" names how the parent
spawned, not what the child runs
- new `_ForkedProc` class — thin `trio.Process`-compatible
shim around a raw OS pid: `.poll()` via
`waitpid(WNOHANG)`, async `.wait()` off-loaded to a trio
cache thread, `.kill()` via `SIGKILL`, `.returncode`
cached for repeat calls. `.stdin`/`.stdout`/`.stderr`
are `None` (fork-w/o-exec inherits parent FDs; we don't
marshal them) which matches `soft_kill()`'s `is not None`
guards
Also, new backend-tier test
`test_subint_forkserver_spawn_basic` drives the registered
backend end-to-end via `open_root_actor` + `open_nursery` +
`run_in_actor` w/ a trivial portal-RPC round-trip. Uses a
`forkserver_spawn_method` fixture to flip
`_spawn_method`/`_ctx` for the test's duration + restore on
teardown (so other session-level tests don't observe the
global flip). Test module docstring reworked to describe
the three tiers now covered: (1) primitive-level, (2)
parent-trio-driven primitives, (3) full registered backend.
Status: still-open work (tracked on `tractor#379`) doc'd
inline in the module docstring — no cancel/hard-kill stress
coverage yet, child-side subint-hosted root runtime still
future (gated on `msgspec#563`), thread-hygiene audit
pending the same unblock.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 22:49:23 +00:00
|
|
|
in the *parent*. Tiers (1)+(2) here cover the primitives
|
|
|
|
|
driven from inside `trio.run()` in the parent, and tier (3)
|
|
|
|
|
(the `*_spawn_basic` test) drives the registered
|
|
|
|
|
`subint_forkserver` spawn backend end-to-end against the
|
|
|
|
|
tractor runtime.
|
2026-04-22 22:00:06 +00:00
|
|
|
|
|
|
|
|
Gating
|
|
|
|
|
------
|
|
|
|
|
- py3.14+ (via `concurrent.interpreters` presence)
|
Wire `subint_forkserver` as first-class backend
Promote `_subint_forkserver` from primitives-only into a
registered spawn backend: `'subint_forkserver'` is now a
`SpawnMethodKey` literal, dispatched via `_methods` to
the new `subint_forkserver_proc()` target, feature-gated
under the existing `subint`-family py3.14+ case, and
selectable via `--spawn-backend=subint_forkserver`.
Deats,
- new `subint_forkserver_proc()` spawn target in
`_subint_forkserver`:
- mirrors `trio_proc()`'s supervision model — real OS
subprocess so `Portal.cancel_actor()` + `soft_kill()`
on graceful teardown, `os.kill(SIGKILL)` on hard-reap
(no `_interpreters.destroy()` race to fuss over bc the
child lives in its own process)
- only real diff from `trio_proc` is the spawn mechanism:
fork from a main-interp worker thread via
`fork_from_worker_thread()` (off-loaded to trio's
thread pool) instead of `trio.lowlevel.open_process()`
- child-side `_child_target` closure runs
`tractor._child._actor_child_main()` with
`spawn_method='trio'` — the child is a regular trio
actor, "subint_forkserver" names how the parent
spawned, not what the child runs
- new `_ForkedProc` class — thin `trio.Process`-compatible
shim around a raw OS pid: `.poll()` via
`waitpid(WNOHANG)`, async `.wait()` off-loaded to a trio
cache thread, `.kill()` via `SIGKILL`, `.returncode`
cached for repeat calls. `.stdin`/`.stdout`/`.stderr`
are `None` (fork-w/o-exec inherits parent FDs; we don't
marshal them) which matches `soft_kill()`'s `is not None`
guards
Also, new backend-tier test
`test_subint_forkserver_spawn_basic` drives the registered
backend end-to-end via `open_root_actor` + `open_nursery` +
`run_in_actor` w/ a trivial portal-RPC round-trip. Uses a
`forkserver_spawn_method` fixture to flip
`_spawn_method`/`_ctx` for the test's duration + restore on
teardown (so other session-level tests don't observe the
global flip). Test module docstring reworked to describe
the three tiers now covered: (1) primitive-level, (2)
parent-trio-driven primitives, (3) full registered backend.
Status: still-open work (tracked on `tractor#379`) doc'd
inline in the module docstring — no cancel/hard-kill stress
coverage yet, child-side subint-hosted root runtime still
future (gated on `msgspec#563`), thread-hygiene audit
pending the same unblock.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 22:49:23 +00:00
|
|
|
- no `--spawn-backend` restriction — the backend-level test
|
|
|
|
|
flips `tractor.spawn._spawn._spawn_method` programmatically
|
|
|
|
|
(via `try_set_start_method('subint_forkserver')`) and
|
|
|
|
|
restores it on teardown, so these tests are independent of
|
|
|
|
|
the session-level CLI backend choice.
|
2026-04-22 22:00:06 +00:00
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
from __future__ import annotations
|
|
|
|
|
from functools import partial
|
|
|
|
|
import os
|
2026-04-23 15:01:56 +00:00
|
|
|
from pathlib import Path
|
Add DRAFT `subint_forkserver` orphan-SIGINT test
Tier-4 test `test_orphaned_subactor_sigint_cleanup_DRAFT`
documents an empirical SIGINT-delivery gap in the
`subint_forkserver` backend: when the parent dies via
`SIGKILL` (no IPC `Portal.cancel_actor()` possible) and
`SIGINT` is sent to the orphan child, the child DOES NOT
unwind — CPython's default `KeyboardInterrupt` is delivered
to `threading.main_thread()`, whose tstate is dead in the
post-fork child bc fork inherited the worker thread, not
main. Trio running on the fork-inherited worker thread
therefore never observes the signal. Marked
`xfail(strict=True)` so the mark flips to XPASS→fail once
the backend grows explicit SIGINT plumbing.
Deats,
- harness runs the failure-mode sequence out-of-process:
1. harness subprocess runs a fresh Python script
that calls `try_set_start_method('subint_forkserver')`
then opens a root actor + one `sleep_forever` subactor
2. parse `PARENT_READY=<pid>` + `CHILD_PID=<pid>` markers
off harness `stdout` to confirm IPC handshake
completed
3. `SIGKILL` the parent, `proc.wait()` to reap the
zombie (otherwise `os.kill(pid, 0)` keeps reporting
it alive)
4. assert the child survived the parent-reap (i.e. was
actually orphaned, not reaped too) before moving on
5. `SIGINT` the orphan child, poll `os.kill(child_pid, 0)`
every 100ms for up to 10s
- supporting helpers: `_read_marker()` with per-proc
bytes-buffer to carry partial lines across calls,
`_process_alive()` liveness probe via `kill(pid, 0)`
- Linux-only via `platform.system() != 'Linux'` skip —
orphan-reparenting semantics don't generalize to
other platforms
- port offset (`reg_addr[1] + 17`) so the harness listener
doesn't race concurrently-running backend tests
- best-effort `finally:` cleanup: `SIGKILL` any still-alive
pids + `proc.kill()` + bounded `proc.wait()` to avoid
leaking orphans across the session
Also, tier-4 header comment documents the cross-backend
generalization path: applicable to any multi-process
backend (`trio`, `mp_spawn`, `mp_forkserver`,
`subint_forkserver`), NOT to plain `subint` (in-process
subints have no orphan OS-child). Move path: lift
harness into `tests/_orphan_harness.py`, parametrize on
session `_spawn_method`, add
`skipif _spawn_method == 'subint'`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 23:39:41 +00:00
|
|
|
import platform
|
|
|
|
|
import select
|
|
|
|
|
import signal
|
|
|
|
|
import subprocess
|
|
|
|
|
import sys
|
|
|
|
|
import time
|
2026-04-22 22:00:06 +00:00
|
|
|
|
|
|
|
|
import pytest
|
|
|
|
|
import trio
|
|
|
|
|
|
Wire `subint_forkserver` as first-class backend
Promote `_subint_forkserver` from primitives-only into a
registered spawn backend: `'subint_forkserver'` is now a
`SpawnMethodKey` literal, dispatched via `_methods` to
the new `subint_forkserver_proc()` target, feature-gated
under the existing `subint`-family py3.14+ case, and
selectable via `--spawn-backend=subint_forkserver`.
Deats,
- new `subint_forkserver_proc()` spawn target in
`_subint_forkserver`:
- mirrors `trio_proc()`'s supervision model — real OS
subprocess so `Portal.cancel_actor()` + `soft_kill()`
on graceful teardown, `os.kill(SIGKILL)` on hard-reap
(no `_interpreters.destroy()` race to fuss over bc the
child lives in its own process)
- only real diff from `trio_proc` is the spawn mechanism:
fork from a main-interp worker thread via
`fork_from_worker_thread()` (off-loaded to trio's
thread pool) instead of `trio.lowlevel.open_process()`
- child-side `_child_target` closure runs
`tractor._child._actor_child_main()` with
`spawn_method='trio'` — the child is a regular trio
actor, "subint_forkserver" names how the parent
spawned, not what the child runs
- new `_ForkedProc` class — thin `trio.Process`-compatible
shim around a raw OS pid: `.poll()` via
`waitpid(WNOHANG)`, async `.wait()` off-loaded to a trio
cache thread, `.kill()` via `SIGKILL`, `.returncode`
cached for repeat calls. `.stdin`/`.stdout`/`.stderr`
are `None` (fork-w/o-exec inherits parent FDs; we don't
marshal them) which matches `soft_kill()`'s `is not None`
guards
Also, new backend-tier test
`test_subint_forkserver_spawn_basic` drives the registered
backend end-to-end via `open_root_actor` + `open_nursery` +
`run_in_actor` w/ a trivial portal-RPC round-trip. Uses a
`forkserver_spawn_method` fixture to flip
`_spawn_method`/`_ctx` for the test's duration + restore on
teardown (so other session-level tests don't observe the
global flip). Test module docstring reworked to describe
the three tiers now covered: (1) primitive-level, (2)
parent-trio-driven primitives, (3) full registered backend.
Status: still-open work (tracked on `tractor#379`) doc'd
inline in the module docstring — no cancel/hard-kill stress
coverage yet, child-side subint-hosted root runtime still
future (gated on `msgspec#563`), thread-hygiene audit
pending the same unblock.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 22:49:23 +00:00
|
|
|
import tractor
|
2026-04-22 22:00:06 +00:00
|
|
|
from tractor.devx import dump_on_hang
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Gate: subint forkserver primitives require py3.14+. Check
|
|
|
|
|
# the public stdlib wrapper's presence (added in 3.14) rather
|
|
|
|
|
# than `_interpreters` directly — see
|
|
|
|
|
# `tractor.spawn._subint` for why.
|
|
|
|
|
pytest.importorskip('concurrent.interpreters')
|
|
|
|
|
|
|
|
|
|
from tractor.spawn._subint_forkserver import ( # noqa: E402
|
|
|
|
|
fork_from_worker_thread,
|
|
|
|
|
run_subint_in_worker_thread,
|
|
|
|
|
wait_child,
|
|
|
|
|
)
|
Wire `subint_forkserver` as first-class backend
Promote `_subint_forkserver` from primitives-only into a
registered spawn backend: `'subint_forkserver'` is now a
`SpawnMethodKey` literal, dispatched via `_methods` to
the new `subint_forkserver_proc()` target, feature-gated
under the existing `subint`-family py3.14+ case, and
selectable via `--spawn-backend=subint_forkserver`.
Deats,
- new `subint_forkserver_proc()` spawn target in
`_subint_forkserver`:
- mirrors `trio_proc()`'s supervision model — real OS
subprocess so `Portal.cancel_actor()` + `soft_kill()`
on graceful teardown, `os.kill(SIGKILL)` on hard-reap
(no `_interpreters.destroy()` race to fuss over bc the
child lives in its own process)
- only real diff from `trio_proc` is the spawn mechanism:
fork from a main-interp worker thread via
`fork_from_worker_thread()` (off-loaded to trio's
thread pool) instead of `trio.lowlevel.open_process()`
- child-side `_child_target` closure runs
`tractor._child._actor_child_main()` with
`spawn_method='trio'` — the child is a regular trio
actor, "subint_forkserver" names how the parent
spawned, not what the child runs
- new `_ForkedProc` class — thin `trio.Process`-compatible
shim around a raw OS pid: `.poll()` via
`waitpid(WNOHANG)`, async `.wait()` off-loaded to a trio
cache thread, `.kill()` via `SIGKILL`, `.returncode`
cached for repeat calls. `.stdin`/`.stdout`/`.stderr`
are `None` (fork-w/o-exec inherits parent FDs; we don't
marshal them) which matches `soft_kill()`'s `is not None`
guards
Also, new backend-tier test
`test_subint_forkserver_spawn_basic` drives the registered
backend end-to-end via `open_root_actor` + `open_nursery` +
`run_in_actor` w/ a trivial portal-RPC round-trip. Uses a
`forkserver_spawn_method` fixture to flip
`_spawn_method`/`_ctx` for the test's duration + restore on
teardown (so other session-level tests don't observe the
global flip). Test module docstring reworked to describe
the three tiers now covered: (1) primitive-level, (2)
parent-trio-driven primitives, (3) full registered backend.
Status: still-open work (tracked on `tractor#379`) doc'd
inline in the module docstring — no cancel/hard-kill stress
coverage yet, child-side subint-hosted root runtime still
future (gated on `msgspec#563`), thread-hygiene audit
pending the same unblock.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 22:49:23 +00:00
|
|
|
from tractor.spawn import _spawn as _spawn_mod # noqa: E402
|
|
|
|
|
from tractor.spawn._spawn import try_set_start_method # noqa: E402
|
2026-04-22 22:00:06 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
# child-side callables (passed via `child_target=` across fork)
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_CHILD_TRIO_BOOTSTRAP: str = (
|
|
|
|
|
'import trio\n'
|
|
|
|
|
'async def _main():\n'
|
|
|
|
|
' await trio.sleep(0.05)\n'
|
|
|
|
|
' return 42\n'
|
|
|
|
|
'result = trio.run(_main)\n'
|
|
|
|
|
'assert result == 42, f"trio.run returned {result}"\n'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _child_trio_in_subint() -> int:
|
|
|
|
|
'''
|
|
|
|
|
`child_target` for the trio-in-child scenario: drive a
|
|
|
|
|
trivial `trio.run()` inside a fresh legacy-config subint
|
|
|
|
|
on a worker thread.
|
|
|
|
|
|
|
|
|
|
Returns an exit code suitable for `os._exit()`:
|
|
|
|
|
- 0: subint-hosted `trio.run()` succeeded
|
|
|
|
|
- 3: driver thread hang (timeout inside `run_subint_in_worker_thread`)
|
|
|
|
|
- 4: subint bootstrap raised some other exception
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
try:
|
|
|
|
|
run_subint_in_worker_thread(
|
|
|
|
|
_CHILD_TRIO_BOOTSTRAP,
|
|
|
|
|
thread_name='child-subint-trio-thread',
|
|
|
|
|
)
|
|
|
|
|
except RuntimeError:
|
|
|
|
|
# timeout / thread-never-returned
|
|
|
|
|
return 3
|
|
|
|
|
except BaseException:
|
|
|
|
|
return 4
|
|
|
|
|
return 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
# parent-side harnesses (run inside `trio.run()`)
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
async def run_fork_in_non_trio_thread(
|
|
|
|
|
deadline: float,
|
|
|
|
|
*,
|
|
|
|
|
child_target=None,
|
|
|
|
|
) -> int:
|
|
|
|
|
'''
|
|
|
|
|
From inside a parent `trio.run()`, off-load the
|
|
|
|
|
forkserver primitive to a main-interp worker thread via
|
|
|
|
|
`trio.to_thread.run_sync()` and return the forked child's
|
|
|
|
|
pid.
|
|
|
|
|
|
|
|
|
|
Then `wait_child()` on that pid (also off-loaded so we
|
|
|
|
|
don't block trio's event loop on `waitpid()`) and assert
|
|
|
|
|
the child exited cleanly.
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
with trio.fail_after(deadline):
|
|
|
|
|
# NOTE: `fork_from_worker_thread` internally spawns its
|
|
|
|
|
# own dedicated `threading.Thread` (not from trio's
|
|
|
|
|
# cache) and joins it before returning — so we can
|
|
|
|
|
# safely off-load via `to_thread.run_sync` without
|
|
|
|
|
# worrying about the trio-thread-cache recycling the
|
|
|
|
|
# runner. Pass `abandon_on_cancel=False` for the
|
|
|
|
|
# same "bounded + clean" rationale we use in
|
|
|
|
|
# `_subint.subint_proc`.
|
|
|
|
|
pid: int = await trio.to_thread.run_sync(
|
|
|
|
|
partial(
|
|
|
|
|
fork_from_worker_thread,
|
|
|
|
|
child_target,
|
|
|
|
|
thread_name='test-subint-forkserver',
|
|
|
|
|
),
|
|
|
|
|
abandon_on_cancel=False,
|
|
|
|
|
)
|
|
|
|
|
assert pid > 0
|
|
|
|
|
|
|
|
|
|
ok, status_str = await trio.to_thread.run_sync(
|
|
|
|
|
partial(
|
|
|
|
|
wait_child,
|
|
|
|
|
pid,
|
|
|
|
|
expect_exit_ok=True,
|
|
|
|
|
),
|
|
|
|
|
abandon_on_cancel=False,
|
|
|
|
|
)
|
|
|
|
|
assert ok, (
|
|
|
|
|
f'forked child did not exit cleanly: '
|
|
|
|
|
f'{status_str}'
|
|
|
|
|
)
|
|
|
|
|
return pid
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
# tests
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Bounded wall-clock via `pytest-timeout` (`method='thread'`)
|
|
|
|
|
# for the usual GIL-hostage safety reason documented in the
|
|
|
|
|
# sibling `test_subint_cancellation.py` / the class-A
|
|
|
|
|
# `subint_sigint_starvation_issue.md`. Each test also has an
|
|
|
|
|
# inner `trio.fail_after()` so assertion failures fire fast
|
|
|
|
|
# under normal conditions.
|
|
|
|
|
@pytest.mark.timeout(30, method='thread')
|
2026-04-23 15:01:56 +00:00
|
|
|
def test_fork_from_worker_thread_via_trio(
|
|
|
|
|
) -> None:
|
2026-04-22 22:00:06 +00:00
|
|
|
'''
|
|
|
|
|
Baseline: inside `trio.run()`, call
|
|
|
|
|
`fork_from_worker_thread()` via `trio.to_thread.run_sync()`,
|
|
|
|
|
get a child pid back, reap the child cleanly.
|
|
|
|
|
|
|
|
|
|
No trio-in-child. If this regresses we know the parent-
|
|
|
|
|
side trio↔worker-thread plumbing is broken independent
|
|
|
|
|
of any child-side subint machinery.
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
deadline: float = 10.0
|
|
|
|
|
with dump_on_hang(
|
|
|
|
|
seconds=deadline,
|
|
|
|
|
path='/tmp/subint_forkserver_baseline.dump',
|
|
|
|
|
):
|
|
|
|
|
pid: int = trio.run(
|
|
|
|
|
partial(run_fork_in_non_trio_thread, deadline),
|
|
|
|
|
)
|
|
|
|
|
# parent-side sanity — we got a real pid back.
|
|
|
|
|
assert isinstance(pid, int) and pid > 0
|
|
|
|
|
# by now the child has been waited on; it shouldn't be
|
|
|
|
|
# reap-able again.
|
|
|
|
|
with pytest.raises((ChildProcessError, OSError)):
|
|
|
|
|
os.waitpid(pid, os.WNOHANG)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.timeout(30, method='thread')
|
|
|
|
|
def test_fork_and_run_trio_in_child() -> None:
|
|
|
|
|
'''
|
|
|
|
|
End-to-end: inside the parent's `trio.run()`, off-load
|
|
|
|
|
`fork_from_worker_thread()` to a worker thread, have the
|
|
|
|
|
forked child then create a fresh subint and run
|
|
|
|
|
`trio.run()` inside it on yet another worker thread.
|
|
|
|
|
|
|
|
|
|
This is the full "forkserver + trio-in-subint-in-child"
|
|
|
|
|
pattern the proposed `subint_forkserver` spawn backend
|
|
|
|
|
would rest on.
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
deadline: float = 15.0
|
|
|
|
|
with dump_on_hang(
|
|
|
|
|
seconds=deadline,
|
|
|
|
|
path='/tmp/subint_forkserver_trio_in_child.dump',
|
|
|
|
|
):
|
|
|
|
|
pid: int = trio.run(
|
|
|
|
|
partial(
|
|
|
|
|
run_fork_in_non_trio_thread,
|
|
|
|
|
deadline,
|
|
|
|
|
child_target=_child_trio_in_subint,
|
|
|
|
|
),
|
|
|
|
|
)
|
|
|
|
|
assert isinstance(pid, int) and pid > 0
|
Wire `subint_forkserver` as first-class backend
Promote `_subint_forkserver` from primitives-only into a
registered spawn backend: `'subint_forkserver'` is now a
`SpawnMethodKey` literal, dispatched via `_methods` to
the new `subint_forkserver_proc()` target, feature-gated
under the existing `subint`-family py3.14+ case, and
selectable via `--spawn-backend=subint_forkserver`.
Deats,
- new `subint_forkserver_proc()` spawn target in
`_subint_forkserver`:
- mirrors `trio_proc()`'s supervision model — real OS
subprocess so `Portal.cancel_actor()` + `soft_kill()`
on graceful teardown, `os.kill(SIGKILL)` on hard-reap
(no `_interpreters.destroy()` race to fuss over bc the
child lives in its own process)
- only real diff from `trio_proc` is the spawn mechanism:
fork from a main-interp worker thread via
`fork_from_worker_thread()` (off-loaded to trio's
thread pool) instead of `trio.lowlevel.open_process()`
- child-side `_child_target` closure runs
`tractor._child._actor_child_main()` with
`spawn_method='trio'` — the child is a regular trio
actor, "subint_forkserver" names how the parent
spawned, not what the child runs
- new `_ForkedProc` class — thin `trio.Process`-compatible
shim around a raw OS pid: `.poll()` via
`waitpid(WNOHANG)`, async `.wait()` off-loaded to a trio
cache thread, `.kill()` via `SIGKILL`, `.returncode`
cached for repeat calls. `.stdin`/`.stdout`/`.stderr`
are `None` (fork-w/o-exec inherits parent FDs; we don't
marshal them) which matches `soft_kill()`'s `is not None`
guards
Also, new backend-tier test
`test_subint_forkserver_spawn_basic` drives the registered
backend end-to-end via `open_root_actor` + `open_nursery` +
`run_in_actor` w/ a trivial portal-RPC round-trip. Uses a
`forkserver_spawn_method` fixture to flip
`_spawn_method`/`_ctx` for the test's duration + restore on
teardown (so other session-level tests don't observe the
global flip). Test module docstring reworked to describe
the three tiers now covered: (1) primitive-level, (2)
parent-trio-driven primitives, (3) full registered backend.
Status: still-open work (tracked on `tractor#379`) doc'd
inline in the module docstring — no cancel/hard-kill stress
coverage yet, child-side subint-hosted root runtime still
future (gated on `msgspec#563`), thread-hygiene audit
pending the same unblock.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 22:49:23 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
# tier-3 backend test: drive the registered `subint_forkserver`
|
|
|
|
|
# spawn backend end-to-end through tractor's actor-nursery +
|
|
|
|
|
# portal-RPC machinery.
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
async def _trivial_rpc() -> str:
|
|
|
|
|
'''
|
|
|
|
|
Minimal subactor-side RPC body: just return a sentinel
|
|
|
|
|
string the parent can assert on.
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
return 'hello from subint-forkserver child'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
async def _happy_path_forkserver(
|
|
|
|
|
reg_addr: tuple[str, int | str],
|
|
|
|
|
deadline: float,
|
|
|
|
|
) -> None:
|
|
|
|
|
'''
|
|
|
|
|
Parent-side harness: stand up a root actor, open an actor
|
|
|
|
|
nursery, spawn one subactor via the currently-selected
|
|
|
|
|
spawn backend (which this test will have flipped to
|
|
|
|
|
`subint_forkserver`), run a trivial RPC through its
|
|
|
|
|
portal, assert the round-trip result.
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
with trio.fail_after(deadline):
|
|
|
|
|
async with (
|
|
|
|
|
tractor.open_root_actor(
|
|
|
|
|
registry_addrs=[reg_addr],
|
|
|
|
|
),
|
|
|
|
|
tractor.open_nursery() as an,
|
|
|
|
|
):
|
|
|
|
|
portal: tractor.Portal = await an.run_in_actor(
|
|
|
|
|
_trivial_rpc,
|
|
|
|
|
name='subint-forkserver-child',
|
|
|
|
|
)
|
|
|
|
|
result: str = await portal.wait_for_result()
|
|
|
|
|
assert result == 'hello from subint-forkserver child'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.fixture
|
|
|
|
|
def forkserver_spawn_method():
|
|
|
|
|
'''
|
|
|
|
|
Flip `tractor.spawn._spawn._spawn_method` to
|
|
|
|
|
`'subint_forkserver'` for the duration of a test, then
|
|
|
|
|
restore whatever was in place before (usually the
|
|
|
|
|
session-level CLI choice, typically `'trio'`).
|
|
|
|
|
|
|
|
|
|
Without this, other tests in the same session would
|
|
|
|
|
observe the global flip and start spawning via fork —
|
|
|
|
|
which is almost certainly NOT what their assertions were
|
|
|
|
|
written against.
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
prev_method: str = _spawn_mod._spawn_method
|
|
|
|
|
prev_ctx = _spawn_mod._ctx
|
|
|
|
|
try_set_start_method('subint_forkserver')
|
|
|
|
|
try:
|
|
|
|
|
yield
|
|
|
|
|
finally:
|
|
|
|
|
_spawn_mod._spawn_method = prev_method
|
|
|
|
|
_spawn_mod._ctx = prev_ctx
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@pytest.mark.timeout(60, method='thread')
|
|
|
|
|
def test_subint_forkserver_spawn_basic(
|
|
|
|
|
reg_addr: tuple[str, int | str],
|
|
|
|
|
forkserver_spawn_method,
|
|
|
|
|
) -> None:
|
|
|
|
|
'''
|
|
|
|
|
Happy-path: spawn ONE subactor via the
|
|
|
|
|
`subint_forkserver` backend (parent-side fork from a
|
|
|
|
|
main-interp worker thread), do a trivial portal-RPC
|
|
|
|
|
round-trip, tear the nursery down cleanly.
|
|
|
|
|
|
|
|
|
|
If this passes, the "forkserver + tractor runtime" arch
|
|
|
|
|
is proven end-to-end: the registered
|
|
|
|
|
`subint_forkserver_proc` spawn target successfully
|
|
|
|
|
forks a child, the child runs `_actor_child_main()` +
|
|
|
|
|
completes IPC handshake + serves an RPC, and the parent
|
|
|
|
|
reaps via `_ForkedProc.wait()` without regressing any of
|
|
|
|
|
the normal nursery teardown invariants.
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
deadline: float = 20.0
|
|
|
|
|
with dump_on_hang(
|
|
|
|
|
seconds=deadline,
|
|
|
|
|
path='/tmp/subint_forkserver_spawn_basic.dump',
|
|
|
|
|
):
|
|
|
|
|
trio.run(
|
|
|
|
|
partial(
|
|
|
|
|
_happy_path_forkserver,
|
|
|
|
|
reg_addr,
|
|
|
|
|
deadline,
|
|
|
|
|
),
|
|
|
|
|
)
|
Add DRAFT `subint_forkserver` orphan-SIGINT test
Tier-4 test `test_orphaned_subactor_sigint_cleanup_DRAFT`
documents an empirical SIGINT-delivery gap in the
`subint_forkserver` backend: when the parent dies via
`SIGKILL` (no IPC `Portal.cancel_actor()` possible) and
`SIGINT` is sent to the orphan child, the child DOES NOT
unwind — CPython's default `KeyboardInterrupt` is delivered
to `threading.main_thread()`, whose tstate is dead in the
post-fork child bc fork inherited the worker thread, not
main. Trio running on the fork-inherited worker thread
therefore never observes the signal. Marked
`xfail(strict=True)` so the mark flips to XPASS→fail once
the backend grows explicit SIGINT plumbing.
Deats,
- harness runs the failure-mode sequence out-of-process:
1. harness subprocess runs a fresh Python script
that calls `try_set_start_method('subint_forkserver')`
then opens a root actor + one `sleep_forever` subactor
2. parse `PARENT_READY=<pid>` + `CHILD_PID=<pid>` markers
off harness `stdout` to confirm IPC handshake
completed
3. `SIGKILL` the parent, `proc.wait()` to reap the
zombie (otherwise `os.kill(pid, 0)` keeps reporting
it alive)
4. assert the child survived the parent-reap (i.e. was
actually orphaned, not reaped too) before moving on
5. `SIGINT` the orphan child, poll `os.kill(child_pid, 0)`
every 100ms for up to 10s
- supporting helpers: `_read_marker()` with per-proc
bytes-buffer to carry partial lines across calls,
`_process_alive()` liveness probe via `kill(pid, 0)`
- Linux-only via `platform.system() != 'Linux'` skip —
orphan-reparenting semantics don't generalize to
other platforms
- port offset (`reg_addr[1] + 17`) so the harness listener
doesn't race concurrently-running backend tests
- best-effort `finally:` cleanup: `SIGKILL` any still-alive
pids + `proc.kill()` + bounded `proc.wait()` to avoid
leaking orphans across the session
Also, tier-4 header comment documents the cross-backend
generalization path: applicable to any multi-process
backend (`trio`, `mp_spawn`, `mp_forkserver`,
`subint_forkserver`), NOT to plain `subint` (in-process
subints have no orphan OS-child). Move path: lift
harness into `tests/_orphan_harness.py`, parametrize on
session `_spawn_method`, add
`skipif _spawn_method == 'subint'`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 23:39:41 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
# tier-4 DRAFT: orphaned-subactor SIGINT survivability
|
|
|
|
|
#
|
|
|
|
|
# Motivating question: with `subint_forkserver`, the child's
|
|
|
|
|
# `trio.run()` lives on the fork-inherited worker thread which
|
|
|
|
|
# is NOT `threading.main_thread()` — so trio cannot install its
|
|
|
|
|
# `signal.set_wakeup_fd`-based SIGINT handler. If the parent
|
|
|
|
|
# goes away via `SIGKILL` (no IPC `Portal.cancel_actor()`
|
|
|
|
|
# possible), does SIGINT on the orphan child cleanly tear it
|
|
|
|
|
# down via CPython's default `KeyboardInterrupt` delivery, or
|
|
|
|
|
# does it hang?
|
|
|
|
|
#
|
|
|
|
|
# Working hypothesis (unverified pre-this-test): post-fork the
|
|
|
|
|
# child is effectively single-threaded (only the fork-worker
|
|
|
|
|
# tstate survived), so SIGINT → default handler → raises
|
|
|
|
|
# `KeyboardInterrupt` on the only thread — which happens to be
|
|
|
|
|
# the one driving trio's event loop — so trio observes it at
|
|
|
|
|
# the next checkpoint. If so, we're "fine" on this backend
|
|
|
|
|
# despite the missing trio SIGINT handler.
|
|
|
|
|
#
|
|
|
|
|
# Cross-backend generalization (decide after this passes):
|
|
|
|
|
# - applicable to any backend whose subactors are separate OS
|
|
|
|
|
# processes: `trio`, `mp_spawn`, `mp_forkserver`,
|
|
|
|
|
# `subint_forkserver`.
|
|
|
|
|
# - NOT applicable to plain `subint` (subactors are in-process
|
|
|
|
|
# subinterpreters, no orphan child process to SIGINT).
|
|
|
|
|
# - move path: lift the harness script into
|
|
|
|
|
# `tests/_orphan_harness.py`, parametrize on the session's
|
|
|
|
|
# `_spawn_method`, add `skipif _spawn_method == 'subint'`.
|
|
|
|
|
# ----------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_ORPHAN_HARNESS_SCRIPT: str = '''
|
|
|
|
|
import os
|
|
|
|
|
import sys
|
|
|
|
|
import trio
|
|
|
|
|
import tractor
|
|
|
|
|
from tractor.spawn._spawn import try_set_start_method
|
|
|
|
|
|
|
|
|
|
async def _sleep_forever() -> None:
|
|
|
|
|
print(f"CHILD_PID={os.getpid()}", flush=True)
|
|
|
|
|
await trio.sleep_forever()
|
|
|
|
|
|
|
|
|
|
async def _main(reg_addr):
|
|
|
|
|
async with (
|
|
|
|
|
tractor.open_root_actor(registry_addrs=[reg_addr]),
|
|
|
|
|
tractor.open_nursery() as an,
|
|
|
|
|
):
|
|
|
|
|
portal = await an.run_in_actor(
|
|
|
|
|
_sleep_forever,
|
|
|
|
|
name="orphan-test-child",
|
|
|
|
|
)
|
|
|
|
|
print(f"PARENT_READY={os.getpid()}", flush=True)
|
|
|
|
|
await trio.sleep_forever()
|
|
|
|
|
|
|
|
|
|
if __name__ == "__main__":
|
|
|
|
|
backend = sys.argv[1]
|
|
|
|
|
host = sys.argv[2]
|
|
|
|
|
port = int(sys.argv[3])
|
|
|
|
|
try_set_start_method(backend)
|
|
|
|
|
trio.run(_main, (host, port))
|
|
|
|
|
'''
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _read_marker(
|
|
|
|
|
proc: subprocess.Popen,
|
|
|
|
|
marker: str,
|
|
|
|
|
timeout: float,
|
|
|
|
|
_buf: dict,
|
|
|
|
|
) -> str:
|
|
|
|
|
'''
|
|
|
|
|
Block until `<marker>=<value>\\n` appears on `proc.stdout`
|
|
|
|
|
and return `<value>`. Uses a per-proc byte buffer (`_buf`)
|
|
|
|
|
to carry partial lines across calls.
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
deadline: float = time.monotonic() + timeout
|
|
|
|
|
remainder: bytes = _buf.get('remainder', b'')
|
|
|
|
|
prefix: bytes = f'{marker}='.encode()
|
|
|
|
|
while time.monotonic() < deadline:
|
|
|
|
|
# drain any complete lines already buffered
|
|
|
|
|
while b'\n' in remainder:
|
|
|
|
|
line, remainder = remainder.split(b'\n', 1)
|
|
|
|
|
if line.startswith(prefix):
|
|
|
|
|
_buf['remainder'] = remainder
|
|
|
|
|
return line[len(prefix):].decode().strip()
|
|
|
|
|
ready, _, _ = select.select([proc.stdout], [], [], 0.2)
|
|
|
|
|
if not ready:
|
|
|
|
|
continue
|
|
|
|
|
chunk: bytes = os.read(proc.stdout.fileno(), 4096)
|
|
|
|
|
if not chunk:
|
|
|
|
|
break
|
|
|
|
|
remainder += chunk
|
|
|
|
|
_buf['remainder'] = remainder
|
|
|
|
|
raise TimeoutError(
|
|
|
|
|
f'Never observed marker {marker!r} on harness stdout '
|
|
|
|
|
f'within {timeout}s'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _process_alive(pid: int) -> bool:
|
|
|
|
|
'''Liveness probe for a pid we do NOT parent (post-orphan).'''
|
|
|
|
|
try:
|
|
|
|
|
os.kill(pid, 0)
|
|
|
|
|
return True
|
|
|
|
|
except ProcessLookupError:
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
|
Pin forkserver hang to pytest `--capture=fd`
Sixth and final diagnostic pass — after all 4
cascade fixes landed (FD hygiene, pidfd wait,
`_parent_chan_cs` wiring, bounded peer-clear), the
actual last gate on
`test_nested_multierrors[subint_forkserver]`
turned out to be **pytest's default
`--capture=fd` stdout/stderr capture**, not
anything in the runtime cascade.
Empirical result: `pytest -s` → test PASSES in
6.20s. Default `--capture=fd` → hangs forever.
Mechanism: pytest replaces the parent's fds 1,2
with pipe write-ends it reads from. Fork children
inherit those pipes (since `_close_inherited_fds`
correctly preserves stdio). The error-propagation
cascade in a multi-level cancel test generates
7+ actors each logging multiple `RemoteActorError`
/ `ExceptionGroup` tracebacks — enough output to
fill Linux's 64KB pipe buffer. Writes block,
subactors can't progress, processes don't exit,
`_ForkedProc.wait` hangs.
Self-critical aside: I earlier tested w/ and w/o
`-s` and both hung, concluding "capture-pipe
ruled out". That was wrong — at that time fixes
1-4 weren't all in place, so the test was
failing at deeper levels long before reaching
the "produce lots of output" phase. Once the
cascade could actually tear down cleanly, enough
output flowed to hit the pipe limit. Order-of-
operations mistake: ruling something out based
on a test that was failing for a different
reason.
Deats,
- `subint_forkserver_test_cancellation_leak_issue
.md`: new section "Update — VERY late: pytest
capture pipe IS the final gate" w/ DIAG timeline
showing `trio.run` fully returns, diagnosis of
pipe-fill mechanism, retrospective on the
earlier wrong ruling-out, and fix direction
(redirect subactor stdout/stderr to `/dev/null`
in fork-child prelude, conditional on
pytest-detection or opt-in flag)
- `tests/test_cancellation.py`: skip-mark reason
rewritten to describe the capture-pipe gate
specifically; cross-refs the new doc section
- `tests/spawn/test_subint_forkserver.py`: the
orphan-SIGINT test regresses back to xfail.
Previously passed after the FD-hygiene fix,
but the new `wait_for_no_more_peers(
move_on_after=3.0)` bound in `async_main`'s
teardown added up to 3s latency, pushing
orphan-subactor exit past the test's 10s poll
window. Real fix: faster orphan-side teardown
OR extend poll window to 15s
No runtime code changes in this commit — just
test-mark adjustments + doc wrap-up.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-24 03:18:14 +00:00
|
|
|
# Regressed back to xfail: previously passed after the
|
|
|
|
|
# fork-child FD-hygiene fix in `_close_inherited_fds()`,
|
|
|
|
|
# but the recent `wait_for_no_more_peers(move_on_after=3.0)`
|
|
|
|
|
# bound in `async_main`'s teardown added up to 3s to the
|
|
|
|
|
# orphan subactor's exit timeline, pushing it past the
|
|
|
|
|
# test's 10s poll window. Real fix requires making the
|
|
|
|
|
# bounded wait faster when the actor is orphaned, or
|
|
|
|
|
# increasing the test's poll window. See tracker doc
|
|
|
|
|
# `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`.
|
|
|
|
|
@pytest.mark.xfail(
|
|
|
|
|
strict=True,
|
|
|
|
|
reason=(
|
|
|
|
|
'Regressed to xfail after `wait_for_no_more_peers` '
|
|
|
|
|
'bound added ~3s teardown latency. Needs either '
|
|
|
|
|
'faster orphan-side teardown or 15s test poll window.'
|
|
|
|
|
),
|
|
|
|
|
)
|
2026-04-23 15:01:56 +00:00
|
|
|
@pytest.mark.timeout(
|
|
|
|
|
30,
|
|
|
|
|
method='thread',
|
|
|
|
|
)
|
Add DRAFT `subint_forkserver` orphan-SIGINT test
Tier-4 test `test_orphaned_subactor_sigint_cleanup_DRAFT`
documents an empirical SIGINT-delivery gap in the
`subint_forkserver` backend: when the parent dies via
`SIGKILL` (no IPC `Portal.cancel_actor()` possible) and
`SIGINT` is sent to the orphan child, the child DOES NOT
unwind — CPython's default `KeyboardInterrupt` is delivered
to `threading.main_thread()`, whose tstate is dead in the
post-fork child bc fork inherited the worker thread, not
main. Trio running on the fork-inherited worker thread
therefore never observes the signal. Marked
`xfail(strict=True)` so the mark flips to XPASS→fail once
the backend grows explicit SIGINT plumbing.
Deats,
- harness runs the failure-mode sequence out-of-process:
1. harness subprocess runs a fresh Python script
that calls `try_set_start_method('subint_forkserver')`
then opens a root actor + one `sleep_forever` subactor
2. parse `PARENT_READY=<pid>` + `CHILD_PID=<pid>` markers
off harness `stdout` to confirm IPC handshake
completed
3. `SIGKILL` the parent, `proc.wait()` to reap the
zombie (otherwise `os.kill(pid, 0)` keeps reporting
it alive)
4. assert the child survived the parent-reap (i.e. was
actually orphaned, not reaped too) before moving on
5. `SIGINT` the orphan child, poll `os.kill(child_pid, 0)`
every 100ms for up to 10s
- supporting helpers: `_read_marker()` with per-proc
bytes-buffer to carry partial lines across calls,
`_process_alive()` liveness probe via `kill(pid, 0)`
- Linux-only via `platform.system() != 'Linux'` skip —
orphan-reparenting semantics don't generalize to
other platforms
- port offset (`reg_addr[1] + 17`) so the harness listener
doesn't race concurrently-running backend tests
- best-effort `finally:` cleanup: `SIGKILL` any still-alive
pids + `proc.kill()` + bounded `proc.wait()` to avoid
leaking orphans across the session
Also, tier-4 header comment documents the cross-backend
generalization path: applicable to any multi-process
backend (`trio`, `mp_spawn`, `mp_forkserver`,
`subint_forkserver`), NOT to plain `subint` (in-process
subints have no orphan OS-child). Move path: lift
harness into `tests/_orphan_harness.py`, parametrize on
session `_spawn_method`, add
`skipif _spawn_method == 'subint'`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 23:39:41 +00:00
|
|
|
def test_orphaned_subactor_sigint_cleanup_DRAFT(
|
|
|
|
|
reg_addr: tuple[str, int | str],
|
2026-04-23 15:01:56 +00:00
|
|
|
tmp_path: Path,
|
Add DRAFT `subint_forkserver` orphan-SIGINT test
Tier-4 test `test_orphaned_subactor_sigint_cleanup_DRAFT`
documents an empirical SIGINT-delivery gap in the
`subint_forkserver` backend: when the parent dies via
`SIGKILL` (no IPC `Portal.cancel_actor()` possible) and
`SIGINT` is sent to the orphan child, the child DOES NOT
unwind — CPython's default `KeyboardInterrupt` is delivered
to `threading.main_thread()`, whose tstate is dead in the
post-fork child bc fork inherited the worker thread, not
main. Trio running on the fork-inherited worker thread
therefore never observes the signal. Marked
`xfail(strict=True)` so the mark flips to XPASS→fail once
the backend grows explicit SIGINT plumbing.
Deats,
- harness runs the failure-mode sequence out-of-process:
1. harness subprocess runs a fresh Python script
that calls `try_set_start_method('subint_forkserver')`
then opens a root actor + one `sleep_forever` subactor
2. parse `PARENT_READY=<pid>` + `CHILD_PID=<pid>` markers
off harness `stdout` to confirm IPC handshake
completed
3. `SIGKILL` the parent, `proc.wait()` to reap the
zombie (otherwise `os.kill(pid, 0)` keeps reporting
it alive)
4. assert the child survived the parent-reap (i.e. was
actually orphaned, not reaped too) before moving on
5. `SIGINT` the orphan child, poll `os.kill(child_pid, 0)`
every 100ms for up to 10s
- supporting helpers: `_read_marker()` with per-proc
bytes-buffer to carry partial lines across calls,
`_process_alive()` liveness probe via `kill(pid, 0)`
- Linux-only via `platform.system() != 'Linux'` skip —
orphan-reparenting semantics don't generalize to
other platforms
- port offset (`reg_addr[1] + 17`) so the harness listener
doesn't race concurrently-running backend tests
- best-effort `finally:` cleanup: `SIGKILL` any still-alive
pids + `proc.kill()` + bounded `proc.wait()` to avoid
leaking orphans across the session
Also, tier-4 header comment documents the cross-backend
generalization path: applicable to any multi-process
backend (`trio`, `mp_spawn`, `mp_forkserver`,
`subint_forkserver`), NOT to plain `subint` (in-process
subints have no orphan OS-child). Move path: lift
harness into `tests/_orphan_harness.py`, parametrize on
session `_spawn_method`, add
`skipif _spawn_method == 'subint'`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 23:39:41 +00:00
|
|
|
) -> None:
|
|
|
|
|
'''
|
|
|
|
|
DRAFT — orphaned-subactor SIGINT survivability under the
|
|
|
|
|
`subint_forkserver` backend.
|
|
|
|
|
|
|
|
|
|
Sequence:
|
|
|
|
|
1. Spawn a harness subprocess that brings up a root
|
|
|
|
|
actor + one `sleep_forever` subactor via
|
|
|
|
|
`subint_forkserver`.
|
|
|
|
|
2. Read the harness's stdout for `PARENT_READY=<pid>`
|
|
|
|
|
and `CHILD_PID=<pid>` markers (confirms the
|
|
|
|
|
parent→child IPC handshake completed).
|
|
|
|
|
3. `SIGKILL` the parent (no IPC cancel possible — the
|
|
|
|
|
whole point of this test).
|
|
|
|
|
4. `SIGINT` the orphan child.
|
|
|
|
|
5. Poll `os.kill(child_pid, 0)` for up to 10s — assert
|
|
|
|
|
the child exits.
|
|
|
|
|
|
Refine `subint_forkserver` orphan-SIGINT diagnosis
Empirical follow-up to the xfail'd orphan-SIGINT test:
the hang is **not** "trio can't install a handler on a
non-main thread" (the original hypothesis from the
`child_sigint` scaffold commit). On py3.14:
- `threading.current_thread() is threading.main_thread()`
IS True post-fork — CPython re-designates the
fork-inheriting thread as "main" correctly
- trio's `KIManager` SIGINT handler IS installed in the
subactor (`signal.getsignal(SIGINT)` confirms)
- the kernel DOES deliver SIGINT to the thread
But `faulthandler` dumps show the subactor wedged in
`trio/_core/_io_epoll.py::get_events` — trio's
wakeup-fd mechanism (which turns SIGINT into an epoll-wake)
isn't firing. So the `except KeyboardInterrupt` at
`tractor/spawn/_entry.py::_trio_main:164` — the runtime's
intentional "KBI-as-OS-cancel" path — never fires.
Deats,
- new `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
(+385 LOC): full writeup — TL;DR, symptom reproducer,
the "intentional cancel path" the bug defeats,
diagnostic evidence (`faulthandler` output +
`getsignal` probe), ruled-out hypotheses
(non-main-thread issue, wakeup-fd inheritance,
KBI-as-trio-check-exception), and fix directions
- `test_orphaned_subactor_sigint_cleanup_DRAFT` xfail
`reason` + test docstring rewritten to match the
refined understanding — old wording blamed the
non-main-thread path, new wording points at the
`epoll_wait` wedge + cross-refs the new conc-anal doc
- `_subint_forkserver` module docstring's
`child_sigint='trio'` bullet updated: now notes trio's
handler is already correctly installed, so the flag may
end up a no-op / doc-only mode once the real root cause
is fixed
Closing the gap aligns with existing design intent (make
the already-designed "KBI-as-OS-cancel" behavior actually
fire), not a new feature.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 13:31:32 +00:00
|
|
|
Empirical result (2026-04, py3.14): currently **FAILS** —
|
|
|
|
|
SIGINT on the orphan child doesn't unwind the trio loop,
|
|
|
|
|
despite trio's `KIManager` handler being correctly
|
|
|
|
|
installed in the subactor (the post-fork thread IS
|
|
|
|
|
`threading.main_thread()` on py3.14). `faulthandler` dump
|
|
|
|
|
shows the subactor wedged in `trio/_core/_io_epoll.py::
|
|
|
|
|
get_events` — the signal's supposed wakeup of the event
|
|
|
|
|
loop isn't firing. Full analysis + diagnostic evidence
|
|
|
|
|
in `ai/conc-anal/
|
|
|
|
|
subint_forkserver_orphan_sigint_hang_issue.md`.
|
|
|
|
|
|
|
|
|
|
The runtime's *intentional* "KBI-as-OS-cancel" path at
|
|
|
|
|
`tractor/spawn/_entry.py::_trio_main:164` is therefore
|
|
|
|
|
unreachable under this backend+config. Closing the gap is
|
|
|
|
|
aligned with existing design intent (make the already-
|
|
|
|
|
designed behavior actually fire), not a new feature.
|
|
|
|
|
Marked `xfail(strict=True)` so the
|
Add DRAFT `subint_forkserver` orphan-SIGINT test
Tier-4 test `test_orphaned_subactor_sigint_cleanup_DRAFT`
documents an empirical SIGINT-delivery gap in the
`subint_forkserver` backend: when the parent dies via
`SIGKILL` (no IPC `Portal.cancel_actor()` possible) and
`SIGINT` is sent to the orphan child, the child DOES NOT
unwind — CPython's default `KeyboardInterrupt` is delivered
to `threading.main_thread()`, whose tstate is dead in the
post-fork child bc fork inherited the worker thread, not
main. Trio running on the fork-inherited worker thread
therefore never observes the signal. Marked
`xfail(strict=True)` so the mark flips to XPASS→fail once
the backend grows explicit SIGINT plumbing.
Deats,
- harness runs the failure-mode sequence out-of-process:
1. harness subprocess runs a fresh Python script
that calls `try_set_start_method('subint_forkserver')`
then opens a root actor + one `sleep_forever` subactor
2. parse `PARENT_READY=<pid>` + `CHILD_PID=<pid>` markers
off harness `stdout` to confirm IPC handshake
completed
3. `SIGKILL` the parent, `proc.wait()` to reap the
zombie (otherwise `os.kill(pid, 0)` keeps reporting
it alive)
4. assert the child survived the parent-reap (i.e. was
actually orphaned, not reaped too) before moving on
5. `SIGINT` the orphan child, poll `os.kill(child_pid, 0)`
every 100ms for up to 10s
- supporting helpers: `_read_marker()` with per-proc
bytes-buffer to carry partial lines across calls,
`_process_alive()` liveness probe via `kill(pid, 0)`
- Linux-only via `platform.system() != 'Linux'` skip —
orphan-reparenting semantics don't generalize to
other platforms
- port offset (`reg_addr[1] + 17`) so the harness listener
doesn't race concurrently-running backend tests
- best-effort `finally:` cleanup: `SIGKILL` any still-alive
pids + `proc.kill()` + bounded `proc.wait()` to avoid
leaking orphans across the session
Also, tier-4 header comment documents the cross-backend
generalization path: applicable to any multi-process
backend (`trio`, `mp_spawn`, `mp_forkserver`,
`subint_forkserver`), NOT to plain `subint` (in-process
subints have no orphan OS-child). Move path: lift
harness into `tests/_orphan_harness.py`, parametrize on
session `_spawn_method`, add
`skipif _spawn_method == 'subint'`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 23:39:41 +00:00
|
|
|
mark flips to XPASS→fail once the gap is closed and we'll
|
|
|
|
|
know to drop the mark.
|
|
|
|
|
|
|
|
|
|
'''
|
|
|
|
|
if platform.system() != 'Linux':
|
|
|
|
|
pytest.skip(
|
|
|
|
|
'orphan-reparenting semantics only exercised on Linux'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
script_path = tmp_path / '_orphan_harness.py'
|
|
|
|
|
script_path.write_text(_ORPHAN_HARNESS_SCRIPT)
|
|
|
|
|
|
|
|
|
|
# Offset the port so we don't race the session reg_addr with
|
|
|
|
|
# any concurrently-running backend test's listener.
|
|
|
|
|
host: str = reg_addr[0]
|
|
|
|
|
port: int = int(reg_addr[1]) + 17
|
|
|
|
|
|
|
|
|
|
proc: subprocess.Popen = subprocess.Popen(
|
|
|
|
|
[
|
|
|
|
|
sys.executable,
|
|
|
|
|
str(script_path),
|
|
|
|
|
'subint_forkserver',
|
|
|
|
|
host,
|
|
|
|
|
str(port),
|
|
|
|
|
],
|
|
|
|
|
stdout=subprocess.PIPE,
|
|
|
|
|
stderr=subprocess.STDOUT,
|
|
|
|
|
)
|
|
|
|
|
parent_pid: int | None = None
|
|
|
|
|
child_pid: int | None = None
|
|
|
|
|
buf: dict = {}
|
|
|
|
|
try:
|
|
|
|
|
child_pid = int(_read_marker(proc, 'CHILD_PID', 15.0, buf))
|
|
|
|
|
parent_pid = int(_read_marker(proc, 'PARENT_READY', 15.0, buf))
|
|
|
|
|
|
|
|
|
|
# sanity: both alive before we start killing stuff
|
|
|
|
|
assert _process_alive(parent_pid), (
|
|
|
|
|
f'harness parent pid={parent_pid} gone before '
|
|
|
|
|
f'SIGKILL — test premise broken'
|
|
|
|
|
)
|
|
|
|
|
assert _process_alive(child_pid), (
|
|
|
|
|
f'orphan-candidate child pid={child_pid} gone '
|
|
|
|
|
f'before test started'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# step 3: kill parent — no IPC cancel arrives at child.
|
|
|
|
|
# `proc.wait()` reaps the zombie so it truly disappears
|
|
|
|
|
# from the process table (otherwise `os.kill(pid, 0)`
|
|
|
|
|
# keeps reporting it as alive).
|
|
|
|
|
os.kill(parent_pid, signal.SIGKILL)
|
|
|
|
|
try:
|
|
|
|
|
proc.wait(timeout=3.0)
|
|
|
|
|
except subprocess.TimeoutExpired:
|
|
|
|
|
pytest.fail(
|
|
|
|
|
f'harness parent pid={parent_pid} did not die '
|
|
|
|
|
f'after SIGKILL — test premise broken'
|
|
|
|
|
)
|
|
|
|
|
assert _process_alive(child_pid), (
|
|
|
|
|
f'child pid={child_pid} died along with parent — '
|
|
|
|
|
f'did the parent reap it before SIGKILL took? '
|
|
|
|
|
f'test premise requires an orphan.'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# step 4+5: SIGINT the orphan, poll for exit.
|
|
|
|
|
os.kill(child_pid, signal.SIGINT)
|
2026-04-23 15:01:56 +00:00
|
|
|
timeout: float = 6.0
|
|
|
|
|
cleanup_deadline: float = time.monotonic() + timeout
|
Add DRAFT `subint_forkserver` orphan-SIGINT test
Tier-4 test `test_orphaned_subactor_sigint_cleanup_DRAFT`
documents an empirical SIGINT-delivery gap in the
`subint_forkserver` backend: when the parent dies via
`SIGKILL` (no IPC `Portal.cancel_actor()` possible) and
`SIGINT` is sent to the orphan child, the child DOES NOT
unwind — CPython's default `KeyboardInterrupt` is delivered
to `threading.main_thread()`, whose tstate is dead in the
post-fork child bc fork inherited the worker thread, not
main. Trio running on the fork-inherited worker thread
therefore never observes the signal. Marked
`xfail(strict=True)` so the mark flips to XPASS→fail once
the backend grows explicit SIGINT plumbing.
Deats,
- harness runs the failure-mode sequence out-of-process:
1. harness subprocess runs a fresh Python script
that calls `try_set_start_method('subint_forkserver')`
then opens a root actor + one `sleep_forever` subactor
2. parse `PARENT_READY=<pid>` + `CHILD_PID=<pid>` markers
off harness `stdout` to confirm IPC handshake
completed
3. `SIGKILL` the parent, `proc.wait()` to reap the
zombie (otherwise `os.kill(pid, 0)` keeps reporting
it alive)
4. assert the child survived the parent-reap (i.e. was
actually orphaned, not reaped too) before moving on
5. `SIGINT` the orphan child, poll `os.kill(child_pid, 0)`
every 100ms for up to 10s
- supporting helpers: `_read_marker()` with per-proc
bytes-buffer to carry partial lines across calls,
`_process_alive()` liveness probe via `kill(pid, 0)`
- Linux-only via `platform.system() != 'Linux'` skip —
orphan-reparenting semantics don't generalize to
other platforms
- port offset (`reg_addr[1] + 17`) so the harness listener
doesn't race concurrently-running backend tests
- best-effort `finally:` cleanup: `SIGKILL` any still-alive
pids + `proc.kill()` + bounded `proc.wait()` to avoid
leaking orphans across the session
Also, tier-4 header comment documents the cross-backend
generalization path: applicable to any multi-process
backend (`trio`, `mp_spawn`, `mp_forkserver`,
`subint_forkserver`), NOT to plain `subint` (in-process
subints have no orphan OS-child). Move path: lift
harness into `tests/_orphan_harness.py`, parametrize on
session `_spawn_method`, add
`skipif _spawn_method == 'subint'`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 23:39:41 +00:00
|
|
|
while time.monotonic() < cleanup_deadline:
|
|
|
|
|
if not _process_alive(child_pid):
|
|
|
|
|
return # <- success path
|
|
|
|
|
time.sleep(0.1)
|
|
|
|
|
|
|
|
|
|
pytest.fail(
|
|
|
|
|
f'Orphan subactor (pid={child_pid}) did NOT exit '
|
|
|
|
|
f'within 10s of SIGINT under `subint_forkserver` '
|
|
|
|
|
f'→ trio on non-main thread did not observe the '
|
|
|
|
|
f'default CPython KeyboardInterrupt; backend needs '
|
|
|
|
|
f'explicit SIGINT plumbing.'
|
|
|
|
|
)
|
|
|
|
|
finally:
|
|
|
|
|
# best-effort cleanup to avoid leaking orphans across
|
|
|
|
|
# the test session regardless of outcome.
|
|
|
|
|
for pid in (parent_pid, child_pid):
|
|
|
|
|
if pid is None:
|
|
|
|
|
continue
|
|
|
|
|
try:
|
|
|
|
|
os.kill(pid, signal.SIGKILL)
|
|
|
|
|
except ProcessLookupError:
|
|
|
|
|
pass
|
|
|
|
|
try:
|
|
|
|
|
proc.kill()
|
|
|
|
|
except OSError:
|
|
|
|
|
pass
|
|
|
|
|
try:
|
|
|
|
|
proc.wait(timeout=2.0)
|
|
|
|
|
except subprocess.TimeoutExpired:
|
|
|
|
|
pass
|