Commit Graph

3 Commits (76605d5609580a7d033093d70818f5358d8b5635)

Author SHA1 Message Date
Gud Boi 76605d5609 Add DRAFT `subint_forkserver` orphan-SIGINT test
Tier-4 test `test_orphaned_subactor_sigint_cleanup_DRAFT`
documents an empirical SIGINT-delivery gap in the
`subint_forkserver` backend: when the parent dies via
`SIGKILL` (no IPC `Portal.cancel_actor()` possible) and
`SIGINT` is sent to the orphan child, the child DOES NOT
unwind — CPython's default `KeyboardInterrupt` is delivered
to `threading.main_thread()`, whose tstate is dead in the
post-fork child bc fork inherited the worker thread, not
main. Trio running on the fork-inherited worker thread
therefore never observes the signal. Marked
`xfail(strict=True)` so the mark flips to XPASS→fail once
the backend grows explicit SIGINT plumbing.

Deats,
- harness runs the failure-mode sequence out-of-process:
  1. harness subprocess runs a fresh Python script
     that calls `try_set_start_method('subint_forkserver')`
     then opens a root actor + one `sleep_forever` subactor
  2. parse `PARENT_READY=<pid>` + `CHILD_PID=<pid>` markers
     off harness `stdout` to confirm IPC handshake
     completed
  3. `SIGKILL` the parent, `proc.wait()` to reap the
     zombie (otherwise `os.kill(pid, 0)` keeps reporting
     it alive)
  4. assert the child survived the parent-reap (i.e. was
     actually orphaned, not reaped too) before moving on
  5. `SIGINT` the orphan child, poll `os.kill(child_pid, 0)`
     every 100ms for up to 10s
- supporting helpers: `_read_marker()` with per-proc
  bytes-buffer to carry partial lines across calls,
  `_process_alive()` liveness probe via `kill(pid, 0)`
- Linux-only via `platform.system() != 'Linux'` skip —
  orphan-reparenting semantics don't generalize to
  other platforms
- port offset (`reg_addr[1] + 17`) so the harness listener
  doesn't race concurrently-running backend tests
- best-effort `finally:` cleanup: `SIGKILL` any still-alive
  pids + `proc.kill()` + bounded `proc.wait()` to avoid
  leaking orphans across the session

Also, tier-4 header comment documents the cross-backend
generalization path: applicable to any multi-process
backend (`trio`, `mp_spawn`, `mp_forkserver`,
`subint_forkserver`), NOT to plain `subint` (in-process
subints have no orphan OS-child). Move path: lift
harness into `tests/_orphan_harness.py`, parametrize on
session `_spawn_method`, add
`skipif _spawn_method == 'subint'`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 26914fde75 Wire `subint_forkserver` as first-class backend
Promote `_subint_forkserver` from primitives-only into a
registered spawn backend: `'subint_forkserver'` is now a
`SpawnMethodKey` literal, dispatched via `_methods` to
the new `subint_forkserver_proc()` target, feature-gated
under the existing `subint`-family py3.14+ case, and
selectable via `--spawn-backend=subint_forkserver`.

Deats,
- new `subint_forkserver_proc()` spawn target in
  `_subint_forkserver`:
  - mirrors `trio_proc()`'s supervision model — real OS
    subprocess so `Portal.cancel_actor()` + `soft_kill()`
    on graceful teardown, `os.kill(SIGKILL)` on hard-reap
    (no `_interpreters.destroy()` race to fuss over bc the
    child lives in its own process)
  - only real diff from `trio_proc` is the spawn mechanism:
    fork from a main-interp worker thread via
    `fork_from_worker_thread()` (off-loaded to trio's
    thread pool) instead of `trio.lowlevel.open_process()`
  - child-side `_child_target` closure runs
    `tractor._child._actor_child_main()` with
    `spawn_method='trio'` — the child is a regular trio
    actor, "subint_forkserver" names how the parent
    spawned, not what the child runs
- new `_ForkedProc` class — thin `trio.Process`-compatible
  shim around a raw OS pid: `.poll()` via
  `waitpid(WNOHANG)`, async `.wait()` off-loaded to a trio
  cache thread, `.kill()` via `SIGKILL`, `.returncode`
  cached for repeat calls. `.stdin`/`.stdout`/`.stderr`
  are `None` (fork-w/o-exec inherits parent FDs; we don't
  marshal them) which matches `soft_kill()`'s `is not None`
  guards

Also, new backend-tier test
`test_subint_forkserver_spawn_basic` drives the registered
backend end-to-end via `open_root_actor` + `open_nursery` +
`run_in_actor` w/ a trivial portal-RPC round-trip. Uses a
`forkserver_spawn_method` fixture to flip
`_spawn_method`/`_ctx` for the test's duration + restore on
teardown (so other session-level tests don't observe the
global flip). Test module docstring reworked to describe
the three tiers now covered: (1) primitive-level, (2)
parent-trio-driven primitives, (3) full registered backend.

Status: still-open work (tracked on `tractor#379`) doc'd
inline in the module docstring — no cancel/hard-kill stress
coverage yet, child-side subint-hosted root runtime still
future (gated on `msgspec#563`), thread-hygiene audit
pending the same unblock.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 25e400d526 Add trio-parent tests for `_subint_forkserver`
New pytest module `tests/spawn/test_subint_forkserver.py`
drives the forkserver primitives from inside a real
`trio.run()` in the parent — the runtime shape tractor will
actually use when we wire up a `subint_forkserver` spawn
backend proper. Complements the standalone no-trio-in-parent
`ai/conc-anal/subint_fork_from_main_thread_smoketest.py`.

Deats,
- new test pkg `tests/spawn/` (+ empty `__init__.py`)
- two tests, both `@pytest.mark.timeout(30, method='thread')`
  for the GIL-hostage safety reason doc'd in
  `ai/conc-anal/subint_sigint_starvation_issue.md`:
  - `test_fork_from_worker_thread_via_trio` — parent-side
    plumbing baseline. `trio.run()` off-loads forkserver
    prims via `trio.to_thread.run_sync()` + asserts the
    child reaps cleanly
  - `test_fork_and_run_trio_in_child` — end-to-end: forked
    child calls `run_subint_in_worker_thread()` with a
    bootstrap str that does `trio.run()` in a fresh subint
- both tests wrap the inner `trio.run()` in a
  `dump_on_hang()` for post-mortem if the outer
  `pytest-timeout` fires
- intentionally NOT using `--spawn-backend` — the tests
  drive the primitives directly rather than going through
  tractor's spawn-method registry (which the forkserver
  isn't plugged into yet)

Also, rename `run_trio_in_subint()` →
`run_subint_in_worker_thread()` for naming consistency with
the sibling `fork_from_worker_thread()`. The action is really
"host a subint on a worker thread", not specifically "run
trio" — trio just happens to be the typical payload.
Propagate the rename to the smoketest.

Further, add a "TODO — cleanup gated on msgspec PEP 684
support" section to the `_subint_forkserver` module
docstring: flags the dedicated-`threading.Thread` design as
potentially-revisable once isolated-mode subints are viable
in tractor. Cross-refs `msgspec#563` + `tractor#379` and
points at an audit-plan conc-anal doc we'll add next.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00