tractor/ai/prompt-io/claude/20260422T200723Z_797f57c_pr...

156 lines
6.5 KiB
Markdown
Raw Normal View History

Add CPython-level `subint_fork` workaround smoketest Standalone script to validate the "main-interp worker-thread forkserver + subint-hosted trio" arch proposed as a workaround to the CPython-level refusal doc'd in `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`. Deliberately NOT a `tractor` test — zero `tractor` imports. Uses `_interpreters` (private stdlib) + `os.fork()` directly so pass/fail is a property of CPython alone, independent of our runtime. Requires py3.14+. Deats, - four scenarios via `--scenario`: - `control_subint_thread_fork` — the KNOWN-BROKEN case as a harness sanity; if the child DOESN'T abort, our analysis is wrong - `main_thread_fork` — baseline sanity, must always succeed - `worker_thread_fork` — architectural assertion: regular `threading.Thread` attached to main interp calls `os.fork()`; child should survive post-fork cleanup - `full_architecture` — end-to-end: fork from a main-interp worker thread, then in child create a subint driving a worker thread running `trio.run()` - exit code 0 on EXPECTED outcome (for `control_*` that means "child aborted", not "child succeeded") - each scenario prints a self-contained pass/fail banner; use `os.waitpid()` of the parent + per-scenario status prints to observe the child's fate Also, log NLNet provenance for this session's three-sub-phase work (py3.13 gate tightening, `pytest-timeout` + marker refactor, `subint_fork` prototype → CPython-block finding). Prompt-IO: ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.md (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 20:40:52 +00:00
---
model: claude-opus-4-7[1m]
service: claude
session: subints-phase-b-hardening-and-fork-block
timestamp: 2026-04-22T20:07:23Z
git_ref: 797f57c
scope: code
substantive: true
raw_file: 20260422T200723Z_797f57c_prompt_io.raw.md
---
## Prompt
Session-spanning work on the Phase B `subint` spawn-backend.
Three distinct sub-phases in one log:
1. **Py3.13 gate tightening** — diagnose a reproducible hang
of subint spawn flow under py3.13 (works on py3.14), trace
to a private `_interpreters` module vintage issue, tighten
our feature gate from "`_interpreters` present" to "public
`concurrent.interpreters` present" (i.e. py3.14+).
2. **Test-harness hardening** — add `pytest-timeout` dep, put
`@pytest.mark.timeout(30, method='thread')` on the
three known-hanging subint tests cataloged in
`ai/conc-anal/subint_sigint_starvation_issue.md`. Separately,
code-review the user's in-flight `skipon_spawn_backend`
marker implementation; find four bugs; refactor to use
`item.iter_markers()`.
3. **`subint_fork` prototype → CPython-block finding** — draft
a WIP `subint_fork_proc` backend using a sub-interpreter as
a launchpad for `os.fork()` (to sidestep trio#1614). User
tests on py3.14, hits
`Fatal Python error: _PyInterpreterState_DeleteExceptMain:
not main interpreter`. Walk CPython sources (local clone at
`~/repos/cpython/`) to pinpoint the refusal
(`Modules/posixmodule.c:728` → `Python/pystate.c:1040`).
Revert implementation to a `NotImplementedError` stub in a
new `_subint_fork.py` submodule, document the finding in a
third `conc-anal/` doc with an upstream-report draft for
the CPython issue tracker. Finally, discuss user's proposed
workaround architecture (main-interp worker-thread
forkserver) and draft a standalone smoke-test script for
feasibility validation.
## Response summary
All three sub-phases landed concrete artifacts:
**Sub-phase 1** — `_subint.py` + `_spawn.py` gates + error
messages updated to require py3.14+ via the public
`concurrent.interpreters` module presence check. Module
docstring revised to explain the empirical reason
(py3.13's private `_interpreters` vintage wedges under
multi-trio-task usage even though minimal standalone
reproducers work fine there). Test-module
`pytest.importorskip` likewise switched.
**Sub-phase 2** — `pytest-timeout>=2.3` added to `testing`
dep group. `@pytest.mark.timeout(30, method='thread')`
applied on:
- `tests/discovery/test_registrar.py::test_stale_entry_is_deleted`
- `tests/test_cancellation.py::test_cancel_while_childs_child_in_sync_sleep`
- `tests/test_cancellation.py::test_multierror_fast_nursery`
- `tests/test_subint_cancellation.py::test_subint_non_checkpointing_child`
`method='thread'` documented inline as load-bearing — the
GIL-starvation path that drops `SIGINT` would equally drop
`SIGALRM`, so only a watchdog-thread timeout can reliably
escape.
`skipon_spawn_backend` plugin refactored into a single
`iter_markers`-driven loop in `pytest_collection_modifyitems`
(~30 LOC replacing ~30 LOC of nested conditionals). Four
bugs dissolved: wrong `.get()` key, module-level `pytestmark`
suppressing per-test marks, unhandled `pytestmark = [list]`
form, `pytest.Makr` typo. Marker help text updated to
document the variadic backend-list + `reason=` kwarg
surface.
**Sub-phase 3** — Prototype drafted (then reverted):
- `tractor/spawn/_subint_fork.py` — new dedicated submodule
housing the `subint_fork_proc` stub. Module docstring +
fn docstring explain the attempt, the CPython-level
block, and the reason for keeping the stub in-tree
(documentation of the attempt + starting point if CPython
ever lifts the restriction).
- `tractor/spawn/_spawn.py``'subint_fork'` registered as a
`SpawnMethodKey` literal + in `_methods`, so
`--spawn-backend=subint_fork` routes to a clean
`NotImplementedError` pointing at the analysis doc rather
than an "invalid backend" error.
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
third sibling conc-anal doc. Full annotated CPython
source walkthrough from user-visible
`Fatal Python error` → `Modules/posixmodule.c:728
PyOS_AfterFork_Child()` → `Python/pystate.c:1040
_PyInterpreterState_DeleteExceptMain()` gate. Includes a
copy-paste-ready upstream-report draft for the CPython
issue tracker with a two-tier ask (ideally "make it work",
minimally "cleaner error than `Fatal Python error`
aborting the child").
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
standalone zero-tractor-import CPython-level smoke test
for the user's proposed workaround architecture
(forkserver on a main-interp worker thread). Four
argparse-driven scenarios: `control_subint_thread_fork`
(reproduces the known-broken case as a test-harness
sanity), `main_thread_fork` (baseline), `worker_thread_fork`
(architectural assertion), `full_architecture`
(end-to-end trio-in-subint in forked child). User will
run on py3.14 next.
## Files changed
See `git log 26fb820..HEAD --stat` for the canonical list.
New files this session:
- `tractor/spawn/_subint_fork.py`
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
Modified (diff pointers in raw log):
- `tractor/spawn/_subint.py` (py3.14 gate)
- `tractor/spawn/_spawn.py` (`subint_fork` registration)
- `tractor/_testing/pytest.py` (`skipon_spawn_backend` refactor)
- `pyproject.toml` (`pytest-timeout` dep)
- `tests/discovery/test_registrar.py`,
`tests/test_cancellation.py`,
`tests/test_subint_cancellation.py` (timeout marks,
cross-refs to conc-anal docs)
## Human edits
Several back-and-forth iterations with user-driven
adjustments during the session:
- User corrected my initial mis-classification of
`test_cancel_while_childs_child_in_sync_sleep[subint-False]`
as Ctrl-C-able — second strace showed `EAGAIN`, putting
it squarely in class A (GIL-starvation). Re-analysis
preserved in the raw log.
- User independently fixed the `.get(reason)``.get('reason', reason)`
bug in the marker plugin before my review; preserved their
fix.
- User suggested moving the `subint_fork_proc` stub from
the bottom of `_subint.py` into its own
`_subint_fork.py` submodule — applied.
- User asked to keep the forkserver-architecture
discussion as background for the smoke-test rather than
committing to a tractor-side refactor until the smoke
test validates the CPython-level assumptions.
Commit messages in this range (b025c982 … 797f57c) were
drafted via `/commit-msg` + `rewrap.py --width 67`; user
landed them with the usual review.