112 lines
3.8 KiB
Markdown
112 lines
3.8 KiB
Markdown
|
|
---
|
||
|
|
model: claude-opus-4-7[1m]
|
||
|
|
service: claude
|
||
|
|
session: subint-phase-b-hang-classification
|
||
|
|
timestamp: 2026-04-20T19:27:39Z
|
||
|
|
git_ref: HEAD (pre-commit; on branch `subint_spawner_backend`)
|
||
|
|
scope: docs
|
||
|
|
substantive: true
|
||
|
|
raw_file: 20260420T192739Z_5e8cd8b2_prompt_io.raw.md
|
||
|
|
---
|
||
|
|
|
||
|
|
## Prompt
|
||
|
|
|
||
|
|
Collab between user (`goodboy`) and `claude` to
|
||
|
|
disambiguate two distinct hang modes hit during
|
||
|
|
Phase B subint-spawn-backend bringup (issue #379).
|
||
|
|
The user ran the failing suites, captured `strace`
|
||
|
|
evidence on hung pytest pids, and set the framing:
|
||
|
|
|
||
|
|
> "These cannot be the same bug. Different fix
|
||
|
|
> paths. Write them up separately or we'll keep
|
||
|
|
> conflating them."
|
||
|
|
|
||
|
|
Follow-on asks:
|
||
|
|
- Cross-link each doc from its reproducer test so
|
||
|
|
a future triager lands on the analysis without
|
||
|
|
needing to grep `ai/conc-anal/`.
|
||
|
|
- On `test_stale_entry_is_deleted`: arm
|
||
|
|
`dump_on_hang(seconds=20, path=...)` so a
|
||
|
|
regression captures a stack dump (pytest stderr
|
||
|
|
capture otherwise eats `faulthandler` output).
|
||
|
|
- Keep `test_stale_entry_is_deleted` un-`skip`ped
|
||
|
|
so the dump file remains inspectable.
|
||
|
|
|
||
|
|
Defer the actual fix for the class-2 hang
|
||
|
|
(Ctrl-C-able, tractor-side bug) to a later commit
|
||
|
|
per the incremental-commits preference.
|
||
|
|
|
||
|
|
## Response summary
|
||
|
|
|
||
|
|
Wrote two new `ai/conc-anal/` classification docs
|
||
|
|
and cross-linked them from the reproducer tests:
|
||
|
|
|
||
|
|
1. **`subint_sigint_starvation_issue.md`** —
|
||
|
|
class 1: abandoned-legacy-subint thread +
|
||
|
|
shared GIL starves main trio loop →
|
||
|
|
signal-wakeup-fd pipe fills → SIGINT silently
|
||
|
|
dropped (`write() = EAGAIN`). Pytest process
|
||
|
|
un-Ctrl-C-able. Structurally a CPython limit;
|
||
|
|
blocked on `msgspec` PEP 684 support
|
||
|
|
(jcrist/msgspec#563). Reproducer:
|
||
|
|
`test_stale_entry_is_deleted[subint]`.
|
||
|
|
|
||
|
|
2. **`subint_cancel_delivery_hang_issue.md`** —
|
||
|
|
class 2: parent-side trio task parks on an
|
||
|
|
orphaned IPC channel after subint teardown;
|
||
|
|
no clean EOF delivered to waiting receiver.
|
||
|
|
Ctrl-C-able (main trio loop iterating fine).
|
||
|
|
OUR bug to fix. Candidate fix: explicit
|
||
|
|
parent-side channel abort in `subint_proc`'s
|
||
|
|
hard-kill teardown. Reproducer:
|
||
|
|
`test_subint_non_checkpointing_child`.
|
||
|
|
|
||
|
|
Test-side cross-links:
|
||
|
|
- `tests/discovery/test_registrar.py`:
|
||
|
|
`test_stale_entry_is_deleted` → `trio.run(main)`
|
||
|
|
wrapped in `dump_on_hang(seconds=20,
|
||
|
|
path=<per-method-tmp>)`; long inline comment
|
||
|
|
summarizes `strace` evidence + root-cause chain
|
||
|
|
and points at both docs.
|
||
|
|
- `tests/test_subint_cancellation.py`:
|
||
|
|
`test_subint_non_checkpointing_child` docstring
|
||
|
|
extended with "KNOWN ISSUE (Ctrl-C-able hang)"
|
||
|
|
section pointing at the class-2 doc + noting
|
||
|
|
the class-1 doc is NOT what this test hits.
|
||
|
|
|
||
|
|
## Files changed
|
||
|
|
|
||
|
|
- `ai/conc-anal/subint_sigint_starvation_issue.md`
|
||
|
|
— new, 205 LOC
|
||
|
|
- `ai/conc-anal/subint_cancel_delivery_hang_issue.md`
|
||
|
|
— new, 161 LOC
|
||
|
|
- `tests/discovery/test_registrar.py` — +52/-1
|
||
|
|
(arm `dump_on_hang`, inline-comment cross-link)
|
||
|
|
- `tests/test_subint_cancellation.py` — +26
|
||
|
|
(docstring "KNOWN ISSUE" block)
|
||
|
|
|
||
|
|
## Human edits
|
||
|
|
|
||
|
|
Substantive collab — prose was jointly iterated:
|
||
|
|
|
||
|
|
- User framed the two-doc split, set the
|
||
|
|
classification criteria (Ctrl-C-able vs not),
|
||
|
|
and provided the `strace` evidence.
|
||
|
|
- User decided to keep `test_stale_entry_is_deleted`
|
||
|
|
un-`skip`ped (my initial suggestion was
|
||
|
|
`pytestmark.skipif(spawn_backend=='subint')`).
|
||
|
|
- User chose the candidate fix ordering for
|
||
|
|
class 2 and marked "explicit parent-side channel
|
||
|
|
abort" as the surgical preferred fix.
|
||
|
|
- User picked the file naming convention
|
||
|
|
(`subint_<hang-shape>_issue.md`) over my initial
|
||
|
|
`hang_class_{1,2}.md`.
|
||
|
|
- Assistant drafted the prose, aggregated prior-
|
||
|
|
session root-cause findings from Phase B.2/B.3
|
||
|
|
bringup, and wrote the test-side cross-linking
|
||
|
|
comments.
|
||
|
|
|
||
|
|
No further mechanical edits expected before
|
||
|
|
commit; user may still rewrap via
|
||
|
|
`scripts/rewrap.py` if preferred.
|