Compare commits
No commits in common. "f42dfe50a9bb679b56f4f11fc5fe43cf182c4801" and "3a45dbd5039aa4559ea38293952415332b795ac9" have entirely different histories.
f42dfe50a9
...
3a45dbd503
|
|
@ -1,102 +0,0 @@
|
||||||
# `trio` 0.29 -> 0.33 slows the depth=3 cancel-cascade
|
|
||||||
|
|
||||||
## Symptom
|
|
||||||
|
|
||||||
After locking to `trio==0.33.0` (commit `c7741bba`, was
|
|
||||||
`0.29.0`), this test reliably trips its `fail_after`
|
|
||||||
deadline on the **`trio`** backend:
|
|
||||||
|
|
||||||
```
|
|
||||||
FAILED tests/test_cancellation.py::test_nested_multierrors[start_method=trio-depth=3]
|
|
||||||
- AssertionError: assert False
|
|
||||||
where False = isinstance(
|
|
||||||
Cancelled(source='deadline', source_task=None, reason=None),
|
|
||||||
tractor.RemoteActorError,
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
A `fail_after_w_trace` hang-snapshot is captured for the
|
|
||||||
test each run (deadline-injected `Cancelled` wrapped into
|
|
||||||
the actor-nursery `BaseExceptionGroup`).
|
|
||||||
|
|
||||||
## Root cause (immediate)
|
|
||||||
|
|
||||||
The test budgets `fail_after(6)` for the `trio` backend.
|
|
||||||
That 6s was chosen (commit `32955db0`, while `trio==0.29`)
|
|
||||||
with the assertion that trio finishes "well under" 6s.
|
|
||||||
The `trio` 0.29 -> 0.33 bump slowed the depth=3 cascade
|
|
||||||
past that budget, so the 6s deadline now fires mid-cascade.
|
|
||||||
|
|
||||||
trio 0.33 added **cancel-reason tracking** — every
|
|
||||||
`Cancelled` now carries `(source=, reason=, source_task=)`.
|
|
||||||
The injected exc is `Cancelled(source='deadline')`, i.e.
|
|
||||||
trio itself naming our `fail_after(6)` scope as the cancel
|
|
||||||
origin. When that `Cancelled` collapses one branch of the
|
|
||||||
nursery BEG, the test's `isinstance(subexc,
|
|
||||||
RemoteActorError)` assertion fails. The healthy outcome is
|
|
||||||
`BEG = [RemoteActorError, RemoteActorError]`; the
|
|
||||||
`Cancelled` is purely an artifact of the deadline cutting
|
|
||||||
the cascade short.
|
|
||||||
|
|
||||||
## Measurements (standalone, this machine)
|
|
||||||
|
|
||||||
```
|
|
||||||
depth=1 trio ~3.15s PASS (keeps 6s budget)
|
|
||||||
depth=3 trio ~6.8-8.2s FAIL @ 6s (now bumped to 12s)
|
|
||||||
```
|
|
||||||
|
|
||||||
depth=1 still fits comfortably; only depth=3 (deeper
|
|
||||||
recursive spawn-and-error tree => more actors to reap)
|
|
||||||
exceeds the old budget. The ~2s/depth-level cost looks
|
|
||||||
like serialized per-actor reap / `terminate_after` waits.
|
|
||||||
|
|
||||||
## Mitigation applied
|
|
||||||
|
|
||||||
`test_nested_multierrors` now splits the `trio` budget:
|
|
||||||
|
|
||||||
```python
|
|
||||||
case ('trio', 1):
|
|
||||||
timeout = 6
|
|
||||||
case ('trio', 3):
|
|
||||||
timeout = 12 # was 6; see this doc
|
|
||||||
```
|
|
||||||
|
|
||||||
This stops the deadline from firing so the cascade
|
|
||||||
completes naturally to `[RAE, RAE]`.
|
|
||||||
|
|
||||||
## Also affected — same root cause, different test
|
|
||||||
|
|
||||||
`test_echoserver_detailed_mechanics[trio-raise_error=KeyboardInterrupt]`
|
|
||||||
(`tests/test_infected_asyncio.py`) tripped the *same*
|
|
||||||
slowdown via its much tighter `trio` budget of `1s`. The
|
|
||||||
single-aio-subactor teardown now takes ~1s, so the `1s`
|
|
||||||
`fail_after` raced the deadline (PASS at 0.99s / FAIL at
|
|
||||||
1.03s across back-to-back standalone runs). On a deadline-
|
|
||||||
fire the injected `Cancelled(source='deadline')` wraps the
|
|
||||||
mid-stream `KeyboardInterrupt` into a `BaseExceptionGroup`,
|
|
||||||
which is NOT a `KeyboardInterrupt` so the bare
|
|
||||||
`pytest.raises(KeyboardInterrupt)` fails. (The sibling
|
|
||||||
`raise_error=Exception` variant only "passes" by accident:
|
|
||||||
an `ExceptionGroup` *is-a* `Exception`, so its
|
|
||||||
`pytest.raises(Exception)` still matches even when wrapped.)
|
|
||||||
|
|
||||||
Mitigation: bump that `trio` budget `1 -> 4s` (matching the
|
|
||||||
forking-spawner case). Without a deadline-fire the KBI
|
|
||||||
propagates bare and the assertion passes.
|
|
||||||
|
|
||||||
## Open follow-up (the actual regression)
|
|
||||||
|
|
||||||
The budget bump is a band-aid — the underlying question is
|
|
||||||
**why** the depth=3 `trio` cancel-cascade went from <6s to
|
|
||||||
~7-8s across `trio` 0.29 -> 0.33. Candidate avenues:
|
|
||||||
|
|
||||||
- which scope owns the per-actor `terminate_after` wait,
|
|
||||||
and are the tree's reaps concurrent or serialized?
|
|
||||||
- did trio 0.33's abort/reschedule or cancel-reason
|
|
||||||
bookkeeping change checkpoint timing on the cancel path?
|
|
||||||
|
|
||||||
If/when the cascade speeds back up under-budget, depth=3
|
|
||||||
will start completing well under 12s — at which point the
|
|
||||||
budget can be tightened back toward 6s as a regression
|
|
||||||
tripwire. Related (different backend, same cascade class):
|
|
||||||
`cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
|
|
||||||
|
|
@ -5,7 +5,6 @@
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
import platform
|
import platform
|
||||||
import os
|
import os
|
||||||
import re
|
|
||||||
import signal
|
import signal
|
||||||
import time
|
import time
|
||||||
from typing import (
|
from typing import (
|
||||||
|
|
@ -295,26 +294,6 @@ def expect(
|
||||||
PROMPT = r"\(Pdb\+\)"
|
PROMPT = r"\(Pdb\+\)"
|
||||||
|
|
||||||
|
|
||||||
# Strip terminal color / ANSI-VT100 escape sequences so
|
|
||||||
# substring matching against REPL + traceback output stays
|
|
||||||
# robust to color leakage — Python 3.13's colored tracebacks,
|
|
||||||
# `pdbp`'s pygments highlighting, etc. — even when
|
|
||||||
# `PYTHON_COLORS=0` (set in the `spawn` fixture) isn't honored
|
|
||||||
# by every renderer in the spawned subproc.
|
|
||||||
# Regex per https://stackoverflow.com/a/14693789
|
|
||||||
_ansi_re: re.Pattern = re.compile(
|
|
||||||
r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])'
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def ansi_strip(text: str) -> str:
|
|
||||||
'''
|
|
||||||
Remove ANSI/VT100 escape sequences from `text`.
|
|
||||||
|
|
||||||
'''
|
|
||||||
return _ansi_re.sub('', text)
|
|
||||||
|
|
||||||
|
|
||||||
def in_prompt_msg(
|
def in_prompt_msg(
|
||||||
child: SpawnBase,
|
child: SpawnBase,
|
||||||
parts: list[str],
|
parts: list[str],
|
||||||
|
|
@ -334,7 +313,7 @@ def in_prompt_msg(
|
||||||
'''
|
'''
|
||||||
__tracebackhide__: bool = False
|
__tracebackhide__: bool = False
|
||||||
|
|
||||||
before: str = ansi_strip(str(child.before.decode()))
|
before: str = str(child.before.decode())
|
||||||
for part in parts:
|
for part in parts:
|
||||||
if part not in before:
|
if part not in before:
|
||||||
if pause_on_false:
|
if pause_on_false:
|
||||||
|
|
@ -354,9 +333,9 @@ def in_prompt_msg(
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
# NB: color-char stripping (so we can match against call-stack
|
# TODO: todo support terminal color-chars stripping so we can match
|
||||||
# frame output from the `ll` command and the like) is handled by
|
# against call stack frame output from the the 'll' command the like!
|
||||||
# `ansi_strip()` applied inside `in_prompt_msg()` + below.
|
# -[ ] SO answer for stipping ANSI codes: https://stackoverflow.com/a/14693789
|
||||||
def assert_before(
|
def assert_before(
|
||||||
child: SpawnBase,
|
child: SpawnBase,
|
||||||
patts: list[str],
|
patts: list[str],
|
||||||
|
|
@ -377,7 +356,7 @@ def assert_before(
|
||||||
err_on_false=True,
|
err_on_false=True,
|
||||||
**kwargs
|
**kwargs
|
||||||
)
|
)
|
||||||
before: str = ansi_strip(str(child.before.decode()))
|
before: str = str(child.before.decode())
|
||||||
return before
|
return before
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1186,12 +1186,7 @@ def test_shield_pause(
|
||||||
"('cancelled_before_pause'", # actor name
|
"('cancelled_before_pause'", # actor name
|
||||||
_repl_fail_msg,
|
_repl_fail_msg,
|
||||||
"trio.Cancelled",
|
"trio.Cancelled",
|
||||||
# trio >=0.30 raises via a multi-line
|
"raise Cancelled._create()",
|
||||||
# `raise Cancelled._create(source=.., reason=..,
|
|
||||||
# source_task=..)` (cancel-reason metadata), so
|
|
||||||
# match the open-paren form only, NOT the legacy
|
|
||||||
# bare `()`.
|
|
||||||
"raise Cancelled._create(",
|
|
||||||
|
|
||||||
# we should be handling a taskc inside
|
# we should be handling a taskc inside
|
||||||
# the first `.port_mortem()` sin-shield!
|
# the first `.port_mortem()` sin-shield!
|
||||||
|
|
@ -1209,12 +1204,7 @@ def test_shield_pause(
|
||||||
"('root'", # actor name
|
"('root'", # actor name
|
||||||
_repl_fail_msg,
|
_repl_fail_msg,
|
||||||
"trio.Cancelled",
|
"trio.Cancelled",
|
||||||
# trio >=0.30 raises via a multi-line
|
"raise Cancelled._create()",
|
||||||
# `raise Cancelled._create(source=.., reason=..,
|
|
||||||
# source_task=..)` (cancel-reason metadata), so
|
|
||||||
# match the open-paren form only, NOT the legacy
|
|
||||||
# bare `()`.
|
|
||||||
"raise Cancelled._create(",
|
|
||||||
|
|
||||||
# handling a taskc inside the first unshielded
|
# handling a taskc inside the first unshielded
|
||||||
# `.port_mortem()`.
|
# `.port_mortem()`.
|
||||||
|
|
|
||||||
|
|
@ -8,9 +8,9 @@ after `Actor` construction, so any spawned sub-actor process
|
||||||
should:
|
should:
|
||||||
|
|
||||||
- have `argv[0]` (== `/proc/<pid>/cmdline`) start with
|
- have `argv[0]` (== `/proc/<pid>/cmdline`) start with
|
||||||
`<_def_prefix>[<aid.reprol()>]` (currently `_subactor[…]`)
|
`tractor[<aid.reprol()>]`
|
||||||
- have `/proc/<pid>/comm` start with `<_def_prefix>[`
|
- have `/proc/<pid>/comm` start with `tractor[` (kernel
|
||||||
(kernel truncates to ~15 bytes)
|
truncates to ~15 bytes)
|
||||||
- be detected as a tractor sub-actor by
|
- be detected as a tractor sub-actor by
|
||||||
`_is_tractor_subactor(pid)` via the cmdline marker.
|
`_is_tractor_subactor(pid)` via the cmdline marker.
|
||||||
|
|
||||||
|
|
@ -27,10 +27,7 @@ import trio
|
||||||
import tractor
|
import tractor
|
||||||
|
|
||||||
from tractor.runtime._runtime import Actor
|
from tractor.runtime._runtime import Actor
|
||||||
from tractor.devx._proctitle import (
|
from tractor.devx._proctitle import set_actor_proctitle
|
||||||
set_actor_proctitle,
|
|
||||||
_def_prefix,
|
|
||||||
)
|
|
||||||
from tractor._testing._reap import (
|
from tractor._testing._reap import (
|
||||||
_is_tractor_subactor,
|
_is_tractor_subactor,
|
||||||
_read_cmdline,
|
_read_cmdline,
|
||||||
|
|
@ -44,9 +41,8 @@ _non_linux: bool = platform.system() != 'Linux'
|
||||||
def test_set_actor_proctitle_format():
|
def test_set_actor_proctitle_format():
|
||||||
'''
|
'''
|
||||||
`set_actor_proctitle()` returns the canonical
|
`set_actor_proctitle()` returns the canonical
|
||||||
`<_def_prefix>[<aid.reprol()>]` form (currently
|
`tractor[<aid.reprol()>]` form and actually mutates
|
||||||
`_subactor[…]`) and actually mutates the running
|
the running proc's title.
|
||||||
proc's title.
|
|
||||||
|
|
||||||
'''
|
'''
|
||||||
pytest.importorskip(
|
pytest.importorskip(
|
||||||
|
|
@ -64,14 +60,12 @@ def test_set_actor_proctitle_format():
|
||||||
)
|
)
|
||||||
title: str = set_actor_proctitle(actor)
|
title: str = set_actor_proctitle(actor)
|
||||||
|
|
||||||
# canonical wrapping: `<_def_prefix>[<aid.reprol()>]`.
|
# canonical wrapping: `tractor[<aid.reprol()>]`. We
|
||||||
# We source BOTH the prefix (`_def_prefix`) and the
|
# compare against the runtime-computed `reprol()`
|
||||||
# runtime-computed `reprol()` rather than hard-coding,
|
# rather than a hard-coded value so the test stays
|
||||||
# so the test stays decoupled from the prefix shape
|
# decoupled from `Aid.reprol()`'s internal format
|
||||||
# (flipped to `_subactor` in `3a45dbd5`) AND from
|
# (currently `<name>@<pid>`, but could evolve).
|
||||||
# `Aid.reprol()`'s internal format (currently
|
expected: str = f'tractor[{actor.aid.reprol()}]'
|
||||||
# `<name>@<pid>`, but could evolve).
|
|
||||||
expected: str = f'{_def_prefix}[{actor.aid.reprol()}]'
|
|
||||||
assert title == expected
|
assert title == expected
|
||||||
# sanity: the actor's name must be in the title
|
# sanity: the actor's name must be in the title
|
||||||
# somewhere (so a future `reprol()` change that
|
# somewhere (so a future `reprol()` change that
|
||||||
|
|
@ -146,17 +140,15 @@ def test_subactor_proctitle_visible_via_proc():
|
||||||
)
|
)
|
||||||
|
|
||||||
pid, info = matched[0]
|
pid, info = matched[0]
|
||||||
# canonical proctitle prefix in cmdline (full form);
|
# canonical proctitle prefix in cmdline (full form)
|
||||||
# prefix sourced from `_def_prefix` so it tracks the
|
assert info['cmdline'].startswith('tractor[proctitle_boi@'), (
|
||||||
# `3a45dbd5` flip (`tractor[` -> `_subactor[`).
|
f'cmdline missing `tractor[proctitle_boi@…]` prefix: '
|
||||||
assert info['cmdline'].startswith(f'{_def_prefix}[proctitle_boi@'), (
|
|
||||||
f'cmdline missing `{_def_prefix}[proctitle_boi@…]` prefix: '
|
|
||||||
f'{info["cmdline"]!r}'
|
f'{info["cmdline"]!r}'
|
||||||
)
|
)
|
||||||
# comm is kernel-truncated to ~15 bytes — just check the
|
# comm is kernel-truncated to ~15 bytes — just check the
|
||||||
# `<_def_prefix>[` prefix made it.
|
# `tractor[` prefix made it.
|
||||||
assert info['comm'].startswith(f'{_def_prefix}['), (
|
assert info['comm'].startswith('tractor['), (
|
||||||
f'comm missing `{_def_prefix}[` prefix: {info["comm"]!r}'
|
f'comm missing `tractor[` prefix: {info["comm"]!r}'
|
||||||
)
|
)
|
||||||
# intrinsic-signal detector should match.
|
# intrinsic-signal detector should match.
|
||||||
assert info['is_tractor'] is True
|
assert info['is_tractor'] is True
|
||||||
|
|
|
||||||
|
|
@ -605,28 +605,17 @@ async def test_nested_multierrors(
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
|
||||||
# Per-backend/-depth budgets: in the non-hang case the
|
# 6s budget: in the non-hang case (and on the trio
|
||||||
# whole spawn + cancel-cascade should complete in well
|
# backend) the whole spawn + cancel-cascade should
|
||||||
# under these. On the borderline hang case the
|
# complete in well under that. On the borderline hang
|
||||||
# `fail_after_w_trace` fires `TooSlowError` AND captures a
|
# case the `fail_after_w_trace` fires `TooSlowError`
|
||||||
# ptree/wchan/py-spy snapshot to
|
# AND captures a ptree/wchan/py-spy snapshot to
|
||||||
# `$XDG_CACHE_HOME/tractor/hung-dumps/` for offline
|
# `$XDG_CACHE_HOME/tractor/hung-dumps/` for offline
|
||||||
# inspection. See
|
# inspection. See
|
||||||
# `ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
|
# `ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
|
||||||
#
|
|
||||||
# NOTE: the `trio` depth=3 budget was bumped 6 -> 12s after
|
|
||||||
# the `trio` 0.29 -> 0.33 lock bump (commit c7741bba) slowed
|
|
||||||
# the depth-3 cancel-cascade from <6s to ~7-8s; the 6s
|
|
||||||
# deadline was firing and its `Cancelled(source='deadline')`
|
|
||||||
# (trio 0.33 cancel-reason metadata) collapsed a BEG branch,
|
|
||||||
# breaking the `RemoteActorError` assertion below. depth=1
|
|
||||||
# still finishes in ~3s so keeps the 6s budget. See
|
|
||||||
# `ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
|
|
||||||
match (start_method, depth):
|
match (start_method, depth):
|
||||||
case ('trio', 1):
|
case ('trio', _):
|
||||||
timeout = 6
|
timeout = 6
|
||||||
case ('trio', 3):
|
|
||||||
timeout = 12
|
|
||||||
case ('main_thread_forkserver', 1):
|
case ('main_thread_forkserver', 1):
|
||||||
timeout = 16
|
timeout = 16
|
||||||
case ('main_thread_forkserver', 3):
|
case ('main_thread_forkserver', 3):
|
||||||
|
|
|
||||||
|
|
@ -860,14 +860,7 @@ def test_echoserver_detailed_mechanics(
|
||||||
timeout: float = (
|
timeout: float = (
|
||||||
999 if tractor.debug_mode()
|
999 if tractor.debug_mode()
|
||||||
else 4 if is_forking_spawner
|
else 4 if is_forking_spawner
|
||||||
# was 1; the `trio` 0.29 -> 0.33 bump slowed the
|
else 1
|
||||||
# cancel-cascade so a 1s budget raced the ~1s teardown
|
|
||||||
# deadline. On a deadline-fire the injected
|
|
||||||
# `Cancelled(source='deadline')` wraps the mid-stream
|
|
||||||
# KBI in a `BaseExceptionGroup`, breaking the bare
|
|
||||||
# `pytest.raises(KeyboardInterrupt)` below. See
|
|
||||||
# `ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
|
|
||||||
else 4
|
|
||||||
)
|
)
|
||||||
|
|
||||||
# body factored out so the `fail_after_w_trace`-wrapping
|
# body factored out so the `fail_after_w_trace`-wrapping
|
||||||
|
|
|
||||||
|
|
@ -24,10 +24,7 @@ which" at a glance without needing to read full
|
||||||
`/proc/<pid>/cmdline`.
|
`/proc/<pid>/cmdline`.
|
||||||
|
|
||||||
Format:
|
Format:
|
||||||
``<_def_prefix>[<aid.reprol()>]`` e.g. ``_subactor[doggy@1027301b]``
|
``tractor[<aid.reprol()>]`` e.g. ``tractor[doggy@1027301b]``
|
||||||
(prefix from the `_def_prefix` const, flipped `tractor` ->
|
|
||||||
`_subactor` so sub-actor procs are visually distinct from the
|
|
||||||
root in `ps`/`htop` and the reap-recognition markers.)
|
|
||||||
|
|
||||||
Uses the canonical `Aid.reprol()` form
|
Uses the canonical `Aid.reprol()` form
|
||||||
(``<name>@<uuid_short>``) so the proc-title matches the
|
(``<name>@<uuid_short>``) so the proc-title matches the
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue