Compare commits

...

109 Commits

Author SHA1 Message Date
Tyler Goodlet b8019f90ec Add timeout around inf-streamer suite
Since with the new actorc injection seems to be hanging?
Not sure what exactly the issue is but likely races again
during teardown between the `.run_in_actor()` remote-exc capture
and any actorc after the `portal.cancel()`..

Also tossed in a bp to figure out why actorcs aren't actually showing
outside the `trio.run()`..?
2025-08-20 13:00:46 -04:00
Tyler Goodlet f6ba50979b Adjust nested-subs debug test for tbs output
Such that we don't require every single src/relay_uid in the final
output but instead at some point in the pre-output of some prompt.
Added some comments to match each actor sub-layer.
2025-08-20 13:00:46 -04:00
Tyler Goodlet b3d348ee6a WIP, actor-nursery non-graceful-cancel raises EG
Attempting a rework of the post-cancellation "raising semantics" such
that subactors which are `ActorCancelled` as a result of a non-graceful
in-scope error, are acked via a re-raised
`ExceptionGroup[ActorCancelled*N, Exception]`
*outside the an-block*. Eventually, the idea is to have `ActorCancelled`
be relayed from each subactor in response to any
`Actor.cancel()/Portal.cancel_actor()` request much like
`Context.cancel()/ContextCancelled`.

This is a WIP bc it does break a few tests and requires related
`_spawn`-mod-machinery changes to match some of which I'm not yet sure
are required; need to dig into to the details of the currently failing
suites first.

`._supervise` patch deats,
- add `ActorNursery.maybe_error` which delivers the maybe-EG or
  `._scope_error` depending on `.errors` (now `._errors`, a mapping from
  `Aid`-keys) has entries seet for subs.
- raise ^ if non-null in a new outer-`finally` in
  `_open_and_supervise_one_cancels_all_nursery()`; an "outer" block is
  added to ensure all sub-actor-excs are emited/captured as part of
  `ActorNursery.cancel()` being called (as prior) as well as the
  `da_nursery` being explicitly cancelled alongside it (to unblock the
  tn-block, but still not sure why this is necessary yet?..).
- (now masked) tried injecting actorcs from `.cancel()` loop, but (again
  per more explanation in section below) seems to be suffering a race
  issue with RAE relay?
- left in buncha notes obvi for all this..

`._spawn` patch deats,
- as above, expect `errors: dict` to map from `Aid`-keys.
- pass `errors: dict` into `soft_kill()` since it seemed like we'd want
  to (for now) inject `ActoreCancelled` in some cases (but now i'm not
  sure XD).
- tried out a couple spots (which are now masked) to inject
  `ActorCancelled` after calling `Portal.cancel()` in various
  subactor-supervision routines whenev an RAE is not set..
  - oddly seems to be overwriting actual errors (likely due to racing
    with RAE receive and/or actorc-request timeout?) despite the guard
    logic..which clearly doesn't resolve the issue..
- buncha `tn`-style renaming.
2025-08-20 13:00:46 -04:00
Tyler Goodlet d432e2e245 Add todo for `tn` to `gather_contexts()` from `find_actor()`? 2025-08-20 13:00:46 -04:00
Tyler Goodlet ab013e3069 Use `an` var name in nested subactor debugging ex. 2025-08-20 13:00:46 -04:00
Tyler Goodlet 6344f9cdb7 TOSQUASH 313ad93: yeah dun use `._message` as tb-str.. 2025-08-20 13:00:46 -04:00
Tyler Goodlet 82b5bd52c8 Add an `actorc` test-driven-dev suite
Defining how an actor-nursery should emit an eg based on non-graceful
cancellation in a new `test_actor_nursery` module. Obviously fails atm
until the implementation is completed.
2025-08-20 13:00:46 -04:00
Tyler Goodlet dc806b8aba Add `ActorCancelled` as an runtime-wide-signal
As in a layer "above" a KBI/SIGINT but "below" a `ContextCancelled` and
generally signalling an interrupt which requests cancellation of the
actor's `trio.run()`.

Impl deats,
- mk the new exc type inherit from our ctxc (for now) but overriding the
  `.canceller` impl to,
  * pull from the `RemoteActorError._extra_msgdata: dict` when no
    `._ipc_msg` is set (which is always to start, until we incorporate
    a new `CancelActor` msg type).
  * not allow a `None` value since we should key-error if not set per
    prev bullet.
- Mk adjustments (related) to parent `RemoteActorError.pformat()` to
  accommodate showing the `.canceller` field in repr output,
  * change `.relay_uid` to not crash when `._ipc_msg` is unset.
  * support `.msg.types.Aid` and use its `.reprol()` from `._mk_fields_str()`.
  * always call `._mk_fields_str()`, not just when `tb_str` is provided,
    and for now use any `._message` in-place of a `tb_str` when
    undefined.
2025-08-20 13:00:46 -04:00
Tyler Goodlet 5ab642bdf0 Drop more `typing.Optional` usage 2025-08-20 12:45:49 -04:00
Tyler Goodlet ed18ecd064 Drop `tn` arg to `maybe_raise_from_masking_exc()` in `._rpc` 2025-08-20 12:45:49 -04:00
Tyler Goodlet cec0282953 Add `never_warn_on: dict` support to unmasker
Such that key->value pairs can be defined which should *never be*
unmasked where values of
- the keys are exc-types which might be masked, and
- the values are exc-types which masked the equivalent key.

For example, the default includes:
- KBI->taskc: a kbi should never be unmasked from its masking
  `trio.Cancelled`.

For the impl, a new `do_warn: bool` in the fn-body determines the
primary guard for whether a warning or re-raising is necessary.
2025-08-20 12:45:49 -04:00
Tyler Goodlet 25c5847f2e Drop `tn` input from `maybe_raise_from_masking_exc()`
Including all caller usage throughout. Moving to a non-`except*` impl
means it's never needed as a signal from the caller - we can just catch
the beg outright (like we should have always been doing)..
2025-08-20 12:45:49 -04:00
Tyler Goodlet ba793fadd9 Pass `tuple` from `._invoke()` unmasker usage
To match the `maybe_raise_from_masking_exc()` sig change.
2025-08-20 12:45:49 -04:00
Tyler Goodlet d17864a432 Adjust test suites to new `maybe_raise_from_masking_exc()` changes 2025-08-20 12:45:49 -04:00
Tyler Goodlet 6c361a9564 Drop `except*` usage from `._taskc` unmasker
That is from `maybe_raise_from_masking_exc()` thus minimizing us to
a single `except BaseException` block with logic branching for the beg
vs. `unmask_from` exc cases.

Also,
- raise val-err when `unmask_from` is not a `tuple`.
- tweak the exc-note warning format.
- drop all pausing from dev work.
2025-08-20 12:45:49 -04:00
Tyler Goodlet 34ca7429c7 Add a "real-world" example of cancelled-masking with `.aclose()` 2025-08-20 12:45:49 -04:00
Bd c9a55c2d46
Merge pull request #397 from goodboy/post_mortems
Fix root-actor crash handling despite runtime cancellation
2025-08-20 12:45:06 -04:00
Tyler Goodlet 548855b4f5 Comment/docs tweaks per copilot reivew
Add a micro glossary to clarify questioned terms and refine out some
patch specific comment regions.
2025-08-20 12:36:08 -04:00
Tyler Goodlet 5322861d6d Clean out old-commented tn-opens and ipc-server settings checks 2025-08-20 11:35:31 -04:00
Tyler Goodlet 46a2fa7074 Always pass a `tn` to `._server._serve_ipc_eps()`
Turns out we weren't despite the optional `stream_handler_nursery` input
to `Server.listen_on()`; fail over to the `Server._stream_handler_tn`
allocated during server setup in those cases.
2025-08-20 11:30:58 -04:00
Tyler Goodlet bfe5b2dde6 Hide `collapse_eg()` frame as used from `open_root_actor()` 2025-08-20 10:44:42 -04:00
Tyler Goodlet a9f06df3fb Heh, add back `Actor._root_tn`, it has purpose..
Turns out I didn't read my own internals docs/comments and despite it
not being used previously, this adds the real use case: a root,
per-actor, scope which ensures parent comms are the last conc-thing to
be cancelled.

Also, the impl changes here make the test from 6410e45 (or wtv
it's rebased to) pass, i.e. we can support crash handling in the root
actor despite the root-tn having been (self) cancelled.

Superficial adjustments,
- rename `Actor._service_n` -> `._service_tn` everywhere.
- add asserts to `._runtime.async_main()` which ensure that the any
  `.trionics.maybe_open_nursery()` calls against optionally passed
  `._[root/service]_tn` are allocated-if-not-provided (the
  `._service_tn`-case being an i-guess-prep-for-the-future-anti-pattern
  Bp).
- obvi adjust all internal usage to match new naming.

Serious/real-use-case changes,
- add (back) a `Actor._root_tn` which sits a scope "above" the
  service-tn and is either,
  + assigned in `._runtime.async_main()` for sub-actors OR,
  + assigned in `._root.open_root_actor()` for the root actor.
  **THE primary reason** to keep this "upper" tn is that during
  a full-`Actor`-cancellation condition (more details below) we want to
  ensure that the IPC connection with a sub-actor's parent is **the last
  thing to be cancelled**; this is most simply implemented by ensuring
  that the `Actor._parent_chan: .ipc.Channel` is handled in an upper
  scope in `_rpc.process_messages()`-subtask-terms.
- for the root actor this `root_tn` is allocated in `.open_root_actor()`
  body and assigned as such.
- extend `Actor.cancel_soon()` to be cohesive with this entire teardown
  "policy" by scheduling a task in the `._root_tn` which,
  * waits for the `._service_tn` to complete and then,
  * cancels the `._root_tn.cancel_scope`,
  * includes "sclangy" console logging throughout.
2025-08-20 10:18:52 -04:00
Tyler Goodlet ee32bc433c Add a root-already-cancelled crash handling test
Such that we audit the `shield=root_tn.cancel_scope.cancel_called,`
passed to `await debug._maybe_enter_pm()` in the `open_root_actor()`
exit handler block.
2025-08-20 10:18:52 -04:00
Tyler Goodlet 561954594e Add attempt at non-root-parent REPL guarding
I masked it bc it doesn't seem to actually work for the case I was
testing (`emsd` clobbering a `paperboi` in `piker`..) but figured I'd
leave it as a reminder for solving this problem more generally (#320)
since this is likely the place in the code for a soln.

When i tested it in my case it just resulted in a hang around the `with
debug.acquire_debug_lock()` for some reason? Can't remember if the child
ended up being able to REPL without issue though..
2025-08-19 14:15:14 -04:00
Tyler Goodlet 28a6354e81 Set `shield` when `.cancel_called` for root crashes
Such that we handle them despite a cancellation condition. This is
almost always the case, that `root_tn.cancel_scope.cancel_called` is
set, by the time the `debug._maybe_enter_pm()` hits. Previous I guess we
just weren't actually ever REPL-debugging such cases?

TODO, still needs a test obvi!
2025-08-19 14:14:38 -04:00
Tyler Goodlet d1599449e7 Mk `pause_from_sync()` raise `InternalError` on no `greenback` init 2025-08-19 14:14:27 -04:00
Tyler Goodlet 2d27c94dec Hide `_maybe_enter_pm()` frame (again?) 2025-08-19 14:14:27 -04:00
Tyler Goodlet 6e4c76245b Add LoC pattern matches for `test_post_mortem_api` 2025-08-19 14:14:27 -04:00
Bd a6f599901c
Merge pull request #395 from goodboy/to_asyncio_eoc_signal
`to_asyncio` eoc signal: use `trio.EndOfChannel` to indicate (maybe non-graceful) `asyncio.Task` termination
2025-08-19 12:45:23 -04:00
Tyler Goodlet 0fafd25f0d Comment tweaks per copilot review 2025-08-19 12:33:47 -04:00
Tyler Goodlet b74e93ee55 Change one infected-aio test to use `chan` in fn sig 2025-08-18 22:32:51 -04:00
Tyler Goodlet 961504b657 Support `chan.started_nowait()` in `.open_channel_from()` target
That is the `target` can declare a `chan: LinkedTaskChannel` instead of
`to_trio`/`from_aio`.

To support it,
- change `.started()` -> the more appropriate `.started_nowait()` which
  can be called sync from the aio child task.
- adjust the `provide_channels` assert to accept either fn sig
  declaration (for now).

Still needs test(s) obvi..
2025-08-18 22:32:51 -04:00
Tyler Goodlet bd148300c5 Relay `asyncio` errors via EoC and raise from rent
Makes the newly added `test_aio_side_raises_before_started` test pass by
ensuring errors raised by any `.to_asyncio.open_channel_from()` spawned
child-`asyncio.Task` are relayed by any caught `trio.EndOfChannel` by
checking for a new `LinkedTaskChannel._closed_by_aio_task: bool`.

Impl deats,
- obvi add `LinkedTaskChannel._closed_by_aio_task: bool = False`
- in `translate_aio_errors()` always check for the new flag on EOC
  conditions and in such cases set `chan._trio_to_raise = aio_err` such
  that the `trio`-parent-task always raises the child's exception
  directly, OW keep original EoC passthrough in place.
- include *very* detailed per-case comments around the extended handler.
- adjust re-raising logic with a new `raise_from` where we only give the
  `aio_err` priority if it's not already set as to `trio_to_raise`.

Also,
- hide the `_run_asyncio_task()` frame by def.
2025-08-18 22:32:51 -04:00
Tyler Goodlet 4a7491bda4 Add "raises-pre-started" `open_channel_from()` test
Verifying that if any exc is raised pre `chan.send_nowait()` (our
currentlly shite version of a `chan.started()`) then that exc is indeed
raised through on the `trio`-parent task side. This case was reproduced
from a `piker.brokers.ib` issue with a similar embedded
`.trionics.maybe_open_context()` call.

Deats,
- call the suite `test_aio_side_raises_before_started`.
- mk the `@context` simply `maybe_open_context(acm_func=open_channel_from)`
  with a `target=raise_before_started` which,
- simply sleeps then immediately raises a RTE.
- expect the RTE from the aio-child-side to propagate all the way up to
  the root-actor's task right up through the `trio.run()`.
2025-08-18 22:32:51 -04:00
Bd 62415518fc
Merge pull request #394 from goodboy/nursery_cleaning
A bit of (actor) nursery cleaning
2025-08-18 22:32:19 -04:00
Tyler Goodlet 5c7d930a9a Drop unused `Actor._root_n`.. 2025-08-18 22:16:03 -04:00
Tyler Goodlet c46986504d Switch nursery to `CancelScope`-status properties
Been meaning to do this forever and a recent test hang finally drove me
to it Bp

Like it sounds, adopt the "cancel-status" properties on `ActorNursery`
use already on our `Context` and derived from `trio.CancelScope`:

- add new private `._cancel_called` (set in the head of `.cancel()`)
  & `._cancelled_caught` (set in the tail) instance vars with matching
  read-only `@properties`.

- drop the instance-var and instead delegate a `.cancelled: bool`
  property to `._cancel_called` and add a usage deprecation warning
  (since removing it breaks a buncha tests).
2025-08-18 22:16:03 -04:00
Tyler Goodlet e05a4d3cac Enforce named-args only to `.open_nursery()` 2025-08-18 22:16:03 -04:00
Bd a9aa5ec04e
Merge pull request #392 from goodboy/introspect_ipc
Introspect-ipc: some `.ipc` subpkg iface refinements for reading cancel statuses and `Address.__repr__()`
2025-08-18 22:15:40 -04:00
Tyler Goodlet 5021514a6a Disable shm resource tracker via flag on 3.13+
As per the newly added support,
https://docs.python.org/3/library/multiprocessing.shared_memory.html
2025-08-18 22:04:40 -04:00
Tyler Goodlet 79f502034f Don't hard code runtime-dir, read it with `._state.get_rt_dir()` 2025-08-18 21:30:48 -04:00
Tyler Goodlet 331921f612 Hmm disable CRE case for now, causes test fails
So i need to either adjust the tests or figure out if/why this is needed
to avoid the crashing in `pikerd` i found when killin the chart during
a long backfill with `binance` backend..
2025-08-18 21:30:48 -04:00
Tyler Goodlet df0d00abf4 Translate CRE's due to socket-close to tpt-closed
Just like in the BRE case (for UDS) it seems when a peer closes the
(UDS?) socket `trio` instead raises a `ClosedResourceError` which we now
catch and re-raise as a `TransportClosed`. This again results in
`tpt.send()` calls from the rpc-runtime **not** raising when it's known
that the IPC channel is disconnected.
2025-08-18 21:30:48 -04:00
Tyler Goodlet a72d1e6c48 Multi-line-style up the UDS fast-connect handler
Shift around comments and expressions for better reading, assign
`tpt_closed` for easier introspection from REPL during debug oh and fix
the `MsgpackTransport.pformat()` to render '|_peers: 1' .. XD
2025-08-18 21:30:48 -04:00
Tyler Goodlet 5931c59aef Log "out-of-layer" cancellation in `._rpc._invoke()`
Similar to what was just changed for `Context.repr_state`, when the
child task is cancelled but by a different "layer" of the runtime (i.e.
a `Portal.cancel_actor()` / `SIGINT`-to-process canceller) we don't
dump a traceback instead just `log.cancel()` emit.
2025-08-18 21:30:48 -04:00
Tyler Goodlet ba08052ddf Handle "out-of-layer" remote `Context` cancellation
Such that if the local task hasn't resolved but is `trio.Cancelled` and
a `.canceller` was set, we report a `'actor-cancelled'` from
`.repr_state: str`. Bit of formatting to avoid needless newlines too!
2025-08-18 21:30:48 -04:00
Tyler Goodlet 00112edd58 UDS: implicitly create `Address.bindspace: Path`
Since it's merely a local-file-sys subdirectory and there should be no
reason file creation conflicts with other bind spaces.

Also add 2 test suites to match,
- `tests/ipc/test_each_tpt::test_uds_bindspace_created_implicitly` to
  verify the dir creation when DNE.
- `..test_uds_double_listen_raises_connerr` to ensure a double bind
  raises a `ConnectionError` from the src `OSError`.
2025-08-18 21:30:48 -04:00
Tyler Goodlet 1d706bddda Rm `assert` from `Channel.from_addr()`, for UDS we re-created to extract the peer PID 2025-08-18 21:30:48 -04:00
Tyler Goodlet 3c30c559d5 `ipc._uds`: assign `.l/raddr` in `.connect_to()`
Using `.get_stream_addrs()` such that we always (*can*) assign the peer
end's PID in the `._raddr`.

Also factor common `ConnectionError` re-raising into
a `_reraise_as_connerr()`-@cm.
2025-08-18 21:30:48 -04:00
Tyler Goodlet 599020c2c5 Rename all lingering ctx-side bits
As before but more thoroughly in comments and var names finally changing
all,
- caller -> parent
- callee -> child
2025-08-18 21:30:48 -04:00
Tyler Goodlet 50f6543ee7 Add `Channel.closed/.cancel_called`
I.e. the public properties for the private instance var equivs; improves
expected introspection usage.
2025-08-18 21:30:48 -04:00
Tyler Goodlet c0854fd221 Set `Channel._cancel_called` via `chan` var
In `Portal.cancel_actor()` that is, at the least to make it easier to
ref search from an editor Bp
2025-08-18 21:30:48 -04:00
Tyler Goodlet e875b62869 Add `.ipc._shm` todo-idea for `@actor_fixture` API 2025-08-18 21:30:48 -04:00
Tyler Goodlet 3ab7498893 Add todo for py3.13+ `.shared_memory`'s new `track=False` support.. finally they added it XD 2025-08-18 21:30:48 -04:00
Bd dd041b0a01
Merge pull request #393 from goodboy/trionics_tweaks
Trionics tweaks: some `._mngrs` refinements and fix a `test_resource_cache` hang
2025-08-18 21:20:33 -04:00
Tyler Goodlet 4e252526b5 Accept `tn` to `gather_contexts()/maybe_open_context()`
Such that the caller can be responsible for their own (nursery) scoping
as needed and, for the latter fn's case with
a `trio.Nursery.CancelStatus.encloses()` check to ensure the `tn` is
a valid parent-ish.

Some deats,
- in `gather_contexts()`, mv the `try/finally` outside the nursery block
  to ensure we always do the `parent_exit`.
- for `maybe_open_context()` we do a naive task-tree hierarchy audit to
  ensure the provided scope is not *too* child-ish (with what APIs `trio`
  gives us, see above), OW go with the old approach of using the actor's
  private service nursery.
  Also,
  * better report `trio.Cancelled` around the cache-miss `yield`
    cases and ensure we **never** unmask triggering key-errors.
  * report on any stale-state with the mutex in the `finally` block.
2025-08-18 21:07:12 -04:00
Tyler Goodlet 4ba3590450 Add `.trionics.maybe_open_context()` locking test
Call it `test_lock_not_corrupted_on_fast_cancel()` and includes
a detailed doc string to explain. Implemented it "cleverly" by having
the target `@acm` cancel its parent nursery after a peer, cache-hitting
task, is already waiting on the task mutex release.
2025-08-18 21:07:12 -04:00
Tyler Goodlet f1ff79a4e6 Always `finally` invoke cache-miss `lock.release()`s
Since the `await service_n.start()` on key-err can be cancel-masked
(checkpoint interrupted before `_Cache.run_ctx` completes), we need to
always `lock.release()` in to avoid lock-owner-state corruption and/or
inf-hangs in peer cache-hitting tasks.

Deats,
- add a `try/except/finally` around the key-err triggered cache-miss
  `service_n.start(_Cache.run_ctx, ..)` call, reporting on any taskc
  and always `finally` unlocking.
- fill out some log msg content and use `.debug()` level.
2025-08-18 21:07:12 -04:00
Tyler Goodlet 70664b98de Well then, I guess it just needed, a checkpoint XD
Here I was thinking the bcaster (usage) maybe required a rework but,
NOPE it's just bc a checkpoint was needed in the parent task owning the
`tn` which spawns `get_sub_and_pull()` tasks to ensure the bg allocated
`an`/portal is eventually cancel-called..

Ah well, at least i started a patch for `MsgStream.subscribe()` to make
it multicast revertible.. XD

Anyway, I tossed in some checks & notes related to all that unnecessary
effort since I do think i'll move forward implementing it:
- for the `cache_hit` case always verify that the `bcast` clone is
  unregistered from the common state subs after
  `.subscribe().__aexit__()`.
- do a light check that the implicit `MsgStream._broadcaster` is always
  the only bcrx instance left-leaked into that state.. that is until
  i get the proper de-allocation/reversion from multicast -> unicast
  working.
- put in mega detailed note about the required parent-task checkpoint.
2025-08-18 21:07:12 -04:00
Tyler Goodlet 1c425cbd22 Tool-up `test_resource_cache.test_open_local_sub_to_stream`
Since I recently discovered a very subtle race-case that can sometimes
cause the suite to hang, seemingly due to the `an: ActorNursery`
allocated *behind* the `.trionics.maybe_open_context()` usage; this can
result in never cancelling the 'streamer' subactor despite the `main()`
timeout-guard?

This led me to dig in and find that the underlying issue was 2-fold,

- our `BroadcastReceiver` termination-mgmt semantics in
  `MsgStream.subscribe()` can result in the first subscribing task to
  always keep the `MsgStream._broadcaster` instance allocated; it's
  never `.aclose()`ed, which makes it tough to determine (and thus
  trace) when all subscriber-tasks are actually complete and
  exited-from-`.subscribe()`..

- i was shield waiting `.ipc._server.Server.wait_for_no_more_peers()` in
  `._runtime.async_main()`'s shutdown sequence which would then compound
  the issue resulting in a SIGINT-shielded hang.. the worst kind XD

Actual changes here are just styling, printing, and some mucking with
passing the `an`-ref up to the parent task in the root-actor where i was
doing a conditional `ActorNursery.cancel()` to mk sure that was actually
the problem. Presuming this is fixed the `.pause()` i left unmasked
should never hit.
2025-08-18 21:07:06 -04:00
Tyler Goodlet edc2211444 Go multi-line-style tuples in `maybe_enter_context()`
Allows for an inline comment of the first "cache hit" bool element.
2025-08-18 20:55:18 -04:00
Bd b05abea51e
Merge pull request #390 from goodboy/strict_egs_everywhere
Strict egs everywhere: drop use of `strict_exception_groups=False` throughout!
2025-08-18 14:15:49 -04:00
Tyler Goodlet 88c1c083bd Add timeout to inf-streamer test 2025-08-18 13:31:15 -04:00
Tyler Goodlet b096867d40 Remove lingering seg=False-flags from tests 2025-08-18 12:03:32 -04:00
Tyler Goodlet a3c9822602 Remove lingering seg=False-flags from examples 2025-08-18 12:03:10 -04:00
Tyler Goodlet e3a542f2b5 Never shield-wait `ipc_server.wait_for_no_more_peers()`
As mentioned in prior testing commit, it can cause the worst kind of
hangs, the SIGINT ignoring kind.. Pretty sure there was never any reason
outside some esoteric multi-actor debugging case, and pretty sure that
already was solved?
2025-08-18 10:46:37 -04:00
Tyler Goodlet 0ffcea1033 Adjust `test_trio_prestarted_task_bubbles()` suite to expect non-eg raises 2025-08-18 10:46:37 -04:00
Tyler Goodlet a7bdf0486c Styling tweaks to quadruple streaming test fn 2025-08-18 10:46:37 -04:00
Tyler Goodlet d2ac9ecf95 Resolve `test_cancel_while_childs_child_in_sync_sleep`
Was failing due to the `.fail_after()` timeout being *too short* and
somehow the new interplay of that with strict-exception groups resulting
in the `TooSlowError` never raising but instead an eg with the embedded
`AssertionError`?? I still don't really get it honestly..

I've written up lengthy notes around the different `delay` settings that
can be used to see the diff outcomes, the failing case being the one
i still don't really grok and think is justification for `trio` to
bubble inner `Cancelled`s differently possibly?

For now i've included the original failing case as an `xfail`
parametrization for now which will hopefully drive a follow lowlevel
`trio` test in `test_trioisms`!
2025-08-18 10:46:37 -04:00
Tyler Goodlet dcb1062bb8 Fix cluster suite, chng to new `gather_contexts()`
Namely `test_empty_mngrs_input_raises()` was failing due to
lazy-iterator use as input to `mngrs` which i guess i added support for
a while back (by it doing a `list(mngrs)` internally)? So just change it
to `gather_contexts(mngrs=())` and also tweak the `trio.fail_after(3)`
since it appears that the prior 1sec was causing
too-fast-of-a-cancellation (before the cluster fully spawned) and thus
the expected `ValueError` never to show..

Also, mask the `tractor.trionics.collapse_eg()` usage (again?) in
`open_actor_cluster()` since it seems unnecessary.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 05d865c0f1 WIP tinkering with strict-eg-tns and cluster API
Seems that the way the actor-nursery interacts with the
`.trionics.gather_contexts()` API on cancellation makes our
`.trionics.collapse_eg()` not work as intended?

I need to dig into how `ActorNursery.cancel()` and `.__aexit__()` might
be causing this discrepancy..

Consider this a commit-of-my-index type save for rn.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 8218f0f51f Bit of multi-line styling / name tweaks in cancellation suites 2025-08-18 10:46:37 -04:00
Tyler Goodlet 8f19f5d3a8 Mk temp collapser bp work outside runtime as well.. 2025-08-18 10:46:37 -04:00
Tyler Goodlet 64c27a914b Add temp breakpoint support to `collapse_eg()` 2025-08-18 10:46:37 -04:00
Tyler Goodlet d9c8d543b3 Suppress beg tbs from `collapse_eg()`
It was originally this way; I forgot to flip it back when discarding the
`except*` handler impl..

Specially handle the `exc.__cause__` case where we raise from any
detected underlying cause and OW `from None` to suppress the eg's tb.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 048b154f00 Rework `collapse_eg()` to NOT use `except*`..
Since it turns out the semantics are basically inverse of normal
`except` (particularly for re-raising) which is hard to get right, and
bc it's a lot easier to just delegate to what `trio` already has behind
the `strict_exception_groups=False` setting, Bp

I added a rant here which will get removed shortly likely, but i think
going forward recommending against use of `except*` is prudent for
anything low level enough in the runtime (like trying to filter begs).

Dirty deats,
- copy `trio._core._run.collapse_exception_group()` to here with only
  a slight mod to remove the notes check and tb concatting for the
  collapse case.
- rename `maybe_collapse_eg()` - > `get_collapsed_eg()` and delegate it
  directly to the former `trio` fn; return `None` when it returns the
  same beg without collapse.
- simplify our own `collapse_eg()` to either raise the collapsed `exc`
  or original `beg`.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 88828e9f99 Couple more `._root` logging tweaks.. 2025-08-18 10:46:37 -04:00
Tyler Goodlet 25ff195c17 Use collapser around `root_tn` in `async_main()`
Replacing yet another loose-eg-flag. Also toss in a todo to maybe use
the unmasker around the `open_root_actor()` body.
2025-08-18 10:46:37 -04:00
Tyler Goodlet f60cc646ff Facepalm, fix `raise from` in `collapse_eg()`
I dunno what exactly I was thinking but we definitely don't want to
**ever** raise from the original exc-group, instead always raise from
any original `.__cause__` to be consistent with the embedded src-error's
context.

Also, adjust `maybe_collapse_eg()` to return `False` in the non-single
`.exceptions` case, again don't know what I was trying to do but this
simplifies caller logic and the prior return-semantic had no real
value..

This fixes some final usage in the runtime (namely top level nursery
usage in `._root`/`._runtime`) which was previously causing test suite
failures prior to this fix.
2025-08-18 10:46:37 -04:00
Tyler Goodlet a2b754b5f5 Just import `._runtime` ns in `._root`; be a bit more explicit 2025-08-18 10:46:37 -04:00
Tyler Goodlet 5e13588aed Use collapse in `._root.open_root_actor()` too
Seems to add one more cancellation suite failure as well as now cause
the discovery test to error instead of fail?
2025-08-18 10:46:37 -04:00
Tyler Goodlet 0a56f40bab Use collapser around root tn in `.async_main()`
Seems to cause the following test suites to fail however..

- 'test_advanced_faults.py::test_ipc_channel_break_during_stream'
- 'test_advanced_faults.py::test_ipc_channel_break_during_stream'
- 'test_clustering.py::test_empty_mngrs_input_raises'

Also tweak some ctxc request logging content.
2025-08-18 10:46:37 -04:00
Tyler Goodlet f776c47cb4 Drop msging-err patt from `subactor_breakpoint` ex
Since the `bdb` module was added to the namespace lookup set in
`._exceptions.get_err_type()` we can now relay a RAE-boxed
`bdb.BdbQuit`.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 7f584d4f54 Switch to strict-eg nurseries almost everywhere
That is just throughout the core library, not the tests yet. Again, we
simply change over to using our (nearly equivalent?)
`.trionics.collapse_eg()` in place of the already deprecated
`strict_exception_groups=False` flag in the following internals,
- the conc-fan-out tn use in `._discovery.find_actor()`.
- `._portal.open_portal()`'s internal tn used to spawn a bg rpc-msg-loop
  task.
- the daemon and "run-in-actor" layered tn pair allocated in
  `._supervise._open_and_supervise_one_cancels_all_nursery()`.

The remaining loose-eg usage in `._root` and `._runtime` seem to be
necessary to keep the test suite green?? For the moment these are left
out.
2025-08-18 10:46:37 -04:00
Tyler Goodlet d650dda0fa Use collapser in rent side of `Context` 2025-08-18 10:46:37 -04:00
Tyler Goodlet f6598e8400 Add some tooling params to `collapse_eg()` 2025-08-18 10:46:37 -04:00
Bd 59822ff093
Merge pull request #389 from goodboy/better_reprs
Better `repr()`s: more console friendly reprentations of internal primitives
2025-08-16 17:20:02 -04:00
Tyler Goodlet ca427aec7e More prep-to-reduce the `Actor` method-iface
- drop the (never/un)used `.get_chans()`.
- add #TODO for factoring many methods into a new `.rpc`-subsys/pkg
  primitive, like an `RPCMngr/Server` type eventually.
- add todo to maybe mv `.get_parent()` elsewhere?
- move masked `._hard_mofo_kill()` to bottom.
2025-08-16 17:06:23 -04:00
Tyler Goodlet f53aa992af .log: expose `at_least_level()` as `StackLevelAdapter` meth 2025-08-15 17:29:22 -04:00
Tyler Goodlet 69e0afccf0 Use `Address` where possible in (root) actor boot
Namely inside various bootup-sequences in `._root` and `._runtime`
particularly in the root actor to support both better tpt-address
denoting in our logging and as part of clarifying logic around setting
the root's registry addresses which is soon to be much better factored
out of the core and into an explicit subsystem + API.

Some `_root.open_root_actor()` deats,
- set `registry_addrs` to a new `uw_reg_addrs` (uw: unwrapped) to be
  more explicit about wrapped addr types thoughout.
- instead ensure `registry_addrs` are the wrapped types and pass down
  into the root `Actor` singleton-instance.
- factor the root-actor check + rt-vars update (updating the `'_root_addrs'`)
  out of `._runtime.async_main()` into this fn.
- as previous, set `trans_bind_addrs = uw_reg_addrs` in unwrapped form since it will
  be passed down both through rt-vars as `'_root_addrs'` and to
  `._runtim.async_main()` as `accept_addrs` (which is then passed to the
  IPC server).
- adjust/simplify much logging.
- shield the `await actor.cancel(None)  # self cancel` to avoid any
  finally-footguns.
- as mentioned convert the

For `_runtime.async_main()` tweaks,
- expect `registry_addrs: list[Address]|None = None` with appropriate
  unwrapping prior to setting both `.reg_addrs` and the equiv rt-var.
- add a new `.registry_addrs` prop for the wrapped form.
- convert a final loose-eg for the `service_nursery` to use
  `collapse_eg()`.
- simplify teardown report logging.
2025-08-15 17:29:10 -04:00
Tyler Goodlet e275c49b23 Stackscope import fail msg dun need braces.. 2025-08-15 16:34:03 -04:00
Tyler Goodlet 48fbf38c1d Drop duplicated (masked) debugging-`terminate_after`, prolly a rebase slip.. 2025-08-15 16:33:31 -04:00
Tyler Goodlet defd6e28d2 Facepalm, actually use `.log.cancel()`-level to report parent-side taskc.. 2025-08-15 16:31:52 -04:00
Tyler Goodlet 414b0e2bae Update buncha log msg fmting in `.msg._ops`
Mostly just multi-line code styling again: always putting standalone
`'f\n'` on separate LOC so it reads like it renders to console. Oh and
and a level drop to `.runtime()` for rx-msg reports.
2025-08-15 16:30:10 -04:00
Tyler Goodlet d34fb54f7c Update buncha log msg fmting in `._spawn`
Again using `Channel.aid.reprol()`, `.devx.pformat.nest_from_op()` and
 converting to multi-line code style an ' for str-report-contents. Tweak
 some imports to sub-mod level as well.
2025-08-15 16:29:17 -04:00
Tyler Goodlet 5d87f63377 Update buncha log msg fmting in `._portal`
Namely to use `Channel.aid.reprol()` and converting to our newer style
multi-line code style for str-reports.
2025-08-15 16:29:11 -04:00
Tyler Goodlet 0ca3d50602 Use `._supervise._shutdown_msg` in tooling test 2025-08-15 16:29:05 -04:00
Tyler Goodlet 8880a80e3e Use `nest_from_op()`/`pretty_struct` in `._rpc`
Again for nicer console logging. Also fix a double `req_chan` arg bug
when passed to `_invoke` in the `self.cancel()` rt-ep; don't update the
`kwargs: dict` just merge in `req_chan` input at call time.
2025-08-15 16:28:46 -04:00
Tyler Goodlet 7be713ee1e Use `nest_from_op()` in actor-nursery shutdown
Including a new one-line `_shutdown_msg: str` which we mod-var-set for
testing usage and some denoising at `.info()` level. Adjust `Actor()`
instantiating input to the new `.registry_addrs` wrapped addrs property.
2025-08-15 16:28:30 -04:00
Tyler Goodlet 4bd8211abb Add #TODO for `._context` to use `.msg.Aid` 2025-08-15 16:24:35 -04:00
Tyler Goodlet a23a98886c Even more `.ipc.*` repr refinements
Mostly adjusting indentation, noise level, and clarity via `.pformat()`
tweaks more general use of `.devx.pformat.nest_from_op()`.

Specific impl deats,
- use `pformat.ppfmt()/`nest_from_op()` more seriously throughout
  `._server`.
- add a `._server.Endpoint.pformat()`.
- add `._server.Server.len_peers()` and `.repr_state()`.
- polish `Server.pformat()`.
- drop some redundant `log.runtime()`s from `._serve_ipc_eps()` instead
  leaving-them-only/putting-them in the caller pub meth.
- `._tcp.start_listener()` log the bound addr, not the input (which may
  be the 0-port.
2025-08-15 16:24:27 -04:00
Tyler Goodlet 31544c862c More `.ipc.Channel`-repr related tweaks
- only generate a repr in `.from_addr()` when log level is >= 'runtime'.
 |_ add a todo about supporting this optimization more generally on our
   adapter.
- fix `Channel.pformat()` to show unknown peer field line fmt correctly.
- add a `Channel.maddr: str` which just delegates directly to the
  `._transport` like other pass-thru property fields.
2025-08-15 16:24:22 -04:00
Tyler Goodlet 7d320c4e1e Mk `Aid` hashable, use pretty-`.__repr__()`
Hash on the `.uuid: str` and delegate verbatim to
`msg.pretty_struct.Struct`'s equiv method.
2025-08-15 16:24:15 -04:00
Tyler Goodlet 38944ad1d2 Drop `actor_info: str` from `._entry` logs 2025-08-15 16:24:06 -04:00
Tyler Goodlet 9260909fe1 Try `nest_from_op()` in some `._rpc` spots
To start trying out,
- using in the `Start`-msg handler-block to repr the msg coming
  *from* a `repr(Channel)` using '<=)` sclang op.
- for a completed RPC task in `_invoke_non_context()`.
- for the msg loop task's termination report.
2025-08-15 16:23:59 -04:00
Tyler Goodlet c00b3c86ea Hide more `Channel._transport` privates for repr
Such as the `MsgTransport.stream` and `.drain` attrs since they're
rarely that important at the chan level. Also start adopting
a `.<attr>=` style for actual attrs of the type versus a `<name>:
` style for meta-field info lines.
2025-08-15 16:23:54 -04:00
Tyler Goodlet 808a336508 Refine `Actor` status iface, use `Aid` throughout
To simplify `.pformat()` output when the new `privates: bool` is unset
(the default) this adds new public attrs to wrap an actor's
cancellation status as well as provide a `.repr_state: str` (similar to
our equiv on `Context`). Rework `.pformat()` to render a much simplified
repr using all these new refinements.

Further, port the `.cancel()` method to use `.msg.types.Aid` for all
internal `requesting_uid` refs (now renamed with `_aid`) and in all
called downstream methods.

New cancel-state iface deats,
- rename `._cancel_called_by_remote` -> `._cancel_called_by` and expect
  it to be set as an `Aid`.
- add `.cancel_complete: bool` which flags whether `.cancel()` ran to
  completion.
- add `.cancel_called: bool` which just wraps `._cancel_called` (and
  which likely will just be dropped since we already have
  `._cancel_called_by`).
- add `.cancel_caller: Aid|None` which wraps `._cancel_called_by`.

In terms of using `Aid` in cancel methods,
- rename vars with `_aid` suffix in `.cancel()` (and wherever else).
- change `.cancel_rpc_tasks()` input param to `req_aid: msgtypes.Aid`.
- do the same for `._cancel_task()` and (for now until we adjust its
  internals as well) use the `Aid.uid` remap property when assigning
  `Context._canceller`.
- adjust all log msg refs to match obvi.
2025-08-15 16:08:53 -04:00
Tyler Goodlet 679d999185 Add flag to toggle private vars in `Channel.pformat()`
Call it `privates: bool` and only show certain internal instance vars
when set in the `repr()` output.
2025-08-15 16:07:39 -04:00
Tyler Goodlet a8428d7de3 Extend `.msg.types.Aid` method interface
Providing the legacy `.uid -> tuple` style id (since still used for the
`Actor._contexts` table) and a `repr-one-line` method `.reprol() -> str`
for rendering a compact unique actor ID summary (useful in
logging/.pformat()s at the least).
2025-08-15 16:07:39 -04:00
48 changed files with 3405 additions and 1269 deletions

View File

@ -16,6 +16,7 @@ from tractor import (
ContextCancelled, ContextCancelled,
MsgStream, MsgStream,
_testing, _testing,
trionics,
) )
import trio import trio
import pytest import pytest
@ -62,9 +63,8 @@ async def recv_and_spawn_net_killers(
await ctx.started() await ctx.started()
async with ( async with (
ctx.open_stream() as stream, ctx.open_stream() as stream,
trio.open_nursery( trionics.collapse_eg(),
strict_exception_groups=False, trio.open_nursery() as tn,
) as tn,
): ):
async for i in stream: async for i in stream:
print(f'child echoing {i}') print(f'child echoing {i}')

View File

@ -21,12 +21,12 @@ async def breakpoint_forever():
async def spawn_until(depth=0): async def spawn_until(depth=0):
""""A nested nursery that triggers another ``NameError``. """"A nested nursery that triggers another ``NameError``.
""" """
async with tractor.open_nursery() as n: async with tractor.open_nursery() as an:
if depth < 1: if depth < 1:
await n.run_in_actor(breakpoint_forever) await an.run_in_actor(breakpoint_forever)
p = await n.run_in_actor( p = await an.run_in_actor(
name_error, name_error,
name='name_error' name='name_error'
) )
@ -38,7 +38,7 @@ async def spawn_until(depth=0):
# recusrive call to spawn another process branching layer of # recusrive call to spawn another process branching layer of
# the tree # the tree
depth -= 1 depth -= 1
await n.run_in_actor( await an.run_in_actor(
spawn_until, spawn_until,
depth=depth, depth=depth,
name=f'spawn_until_{depth}', name=f'spawn_until_{depth}',

View File

@ -0,0 +1,35 @@
import trio
import tractor
async def main():
async with tractor.open_root_actor(
debug_mode=True,
loglevel='cancel',
) as _root:
# manually trigger self-cancellation and wait
# for it to fully trigger.
_root.cancel_soon()
await _root._cancel_complete.wait()
print('root cancelled')
# now ensure we can still use the REPL
try:
await tractor.pause()
except trio.Cancelled as _taskc:
assert (root_cs := _root._root_tn.cancel_scope).cancel_called
# NOTE^^ above logic but inside `open_root_actor()` and
# passed to the `shield=` expression is effectively what
# we're testing here!
await tractor.pause(shield=root_cs.cancel_called)
# XXX, if shield logic *is wrong* inside `open_root_actor()`'s
# crash-handler block this should never be interacted,
# instead `trio.Cancelled` would be bubbled up: the original
# BUG.
assert 0
if __name__ == '__main__':
trio.run(main)

View File

@ -23,9 +23,8 @@ async def main():
modules=[__name__] modules=[__name__]
) as portal_map, ) as portal_map,
trio.open_nursery( tractor.trionics.collapse_eg(),
strict_exception_groups=False, trio.open_nursery() as tn,
) as tn,
): ):
for (name, portal) in portal_map.items(): for (name, portal) in portal_map.items():

View File

@ -0,0 +1,145 @@
from contextlib import (
contextmanager as cm,
# TODO, any diff in async case(s)??
# asynccontextmanager as acm,
)
from functools import partial
import tractor
import trio
log = tractor.log.get_logger(__name__)
tractor.log.get_console_log('info')
@cm
def teardown_on_exc(
raise_from_handler: bool = False,
):
'''
You could also have a teardown handler which catches any exc and
does some required teardown. In this case the problem is
compounded UNLESS you ensure the handler's scope is OUTSIDE the
`ux.aclose()`.. that is in the caller's enclosing scope.
'''
try:
yield
except BaseException as _berr:
berr = _berr
log.exception(
f'Handling termination teardown in child due to,\n'
f'{berr!r}\n'
)
if raise_from_handler:
# XXX teardown ops XXX
# on termination these steps say need to be run to
# ensure wider system consistency (like the state of
# remote connections/services).
#
# HOWEVER, any bug in this teardown code is also
# masked by the `tx.aclose()`!
# this is also true if `_tn.cancel_scope` is
# `.cancel_called` by the parent in a graceful
# request case..
# simulate a bug in teardown handler.
raise RuntimeError(
'woopsie teardown bug!'
)
raise # no teardown bug.
async def finite_stream_to_rent(
tx: trio.abc.SendChannel,
child_errors_mid_stream: bool,
task_status: trio.TaskStatus[
trio.CancelScope,
] = trio.TASK_STATUS_IGNORED,
):
async with (
# XXX without this unmasker the mid-streaming RTE is never
# reported since it is masked by the `tx.aclose()`
# call which in turn raises `Cancelled`!
#
# NOTE, this is WITHOUT doing any exception handling
# inside the child task!
#
# TODO, uncomment next LoC to see the supprsessed beg[RTE]!
# tractor.trionics.maybe_raise_from_masking_exc(),
tx as tx, # .aclose() is the guilty masker chkpt!
trio.open_nursery() as _tn,
):
# pass our scope back to parent for supervision\
# control.
task_status.started(_tn.cancel_scope)
with teardown_on_exc(
raise_from_handler=not child_errors_mid_stream,
):
for i in range(100):
log.info(
f'Child tx {i!r}\n'
)
if (
child_errors_mid_stream
and
i == 66
):
# oh wait but WOOPS there's a bug
# in that teardown code!?
raise RuntimeError(
'woopsie, a mid-streaming bug!?'
)
await tx.send(i)
async def main(
# TODO! toggle this for the 2 cases!
# 1. child errors mid-stream while parent is also requesting
# (graceful) cancel of that child streamer.
#
# 2. child contains a teardown handler which contains a
# bug and raises.
#
child_errors_mid_stream: bool,
):
tx, rx = trio.open_memory_channel(1)
async with (
trio.open_nursery() as tn,
rx as rx,
):
_child_cs = await tn.start(
partial(
finite_stream_to_rent,
child_errors_mid_stream=child_errors_mid_stream,
tx=tx,
)
)
async for msg in rx:
log.info(
f'Rent rx {msg!r}\n'
)
# simulate some external cancellation
# request **JUST BEFORE** the child errors.
if msg == 65:
log.cancel(
f'Cancelling parent on,\n'
f'msg={msg}\n'
f'\n'
f'Simulates OOB cancel request!\n'
)
tn.cancel_scope.cancel()
if __name__ == '__main__':
for case in [True, False]:
trio.run(main, case)

View File

@ -1,8 +1,8 @@
""" """
That "native" debug mode better work! That "native" debug mode better work!
All these tests can be understood (somewhat) by running the equivalent All these tests can be understood (somewhat) by running the
`examples/debugging/` scripts manually. equivalent `examples/debugging/` scripts manually.
TODO: TODO:
- none of these tests have been run successfully on windows yet but - none of these tests have been run successfully on windows yet but
@ -317,7 +317,6 @@ def test_subactor_breakpoint(
assert in_prompt_msg( assert in_prompt_msg(
child, [ child, [
'MessagingError:',
'RemoteActorError:', 'RemoteActorError:',
"('breakpoint_forever'", "('breakpoint_forever'",
'bdb.BdbQuit', 'bdb.BdbQuit',
@ -710,10 +709,41 @@ def test_multi_nested_subactors_error_through_nurseries(
child = spawn('multi_nested_subactors_error_up_through_nurseries') child = spawn('multi_nested_subactors_error_up_through_nurseries')
# timed_out_early: bool = False # timed_out_early: bool = False
at_least_one: list[str] = [
"bdb.BdbQuit",
for send_char in itertools.cycle(['c', 'q']): # leaf subs, which actually raise in "user code"
"src_uid=('breakpoint_forever'",
"src_uid=('name_error'",
# 2nd layer subs
"src_uid=('spawn_until_1'",
"src_uid=('spawn_until_2'",
"src_uid=('spawn_until_3'",
"relay_uid=('spawn_until_0'",
# 1st layer subs
"src_uid=('spawner0'",
"src_uid=('spawner1'",
]
for i, send_char in enumerate(
itertools.cycle(['c', 'q'])
):
try: try:
child.expect(PROMPT) child.expect(PROMPT)
for patt in at_least_one.copy():
if in_prompt_msg(
child,
[patt],
):
print(
f'Found patt in prompt {i}\n'
f'patt: {patt!r}\n'
)
at_least_one.remove(patt)
child.sendline(send_char) child.sendline(send_char)
time.sleep(0.01) time.sleep(0.01)
@ -722,27 +752,15 @@ def test_multi_nested_subactors_error_through_nurseries(
assert_before( assert_before(
child, child,
[ # boxed source errors [
"NameError: name 'doggypants' is not defined", # boxed source errors should show in final
# post-prompt tb to console.
"tractor._exceptions.RemoteActorError:", "tractor._exceptions.RemoteActorError:",
"('name_error'", "NameError: name 'doggypants' is not defined",
"bdb.BdbQuit",
# first level subtrees # TODO? once we get more pedantic with `relay_uid` should
# "tractor._exceptions.RemoteActorError: ('spawner0'", # prolly include all actor-IDs we expect to see in final
"src_uid=('spawner0'", # tb?
# "tractor._exceptions.RemoteActorError: ('spawner1'",
# propagation of errors up through nested subtrees
# "tractor._exceptions.RemoteActorError: ('spawn_until_0'",
# "tractor._exceptions.RemoteActorError: ('spawn_until_1'",
# "tractor._exceptions.RemoteActorError: ('spawn_until_2'",
# ^-NOTE-^ old RAE repr, new one is below with a field
# showing the src actor's uid.
"src_uid=('spawn_until_0'",
"relay_uid=('spawn_until_1'",
"src_uid=('spawn_until_2'",
] ]
) )
@ -926,6 +944,7 @@ def test_post_mortem_api(
"<Task 'name_error'", "<Task 'name_error'",
"NameError", "NameError",
"('child'", "('child'",
'getattr(doggypants)', # exc-LoC
] ]
) )
if ctlc: if ctlc:
@ -942,8 +961,8 @@ def test_post_mortem_api(
"<Task '__main__.main'", "<Task '__main__.main'",
"('root'", "('root'",
"NameError", "NameError",
"tractor.post_mortem()",
"src_uid=('child'", "src_uid=('child'",
"tractor.post_mortem()", # in `main()`-LoC
] ]
) )
if ctlc: if ctlc:
@ -961,6 +980,10 @@ def test_post_mortem_api(
"('root'", "('root'",
"NameError", "NameError",
"src_uid=('child'", "src_uid=('child'",
# raising line in `main()` but from crash-handling
# in `tractor.open_nursery()`.
'async with p.open_context(name_error) as (ctx, first):',
] ]
) )
if ctlc: if ctlc:
@ -1152,6 +1175,54 @@ def test_ctxep_pauses_n_maybe_ipc_breaks(
) )
def test_crash_handling_within_cancelled_root_actor(
spawn: PexpectSpawner,
):
'''
Ensure that when only a root-actor is started via `open_root_actor()`
we can crash-handle in debug-mode despite self-cancellation.
More-or-less ensures we conditionally shield the pause in
`._root.open_root_actor()`'s `await debug._maybe_enter_pm()`
call.
'''
child = spawn('root_self_cancelled_w_error')
child.expect(PROMPT)
assert_before(
child,
[
"Actor.cancel_soon()` was called!",
"root cancelled",
_pause_msg,
"('root'", # actor name
]
)
child.sendline('c')
child.expect(PROMPT)
assert_before(
child,
[
_crash_msg,
"('root'", # actor name
"AssertionError",
"assert 0",
]
)
child.sendline('c')
child.expect(EOF)
assert_before(
child,
[
"AssertionError",
"assert 0",
]
)
# TODO: better error for "non-ideal" usage from the root actor. # TODO: better error for "non-ideal" usage from the root actor.
# -[ ] if called from an async scope emit a message that suggests # -[ ] if called from an async scope emit a message that suggests
# using `await tractor.pause()` instead since it's less overhead # using `await tractor.pause()` instead since it's less overhead

View File

@ -121,9 +121,11 @@ def test_shield_pause(
child.pid, child.pid,
signal.SIGINT, signal.SIGINT,
) )
from tractor._supervise import _shutdown_msg
expect( expect(
child, child,
'Shutting down actor runtime', # 'Shutting down actor runtime',
_shutdown_msg,
timeout=6, timeout=6,
) )
assert_before( assert_before(

View File

@ -0,0 +1,114 @@
'''
Unit-ish tests for specific IPC transport protocol backends.
'''
from __future__ import annotations
from pathlib import Path
import pytest
import trio
import tractor
from tractor import (
Actor,
_state,
_addr,
)
@pytest.fixture
def bindspace_dir_str() -> str:
rt_dir: Path = tractor._state.get_rt_dir()
bs_dir: Path = rt_dir / 'doggy'
bs_dir_str: str = str(bs_dir)
assert not bs_dir.is_dir()
yield bs_dir_str
# delete it on suite teardown.
# ?TODO? should we support this internally
# or is leaking it ok?
if bs_dir.is_dir():
bs_dir.rmdir()
def test_uds_bindspace_created_implicitly(
debug_mode: bool,
bindspace_dir_str: str,
):
registry_addr: tuple = (
f'{bindspace_dir_str}',
'registry@doggy.sock',
)
bs_dir_str: str = registry_addr[0]
# XXX, ensure bindspace-dir DNE beforehand!
assert not Path(bs_dir_str).is_dir()
async def main():
async with tractor.open_nursery(
enable_transports=['uds'],
registry_addrs=[registry_addr],
debug_mode=debug_mode,
) as _an:
# XXX MUST be created implicitly by
# `.ipc._uds.start_listener()`!
assert Path(bs_dir_str).is_dir()
root: Actor = tractor.current_actor()
assert root.is_registrar
assert registry_addr in root.reg_addrs
assert (
registry_addr
in
_state._runtime_vars['_registry_addrs']
)
assert (
_addr.wrap_address(registry_addr)
in
root.registry_addrs
)
trio.run(main)
def test_uds_double_listen_raises_connerr(
debug_mode: bool,
bindspace_dir_str: str,
):
registry_addr: tuple = (
f'{bindspace_dir_str}',
'registry@doggy.sock',
)
async def main():
async with tractor.open_nursery(
enable_transports=['uds'],
registry_addrs=[registry_addr],
debug_mode=debug_mode,
) as _an:
# runtime up
root: Actor = tractor.current_actor()
from tractor.ipc._uds import (
start_listener,
UDSAddress,
)
ya_bound_addr: UDSAddress = root.registry_addrs[0]
try:
await start_listener(
addr=ya_bound_addr,
)
except ConnectionError as connerr:
assert type(src_exc := connerr.__context__) is OSError
assert 'Address already in use' in src_exc.args
# complete, exit test.
else:
pytest.fail('It dint raise a connerr !?')
trio.run(main)

View File

@ -0,0 +1,98 @@
'''
Basic `ActorNursery` operations and closure semantics,
- basic remote error collection,
- basic multi-subactor cancellation.
'''
# import os
# import signal
# import platform
# import time
# from itertools import repeat
import pytest
import trio
import tractor
from tractor._exceptions import ActorCancelled
# from tractor._testing import (
# tractor_test,
# )
# from .conftest import no_windows
@pytest.mark.parametrize(
'num_subs',
[
1,
3,
]
)
def test_one_cancels_all(
start_method: str,
loglevel: str,
debug_mode: bool,
num_subs: int,
):
'''
Verify that ifa a single error bubbles to the an-scope the
nursery will be cancelled (just like in `trio`); this is a
one-cancels-all style strategy and are only supervision policy
at the moment.
'''
async def main():
try:
rte = RuntimeError('Uh oh something bad in parent')
async with tractor.open_nursery(
start_method=start_method,
loglevel=loglevel,
debug_mode=debug_mode,
) as an:
# spawn the same number of deamon actors which should be cancelled
dactor_portals = []
for i in range(num_subs):
name: str= f'sub_{i}'
ptl: tractor.Portal = await an.start_actor(
name=name,
enable_modules=[__name__],
)
dactor_portals.append(ptl)
# wait for booted
async with tractor.wait_for_actor(name):
print(f'{name!r} is up.')
# simulate uncaught exc
raise rte
# should error here with a ``RemoteActorError`` or ``MultiError``
except BaseExceptionGroup as _beg:
beg = _beg
# ?TODO? why can't we do `is` on beg?
assert (
beg.exceptions
==
an.maybe_error.exceptions
)
assert len(beg.exceptions) == (
num_subs
+
1 # rte from root
)
# all subactors should have been implicitly
# `Portal.cancel_actor()`ed.
excs = list(beg.exceptions)
excs.remove(rte)
for exc in excs:
assert isinstance(exc, ActorCancelled)
assert an._scope_error is rte
assert not an._children
assert an.cancelled is True
trio.run(main)

View File

@ -313,9 +313,8 @@ async def inf_streamer(
# `trio.EndOfChannel` doesn't propagate directly to the above # `trio.EndOfChannel` doesn't propagate directly to the above
# .open_stream() parent, resulting in it also raising instead # .open_stream() parent, resulting in it also raising instead
# of gracefully absorbing as normal.. so how to handle? # of gracefully absorbing as normal.. so how to handle?
trio.open_nursery( tractor.trionics.collapse_eg(),
strict_exception_groups=False, trio.open_nursery() as tn,
) as tn,
): ):
async def close_stream_on_sentinel(): async def close_stream_on_sentinel():
async for msg in stream: async for msg in stream:

View File

@ -11,6 +11,9 @@ from itertools import repeat
import pytest import pytest
import trio import trio
import tractor import tractor
from tractor._exceptions import (
ActorCancelled,
)
from tractor._testing import ( from tractor._testing import (
tractor_test, tractor_test,
) )
@ -124,7 +127,10 @@ def test_multierror(
) as nursery: ) as nursery:
await nursery.run_in_actor(assert_err, name='errorer1') await nursery.run_in_actor(assert_err, name='errorer1')
portal2 = await nursery.run_in_actor(assert_err, name='errorer2') portal2 = await nursery.run_in_actor(
assert_err,
name='errorer2',
)
# get result(s) from main task # get result(s) from main task
try: try:
@ -137,7 +143,15 @@ def test_multierror(
# here we should get a ``BaseExceptionGroup`` containing exceptions # here we should get a ``BaseExceptionGroup`` containing exceptions
# from both subactors # from both subactors
with pytest.raises(BaseExceptionGroup): with pytest.raises(
expected_exception=(
tractor.RemoteActorError,
# ?TODO, should it be this??
# like `trio`'s strict egs?
BaseExceptionGroup,
),
):
trio.run(main) trio.run(main)
@ -233,10 +247,14 @@ async def stream_forever():
@tractor_test @tractor_test
async def test_cancel_infinite_streamer(start_method): async def test_cancel_infinite_streamer(
start_method: str,
):
# stream for at most 1 seconds # stream for at most 1 seconds
with trio.move_on_after(1) as cancel_scope: with (
trio.fail_after(4),
trio.move_on_after(1) as cancel_scope
):
async with tractor.open_nursery() as n: async with tractor.open_nursery() as n:
portal = await n.start_actor( portal = await n.start_actor(
'donny', 'donny',
@ -284,20 +302,38 @@ async def test_cancel_infinite_streamer(start_method):
], ],
) )
@tractor_test @tractor_test
async def test_some_cancels_all(num_actors_and_errs, start_method, loglevel): async def test_some_cancels_all(
"""Verify a subset of failed subactors causes all others in num_actors_and_errs: tuple,
start_method: str,
loglevel: str,
debug_mode: bool,
):
'''
Verify a subset of failed subactors causes all others in
the nursery to be cancelled just like the strategy in trio. the nursery to be cancelled just like the strategy in trio.
This is the first and only supervisory strategy at the moment. This is the first and only supervisory strategy at the moment.
"""
num_actors, first_err, err_type, ria_func, da_func = num_actors_and_errs '''
(
num_actors,
first_err,
err_type,
ria_func,
da_func,
) = num_actors_and_errs
with trio.fail_after(
3
if not debug_mode
else 999
):
try: try:
async with tractor.open_nursery() as n: async with tractor.open_nursery() as an:
# spawn the same number of deamon actors which should be cancelled # spawn the same number of deamon actors which should be cancelled
dactor_portals = [] dactor_portals = []
for i in range(num_actors): for i in range(num_actors):
dactor_portals.append(await n.start_actor( dactor_portals.append(await an.start_actor(
f'deamon_{i}', f'deamon_{i}',
enable_modules=[__name__], enable_modules=[__name__],
)) ))
@ -307,7 +343,7 @@ async def test_some_cancels_all(num_actors_and_errs, start_method, loglevel):
for i in range(num_actors): for i in range(num_actors):
# start actor(s) that will fail immediately # start actor(s) that will fail immediately
riactor_portals.append( riactor_portals.append(
await n.run_in_actor( await an.run_in_actor(
func, func,
name=f'actor_{i}', name=f'actor_{i}',
**kwargs **kwargs
@ -337,19 +373,28 @@ async def test_some_cancels_all(num_actors_and_errs, start_method, loglevel):
# should error here with a ``RemoteActorError`` or ``MultiError`` # should error here with a ``RemoteActorError`` or ``MultiError``
except first_err as err: except first_err as _err:
err = _err
if isinstance(err, BaseExceptionGroup): if isinstance(err, BaseExceptionGroup):
assert len(err.exceptions) == num_actors assert len(err.exceptions) == num_actors
for exc in err.exceptions: for exc in err.exceptions:
# TODO, figure out why these aren't being set?
if isinstance(exc, ActorCancelled):
breakpoint()
if isinstance(exc, tractor.RemoteActorError): if isinstance(exc, tractor.RemoteActorError):
assert exc.boxed_type == err_type assert exc.boxed_type == err_type
else: else:
assert isinstance(exc, trio.Cancelled) assert isinstance(exc, trio.Cancelled)
elif isinstance(err, tractor.RemoteActorError): elif isinstance(err, tractor.RemoteActorError):
assert err.boxed_type == err_type assert err.boxed_type == err_type
assert n.cancelled is True assert an.cancelled is True
assert not n._children assert not an._children
else: else:
pytest.fail("Should have gotten a remote assertion error?") pytest.fail("Should have gotten a remote assertion error?")
@ -519,10 +564,15 @@ def test_cancel_via_SIGINT_other_task(
async def main(): async def main():
# should never timeout since SIGINT should cancel the current program # should never timeout since SIGINT should cancel the current program
with trio.fail_after(timeout): with trio.fail_after(timeout):
async with trio.open_nursery( async with (
# XXX ?TODO? why no work!?
# tractor.trionics.collapse_eg(),
trio.open_nursery(
strict_exception_groups=False, strict_exception_groups=False,
) as n: ) as tn,
await n.start(spawn_and_sleep_forever) ):
await tn.start(spawn_and_sleep_forever)
if 'mp' in spawn_backend: if 'mp' in spawn_backend:
time.sleep(0.1) time.sleep(0.1)
os.kill(pid, signal.SIGINT) os.kill(pid, signal.SIGINT)
@ -533,38 +583,123 @@ def test_cancel_via_SIGINT_other_task(
async def spin_for(period=3): async def spin_for(period=3):
"Sync sleep." "Sync sleep."
print(f'sync sleeping in sub-sub for {period}\n')
time.sleep(period) time.sleep(period)
async def spawn(): async def spawn_sub_with_sync_blocking_task():
async with tractor.open_nursery() as tn: async with tractor.open_nursery() as an:
await tn.run_in_actor( print('starting sync blocking subactor..\n')
await an.run_in_actor(
spin_for, spin_for,
name='sleeper', name='sleeper',
) )
print('exiting first subactor layer..\n')
@pytest.mark.parametrize(
'man_cancel_outer',
[
False, # passes if delay != 2
# always causes an unexpected eg-w-embedded-assert-err?
pytest.param(True,
marks=pytest.mark.xfail(
reason=(
'always causes an unexpected eg-w-embedded-assert-err?'
)
),
),
],
)
@no_windows @no_windows
def test_cancel_while_childs_child_in_sync_sleep( def test_cancel_while_childs_child_in_sync_sleep(
loglevel, loglevel: str,
start_method, start_method: str,
spawn_backend, spawn_backend: str,
debug_mode: bool,
reg_addr: tuple,
man_cancel_outer: bool,
): ):
"""Verify that a child cancelled while executing sync code is torn '''
Verify that a child cancelled while executing sync code is torn
down even when that cancellation is triggered by the parent down even when that cancellation is triggered by the parent
2 nurseries "up". 2 nurseries "up".
"""
Though the grandchild should stay blocking its actor runtime, its
parent should issue a "zombie reaper" to hard kill it after
sufficient timeout.
'''
if start_method == 'forkserver': if start_method == 'forkserver':
pytest.skip("Forksever sux hard at resuming from sync sleep...") pytest.skip("Forksever sux hard at resuming from sync sleep...")
async def main(): async def main():
with trio.fail_after(2): #
async with tractor.open_nursery() as tn: # XXX BIG TODO NOTE XXX
await tn.run_in_actor( #
spawn, # it seems there's a strange race that can happen
name='spawn', # where where the fail-after will trigger outer scope
# .cancel() which then causes the inner scope to raise,
#
# BaseExceptionGroup('Exceptions from Trio nursery', [
# BaseExceptionGroup('Exceptions from Trio nursery',
# [
# Cancelled(),
# Cancelled(),
# ]
# ),
# AssertionError('assert 0')
# ])
#
# WHY THIS DOESN'T MAKE SENSE:
# ---------------------------
# - it should raise too-slow-error when too slow..
# * verified that using simple-cs and manually cancelling
# you get same outcome -> indicates that the fail-after
# can have its TooSlowError overriden!
# |_ to check this it's easy, simplly decrease the timeout
# as per the var below.
#
# - when using the manual simple-cs the outcome is different
# DESPITE the `assert 0` which means regardless of the
# inner scope effectively failing in the same way, the
# bubbling up **is NOT the same**.
#
# delays trigger diff outcomes..
# ---------------------------
# as seen by uncommenting various lines below there is from
# my POV an unexpected outcome due to the delay=2 case.
#
# delay = 1 # no AssertionError in eg, TooSlowError raised.
# delay = 2 # is AssertionError in eg AND no TooSlowError !?
delay = 4 # is AssertionError in eg AND no _cs cancellation.
with trio.fail_after(delay) as _cs:
# with trio.CancelScope() as cs:
# ^XXX^ can be used instead to see same outcome.
async with (
# tractor.trionics.collapse_eg(), # doesn't help
tractor.open_nursery(
hide_tb=False,
debug_mode=debug_mode,
registry_addrs=[reg_addr],
) as an,
):
await an.run_in_actor(
spawn_sub_with_sync_blocking_task,
name='sync_blocking_sub',
) )
await trio.sleep(1) await trio.sleep(1)
if man_cancel_outer:
print('Cancelling manually in root')
_cs.cancel()
# trigger exc-srced taskc down
# the actor tree.
print('RAISING IN ROOT')
assert 0 assert 0
with pytest.raises(AssertionError): with pytest.raises(AssertionError):

View File

@ -117,9 +117,10 @@ async def open_actor_local_nursery(
ctx: tractor.Context, ctx: tractor.Context,
): ):
global _nursery global _nursery
async with trio.open_nursery( async with (
strict_exception_groups=False, tractor.trionics.collapse_eg(),
) as tn: trio.open_nursery() as tn
):
_nursery = tn _nursery = tn
await ctx.started() await ctx.started()
await trio.sleep(10) await trio.sleep(10)

View File

@ -13,26 +13,24 @@ MESSAGE = 'tractoring at full speed'
def test_empty_mngrs_input_raises() -> None: def test_empty_mngrs_input_raises() -> None:
async def main(): async def main():
with trio.fail_after(1): with trio.fail_after(3):
async with ( async with (
open_actor_cluster( open_actor_cluster(
modules=[__name__], modules=[__name__],
# NOTE: ensure we can passthrough runtime opts # NOTE: ensure we can passthrough runtime opts
loglevel='info', loglevel='cancel',
# debug_mode=True, debug_mode=False,
) as portals, ) as portals,
gather_contexts( gather_contexts(mngrs=()),
# NOTE: it's the use of inline-generator syntax
# here that causes the empty input.
mngrs=(
p.open_context(worker) for p in portals.values()
),
),
): ):
assert 0 # should fail before this?
assert portals
# test should fail if we mk it here!
assert 0, 'Should have raised val-err !?'
with pytest.raises(ValueError): with pytest.raises(ValueError):
trio.run(main) trio.run(main)

View File

@ -11,6 +11,7 @@ import psutil
import pytest import pytest
import subprocess import subprocess
import tractor import tractor
from tractor.trionics import collapse_eg
from tractor._testing import tractor_test from tractor._testing import tractor_test
import trio import trio
@ -193,10 +194,10 @@ async def spawn_and_check_registry(
try: try:
async with tractor.open_nursery() as an: async with tractor.open_nursery() as an:
async with trio.open_nursery( async with (
strict_exception_groups=False, collapse_eg(),
) as trion: trio.open_nursery() as trion,
):
portals = {} portals = {}
for i in range(3): for i in range(3):
name = f'a{i}' name = f'a{i}'
@ -338,11 +339,12 @@ async def close_chans_before_nursery(
async with portal2.open_stream_from( async with portal2.open_stream_from(
stream_forever stream_forever
) as agen2: ) as agen2:
async with trio.open_nursery( async with (
strict_exception_groups=False, collapse_eg(),
) as n: trio.open_nursery() as tn,
n.start_soon(streamer, agen1) ):
n.start_soon(cancel, use_signal, .5) tn.start_soon(streamer, agen1)
tn.start_soon(cancel, use_signal, .5)
try: try:
await streamer(agen2) await streamer(agen2)
finally: finally:

View File

@ -234,10 +234,8 @@ async def trio_ctx(
with trio.fail_after(1 + delay): with trio.fail_after(1 + delay):
try: try:
async with ( async with (
trio.open_nursery( tractor.trionics.collapse_eg(),
# TODO, for new `trio` / py3.13 trio.open_nursery() as tn,
# strict_exception_groups=False,
) as tn,
tractor.to_asyncio.open_channel_from( tractor.to_asyncio.open_channel_from(
sleep_and_err, sleep_and_err,
) as (first, chan), ) as (first, chan),
@ -573,6 +571,8 @@ def test_basic_interloop_channel_stream(
fan_out: bool, fan_out: bool,
): ):
async def main(): async def main():
# TODO, figure out min timeout here!
with trio.fail_after(6):
async with tractor.open_nursery() as an: async with tractor.open_nursery() as an:
portal = await an.run_in_actor( portal = await an.run_in_actor(
stream_from_aio, stream_from_aio,
@ -1088,6 +1088,108 @@ def test_sigint_closes_lifetime_stack(
trio.run(main) trio.run(main)
# ?TODO asyncio.Task fn-deco?
# -[ ] do sig checkingat import time like @context?
# -[ ] maybe name it @aio_task ??
# -[ ] chan: to_asyncio.InterloopChannel ??
async def raise_before_started(
# from_trio: asyncio.Queue,
# to_trio: trio.abc.SendChannel,
chan: to_asyncio.LinkedTaskChannel,
) -> None:
'''
`asyncio.Task` entry point which RTEs before calling
`to_trio.send_nowait()`.
'''
await asyncio.sleep(0.2)
raise RuntimeError('Some shite went wrong before `.send_nowait()`!!')
# to_trio.send_nowait('Uhh we shouldve RTE-d ^^ ??')
chan.started_nowait('Uhh we shouldve RTE-d ^^ ??')
await asyncio.sleep(float('inf'))
@tractor.context
async def caching_ep(
ctx: tractor.Context,
):
log = tractor.log.get_logger('caching_ep')
log.info('syncing via `ctx.started()`')
await ctx.started()
# XXX, allocate the `open_channel_from()` inside
# a `.trionics.maybe_open_context()`.
chan: to_asyncio.LinkedTaskChannel
async with (
tractor.trionics.maybe_open_context(
acm_func=tractor.to_asyncio.open_channel_from,
kwargs={
'target': raise_before_started,
# ^XXX, kwarg to `open_channel_from()`
},
# lock around current actor task access
key=tractor.current_actor().uid,
) as (cache_hit, (clients, chan)),
):
if cache_hit:
log.error(
'Re-using cached `.open_from_channel()` call!\n'
)
else:
log.info(
'Allocating SHOULD-FAIL `.open_from_channel()`\n'
)
await trio.sleep_forever()
def test_aio_side_raises_before_started(
reg_addr: tuple[str, int],
debug_mode: bool,
loglevel: str,
):
'''
Simulates connection-err from `piker.brokers.ib.api`..
Ensure any error raised by child-`asyncio.Task` BEFORE
`chan.started()`
'''
# delay = 999 if debug_mode else 1
async def main():
with trio.fail_after(3):
an: tractor.ActorNursery
async with tractor.open_nursery(
debug_mode=debug_mode,
loglevel=loglevel,
) as an:
p: tractor.Portal = await an.start_actor(
'lchan_cacher_that_raises_fast',
enable_modules=[__name__],
infect_asyncio=True,
)
async with p.open_context(
caching_ep,
) as (ctx, first):
assert not first
with pytest.raises(
expected_exception=(RemoteActorError),
) as excinfo:
trio.run(main)
# ensure `asyncio.Task` exception is bubbled
# allll the way erp!!
rae = excinfo.value
assert rae.boxed_type is RuntimeError
# TODO: debug_mode tests once we get support for `asyncio`! # TODO: debug_mode tests once we get support for `asyncio`!
# #
# -[ ] need tests to wrap both scripts: # -[ ] need tests to wrap both scripts:

View File

@ -235,10 +235,16 @@ async def cancel_after(wait, reg_addr):
@pytest.fixture(scope='module') @pytest.fixture(scope='module')
def time_quad_ex(reg_addr, ci_env, spawn_backend): def time_quad_ex(
reg_addr: tuple,
ci_env: bool,
spawn_backend: str,
):
if spawn_backend == 'mp': if spawn_backend == 'mp':
"""no idea but the mp *nix runs are flaking out here often... '''
""" no idea but the mp *nix runs are flaking out here often...
'''
pytest.skip("Test is too flaky on mp in CI") pytest.skip("Test is too flaky on mp in CI")
timeout = 7 if platform.system() in ('Windows', 'Darwin') else 4 timeout = 7 if platform.system() in ('Windows', 'Darwin') else 4
@ -249,12 +255,24 @@ def time_quad_ex(reg_addr, ci_env, spawn_backend):
return results, diff return results, diff
def test_a_quadruple_example(time_quad_ex, ci_env, spawn_backend): def test_a_quadruple_example(
"""This also serves as a kind of "we'd like to be this fast test".""" time_quad_ex: tuple,
ci_env: bool,
spawn_backend: str,
):
'''
This also serves as a kind of "we'd like to be this fast test".
'''
results, diff = time_quad_ex results, diff = time_quad_ex
assert results assert results
this_fast = 6 if platform.system() in ('Windows', 'Darwin') else 3 this_fast = (
6 if platform.system() in (
'Windows',
'Darwin',
)
else 3
)
assert diff < this_fast assert diff < this_fast

View File

@ -1,5 +1,6 @@
''' '''
Async context manager cache api testing: ``trionics.maybe_open_context():`` Suites for our `.trionics.maybe_open_context()` multi-task
shared-cached `@acm` API.
''' '''
from contextlib import asynccontextmanager as acm from contextlib import asynccontextmanager as acm
@ -9,6 +10,15 @@ from typing import Awaitable
import pytest import pytest
import trio import trio
import tractor import tractor
from tractor.trionics import (
maybe_open_context,
)
from tractor.log import (
get_console_log,
get_logger,
)
log = get_logger(__name__)
_resource: int = 0 _resource: int = 0
@ -52,7 +62,7 @@ def test_resource_only_entered_once(key_on):
# different task names per task will be used # different task names per task will be used
kwargs = {'task_name': name} kwargs = {'task_name': name}
async with tractor.trionics.maybe_open_context( async with maybe_open_context(
maybe_increment_counter, maybe_increment_counter,
kwargs=kwargs, kwargs=kwargs,
key=key, key=key,
@ -72,11 +82,13 @@ def test_resource_only_entered_once(key_on):
with trio.move_on_after(0.5): with trio.move_on_after(0.5):
async with ( async with (
tractor.open_root_actor(), tractor.open_root_actor(),
trio.open_nursery() as n, trio.open_nursery() as tn,
): ):
for i in range(10): for i in range(10):
n.start_soon(enter_cached_mngr, f'task_{i}') tn.start_soon(
enter_cached_mngr,
f'task_{i}',
)
await trio.sleep(0.001) await trio.sleep(0.001)
trio.run(main) trio.run(main)
@ -98,21 +110,32 @@ async def streamer(
@acm @acm
async def open_stream() -> Awaitable[tractor.MsgStream]: async def open_stream() -> Awaitable[
tuple[
tractor.ActorNursery,
tractor.MsgStream,
]
]:
try: try:
async with tractor.open_nursery() as an: async with tractor.open_nursery() as an:
portal = await an.start_actor( portal = await an.start_actor(
'streamer', 'streamer',
enable_modules=[__name__], enable_modules=[__name__],
) )
try:
async with ( async with (
portal.open_context(streamer) as (ctx, first), portal.open_context(streamer) as (ctx, first),
ctx.open_stream() as stream, ctx.open_stream() as stream,
): ):
yield stream print('Entered open_stream() caller')
yield an, stream
print('Exited open_stream() caller')
print('Cancelling streamer') finally:
print(
'Cancelling streamer with,\n'
'=> `Portal.cancel_actor()`'
)
await portal.cancel_actor() await portal.cancel_actor()
print('Cancelled streamer') print('Cancelled streamer')
@ -127,11 +150,15 @@ async def open_stream() -> Awaitable[tractor.MsgStream]:
@acm @acm
async def maybe_open_stream(taskname: str): async def maybe_open_stream(taskname: str):
async with tractor.trionics.maybe_open_context( async with maybe_open_context(
# NOTE: all secondary tasks should cache hit on the same key # NOTE: all secondary tasks should cache hit on the same key
acm_func=open_stream, acm_func=open_stream,
) as (cache_hit, stream): ) as (
cache_hit,
(an, stream)
):
# when the actor + portal + ctx + stream has already been
# allocated we want to just bcast to this task.
if cache_hit: if cache_hit:
print(f'{taskname} loaded from cache') print(f'{taskname} loaded from cache')
@ -139,10 +166,43 @@ async def maybe_open_stream(taskname: str):
# if this feed is already allocated by the first # if this feed is already allocated by the first
# task that entereed # task that entereed
async with stream.subscribe() as bstream: async with stream.subscribe() as bstream:
yield bstream yield an, bstream
print(
f'cached task exited\n'
f')>\n'
f' |_{taskname}\n'
)
# we should always unreg the "cloned" bcrc for this
# consumer-task
assert id(bstream) not in bstream._state.subs
else: else:
# yield the actual stream # yield the actual stream
yield stream try:
yield an, stream
finally:
print(
f'NON-cached task exited\n'
f')>\n'
f' |_{taskname}\n'
)
first_bstream = stream._broadcaster
bcrx_state = first_bstream._state
subs: dict[int, int] = bcrx_state.subs
if len(subs) == 1:
assert id(first_bstream) in subs
# ^^TODO! the bcrx should always de-allocate all subs,
# including the implicit first one allocated on entry
# by the first subscribing peer task, no?
#
# -[ ] adjust `MsgStream.subscribe()` to do this mgmt!
# |_ allows reverting `MsgStream.receive()` to the
# non-bcaster method.
# |_ we can decide whether to reset `._broadcaster`?
#
# await tractor.pause(shield=True)
def test_open_local_sub_to_stream( def test_open_local_sub_to_stream(
@ -159,16 +219,24 @@ def test_open_local_sub_to_stream(
if debug_mode: if debug_mode:
timeout = 999 timeout = 999
print(f'IN debug_mode, setting large timeout={timeout!r}..')
async def main(): async def main():
full = list(range(1000)) full = list(range(1000))
an: tractor.ActorNursery|None = None
num_tasks: int = 10
async def get_sub_and_pull(taskname: str): async def get_sub_and_pull(taskname: str):
nonlocal an
stream: tractor.MsgStream stream: tractor.MsgStream
async with ( async with (
maybe_open_stream(taskname) as stream, maybe_open_stream(taskname) as (
an,
stream,
),
): ):
if '0' in taskname: if '0' in taskname:
assert isinstance(stream, tractor.MsgStream) assert isinstance(stream, tractor.MsgStream)
@ -180,34 +248,159 @@ def test_open_local_sub_to_stream(
first = await stream.receive() first = await stream.receive()
print(f'{taskname} started with value {first}') print(f'{taskname} started with value {first}')
seq = [] seq: list[int] = []
async for msg in stream: async for msg in stream:
seq.append(msg) seq.append(msg)
assert set(seq).issubset(set(full)) assert set(seq).issubset(set(full))
# end of @acm block
print(f'{taskname} finished') print(f'{taskname} finished')
root: tractor.Actor
with trio.fail_after(timeout) as cs: with trio.fail_after(timeout) as cs:
# TODO: turns out this isn't multi-task entrant XD # TODO: turns out this isn't multi-task entrant XD
# We probably need an indepotent entry semantic? # We probably need an indepotent entry semantic?
async with tractor.open_root_actor( async with tractor.open_root_actor(
debug_mode=debug_mode, debug_mode=debug_mode,
): # maybe_enable_greenback=True,
#
# ^TODO? doesn't seem to mk breakpoint() usage work
# bc each bg task needs to open a portal??
# - [ ] we should consider making this part of
# our taskman defaults?
# |_see https://github.com/goodboy/tractor/pull/363
#
) as root:
assert root.is_registrar
async with ( async with (
trio.open_nursery() as tn, trio.open_nursery() as tn,
): ):
for i in range(10): for i in range(num_tasks):
tn.start_soon( tn.start_soon(
get_sub_and_pull, get_sub_and_pull,
f'task_{i}', f'task_{i}',
) )
await trio.sleep(0.001) await trio.sleep(0.001)
print('all consumer tasks finished') print('all consumer tasks finished!')
# ?XXX, ensure actor-nursery is shutdown or we might
# hang here due to a minor task deadlock/race-condition?
#
# - seems that all we need is a checkpoint to ensure
# the last suspended task, which is inside
# `.maybe_open_context()`, can do the
# `Portal.cancel_actor()` call?
#
# - if that bg task isn't resumed, then this blocks
# timeout might hit before that?
#
if root.ipc_server.has_peers():
await trio.lowlevel.checkpoint()
# alt approach, cancel the entire `an`
# await tractor.pause()
# await an.cancel()
# end of runtime scope
print('root actor terminated.')
if cs.cancelled_caught: if cs.cancelled_caught:
pytest.fail( pytest.fail(
'Should NOT time out in `open_root_actor()` ?' 'Should NOT time out in `open_root_actor()` ?'
) )
print('exiting main.')
trio.run(main)
@acm
async def cancel_outer_cs(
cs: trio.CancelScope|None = None,
delay: float = 0,
):
# on first task delay this enough to block
# the 2nd task but then cancel it mid sleep
# so that the tn.start() inside the key-err handler block
# is cancelled and would previously corrupt the
# mutext state.
log.info(f'task entering sleep({delay})')
await trio.sleep(delay)
if cs:
log.info('task calling cs.cancel()')
cs.cancel()
trio.lowlevel.checkpoint()
yield
await trio.sleep_forever()
def test_lock_not_corrupted_on_fast_cancel(
debug_mode: bool,
loglevel: str,
):
'''
Verify that if the caching-task (the first to enter
`maybe_open_context()`) is cancelled mid-cache-miss, the embedded
mutex can never be left in a corrupted state.
That is, the lock is always eventually released ensuring a peer
(cache-hitting) task will never,
- be left to inf-block/hang on the `lock.acquire()`.
- try to release the lock when still owned by the caching-task
due to it having erronously exited without calling
`lock.release()`.
'''
delay: float = 1.
async def use_moc(
cs: trio.CancelScope|None,
delay: float,
):
log.info('task entering moc')
async with maybe_open_context(
cancel_outer_cs,
kwargs={
'cs': cs,
'delay': delay,
},
) as (cache_hit, _null):
if cache_hit:
log.info('2nd task entered')
else:
log.info('1st task entered')
await trio.sleep_forever()
async def main():
with trio.fail_after(delay + 2):
async with (
tractor.open_root_actor(
debug_mode=debug_mode,
loglevel=loglevel,
),
trio.open_nursery() as tn,
):
get_console_log('info')
log.info('yo starting')
cs = tn.cancel_scope
tn.start_soon(
use_moc,
cs,
delay,
name='child',
)
with trio.CancelScope() as rent_cs:
await use_moc(
cs=rent_cs,
delay=delay,
)
trio.run(main) trio.run(main)

View File

@ -147,8 +147,7 @@ def test_trio_prestarted_task_bubbles(
await trio.sleep_forever() await trio.sleep_forever()
async def _trio_main(): async def _trio_main():
# with trio.fail_after(2): with trio.fail_after(2 if not debug_mode else 999):
with trio.fail_after(999):
first: str first: str
chan: to_asyncio.LinkedTaskChannel chan: to_asyncio.LinkedTaskChannel
aio_ev = asyncio.Event() aio_ev = asyncio.Event()
@ -217,32 +216,25 @@ def test_trio_prestarted_task_bubbles(
): ):
aio_ev.set() aio_ev.set()
with pytest.raises(
expected_exception=ExceptionGroup,
) as excinfo:
tractor.to_asyncio.run_as_asyncio_guest(
trio_main=_trio_main,
)
eg = excinfo.value
rte_eg, rest_eg = eg.split(RuntimeError)
# ensure the trio-task's error bubbled despite the aio-side # ensure the trio-task's error bubbled despite the aio-side
# having (maybe) errored first. # having (maybe) errored first.
if aio_err_trigger in ( if aio_err_trigger in (
'after_trio_task_starts', 'after_trio_task_starts',
'after_start_point', 'after_start_point',
): ):
assert len(errs := rest_eg.exceptions) == 1 patt: str = 'trio-side'
typerr = errs[0] expect_exc = TypeError
assert (
type(typerr) is TypeError
and
'trio-side' in typerr.args
)
# when aio errors BEFORE (last) trio task is scheduled, we should # when aio errors BEFORE (last) trio task is scheduled, we should
# never see anythinb but the aio-side. # never see anythinb but the aio-side.
else: else:
assert len(rtes := rte_eg.exceptions) == 1 patt: str = 'asyncio-side'
assert 'asyncio-side' in rtes[0].args[0] expect_exc = RuntimeError
with pytest.raises(expect_exc) as excinfo:
tractor.to_asyncio.run_as_asyncio_guest(
trio_main=_trio_main,
)
caught_exc = excinfo.value
assert patt in caught_exc.args

View File

@ -8,6 +8,7 @@ from contextlib import (
) )
import pytest import pytest
from tractor.trionics import collapse_eg
import trio import trio
from trio import TaskStatus from trio import TaskStatus
@ -64,9 +65,8 @@ def test_stashed_child_nursery(use_start_soon):
async def main(): async def main():
async with ( async with (
trio.open_nursery( collapse_eg(),
strict_exception_groups=False, trio.open_nursery() as pn,
) as pn,
): ):
cn = await pn.start(mk_child_nursery) cn = await pn.start(mk_child_nursery)
assert cn assert cn
@ -117,11 +117,9 @@ def test_acm_embedded_nursery_propagates_enter_err(
async with ( async with (
trio.open_nursery() as tn, trio.open_nursery() as tn,
tractor.trionics.maybe_raise_from_masking_exc( tractor.trionics.maybe_raise_from_masking_exc(
tn=tn,
unmask_from=( unmask_from=(
trio.Cancelled (trio.Cancelled,) if unmask_from_canc
if unmask_from_canc else ()
else None
), ),
) )
): ):
@ -136,7 +134,6 @@ def test_acm_embedded_nursery_propagates_enter_err(
with tractor.devx.maybe_open_crash_handler( with tractor.devx.maybe_open_crash_handler(
pdb=debug_mode, pdb=debug_mode,
) as bxerr: ) as bxerr:
if bxerr:
assert not bxerr.value assert not bxerr.value
async with ( async with (
@ -145,6 +142,7 @@ def test_acm_embedded_nursery_propagates_enter_err(
assert not tn.cancel_scope.cancel_called assert not tn.cancel_scope.cancel_called
assert 0 assert 0
if debug_mode:
assert ( assert (
(err := bxerr.value) (err := bxerr.value)
and and
@ -197,10 +195,8 @@ def test_gatherctxs_with_memchan_breaks_multicancelled(
async with ( async with (
# XXX should ensure ONLY the KBI # XXX should ensure ONLY the KBI
# is relayed upward # is relayed upward
trionics.collapse_eg(), collapse_eg(),
trio.open_nursery( trio.open_nursery(), # as tn,
# strict_exception_groups=False,
), # as tn,
trionics.gather_contexts([ trionics.gather_contexts([
open_memchan(), open_memchan(),

View File

@ -55,10 +55,17 @@ async def open_actor_cluster(
raise ValueError( raise ValueError(
'Number of names is {len(names)} but count it {count}') 'Number of names is {len(names)} but count it {count}')
async with tractor.open_nursery( async with (
# tractor.trionics.collapse_eg(),
tractor.open_nursery(
**runtime_kwargs, **runtime_kwargs,
) as an: ) as an
async with trio.open_nursery() as n: ):
async with (
# tractor.trionics.collapse_eg(),
trio.open_nursery() as tn,
tractor.trionics.maybe_raise_from_masking_exc()
):
uid = tractor.current_actor().uid uid = tractor.current_actor().uid
async def _start(name: str) -> None: async def _start(name: str) -> None:
@ -69,9 +76,8 @@ async def open_actor_cluster(
) )
for name in names: for name in names:
n.start_soon(_start, name) tn.start_soon(_start, name)
assert len(portals) == count assert len(portals) == count
yield portals yield portals
await an.cancel(hard_kill=hard_kill) await an.cancel(hard_kill=hard_kill)

View File

@ -101,6 +101,9 @@ from ._state import (
debug_mode, debug_mode,
_ctxvar_Context, _ctxvar_Context,
) )
from .trionics import (
collapse_eg,
)
# ------ - ------ # ------ - ------
if TYPE_CHECKING: if TYPE_CHECKING:
from ._portal import Portal from ._portal import Portal
@ -151,7 +154,7 @@ class Context:
2 cancel-scope-linked, communicating and parallel executing 2 cancel-scope-linked, communicating and parallel executing
`Task`s. Contexts are allocated on each side of any task `Task`s. Contexts are allocated on each side of any task
RPC-linked msg dialog, i.e. for every request to a remote RPC-linked msg dialog, i.e. for every request to a remote
actor from a `Portal`. On the "callee" side a context is actor from a `Portal`. On the "child" side a context is
always allocated inside `._rpc._invoke()`. always allocated inside `._rpc._invoke()`.
TODO: more detailed writeup on cancellation, error and TODO: more detailed writeup on cancellation, error and
@ -219,8 +222,8 @@ class Context:
# `._runtime.invoke()`. # `._runtime.invoke()`.
_remote_func_type: str | None = None _remote_func_type: str | None = None
# NOTE: (for now) only set (a portal) on the caller side since # NOTE: (for now) only set (a portal) on the parent side since
# the callee doesn't generally need a ref to one and should # the child doesn't generally need a ref to one and should
# normally need to explicitly ask for handle to its peer if # normally need to explicitly ask for handle to its peer if
# more the the `Context` is needed? # more the the `Context` is needed?
_portal: Portal | None = None _portal: Portal | None = None
@ -249,12 +252,12 @@ class Context:
_outcome_msg: Return|Error|ContextCancelled = Unresolved _outcome_msg: Return|Error|ContextCancelled = Unresolved
# on a clean exit there should be a final value # on a clean exit there should be a final value
# delivered from the far end "callee" task, so # delivered from the far end "child" task, so
# this value is only set on one side. # this value is only set on one side.
# _result: Any | int = None # _result: Any | int = None
_result: PayloadT|Unresolved = Unresolved _result: PayloadT|Unresolved = Unresolved
# if the local "caller" task errors this value is always set # if the local "parent" task errors this value is always set
# to the error that was captured in the # to the error that was captured in the
# `Portal.open_context().__aexit__()` teardown block OR, in # `Portal.open_context().__aexit__()` teardown block OR, in
# 2 special cases when an (maybe) expected remote error # 2 special cases when an (maybe) expected remote error
@ -290,7 +293,7 @@ class Context:
# a `ContextCancelled` due to a call to `.cancel()` triggering # a `ContextCancelled` due to a call to `.cancel()` triggering
# "graceful closure" on either side: # "graceful closure" on either side:
# - `._runtime._invoke()` will check this flag before engaging # - `._runtime._invoke()` will check this flag before engaging
# the crash handler REPL in such cases where the "callee" # the crash handler REPL in such cases where the "child"
# raises the cancellation, # raises the cancellation,
# - `.devx.debug.lock_stdio_for_peer()` will set it to `False` if # - `.devx.debug.lock_stdio_for_peer()` will set it to `False` if
# the global tty-lock has been configured to filter out some # the global tty-lock has been configured to filter out some
@ -304,8 +307,8 @@ class Context:
_stream_opened: bool = False _stream_opened: bool = False
_stream: MsgStream|None = None _stream: MsgStream|None = None
# caller of `Portal.open_context()` for # the parent-task's calling-fn's frame-info, the frame above
# logging purposes mostly # `Portal.open_context()`, for introspection/logging.
_caller_info: CallerInfo|None = None _caller_info: CallerInfo|None = None
# overrun handling machinery # overrun handling machinery
@ -526,11 +529,11 @@ class Context:
''' '''
Exactly the value of `self._scope.cancelled_caught` Exactly the value of `self._scope.cancelled_caught`
(delegation) and should only be (able to be read as) (delegation) and should only be (able to be read as)
`True` for a `.side == "caller"` ctx wherein the `True` for a `.side == "parent"` ctx wherein the
`Portal.open_context()` block was exited due to a call to `Portal.open_context()` block was exited due to a call to
`._scope.cancel()` - which should only ocurr in 2 cases: `._scope.cancel()` - which should only ocurr in 2 cases:
- a caller side calls `.cancel()`, the far side cancels - a parent side calls `.cancel()`, the far side cancels
and delivers back a `ContextCancelled` (making and delivers back a `ContextCancelled` (making
`.cancel_acked == True`) and `._scope.cancel()` is `.cancel_acked == True`) and `._scope.cancel()` is
called by `._maybe_cancel_and_set_remote_error()` which called by `._maybe_cancel_and_set_remote_error()` which
@ -539,20 +542,20 @@ class Context:
=> `._scope.cancelled_caught == True` by normal `trio` => `._scope.cancelled_caught == True` by normal `trio`
cs semantics. cs semantics.
- a caller side is delivered a `._remote_error: - a parent side is delivered a `._remote_error:
RemoteActorError` via `._deliver_msg()` and a transitive RemoteActorError` via `._deliver_msg()` and a transitive
call to `_maybe_cancel_and_set_remote_error()` calls call to `_maybe_cancel_and_set_remote_error()` calls
`._scope.cancel()` and that cancellation eventually `._scope.cancel()` and that cancellation eventually
results in `trio.Cancelled`(s) caught in the results in `trio.Cancelled`(s) caught in the
`.open_context()` handling around the @acm's `yield`. `.open_context()` handling around the @acm's `yield`.
Only as an FYI, in the "callee" side case it can also be Only as an FYI, in the "child" side case it can also be
set but never is readable by any task outside the RPC set but never is readable by any task outside the RPC
machinery in `._invoke()` since,: machinery in `._invoke()` since,:
- when a callee side calls `.cancel()`, `._scope.cancel()` - when a child side calls `.cancel()`, `._scope.cancel()`
is called immediately and handled specially inside is called immediately and handled specially inside
`._invoke()` to raise a `ContextCancelled` which is then `._invoke()` to raise a `ContextCancelled` which is then
sent to the caller side. sent to the parent side.
However, `._scope.cancelled_caught` can NEVER be However, `._scope.cancelled_caught` can NEVER be
accessed/read as `True` by any RPC invoked task since it accessed/read as `True` by any RPC invoked task since it
@ -663,7 +666,7 @@ class Context:
when called/closed by actor local task(s). when called/closed by actor local task(s).
NOTEs: NOTEs:
- It is expected that the caller has previously unwrapped - It is expected that the parent has previously unwrapped
the remote error using a call to `unpack_error()` and the remote error using a call to `unpack_error()` and
provides that output exception value as the input provides that output exception value as the input
`error` argument *here*. `error` argument *here*.
@ -673,7 +676,7 @@ class Context:
`Portal.open_context()` (ideally) we want to interrupt `Portal.open_context()` (ideally) we want to interrupt
any ongoing local tasks operating within that any ongoing local tasks operating within that
`Context`'s cancel-scope so as to be notified ASAP of `Context`'s cancel-scope so as to be notified ASAP of
the remote error and engage any caller handling (eg. the remote error and engage any parent handling (eg.
for cross-process task supervision). for cross-process task supervision).
- In some cases we may want to raise the remote error - In some cases we may want to raise the remote error
@ -740,6 +743,8 @@ class Context:
# cancelled, NOT their reported canceller. IOW in the # cancelled, NOT their reported canceller. IOW in the
# latter case we're cancelled by someone else getting # latter case we're cancelled by someone else getting
# cancelled. # cancelled.
#
# !TODO, switching to `Actor.aid` here!
if (canc := error.canceller) == self._actor.uid: if (canc := error.canceller) == self._actor.uid:
whom: str = 'us' whom: str = 'us'
self._canceller = canc self._canceller = canc
@ -881,6 +886,11 @@ class Context:
@property @property
def repr_caller(self) -> str: def repr_caller(self) -> str:
'''
Render a "namespace-path" style representation of the calling
task-fn.
'''
ci: CallerInfo|None = self._caller_info ci: CallerInfo|None = self._caller_info
if ci: if ci:
return ( return (
@ -894,7 +904,7 @@ class Context:
def repr_api(self) -> str: def repr_api(self) -> str:
return 'Portal.open_context()' return 'Portal.open_context()'
# TODO: use `.dev._frame_stack` scanning to find caller! # TODO: use `.dev._frame_stack` scanning to find caller fn!
# ci: CallerInfo|None = self._caller_info # ci: CallerInfo|None = self._caller_info
# if ci: # if ci:
# return ( # return (
@ -929,7 +939,7 @@ class Context:
=> That is, an IPC `Context` (this) **does not** => That is, an IPC `Context` (this) **does not**
have the same semantics as a `trio.CancelScope`. have the same semantics as a `trio.CancelScope`.
If the caller (who entered the `Portal.open_context()`) If the parent (who entered the `Portal.open_context()`)
desires that the internal block's cancel-scope be desires that the internal block's cancel-scope be
cancelled it should open its own `trio.CancelScope` and cancelled it should open its own `trio.CancelScope` and
manage it as needed. manage it as needed.
@ -940,7 +950,7 @@ class Context:
self.cancel_called = True self.cancel_called = True
header: str = ( header: str = (
f'Cancelling ctx from {side.upper()}-side\n' f'Cancelling ctx from {side!r}-side\n'
) )
reminfo: str = ( reminfo: str = (
# ' =>\n' # ' =>\n'
@ -948,7 +958,7 @@ class Context:
f'\n' f'\n'
f'c)=> {self.chan.uid}\n' f'c)=> {self.chan.uid}\n'
f' |_[{self.dst_maddr}\n' f' |_[{self.dst_maddr}\n'
f' >>{self.repr_rpc}\n' f' >> {self.repr_rpc}\n'
# f' >> {self._nsf}() -> {codec}[dict]:\n\n' # f' >> {self._nsf}() -> {codec}[dict]:\n\n'
# TODO: pull msg-type from spec re #320 # TODO: pull msg-type from spec re #320
) )
@ -1001,7 +1011,6 @@ class Context:
else: else:
log.cancel( log.cancel(
f'Timed out on cancel request of remote task?\n' f'Timed out on cancel request of remote task?\n'
f'\n'
f'{reminfo}' f'{reminfo}'
) )
@ -1012,7 +1021,7 @@ class Context:
# `_invoke()` RPC task. # `_invoke()` RPC task.
# #
# NOTE: on this side we ALWAYS cancel the local scope # NOTE: on this side we ALWAYS cancel the local scope
# since the caller expects a `ContextCancelled` to be sent # since the parent expects a `ContextCancelled` to be sent
# from `._runtime._invoke()` back to the other side. The # from `._runtime._invoke()` back to the other side. The
# logic for catching the result of the below # logic for catching the result of the below
# `._scope.cancel()` is inside the `._runtime._invoke()` # `._scope.cancel()` is inside the `._runtime._invoke()`
@ -1185,8 +1194,8 @@ class Context:
) -> Any|Exception: ) -> Any|Exception:
''' '''
From some (caller) side task, wait for and return the final From some (parent) side task, wait for and return the final
result from the remote (callee) side's task. result from the remote (child) side's task.
This provides a mechanism for one task running in some actor to wait This provides a mechanism for one task running in some actor to wait
on another task at the other side, in some other actor, to terminate. on another task at the other side, in some other actor, to terminate.
@ -1482,6 +1491,12 @@ class Context:
): ):
status = 'peer-cancelled' status = 'peer-cancelled'
case (
Unresolved,
trio.Cancelled(), # any error-type
) if self.canceller:
status = 'actor-cancelled'
# (remote) error condition # (remote) error condition
case ( case (
Unresolved, Unresolved,
@ -1595,7 +1610,7 @@ class Context:
raise err raise err
# TODO: maybe a flag to by-pass encode op if already done # TODO: maybe a flag to by-pass encode op if already done
# here in caller? # here in parent?
await self.chan.send(started_msg) await self.chan.send(started_msg)
# set msg-related internal runtime-state # set msg-related internal runtime-state
@ -1671,7 +1686,7 @@ class Context:
XXX RULES XXX XXX RULES XXX
------ - ------ ------ - ------
- NEVER raise remote errors from this method; a runtime task caller. - NEVER raise remote errors from this method; a calling runtime-task.
An error "delivered" to a ctx should always be raised by An error "delivered" to a ctx should always be raised by
the corresponding local task operating on the the corresponding local task operating on the
`Portal`/`Context` APIs. `Portal`/`Context` APIs.
@ -1747,7 +1762,7 @@ class Context:
else: else:
report = ( report = (
'Queueing OVERRUN msg on caller task:\n\n' 'Queueing OVERRUN msg on parent task:\n\n'
+ report + report
) )
log.debug(report) log.debug(report)
@ -1943,12 +1958,12 @@ async def open_context_from_portal(
IPC protocol. IPC protocol.
The yielded `tuple` is a pair delivering a `tractor.Context` The yielded `tuple` is a pair delivering a `tractor.Context`
and any first value "sent" by the "callee" task via a call and any first value "sent" by the "child" task via a call
to `Context.started(<value: Any>)`; this side of the to `Context.started(<value: Any>)`; this side of the
context does not unblock until the "callee" task calls context does not unblock until the "child" task calls
`.started()` in similar style to `trio.Nursery.start()`. `.started()` in similar style to `trio.Nursery.start()`.
When the "callee" (side that is "called"/started by a call When the "child" (side that is "called"/started by a call
to *this* method) returns, the caller side (this) unblocks to *this* method) returns, the parent side (this) unblocks
and any final value delivered from the other end can be and any final value delivered from the other end can be
retrieved using the `Contex.wait_for_result()` api. retrieved using the `Contex.wait_for_result()` api.
@ -1961,7 +1976,7 @@ async def open_context_from_portal(
__tracebackhide__: bool = hide_tb __tracebackhide__: bool = hide_tb
# denote this frame as a "runtime frame" for stack # denote this frame as a "runtime frame" for stack
# introspection where we report the caller code in logging # introspection where we report the parent code in logging
# and error message content. # and error message content.
# NOTE: 2 bc of the wrapping `@acm` # NOTE: 2 bc of the wrapping `@acm`
__runtimeframe__: int = 2 # noqa __runtimeframe__: int = 2 # noqa
@ -2020,13 +2035,11 @@ async def open_context_from_portal(
# placeholder for any exception raised in the runtime # placeholder for any exception raised in the runtime
# or by user tasks which cause this context's closure. # or by user tasks which cause this context's closure.
scope_err: BaseException|None = None scope_err: BaseException|None = None
ctxc_from_callee: ContextCancelled|None = None ctxc_from_child: ContextCancelled|None = None
try: try:
async with ( async with (
trio.open_nursery( collapse_eg(),
strict_exception_groups=False, trio.open_nursery() as tn,
) as tn,
msgops.maybe_limit_plds( msgops.maybe_limit_plds(
ctx=ctx, ctx=ctx,
spec=ctx_meta.get('pld_spec'), spec=ctx_meta.get('pld_spec'),
@ -2101,7 +2114,7 @@ async def open_context_from_portal(
# that we can re-use it around the `yield` ^ here # that we can re-use it around the `yield` ^ here
# or vice versa? # or vice versa?
# #
# maybe TODO NOTE: between the caller exiting and # maybe TODO NOTE: between the parent exiting and
# arriving here the far end may have sent a ctxc-msg or # arriving here the far end may have sent a ctxc-msg or
# other error, so the quetion is whether we should check # other error, so the quetion is whether we should check
# for it here immediately and maybe raise so as to engage # for it here immediately and maybe raise so as to engage
@ -2167,16 +2180,16 @@ async def open_context_from_portal(
# request in which case we DO let the error bubble to the # request in which case we DO let the error bubble to the
# opener. # opener.
# #
# 2-THIS "caller" task somewhere invoked `Context.cancel()` # 2-THIS "parent" task somewhere invoked `Context.cancel()`
# and received a `ContextCanclled` from the "callee" # and received a `ContextCanclled` from the "child"
# task, in which case we mask the `ContextCancelled` from # task, in which case we mask the `ContextCancelled` from
# bubbling to this "caller" (much like how `trio.Nursery` # bubbling to this "parent" (much like how `trio.Nursery`
# swallows any `trio.Cancelled` bubbled by a call to # swallows any `trio.Cancelled` bubbled by a call to
# `Nursery.cancel_scope.cancel()`) # `Nursery.cancel_scope.cancel()`)
except ContextCancelled as ctxc: except ContextCancelled as ctxc:
scope_err = ctxc scope_err = ctxc
ctx._local_error: BaseException = scope_err ctx._local_error: BaseException = scope_err
ctxc_from_callee = ctxc ctxc_from_child = ctxc
# XXX TODO XXX: FIX THIS debug_mode BUGGGG!!! # XXX TODO XXX: FIX THIS debug_mode BUGGGG!!!
# using this code and then resuming the REPL will # using this code and then resuming the REPL will
@ -2213,11 +2226,11 @@ async def open_context_from_portal(
# the above `._scope` can be cancelled due to: # the above `._scope` can be cancelled due to:
# 1. an explicit self cancel via `Context.cancel()` or # 1. an explicit self cancel via `Context.cancel()` or
# `Actor.cancel()`, # `Actor.cancel()`,
# 2. any "callee"-side remote error, possibly also a cancellation # 2. any "child"-side remote error, possibly also a cancellation
# request by some peer, # request by some peer,
# 3. any "caller" (aka THIS scope's) local error raised in the above `yield` # 3. any "parent" (aka THIS scope's) local error raised in the above `yield`
except ( except (
# CASE 3: standard local error in this caller/yieldee # CASE 3: standard local error in this parent/yieldee
Exception, Exception,
# CASES 1 & 2: can manifest as a `ctx._scope_nursery` # CASES 1 & 2: can manifest as a `ctx._scope_nursery`
@ -2231,9 +2244,9 @@ async def open_context_from_portal(
# any `Context._maybe_raise_remote_err()` call. # any `Context._maybe_raise_remote_err()` call.
# #
# 2.-`BaseExceptionGroup[ContextCancelled | RemoteActorError]` # 2.-`BaseExceptionGroup[ContextCancelled | RemoteActorError]`
# from any error delivered from the "callee" side # from any error delivered from the "child" side
# AND a group-exc is only raised if there was > 1 # AND a group-exc is only raised if there was > 1
# tasks started *here* in the "caller" / opener # tasks started *here* in the "parent" / opener
# block. If any one of those tasks calls # block. If any one of those tasks calls
# `.wait_for_result()` or `MsgStream.receive()` # `.wait_for_result()` or `MsgStream.receive()`
# `._maybe_raise_remote_err()` will be transitively # `._maybe_raise_remote_err()` will be transitively
@ -2246,8 +2259,8 @@ async def open_context_from_portal(
trio.Cancelled, # NOTE: NOT from inside the ctx._scope trio.Cancelled, # NOTE: NOT from inside the ctx._scope
KeyboardInterrupt, KeyboardInterrupt,
) as caller_err: ) as rent_err:
scope_err = caller_err scope_err = rent_err
ctx._local_error: BaseException = scope_err ctx._local_error: BaseException = scope_err
# XXX: ALWAYS request the context to CANCEL ON any ERROR. # XXX: ALWAYS request the context to CANCEL ON any ERROR.
@ -2257,7 +2270,7 @@ async def open_context_from_portal(
# await debug.pause() # await debug.pause()
# log.cancel( # log.cancel(
match scope_err: match scope_err:
case trio.Cancelled: case trio.Cancelled():
logmeth = log.cancel logmeth = log.cancel
# XXX explicitly report on any non-graceful-taskc cases # XXX explicitly report on any non-graceful-taskc cases
@ -2265,7 +2278,7 @@ async def open_context_from_portal(
logmeth = log.exception logmeth = log.exception
logmeth( logmeth(
f'ctx {ctx.side!r}-side exited with {ctx.repr_outcome()}\n' f'ctx {ctx.side!r}-side exited with {ctx.repr_outcome()!r}\n'
) )
if debug_mode(): if debug_mode():
@ -2286,9 +2299,9 @@ async def open_context_from_portal(
'Calling `ctx.cancel()`!\n' 'Calling `ctx.cancel()`!\n'
) )
# we don't need to cancel the callee if it already # we don't need to cancel the child if it already
# told us it's cancelled ;p # told us it's cancelled ;p
if ctxc_from_callee is None: if ctxc_from_child is None:
try: try:
await ctx.cancel() await ctx.cancel()
except ( except (
@ -2319,8 +2332,8 @@ async def open_context_from_portal(
# via a call to # via a call to
# `Context._maybe_cancel_and_set_remote_error()`. # `Context._maybe_cancel_and_set_remote_error()`.
# As per `Context._deliver_msg()`, that error IS # As per `Context._deliver_msg()`, that error IS
# ALWAYS SET any time "callee" side fails and causes "caller # ALWAYS SET any time "child" side fails and causes
# side" cancellation via a `ContextCancelled` here. # "parent side" cancellation via a `ContextCancelled` here.
try: try:
result_or_err: Exception|Any = await ctx.wait_for_result() result_or_err: Exception|Any = await ctx.wait_for_result()
except BaseException as berr: except BaseException as berr:
@ -2356,7 +2369,7 @@ async def open_context_from_portal(
) )
case (None, _): case (None, _):
log.runtime( log.runtime(
'Context returned final result from callee task:\n' 'Context returned final result from child task:\n'
f'<= peer: {uid}\n' f'<= peer: {uid}\n'
f' |_ {nsf}()\n\n' f' |_ {nsf}()\n\n'
@ -2451,7 +2464,7 @@ async def open_context_from_portal(
) )
# TODO: should we add a `._cancel_req_received` # TODO: should we add a `._cancel_req_received`
# flag to determine if the callee manually called # flag to determine if the child manually called
# `ctx.cancel()`? # `ctx.cancel()`?
# -[ ] going to need a cid check no? # -[ ] going to need a cid check no?
@ -2507,7 +2520,7 @@ def mk_context(
recv_chan: trio.MemoryReceiveChannel recv_chan: trio.MemoryReceiveChannel
send_chan, recv_chan = trio.open_memory_channel(msg_buffer_size) send_chan, recv_chan = trio.open_memory_channel(msg_buffer_size)
# TODO: only scan caller-info if log level so high! # TODO: only scan parent-info if log level so high!
from .devx._frame_stack import find_caller_info from .devx._frame_stack import find_caller_info
caller_info: CallerInfo|None = find_caller_info() caller_info: CallerInfo|None = find_caller_info()

View File

@ -27,8 +27,11 @@ from typing import (
) )
from contextlib import asynccontextmanager as acm from contextlib import asynccontextmanager as acm
from tractor.log import get_logger from .log import get_logger
from .trionics import gather_contexts from .trionics import (
gather_contexts,
collapse_eg,
)
from .ipc import _connect_chan, Channel from .ipc import _connect_chan, Channel
from ._addr import ( from ._addr import (
UnwrappedAddress, UnwrappedAddress,
@ -87,7 +90,6 @@ async def get_registry(
yield regstr_ptl yield regstr_ptl
@acm @acm
async def get_root( async def get_root(
**kwargs, **kwargs,
@ -215,7 +217,7 @@ async def find_actor(
raise_on_none: bool = False, raise_on_none: bool = False,
) -> AsyncGenerator[ ) -> AsyncGenerator[
Portal | list[Portal] | None, Portal|list[Portal]|None,
None, None,
]: ]:
''' '''
@ -253,9 +255,13 @@ async def find_actor(
for addr in registry_addrs for addr in registry_addrs
) )
portals: list[Portal] portals: list[Portal]
async with gather_contexts( async with (
collapse_eg(),
gather_contexts(
mngrs=maybe_portals, mngrs=maybe_portals,
) as portals: # tn=tn, # ?TODO, helps to pass rent tn here?
) as portals,
):
# log.runtime( # log.runtime(
# 'Gathered portals:\n' # 'Gathered portals:\n'
# f'{portals}' # f'{portals}'

View File

@ -21,7 +21,7 @@ Sub-process entry points.
from __future__ import annotations from __future__ import annotations
from functools import partial from functools import partial
import multiprocessing as mp import multiprocessing as mp
import os # import os
from typing import ( from typing import (
Any, Any,
TYPE_CHECKING, TYPE_CHECKING,
@ -38,6 +38,7 @@ from .devx import (
_frame_stack, _frame_stack,
pformat, pformat,
) )
# from .msg import pretty_struct
from .to_asyncio import run_as_asyncio_guest from .to_asyncio import run_as_asyncio_guest
from ._addr import UnwrappedAddress from ._addr import UnwrappedAddress
from ._runtime import ( from ._runtime import (
@ -127,20 +128,13 @@ def _trio_main(
if actor.loglevel is not None: if actor.loglevel is not None:
get_console_log(actor.loglevel) get_console_log(actor.loglevel)
actor_info: str = (
f'|_{actor}\n'
f' uid: {actor.uid}\n'
f' pid: {os.getpid()}\n'
f' parent_addr: {parent_addr}\n'
f' loglevel: {actor.loglevel}\n'
)
log.info( log.info(
'Starting new `trio` subactor\n' f'Starting `trio` subactor from parent @ '
f'{parent_addr}\n'
+ +
pformat.nest_from_op( pformat.nest_from_op(
input_op='>(', # see syntax ideas above input_op='>(', # see syntax ideas above
text=actor_info, text=f'{actor}',
nest_indent=2, # since "complete"
) )
) )
logmeth = log.info logmeth = log.info
@ -149,7 +143,7 @@ def _trio_main(
+ +
pformat.nest_from_op( pformat.nest_from_op(
input_op=')>', # like a "closed-to-play"-icon from super perspective input_op=')>', # like a "closed-to-play"-icon from super perspective
text=actor_info, text=f'{actor}',
nest_indent=1, nest_indent=1,
) )
) )
@ -167,7 +161,7 @@ def _trio_main(
+ +
pformat.nest_from_op( pformat.nest_from_op(
input_op='c)>', # closed due to cancel (see above) input_op='c)>', # closed due to cancel (see above)
text=actor_info, text=f'{actor}',
) )
) )
except BaseException as err: except BaseException as err:
@ -177,7 +171,7 @@ def _trio_main(
+ +
pformat.nest_from_op( pformat.nest_from_op(
input_op='x)>', # closed by error input_op='x)>', # closed by error
text=actor_info, text=f'{actor}',
) )
) )
# NOTE since we raise a tb will already be shown on the # NOTE since we raise a tb will already be shown on the

View File

@ -46,6 +46,7 @@ from msgspec import (
from tractor._state import current_actor from tractor._state import current_actor
from tractor.log import get_logger from tractor.log import get_logger
from tractor.msg import ( from tractor.msg import (
Aid,
Error, Error,
PayloadMsg, PayloadMsg,
MsgType, MsgType,
@ -479,8 +480,9 @@ class RemoteActorError(Exception):
@property @property
def relay_uid(self) -> tuple[str, str]|None: def relay_uid(self) -> tuple[str, str]|None:
if msg := self._ipc_msg:
return tuple( return tuple(
self._ipc_msg.relay_path[-1] msg.relay_path[-1]
) )
@property @property
@ -521,7 +523,8 @@ class RemoteActorError(Exception):
for key in fields: for key in fields:
if ( if (
key == 'relay_uid' key == 'relay_uid'
and not self.is_inception() and
not self.is_inception()
): ):
continue continue
@ -534,6 +537,13 @@ class RemoteActorError(Exception):
None, None,
) )
) )
if (
key == 'canceller'
and
isinstance(val, Aid)
):
val: str = val.reprol(sin_uuid=False)
# TODO: for `.relay_path` on multiline? # TODO: for `.relay_path` on multiline?
# if not isinstance(val, str): # if not isinstance(val, str):
# val_str = pformat(val) # val_str = pformat(val)
@ -623,12 +633,22 @@ class RemoteActorError(Exception):
# IFF there is an embedded traceback-str we always # IFF there is an embedded traceback-str we always
# draw the ascii-box around it. # draw the ascii-box around it.
body: str = '' body: str = ''
if tb_str := self.tb_str:
fields: str = self._mk_fields_str( fields: str = self._mk_fields_str(
_body_fields _body_fields
+ +
self.extra_body_fields, self.extra_body_fields,
) )
tb_str: str = (
self.tb_str
#
# ^TODO? what to use instead? if anything?
# -[ ] ensure the `.message` doesn't show up 2x in output ya?
# -[ ] ._message isn't really right?
# or
# self._message
)
if tb_str:
from tractor.devx import ( from tractor.devx import (
pformat_boxed_tb, pformat_boxed_tb,
) )
@ -640,7 +660,7 @@ class RemoteActorError(Exception):
# just after <Type( # just after <Type(
# |___ .. # |___ ..
tb_body_indent=1, tb_body_indent=1,
boxer_header=self.relay_uid, boxer_header=self.relay_uid or '-',
) )
# !TODO, it'd be nice to import these top level without # !TODO, it'd be nice to import these top level without
@ -713,6 +733,10 @@ class RemoteActorError(Exception):
class ContextCancelled(RemoteActorError): class ContextCancelled(RemoteActorError):
''' '''
IPC context cancellation signal/msg.
Often reffed with the short-hand: "ctxc".
Inter-actor task context was cancelled by either a call to Inter-actor task context was cancelled by either a call to
``Portal.cancel_actor()`` or ``Context.cancel()``. ``Portal.cancel_actor()`` or ``Context.cancel()``.
@ -737,8 +761,8 @@ class ContextCancelled(RemoteActorError):
- (simulating) an IPC transport network outage - (simulating) an IPC transport network outage
- a (malicious) pkt sent specifically to cancel an actor's - a (malicious) pkt sent specifically to cancel an actor's
runtime non-gracefully without ensuring ongoing RPC tasks are runtime non-gracefully without ensuring ongoing RPC tasks
incrementally cancelled as is done with: are incrementally cancelled as is done with:
`Actor` `Actor`
|_`.cancel()` |_`.cancel()`
|_`.cancel_soon()` |_`.cancel_soon()`
@ -759,6 +783,59 @@ class ContextCancelled(RemoteActorError):
# src_actor_uid = canceller # src_actor_uid = canceller
class ActorCancelled(ContextCancelled):
'''
Runtime-layer cancellation signal/msg.
Indicates a "graceful interrupt" of the machinery scheduled by
the py-proc's `trio.run()`.
Often reffed with the short-hand: "actorc".
Raised from within `an: ActorNursery` (via an `ExceptionGroup`)
when an actor has been "process wide" cancel-called using any of,
- `ActorNursery.cancel()`
- `Portal.cancel_actor()`
**and** that cancel request was part of a "non graceful" cancel
condition.
That is, whenever an exception is to be raised outside an `an`
scope-block due to some error raised-in/relayed-to that scope. In
such cases for every subactor which was cancelledand subsequently
( and according to the `an`'s supervision strat ) this is
normally raised per subactor portal.
'''
@property
def canceller(self) -> Aid:
'''
Return the (maybe) `Actor.aid: Aid` for the requesting-author
of this actorc.
Emit a warning msg when `.canceller` has not been set.
See additional relevant notes in
`ContextCancelled.canceller`.
'''
value: tuple[str, str]|None
if msg := self._ipc_msg:
value = msg.canceller
else:
value = self._extra_msgdata['canceller']
if value:
return value
log.warning(
'IPC Context cancelled without a requesting actor?\n'
'Maybe the IPC transport ended abruptly?\n\n'
f'{self}'
)
class MsgTypeError( class MsgTypeError(
RemoteActorError, RemoteActorError,
): ):

View File

@ -39,7 +39,10 @@ import warnings
import trio import trio
from .trionics import maybe_open_nursery from .trionics import (
maybe_open_nursery,
collapse_eg,
)
from ._state import ( from ._state import (
current_actor, current_actor,
) )
@ -115,6 +118,10 @@ class Portal:
@property @property
def chan(self) -> Channel: def chan(self) -> Channel:
'''
Ref to this ctx's underlying `tractor.ipc.Channel`.
'''
return self._chan return self._chan
@property @property
@ -174,10 +181,17 @@ class Portal:
# not expecting a "main" result # not expecting a "main" result
if self._expect_result_ctx is None: if self._expect_result_ctx is None:
peer_id: str = f'{self.channel.aid.reprol()!r}'
log.warning( log.warning(
f"Portal for {self.channel.aid} not expecting a final" f'Portal to peer {peer_id} will not deliver a final result?\n'
" result?\nresult() should only be called if subactor" f'\n'
" was spawned with `ActorNursery.run_in_actor()`") f'Context.result() can only be called by the parent of '
f'a sub-actor when it was spawned with '
f'`ActorNursery.run_in_actor()`'
f'\n'
f'Further this `ActorNursery`-method-API will deprecated in the'
f'near fututre!\n'
)
return NoResult return NoResult
# expecting a "main" result # expecting a "main" result
@ -210,6 +224,7 @@ class Portal:
typname: str = type(self).__name__ typname: str = type(self).__name__
log.warning( log.warning(
f'`{typname}.result()` is DEPRECATED!\n' f'`{typname}.result()` is DEPRECATED!\n'
f'\n'
f'Use `{typname}.wait_for_result()` instead!\n' f'Use `{typname}.wait_for_result()` instead!\n'
) )
return await self.wait_for_result( return await self.wait_for_result(
@ -221,8 +236,10 @@ class Portal:
# terminate all locally running async generator # terminate all locally running async generator
# IPC calls # IPC calls
if self._streams: if self._streams:
log.cancel( peer_id: str = f'{self.channel.aid.reprol()!r}'
f"Cancelling all streams with {self.channel.aid}") report: str = (
f'Cancelling all msg-streams with {peer_id}\n'
)
for stream in self._streams.copy(): for stream in self._streams.copy():
try: try:
await stream.aclose() await stream.aclose()
@ -231,10 +248,18 @@ class Portal:
# (unless of course at some point down the road we # (unless of course at some point down the road we
# won't expect this to always be the case or need to # won't expect this to always be the case or need to
# detect it for respawning purposes?) # detect it for respawning purposes?)
log.debug(f"{stream} was already closed.") report += (
f'->) {stream!r} already closed\n'
)
log.cancel(report)
async def aclose(self): async def aclose(self):
log.debug(f"Closing {self}") log.debug(
f'Closing portal\n'
f'>}}\n'
f'|_{self}\n'
)
# TODO: once we move to implementing our own `ReceiveChannel` # TODO: once we move to implementing our own `ReceiveChannel`
# (including remote task cancellation inside its `.aclose()`) # (including remote task cancellation inside its `.aclose()`)
# we'll need to .aclose all those channels here # we'll need to .aclose all those channels here
@ -260,23 +285,22 @@ class Portal:
__runtimeframe__: int = 1 # noqa __runtimeframe__: int = 1 # noqa
chan: Channel = self.channel chan: Channel = self.channel
peer_id: str = f'{self.channel.aid.reprol()!r}'
if not chan.connected(): if not chan.connected():
log.runtime( log.runtime(
'This channel is already closed, skipping cancel request..' 'Peer {peer_id} is already disconnected\n'
'-> skipping cancel request..\n'
) )
return False return False
reminfo: str = (
f'c)=> {self.channel.aid}\n'
f' |_{chan}\n'
)
log.cancel( log.cancel(
f'Requesting actor-runtime cancel for peer\n\n' f'Sending actor-runtime-cancel-req to peer\n'
f'{reminfo}' f'\n'
f'c)=> {peer_id}\n'
) )
# XXX the one spot we set it? # XXX the one spot we set it?
self.channel._cancel_called: bool = True chan._cancel_called: bool = True
try: try:
# send cancel cmd - might not get response # send cancel cmd - might not get response
# XXX: sure would be nice to make this work with # XXX: sure would be nice to make this work with
@ -297,8 +321,9 @@ class Portal:
# may timeout and we never get an ack (obvi racy) # may timeout and we never get an ack (obvi racy)
# but that doesn't mean it wasn't cancelled. # but that doesn't mean it wasn't cancelled.
log.debug( log.debug(
'May have failed to cancel peer?\n' f'May have failed to cancel peer?\n'
f'{reminfo}' f'\n'
f'c)=?> {peer_id}\n'
) )
# if we get here some weird cancellation case happened # if we get here some weird cancellation case happened
@ -316,22 +341,22 @@ class Portal:
TransportClosed, TransportClosed,
) as tpt_err: ) as tpt_err:
report: str = ( ipc_borked_report: str = (
f'IPC chan for actor already closed or broken?\n\n' f'IPC for actor already closed/broken?\n\n'
f'{self.channel.aid}\n' f'\n'
f' |_{self.channel}\n' f'c)=x> {peer_id}\n'
) )
match tpt_err: match tpt_err:
case TransportClosed(): case TransportClosed():
log.debug(report) log.debug(ipc_borked_report)
case _: case _:
report += ( ipc_borked_report += (
f'\n' f'\n'
f'Unhandled low-level transport-closed/error during\n' f'Unhandled low-level transport-closed/error during\n'
f'Portal.cancel_actor()` request?\n' f'Portal.cancel_actor()` request?\n'
f'<{type(tpt_err).__name__}( {tpt_err} )>\n' f'<{type(tpt_err).__name__}( {tpt_err} )>\n'
) )
log.warning(report) log.warning(ipc_borked_report)
return False return False
@ -488,10 +513,13 @@ class Portal:
with trio.CancelScope(shield=True): with trio.CancelScope(shield=True):
await ctx.cancel() await ctx.cancel()
except trio.ClosedResourceError: except trio.ClosedResourceError as cre:
# if the far end terminates before we send a cancel the # if the far end terminates before we send a cancel the
# underlying transport-channel may already be closed. # underlying transport-channel may already be closed.
log.cancel(f'Context {ctx} was already closed?') log.cancel(
f'Context.cancel() -> {cre!r}\n'
f'cid: {ctx.cid!r} already closed?\n'
)
# XXX: should this always be done? # XXX: should this always be done?
# await recv_chan.aclose() # await recv_chan.aclose()
@ -558,14 +586,13 @@ async def open_portal(
assert actor assert actor
was_connected: bool = False was_connected: bool = False
async with maybe_open_nursery( async with (
collapse_eg(),
maybe_open_nursery(
tn, tn,
shield=shield, shield=shield,
strict_exception_groups=False, ) as tn,
# ^XXX^ TODO? soo roll our own then ?? ):
# -> since we kinda want the "if only one `.exception` then
# just raise that" interface?
) as tn:
if not channel.connected(): if not channel.connected():
await channel.connect() await channel.connect()

View File

@ -37,13 +37,7 @@ import warnings
import trio import trio
from ._runtime import ( from . import _runtime
Actor,
Arbiter,
# TODO: rename and make a non-actor subtype?
# Arbiter as Registry,
async_main,
)
from .devx import ( from .devx import (
debug, debug,
_frame_stack, _frame_stack,
@ -64,6 +58,7 @@ from ._addr import (
) )
from .trionics import ( from .trionics import (
is_multi_cancelled, is_multi_cancelled,
collapse_eg,
) )
from ._exceptions import ( from ._exceptions import (
RuntimeFailure, RuntimeFailure,
@ -102,7 +97,7 @@ async def maybe_block_bp(
): ):
logger.info( logger.info(
f'Found `greenback` installed @ {maybe_mod}\n' f'Found `greenback` installed @ {maybe_mod}\n'
'Enabling `tractor.pause_from_sync()` support!\n' f'Enabling `tractor.pause_from_sync()` support!\n'
) )
os.environ['PYTHONBREAKPOINT'] = ( os.environ['PYTHONBREAKPOINT'] = (
'tractor.devx.debug._sync_pause_from_builtin' 'tractor.devx.debug._sync_pause_from_builtin'
@ -197,13 +192,19 @@ async def open_root_actor(
# read-only state to sublayers? # read-only state to sublayers?
# extra_rt_vars: dict|None = None, # extra_rt_vars: dict|None = None,
) -> Actor: ) -> _runtime.Actor:
''' '''
Runtime init entry point for ``tractor``. Initialize the `tractor` runtime by starting a "root actor" in
a parent-most Python process.
All (disjoint) actor-process-trees-as-programs are created via
this entrypoint.
''' '''
# XXX NEVER allow nested actor-trees! # XXX NEVER allow nested actor-trees!
if already_actor := _state.current_actor(err_on_no_runtime=False): if already_actor := _state.current_actor(
err_on_no_runtime=False,
):
rtvs: dict[str, Any] = _state._runtime_vars rtvs: dict[str, Any] = _state._runtime_vars
root_mailbox: list[str, int] = rtvs['_root_mailbox'] root_mailbox: list[str, int] = rtvs['_root_mailbox']
registry_addrs: list[list[str, int]] = rtvs['_registry_addrs'] registry_addrs: list[list[str, int]] = rtvs['_registry_addrs']
@ -273,14 +274,20 @@ async def open_root_actor(
DeprecationWarning, DeprecationWarning,
stacklevel=2, stacklevel=2,
) )
registry_addrs = [arbiter_addr] uw_reg_addrs = [arbiter_addr]
if not registry_addrs: uw_reg_addrs = registry_addrs
registry_addrs: list[UnwrappedAddress] = default_lo_addrs( if not uw_reg_addrs:
uw_reg_addrs: list[UnwrappedAddress] = default_lo_addrs(
enable_transports enable_transports
) )
assert registry_addrs # must exist by now since all below code is dependent
assert uw_reg_addrs
registry_addrs: list[Address] = [
wrap_address(uw_addr)
for uw_addr in uw_reg_addrs
]
loglevel = ( loglevel = (
loglevel loglevel
@ -329,10 +336,10 @@ async def open_root_actor(
enable_stack_on_sig() enable_stack_on_sig()
# closed into below ping task-func # closed into below ping task-func
ponged_addrs: list[UnwrappedAddress] = [] ponged_addrs: list[Address] = []
async def ping_tpt_socket( async def ping_tpt_socket(
addr: UnwrappedAddress, addr: Address,
timeout: float = 1, timeout: float = 1,
) -> None: ) -> None:
''' '''
@ -352,17 +359,22 @@ async def open_root_actor(
# be better to eventually have a "discovery" protocol # be better to eventually have a "discovery" protocol
# with basic handshake instead? # with basic handshake instead?
with trio.move_on_after(timeout): with trio.move_on_after(timeout):
async with _connect_chan(addr): async with _connect_chan(addr.unwrap()):
ponged_addrs.append(addr) ponged_addrs.append(addr)
except OSError: except OSError:
# TODO: make this a "discovery" log level? # ?TODO, make this a "discovery" log level?
logger.info( logger.info(
f'No actor registry found @ {addr}\n' f'No root-actor registry found @ {addr!r}\n'
) )
# !TODO, this is basically just another (abstract)
# happy-eyeballs, so we should try for formalize it somewhere
# in a `.[_]discovery` ya?
#
async with trio.open_nursery() as tn: async with trio.open_nursery() as tn:
for addr in registry_addrs: for uw_addr in uw_reg_addrs:
addr: Address = wrap_address(uw_addr)
tn.start_soon( tn.start_soon(
ping_tpt_socket, ping_tpt_socket,
addr, addr,
@ -384,31 +396,35 @@ async def open_root_actor(
f'Registry(s) seem(s) to exist @ {ponged_addrs}' f'Registry(s) seem(s) to exist @ {ponged_addrs}'
) )
actor = Actor( actor = _runtime.Actor(
name=name or 'anonymous', name=name or 'anonymous',
uuid=mk_uuid(), uuid=mk_uuid(),
registry_addrs=ponged_addrs, registry_addrs=ponged_addrs,
loglevel=loglevel, loglevel=loglevel,
enable_modules=enable_modules, enable_modules=enable_modules,
) )
# DO NOT use the registry_addrs as the transport server # **DO NOT** use the registry_addrs as the
# addrs for this new non-registar, root-actor. # ipc-transport-server's bind-addrs as this is
# a new NON-registrar, ROOT-actor.
#
# XXX INSTEAD, bind random addrs using the same tpt
# proto.
for addr in ponged_addrs: for addr in ponged_addrs:
waddr: Address = wrap_address(addr)
trans_bind_addrs.append( trans_bind_addrs.append(
waddr.get_random(bindspace=waddr.bindspace) addr.get_random(
bindspace=addr.bindspace,
)
) )
# Start this local actor as the "registrar", aka a regular # Start this local actor as the "registrar", aka a regular
# actor who manages the local registry of "mailboxes" of # actor who manages the local registry of "mailboxes" of
# other process-tree-local sub-actors. # other process-tree-local sub-actors.
else: else:
# NOTE that if the current actor IS THE REGISTAR, the # NOTE that if the current actor IS THE REGISTAR, the
# following init steps are taken: # following init steps are taken:
# - the tranport layer server is bound to each addr # - the tranport layer server is bound to each addr
# pair defined in provided registry_addrs, or the default. # pair defined in provided registry_addrs, or the default.
trans_bind_addrs = registry_addrs trans_bind_addrs = uw_reg_addrs
# - it is normally desirable for any registrar to stay up # - it is normally desirable for any registrar to stay up
# indefinitely until either all registered (child/sub) # indefinitely until either all registered (child/sub)
@ -419,7 +435,8 @@ async def open_root_actor(
# https://github.com/goodboy/tractor/pull/348 # https://github.com/goodboy/tractor/pull/348
# https://github.com/goodboy/tractor/issues/296 # https://github.com/goodboy/tractor/issues/296
actor = Arbiter( # TODO: rename as `RootActor` or is that even necessary?
actor = _runtime.Arbiter(
name=name or 'registrar', name=name or 'registrar',
uuid=mk_uuid(), uuid=mk_uuid(),
registry_addrs=registry_addrs, registry_addrs=registry_addrs,
@ -431,6 +448,16 @@ async def open_root_actor(
# `.trio.run()`. # `.trio.run()`.
actor._infected_aio = _state._runtime_vars['_is_infected_aio'] actor._infected_aio = _state._runtime_vars['_is_infected_aio']
# NOTE, only set the loopback addr for the
# process-tree-global "root" mailbox since all sub-actors
# should be able to speak to their root actor over that
# channel.
raddrs: list[Address] = _state._runtime_vars['_root_addrs']
raddrs.extend(trans_bind_addrs)
# TODO, remove once we have also removed all usage;
# eventually all (root-)registry apis should expect > 1 addr.
_state._runtime_vars['_root_mailbox'] = raddrs[0]
# Start up main task set via core actor-runtime nurseries. # Start up main task set via core actor-runtime nurseries.
try: try:
# assign process-local actor # assign process-local actor
@ -438,21 +465,28 @@ async def open_root_actor(
# start local channel-server and fake the portal API # start local channel-server and fake the portal API
# NOTE: this won't block since we provide the nursery # NOTE: this won't block since we provide the nursery
ml_addrs_str: str = '\n'.join( report: str = f'Starting actor-runtime for {actor.aid.reprol()!r}\n'
f'@{addr}' for addr in trans_bind_addrs if reg_addrs := actor.registry_addrs:
report += (
'-> Opening new registry @ '
+
'\n'.join(
f'{addr}' for addr in reg_addrs
) )
logger.info(
f'Starting local {actor.uid} on the following transport addrs:\n'
f'{ml_addrs_str}'
) )
logger.info(f'{report}\n')
# start the actor runtime in a new task # start runtime in a bg sub-task, yield to caller.
async with trio.open_nursery( async with (
strict_exception_groups=False, collapse_eg(),
# ^XXX^ TODO? instead unpack any RAE as per "loose" style? trio.open_nursery() as root_tn,
) as nursery:
# ``_runtime.async_main()`` creates an internal nursery # ?TODO? finally-footgun below?
# -> see note on why shielding.
# maybe_raise_from_masking_exc(),
):
actor._root_tn = root_tn
# `_runtime.async_main()` creates an internal nursery
# and blocks here until any underlying actor(-process) # and blocks here until any underlying actor(-process)
# tree has terminated thereby conducting so called # tree has terminated thereby conducting so called
# "end-to-end" structured concurrency throughout an # "end-to-end" structured concurrency throughout an
@ -460,9 +494,9 @@ async def open_root_actor(
# "actor runtime" primitives are SC-compat and thus all # "actor runtime" primitives are SC-compat and thus all
# transitively spawned actors/processes must be as # transitively spawned actors/processes must be as
# well. # well.
await nursery.start( await root_tn.start(
partial( partial(
async_main, _runtime.async_main,
actor, actor,
accept_addrs=trans_bind_addrs, accept_addrs=trans_bind_addrs,
parent_addr=None parent_addr=None
@ -490,6 +524,11 @@ async def open_root_actor(
err, err,
api_frame=inspect.currentframe(), api_frame=inspect.currentframe(),
debug_filter=debug_filter, debug_filter=debug_filter,
# XXX NOTE, required to debug root-actor
# crashes under cancellation conditions; so
# most of them!
shield=root_tn.cancel_scope.cancel_called,
) )
if ( if (
@ -510,7 +549,7 @@ async def open_root_actor(
raise raise
finally: finally:
# NOTE: not sure if we'll ever need this but it's # NOTE/TODO?, not sure if we'll ever need this but it's
# possibly better for even more determinism? # possibly better for even more determinism?
# logger.cancel( # logger.cancel(
# f'Waiting on {len(nurseries)} nurseries in root..') # f'Waiting on {len(nurseries)} nurseries in root..')
@ -526,8 +565,14 @@ async def open_root_actor(
) )
logger.info( logger.info(
f'Closing down root actor\n' f'Closing down root actor\n'
f'{op_nested_actor_repr}\n' f'{op_nested_actor_repr}'
) )
# XXX, THIS IS A *finally-footgun*!
# (also mentioned in with-block above)
# -> though already shields iternally it can
# taskc here and mask underlying errors raised in
# the try-block above?
with trio.CancelScope(shield=True):
await actor.cancel(None) # self cancel await actor.cancel(None) # self cancel
finally: finally:
# revert all process-global runtime state # revert all process-global runtime state
@ -541,10 +586,16 @@ async def open_root_actor(
_state._current_actor = None _state._current_actor = None
_state._last_actor_terminated = actor _state._last_actor_terminated = actor
logger.runtime( sclang_repr: str = _pformat.nest_from_op(
input_op=')>',
text=actor.pformat(),
nest_prefix='|_',
nest_indent=1,
)
logger.info(
f'Root actor terminated\n' f'Root actor terminated\n'
f')>\n' f'{sclang_repr}'
f' |_{actor}\n'
) )

View File

@ -64,6 +64,7 @@ from .trionics import (
from .devx import ( from .devx import (
debug, debug,
add_div, add_div,
pformat as _pformat,
) )
from . import _state from . import _state
from .log import get_logger from .log import get_logger
@ -72,7 +73,7 @@ from .msg import (
MsgCodec, MsgCodec,
PayloadT, PayloadT,
NamespacePath, NamespacePath,
# pretty_struct, pretty_struct,
_ops as msgops, _ops as msgops,
) )
from tractor.msg.types import ( from tractor.msg.types import (
@ -220,11 +221,18 @@ async def _invoke_non_context(
task_status.started(ctx) task_status.started(ctx)
result = await coro result = await coro
fname: str = func.__name__ fname: str = func.__name__
op_nested_task: str = _pformat.nest_from_op(
input_op=f')> cid: {ctx.cid!r}',
text=f'{ctx._task}',
nest_indent=1, # under >
)
log.runtime( log.runtime(
'RPC complete:\n' f'RPC task complete\n'
f'task: {ctx._task}\n' f'\n'
f'|_cid={ctx.cid}\n' f'{op_nested_task}\n'
f'|_{fname}() -> {pformat(result)}\n' f'\n'
f')> {fname}() -> {pformat(result)}\n'
) )
# NOTE: only send result if we know IPC isn't down # NOTE: only send result if we know IPC isn't down
@ -376,7 +384,7 @@ async def _errors_relayed_via_ipc(
# RPC task bookeeping. # RPC task bookeeping.
# since RPC tasks are scheduled inside a flat # since RPC tasks are scheduled inside a flat
# `Actor._service_n`, we add "handles" to each such that # `Actor._service_tn`, we add "handles" to each such that
# they can be individually ccancelled. # they can be individually ccancelled.
finally: finally:
@ -454,7 +462,7 @@ async def _invoke(
connected IPC channel. connected IPC channel.
This is the core "RPC" `trio.Task` scheduling machinery used to start every This is the core "RPC" `trio.Task` scheduling machinery used to start every
remotely invoked function, normally in `Actor._service_n: Nursery`. remotely invoked function, normally in `Actor._service_tn: Nursery`.
''' '''
__tracebackhide__: bool = hide_tb __tracebackhide__: bool = hide_tb
@ -634,7 +642,7 @@ async def _invoke(
tn: Nursery tn: Nursery
rpc_ctx_cs: CancelScope rpc_ctx_cs: CancelScope
async with ( async with (
collapse_eg(), collapse_eg(hide_tb=False),
trio.open_nursery() as tn, trio.open_nursery() as tn,
msgops.maybe_limit_plds( msgops.maybe_limit_plds(
ctx=ctx, ctx=ctx,
@ -646,8 +654,7 @@ async def _invoke(
# scope ensures unasking of the `await coro` below # scope ensures unasking of the `await coro` below
# *should* never be interfered with!! # *should* never be interfered with!!
maybe_raise_from_masking_exc( maybe_raise_from_masking_exc(
tn=tn, unmask_from=(Cancelled,),
unmask_from=Cancelled,
) as _mbme, # maybe boxed masked exc ) as _mbme, # maybe boxed masked exc
): ):
ctx._scope_nursery = tn ctx._scope_nursery = tn
@ -664,7 +671,8 @@ async def _invoke(
ctx._result = res ctx._result = res
log.runtime( log.runtime(
f'Sending result msg and exiting {ctx.side!r}\n' f'Sending result msg and exiting {ctx.side!r}\n'
f'{return_msg}\n' f'\n'
f'{pretty_struct.pformat(return_msg)}\n'
) )
await chan.send(return_msg) await chan.send(return_msg)
@ -756,7 +764,6 @@ async def _invoke(
BaseExceptionGroup, BaseExceptionGroup,
BaseException, BaseException,
trio.Cancelled, trio.Cancelled,
) as _scope_err: ) as _scope_err:
scope_err = _scope_err scope_err = _scope_err
if ( if (
@ -815,29 +822,49 @@ async def _invoke(
f'after having {ctx.repr_state!r}\n' f'after having {ctx.repr_state!r}\n'
) )
if merr: if merr:
logmeth: Callable = log.error logmeth: Callable = log.error
if isinstance(merr, ContextCancelled): if (
logmeth: Callable = log.runtime # ctxc: by `Context.cancel()`
isinstance(merr, ContextCancelled)
if not isinstance(merr, RemoteActorError): # out-of-layer cancellation, one of:
tb_str: str = ''.join(traceback.format_exception(merr)) # - actorc: by `Portal.cancel_actor()`
# - OSc: by SIGINT or `Process.signal()`
or (
isinstance(merr, trio.Cancelled)
and
ctx.canceller
)
):
logmeth: Callable = log.cancel
descr_str += (
f' with {merr!r}\n'
)
elif (
not isinstance(merr, RemoteActorError)
):
tb_str: str = ''.join(
traceback.format_exception(merr)
)
descr_str += ( descr_str += (
f'\n{merr!r}\n' # needed? f'\n{merr!r}\n' # needed?
f'{tb_str}\n' f'{tb_str}\n'
f'\n'
f'scope_error:\n'
f'{scope_err!r}\n'
) )
else: else:
descr_str += f'\n{merr!r}\n' descr_str += (
f'{merr!r}\n'
)
else: else:
descr_str += f'\nand final result {ctx.outcome!r}\n' descr_str += (
f'\n'
f'with final result {ctx.outcome!r}\n'
)
logmeth( logmeth(
message f'{message}\n'
+ f'\n'
descr_str f'{descr_str}\n'
) )
@ -908,7 +935,7 @@ async def process_messages(
Receive (multiplexed) per-`Channel` RPC requests as msgs from Receive (multiplexed) per-`Channel` RPC requests as msgs from
remote processes; schedule target async funcs as local remote processes; schedule target async funcs as local
`trio.Task`s inside the `Actor._service_n: Nursery`. `trio.Task`s inside the `Actor._service_tn: Nursery`.
Depending on msg type, non-`cmd` (task spawning/starting) Depending on msg type, non-`cmd` (task spawning/starting)
request payloads (eg. `started`, `yield`, `return`, `error`) request payloads (eg. `started`, `yield`, `return`, `error`)
@ -933,7 +960,7 @@ async def process_messages(
''' '''
actor: Actor = _state.current_actor() actor: Actor = _state.current_actor()
assert actor._service_n # runtime state sanity assert actor._service_tn # runtime state sanity
# TODO: once `trio` get's an "obvious way" for req/resp we # TODO: once `trio` get's an "obvious way" for req/resp we
# should use it? # should use it?
@ -1004,8 +1031,6 @@ async def process_messages(
cid=cid, cid=cid,
kwargs=kwargs, kwargs=kwargs,
): ):
kwargs |= {'req_chan': chan}
# XXX NOTE XXX don't start entire actor # XXX NOTE XXX don't start entire actor
# runtime cancellation if this actor is # runtime cancellation if this actor is
# currently in debug mode! # currently in debug mode!
@ -1024,14 +1049,14 @@ async def process_messages(
cid, cid,
chan, chan,
actor.cancel, actor.cancel,
kwargs, kwargs | {'req_chan': chan},
is_rpc=False, is_rpc=False,
return_msg_type=CancelAck, return_msg_type=CancelAck,
) )
log.runtime( log.runtime(
'Cancelling IPC transport msg-loop with peer:\n' 'Cancelling RPC-msg-loop with peer\n'
f'|_{chan}\n' f'->c}} {chan.aid.reprol()}@[{chan.maddr}]\n'
) )
loop_cs.cancel() loop_cs.cancel()
break break
@ -1044,7 +1069,7 @@ async def process_messages(
): ):
target_cid: str = kwargs['cid'] target_cid: str = kwargs['cid']
kwargs |= { kwargs |= {
'requesting_uid': chan.uid, 'requesting_aid': chan.aid,
'ipc_msg': msg, 'ipc_msg': msg,
# XXX NOTE! ONLY the rpc-task-owning # XXX NOTE! ONLY the rpc-task-owning
@ -1080,21 +1105,34 @@ async def process_messages(
ns=ns, ns=ns,
func=funcname, func=funcname,
kwargs=kwargs, # type-spec this? see `msg.types` kwargs=kwargs, # type-spec this? see `msg.types`
uid=actorid, uid=actor_uuid,
): ):
if actor_uuid != chan.aid.uid:
raise RuntimeError(
f'IPC <Start> msg <-> chan.aid mismatch!?\n'
f'Channel.aid = {chan.aid!r}\n'
f'Start.uid = {actor_uuid!r}\n'
)
# await debug.pause()
op_repr: str = 'Start <=) '
req_repr: str = _pformat.nest_from_op(
input_op=op_repr,
op_suffix='',
nest_prefix='',
text=f'{chan}',
nest_indent=len(op_repr)-1,
rm_from_first_ln='<',
# ^XXX, subtract -1 to account for
# <Channel
# ^_chevron to be stripped
)
start_status: str = ( start_status: str = (
'Handling RPC `Start` request\n' 'Handling RPC request\n'
f'<= peer: {actorid}\n\n' f'{req_repr}\n'
f' |_{chan}\n' f'\n'
f' |_cid: {cid}\n\n' f'->{{ ipc-context-id: {cid!r}\n'
# f' |_{ns}.{funcname}({kwargs})\n' f'->{{ nsp for fn: `{ns}.{funcname}({kwargs})`\n'
f'>> {actor.uid}\n'
f' |_{actor}\n'
f' -> nsp: `{ns}.{funcname}({kwargs})`\n'
# f' |_{ns}.{funcname}({kwargs})\n\n'
# f'{pretty_struct.pformat(msg)}\n'
) )
# runtime-internal endpoint: `Actor.<funcname>` # runtime-internal endpoint: `Actor.<funcname>`
@ -1123,10 +1161,6 @@ async def process_messages(
await chan.send(err_msg) await chan.send(err_msg)
continue continue
start_status += (
f' -> func: {func}\n'
)
# schedule a task for the requested RPC function # schedule a task for the requested RPC function
# in the actor's main "service nursery". # in the actor's main "service nursery".
# #
@ -1134,10 +1168,10 @@ async def process_messages(
# supervision isolation? would avoid having to # supervision isolation? would avoid having to
# manage RPC tasks individually in `._rpc_tasks` # manage RPC tasks individually in `._rpc_tasks`
# table? # table?
start_status += ' -> scheduling new task..\n' start_status += '->( scheduling new task..\n'
log.runtime(start_status) log.runtime(start_status)
try: try:
ctx: Context = await actor._service_n.start( ctx: Context = await actor._service_tn.start(
partial( partial(
_invoke, _invoke,
actor, actor,
@ -1218,12 +1252,24 @@ async def process_messages(
# END-OF `async for`: # END-OF `async for`:
# IPC disconnected via `trio.EndOfChannel`, likely # IPC disconnected via `trio.EndOfChannel`, likely
# due to a (graceful) `Channel.aclose()`. # due to a (graceful) `Channel.aclose()`.
chan_op_repr: str = '<=x] '
chan_repr: str = _pformat.nest_from_op(
input_op=chan_op_repr,
op_suffix='',
nest_prefix='',
text=chan.pformat(),
nest_indent=len(chan_op_repr)-1,
rm_from_first_ln='<',
)
log.runtime( log.runtime(
f'channel for {chan.uid} disconnected, cancelling RPC tasks\n' f'IPC channel disconnected\n'
f'|_{chan}\n' f'{chan_repr}\n'
f'\n'
f'->c) cancelling RPC tasks.\n'
) )
await actor.cancel_rpc_tasks( await actor.cancel_rpc_tasks(
req_uid=actor.uid, req_aid=actor.aid,
# a "self cancel" in terms of the lifetime of the # a "self cancel" in terms of the lifetime of the
# IPC connection which is presumed to be the # IPC connection which is presumed to be the
# source of any requests for spawned tasks. # source of any requests for spawned tasks.
@ -1265,7 +1311,7 @@ async def process_messages(
) as err: ) as err:
if nursery_cancelled_before_task: if nursery_cancelled_before_task:
sn: Nursery = actor._service_n sn: Nursery = actor._service_tn
assert sn and sn.cancel_scope.cancel_called # sanity assert sn and sn.cancel_scope.cancel_called # sanity
log.cancel( log.cancel(
f'Service nursery cancelled before it handled {funcname}' f'Service nursery cancelled before it handled {funcname}'
@ -1295,13 +1341,37 @@ async def process_messages(
finally: finally:
# msg debugging for when he machinery is brokey # msg debugging for when he machinery is brokey
if msg is None: if msg is None:
message: str = 'Exiting IPC msg loop without receiving a msg?' message: str = 'Exiting RPC-loop without receiving a msg?'
else: else:
task_op_repr: str = ')>'
task: trio.Task = trio.lowlevel.current_task()
# maybe add cancelled opt prefix
if task._cancel_status.effectively_cancelled:
task_op_repr = 'c' + task_op_repr
task_repr: str = _pformat.nest_from_op(
input_op=task_op_repr,
text=f'{task!r}',
nest_indent=1,
)
# chan_op_repr: str = '<=} '
# chan_repr: str = _pformat.nest_from_op(
# input_op=chan_op_repr,
# op_suffix='',
# nest_prefix='',
# text=chan.pformat(),
# nest_indent=len(chan_op_repr)-1,
# rm_from_first_ln='<',
# )
message: str = ( message: str = (
'Exiting IPC msg loop with final msg\n\n' f'Exiting RPC-loop with final msg\n'
f'<= peer: {chan.uid}\n' f'\n'
f' |_{chan}\n\n' # f'{chan_repr}\n'
# f'{pretty_struct.pformat(msg)}' f'{task_repr}\n'
f'\n'
f'{pretty_struct.pformat(msg)}'
f'\n'
) )
log.runtime(message) log.runtime(message)

View File

@ -35,6 +35,15 @@ for running all lower level spawning, supervision and msging layers:
SC-transitive RPC via scheduling of `trio` tasks. SC-transitive RPC via scheduling of `trio` tasks.
- registration of newly spawned actors with the discovery sys. - registration of newly spawned actors with the discovery sys.
Glossary:
--------
- tn: a `trio.Nursery` or "task nursery".
- an: an `ActorNursery` or "actor nursery".
- root: top/parent-most scope/task/process/actor (or other runtime
primitive) in a hierarchical tree.
- parent-ish: "higher-up" in the runtime-primitive hierarchy.
- child-ish: "lower-down" in the runtime-primitive hierarchy.
''' '''
from __future__ import annotations from __future__ import annotations
from contextlib import ( from contextlib import (
@ -74,6 +83,10 @@ from tractor.msg import (
pretty_struct, pretty_struct,
types as msgtypes, types as msgtypes,
) )
from .trionics import (
collapse_eg,
maybe_open_nursery,
)
from .ipc import ( from .ipc import (
Channel, Channel,
# IPCServer, # causes cycles atm.. # IPCServer, # causes cycles atm..
@ -170,10 +183,11 @@ class Actor:
msg_buffer_size: int = 2**6 msg_buffer_size: int = 2**6
# nursery placeholders filled in by `async_main()` after fork # nursery placeholders filled in by `async_main()`,
_root_n: Nursery|None = None # - after fork for subactors.
_service_n: Nursery|None = None # - during boot for the root actor.
_root_tn: Nursery|None = None
_service_tn: Nursery|None = None
_ipc_server: _server.IPCServer|None = None _ipc_server: _server.IPCServer|None = None
@property @property
@ -210,7 +224,7 @@ class Actor:
*, *,
enable_modules: list[str] = [], enable_modules: list[str] = [],
loglevel: str|None = None, loglevel: str|None = None,
registry_addrs: list[UnwrappedAddress]|None = None, registry_addrs: list[Address]|None = None,
spawn_method: str|None = None, spawn_method: str|None = None,
# TODO: remove! # TODO: remove!
@ -231,7 +245,7 @@ class Actor:
# state # state
self._cancel_complete = trio.Event() self._cancel_complete = trio.Event()
self._cancel_called_by_remote: tuple[str, tuple]|None = None self._cancel_called_by: tuple[str, tuple]|None = None
self._cancel_called: bool = False self._cancel_called: bool = False
# retreive and store parent `__main__` data which # retreive and store parent `__main__` data which
@ -253,11 +267,12 @@ class Actor:
if arbiter_addr is not None: if arbiter_addr is not None:
warnings.warn( warnings.warn(
'`Actor(arbiter_addr=<blah>)` is now deprecated.\n' '`Actor(arbiter_addr=<blah>)` is now deprecated.\n'
'Use `registry_addrs: list[tuple]` instead.', 'Use `registry_addrs: list[Address]` instead.',
DeprecationWarning, DeprecationWarning,
stacklevel=2, stacklevel=2,
) )
registry_addrs: list[UnwrappedAddress] = [arbiter_addr]
registry_addrs: list[Address] = [wrap_address(arbiter_addr)]
# marked by the process spawning backend at startup # marked by the process spawning backend at startup
# will be None for the parent most process started manually # will be None for the parent most process started manually
@ -296,8 +311,10 @@ class Actor:
# input via the validator. # input via the validator.
self._reg_addrs: list[UnwrappedAddress] = [] self._reg_addrs: list[UnwrappedAddress] = []
if registry_addrs: if registry_addrs:
self.reg_addrs: list[UnwrappedAddress] = registry_addrs _state._runtime_vars['_registry_addrs'] = self.reg_addrs = [
_state._runtime_vars['_registry_addrs'] = registry_addrs addr.unwrap()
for addr in registry_addrs
]
@property @property
def aid(self) -> msgtypes.Aid: def aid(self) -> msgtypes.Aid:
@ -343,69 +360,118 @@ class Actor:
def pid(self) -> int: def pid(self) -> int:
return self._aid.pid return self._aid.pid
@property
def repr_state(self) -> str:
if self.cancel_complete:
return 'cancelled'
elif canceller := self.cancel_caller:
return f' and cancel-called by {canceller}'
else:
return 'running'
def pformat( def pformat(
self, self,
ds: str = ':', ds: str = ': ',
indent: int = 0, indent: int = 0,
privates: bool = False,
) -> str: ) -> str:
fields_sect_prefix: str = ' |_'
parent_uid: tuple|None = None fmtstr: str = f'|_id: {self.aid.reprol()!r}\n'
if privates:
aid_nest_prefix: str = '|_aid='
aid_field_repr: str = _pformat.nest_from_op(
input_op='',
text=pretty_struct.pformat(
struct=self.aid,
field_indent=2,
),
op_suffix='',
nest_prefix=aid_nest_prefix,
nest_indent=0,
)
fmtstr: str = f'{aid_field_repr}'
if rent_chan := self._parent_chan: if rent_chan := self._parent_chan:
parent_uid = rent_chan.uid fmtstr += (
f"|_parent{ds}{rent_chan.aid.reprol()}\n"
)
peers: list = []
server: _server.IPCServer = self.ipc_server server: _server.IPCServer = self.ipc_server
ipc_server_sect: str = ''
if server: if server:
peers: list[tuple] = list(server._peer_connected) if privates:
server_repr: str = self._ipc_server.pformat(
privates=privates,
)
# create field ln as a key-header indented under # create field ln as a key-header indented under
# and up to the section's key prefix. # and up to the section's key prefix.
# field_ln_header: str = textwrap.indent(
# text=f"ipc_server{ds}",
# prefix=' '*len(fields_sect_prefix),
# )
# ^XXX if we were to indent `repr(Server)` to # ^XXX if we were to indent `repr(Server)` to
# '<key>: ' # '<key>: '
# _here_^ # _here_^
server_repr: str = textwrap.indent( server_repr: str = _pformat.nest_from_op(
text=self._ipc_server.pformat(), input_op='', # nest as sub-obj
# prefix=' '*len(field_ln_header), op_suffix='',
prefix=' '*len(fields_sect_prefix), text=server_repr,
) )
ipc_server_sect: str = ( fmtstr += (
# f'{field_ln_header}\n' f"{server_repr}"
f'{server_repr}' )
else:
fmtstr += (
f'|_ipc: {server.repr_state!r}\n'
) )
fmtstr: str = ( fmtstr += (
f' |_id: {self.aid!r}\n' f'|_rpc: {len(self._rpc_tasks)} active tasks\n'
# f" aid{ds}{self.aid!r}\n"
f" parent{ds}{parent_uid}\n"
# f'\n'
f' |_ipc: {len(peers)!r} connected peers\n'
f" peers{ds}{peers!r}\n"
f"{ipc_server_sect}"
# f'\n'
f' |_rpc: {len(self._rpc_tasks)} tasks\n'
f" ctxs{ds}{len(self._contexts)}\n"
# f'\n'
f' |_runtime: ._task{ds}{self._task!r}\n'
f' _spawn_method{ds}{self._spawn_method}\n'
f' _actoruid2nursery{ds}{self._actoruid2nursery}\n'
f' _forkserver_info{ds}{self._forkserver_info}\n'
# f'\n'
f' |_state: "TODO: .repr_state()"\n'
f' _cancel_complete{ds}{self._cancel_complete}\n'
f' _cancel_called_by_remote{ds}{self._cancel_called_by_remote}\n'
f' _cancel_called{ds}{self._cancel_called}\n'
) )
# TODO, actually fix the .repr_state impl/output?
# append ipc-ctx state summary
# ctxs: dict = self._contexts
# if ctxs:
# ctx_states: dict[str, int] = {}
# for ctx in self._contexts.values():
# ctx_state: str = ctx.repr_state
# cnt = ctx_states.setdefault(ctx_state, 0)
# ctx_states[ctx_state] = cnt + 1
# fmtstr += (
# f" ctxs{ds}{ctx_states}\n"
# )
# runtime-state
task_name: str = '<dne>'
if task := self._task:
task_name: str = task.name
fmtstr += (
# TODO, this just like ctx?
f'|_state: {self.repr_state!r}\n'
f' task: {task_name}\n'
f' loglevel: {self.loglevel!r}\n'
f' subactors_spawned: {len(self._actoruid2nursery)}\n'
)
if not _state.is_root_process():
fmtstr += f' spawn_method: {self._spawn_method!r}\n'
if privates:
fmtstr += (
# f' actoruid2nursery{ds}{self._actoruid2nursery}\n'
f' cancel_complete{ds}{self._cancel_complete}\n'
f' cancel_called_by_remote{ds}{self._cancel_called_by}\n'
f' cancel_called{ds}{self._cancel_called}\n'
)
if fmtstr:
fmtstr: str = textwrap.indent(
text=fmtstr,
prefix=' '*(1 + indent),
)
_repr: str = ( _repr: str = (
'<Actor(\n' f'<{type(self).__name__}(\n'
+ f'{fmtstr}'
fmtstr f')>\n'
+
')>\n'
) )
if indent: if indent:
_repr: str = textwrap.indent( _repr: str = textwrap.indent(
@ -420,7 +486,11 @@ class Actor:
def reg_addrs(self) -> list[UnwrappedAddress]: def reg_addrs(self) -> list[UnwrappedAddress]:
''' '''
List of (socket) addresses for all known (and contactable) List of (socket) addresses for all known (and contactable)
registry actors. registry-service actors in "unwrapped" (i.e. IPC interchange
wire-compat) form.
If you are looking for the "wrapped" address form, use
`.registry_addrs` instead.
''' '''
return self._reg_addrs return self._reg_addrs
@ -439,8 +509,14 @@ class Actor:
self._reg_addrs = addrs self._reg_addrs = addrs
@property
def registry_addrs(self) -> list[Address]:
return [wrap_address(uw_addr)
for uw_addr in self.reg_addrs]
def load_modules( def load_modules(
self, self,
) -> None: ) -> None:
''' '''
Load explicitly enabled python modules from local fs after Load explicitly enabled python modules from local fs after
@ -487,6 +563,14 @@ class Actor:
) )
raise raise
# ?TODO, factor this meth-iface into a new `.rpc` subsys primitive?
# - _get_rpc_func(),
# - _deliver_ctx_payload(),
# - get_context(),
# - start_remote_task(),
# - cancel_rpc_tasks(),
# - _cancel_task(),
#
def _get_rpc_func(self, ns, funcname): def _get_rpc_func(self, ns, funcname):
''' '''
Try to lookup and return a target RPC func from the Try to lookup and return a target RPC func from the
@ -530,11 +614,11 @@ class Actor:
queue. queue.
''' '''
uid: tuple[str, str] = chan.uid aid: msgtypes.Aid = chan.aid
assert uid, f"`chan.uid` can't be {uid}" assert aid, f"`chan.aid` can't be {aid}"
try: try:
ctx: Context = self._contexts[( ctx: Context = self._contexts[(
uid, aid.uid,
cid, cid,
# TODO: how to determine this tho? # TODO: how to determine this tho?
@ -545,7 +629,7 @@ class Actor:
'Ignoring invalid IPC msg!?\n' 'Ignoring invalid IPC msg!?\n'
f'Ctx seems to not/no-longer exist??\n' f'Ctx seems to not/no-longer exist??\n'
f'\n' f'\n'
f'<=? {uid}\n' f'<=? {aid.reprol()!r}\n'
f' |_{pretty_struct.pformat(msg)}\n' f' |_{pretty_struct.pformat(msg)}\n'
) )
match msg: match msg:
@ -594,6 +678,7 @@ class Actor:
msging session's lifetime. msging session's lifetime.
''' '''
# ?TODO, use Aid here as well?
actor_uid = chan.uid actor_uid = chan.uid
assert actor_uid assert actor_uid
try: try:
@ -936,12 +1021,64 @@ class Actor:
the RPC service nursery. the RPC service nursery.
''' '''
assert self._service_n actor_repr: str = _pformat.nest_from_op(
self._service_n.start_soon( input_op='>c(',
text=self.pformat(),
nest_indent=1,
)
log.cancel(
'Actor.cancel_soon()` was called!\n'
f'>> scheduling `Actor.cancel()`\n'
f'{actor_repr}'
)
assert self._service_tn
self._service_tn.start_soon(
self.cancel, self.cancel,
None, # self cancel all rpc tasks None, # self cancel all rpc tasks
) )
# schedule a "canceller task" in the `._root_tn` once the
# `._service_tn` is fully shutdown; task waits for child-ish
# scopes to fully exit then finally cancels its parent,
# root-most, scope.
async def cancel_root_tn_after_services():
log.runtime(
'Waiting on service-tn to cancel..\n'
f'c>)\n'
f'|_{self._service_tn.cancel_scope!r}\n'
)
await self._cancel_complete.wait()
log.cancel(
f'`._service_tn` cancelled\n'
f'>c)\n'
f'|_{self._service_tn.cancel_scope!r}\n'
f'\n'
f'>> cancelling `._root_tn`\n'
f'c>(\n'
f' |_{self._root_tn.cancel_scope!r}\n'
)
self._root_tn.cancel_scope.cancel()
self._root_tn.start_soon(
cancel_root_tn_after_services
)
@property
def cancel_complete(self) -> bool:
return self._cancel_complete.is_set()
@property
def cancel_called(self) -> bool:
'''
Was this actor requested to cancel by a remote peer actor.
'''
return self._cancel_called_by is not None
@property
def cancel_caller(self) -> msgtypes.Aid|None:
return self._cancel_called_by
async def cancel( async def cancel(
self, self,
@ -966,20 +1103,18 @@ class Actor:
''' '''
( (
requesting_uid, requesting_aid, # Aid
requester_type, requester_type, # str
req_chan, req_chan,
log_meth, log_meth,
) = ( ) = (
req_chan.uid, req_chan.aid,
'peer', 'peer',
req_chan, req_chan,
log.cancel, log.cancel,
) if req_chan else ( ) if req_chan else (
# a self cancel of ALL rpc tasks # a self cancel of ALL rpc tasks
self.uid, self.aid,
'self', 'self',
self, self,
log.runtime, log.runtime,
@ -987,14 +1122,14 @@ class Actor:
# TODO: just use the new `Context.repr_rpc: str` (and # TODO: just use the new `Context.repr_rpc: str` (and
# other) repr fields instead of doing this all manual.. # other) repr fields instead of doing this all manual..
msg: str = ( msg: str = (
f'Actor-runtime cancel request from {requester_type}\n\n' f'Actor-runtime cancel request from {requester_type!r}\n'
f'<=c) {requesting_uid}\n'
f' |_{self}\n'
f'\n' f'\n'
f'<=c)\n'
f'{self}'
) )
# TODO: what happens here when we self-cancel tho? # TODO: what happens here when we self-cancel tho?
self._cancel_called_by_remote: tuple = requesting_uid self._cancel_called_by: tuple = requesting_aid
self._cancel_called = True self._cancel_called = True
# cancel all ongoing rpc tasks # cancel all ongoing rpc tasks
@ -1022,7 +1157,7 @@ class Actor:
# self-cancel **all** ongoing RPC tasks # self-cancel **all** ongoing RPC tasks
await self.cancel_rpc_tasks( await self.cancel_rpc_tasks(
req_uid=requesting_uid, req_aid=requesting_aid,
parent_chan=None, parent_chan=None,
) )
@ -1032,26 +1167,18 @@ class Actor:
await ipc_server.wait_for_shutdown() await ipc_server.wait_for_shutdown()
# cancel all rpc tasks permanently # cancel all rpc tasks permanently
if self._service_n: if self._service_tn:
self._service_n.cancel_scope.cancel() self._service_tn.cancel_scope.cancel()
log_meth(msg) log_meth(msg)
self._cancel_complete.set() self._cancel_complete.set()
return True return True
# XXX: hard kill logic if needed?
# def _hard_mofo_kill(self):
# # If we're the root actor or zombied kill everything
# if self._parent_chan is None: # TODO: more robust check
# root = trio.lowlevel.current_root_task()
# for n in root.child_nurseries:
# n.cancel_scope.cancel()
async def _cancel_task( async def _cancel_task(
self, self,
cid: str, cid: str,
parent_chan: Channel, parent_chan: Channel,
requesting_uid: tuple[str, str]|None, requesting_aid: msgtypes.Aid|None,
ipc_msg: dict|None|bool = False, ipc_msg: dict|None|bool = False,
@ -1089,7 +1216,7 @@ class Actor:
log.runtime( log.runtime(
'Cancel request for invalid RPC task.\n' 'Cancel request for invalid RPC task.\n'
'The task likely already completed or was never started!\n\n' 'The task likely already completed or was never started!\n\n'
f'<= canceller: {requesting_uid}\n' f'<= canceller: {requesting_aid}\n'
f'=> {cid}@{parent_chan.uid}\n' f'=> {cid}@{parent_chan.uid}\n'
f' |_{parent_chan}\n' f' |_{parent_chan}\n'
) )
@ -1097,9 +1224,12 @@ class Actor:
log.cancel( log.cancel(
'Rxed cancel request for RPC task\n' 'Rxed cancel request for RPC task\n'
f'<=c) {requesting_uid}\n' f'{ctx._task!r} <=c) {requesting_aid}\n'
f' |_{ctx._task}\n' f'|_>> {ctx.repr_rpc}\n'
f' >> {ctx.repr_rpc}\n'
# f'|_{ctx._task}\n'
# f' >> {ctx.repr_rpc}\n'
# f'=> {ctx._task}\n' # f'=> {ctx._task}\n'
# f' >> Actor._cancel_task() => {ctx._task}\n' # f' >> Actor._cancel_task() => {ctx._task}\n'
# f' |_ {ctx._task}\n\n' # f' |_ {ctx._task}\n\n'
@ -1120,9 +1250,9 @@ class Actor:
) )
if ( if (
ctx._canceller is None ctx._canceller is None
and requesting_uid and requesting_aid
): ):
ctx._canceller: tuple = requesting_uid ctx._canceller: tuple = requesting_aid.uid
# TODO: pack the RPC `{'cmd': <blah>}` msg into a ctxc and # TODO: pack the RPC `{'cmd': <blah>}` msg into a ctxc and
# then raise and pack it here? # then raise and pack it here?
@ -1148,7 +1278,7 @@ class Actor:
# wait for _invoke to mark the task complete # wait for _invoke to mark the task complete
flow_info: str = ( flow_info: str = (
f'<= canceller: {requesting_uid}\n' f'<= canceller: {requesting_aid}\n'
f'=> ipc-parent: {parent_chan}\n' f'=> ipc-parent: {parent_chan}\n'
f'|_{ctx}\n' f'|_{ctx}\n'
) )
@ -1165,7 +1295,7 @@ class Actor:
async def cancel_rpc_tasks( async def cancel_rpc_tasks(
self, self,
req_uid: tuple[str, str], req_aid: msgtypes.Aid,
# NOTE: when None is passed we cancel **all** rpc # NOTE: when None is passed we cancel **all** rpc
# tasks running in this actor! # tasks running in this actor!
@ -1175,14 +1305,14 @@ class Actor:
''' '''
Cancel all ongoing RPC tasks owned/spawned for a given Cancel all ongoing RPC tasks owned/spawned for a given
`parent_chan: Channel` or simply all tasks (inside `parent_chan: Channel` or simply all tasks (inside
`._service_n`) when `parent_chan=None`. `._service_tn`) when `parent_chan=None`.
''' '''
tasks: dict = self._rpc_tasks tasks: dict = self._rpc_tasks
if not tasks: if not tasks:
log.runtime( log.runtime(
'Actor has no cancellable RPC tasks?\n' 'Actor has no cancellable RPC tasks?\n'
f'<= canceller: {req_uid}\n' f'<= canceller: {req_aid.reprol()}\n'
) )
return return
@ -1222,7 +1352,7 @@ class Actor:
) )
log.cancel( log.cancel(
f'Cancelling {descr} RPC tasks\n\n' f'Cancelling {descr} RPC tasks\n\n'
f'<=c) {req_uid} [canceller]\n' f'<=c) {req_aid} [canceller]\n'
f'{rent_chan_repr}' f'{rent_chan_repr}'
f'c)=> {self.uid} [cancellee]\n' f'c)=> {self.uid} [cancellee]\n'
f' |_{self} [with {len(tasks)} tasks]\n' f' |_{self} [with {len(tasks)} tasks]\n'
@ -1250,7 +1380,7 @@ class Actor:
await self._cancel_task( await self._cancel_task(
cid, cid,
task_caller_chan, task_caller_chan,
requesting_uid=req_uid, requesting_aid=req_aid,
) )
if tasks: if tasks:
@ -1278,25 +1408,13 @@ class Actor:
''' '''
return self.accept_addrs[0] return self.accept_addrs[0]
def get_parent(self) -> Portal: # TODO, this should delegate ONLY to the
''' # `._spawn_spec._runtime_vars: dict` / `._state` APIs?
Return a `Portal` to our parent. #
# XXX, AH RIGHT that's why..
''' # it's bc we pass this as a CLI flag to the child.py precisely
assert self._parent_chan, "No parent channel for this actor?" # bc we need the bootstrapping pre `async_main()`.. but maybe
return Portal(self._parent_chan) # keep this as an impl deat and not part of the pub iface impl?
def get_chans(
self,
uid: tuple[str, str],
) -> list[Channel]:
'''
Return all IPC channels to the actor with provided `uid`.
'''
return self._peers[uid]
def is_infected_aio(self) -> bool: def is_infected_aio(self) -> bool:
''' '''
If `True`, this actor is running `trio` in guest mode on If `True`, this actor is running `trio` in guest mode on
@ -1307,6 +1425,23 @@ class Actor:
''' '''
return self._infected_aio return self._infected_aio
# ?TODO, is this the right type for this method?
def get_parent(self) -> Portal:
'''
Return a `Portal` to our parent.
'''
assert self._parent_chan, "No parent channel for this actor?"
return Portal(self._parent_chan)
# XXX: hard kill logic if needed?
# def _hard_mofo_kill(self):
# # If we're the root actor or zombied kill everything
# if self._parent_chan is None: # TODO: more robust check
# root = trio.lowlevel.current_root_task()
# for n in root.child_nurseries:
# n.cancel_scope.cancel()
async def async_main( async def async_main(
actor: Actor, actor: Actor,
@ -1350,6 +1485,8 @@ async def async_main(
# establish primary connection with immediate parent # establish primary connection with immediate parent
actor._parent_chan: Channel|None = None actor._parent_chan: Channel|None = None
# is this a sub-actor?
# get runtime info from parent.
if parent_addr is not None: if parent_addr is not None:
( (
actor._parent_chan, actor._parent_chan,
@ -1380,46 +1517,55 @@ async def async_main(
accept_addrs.append(addr.unwrap()) accept_addrs.append(addr.unwrap())
assert accept_addrs assert accept_addrs
# The "root" nursery ensures the channel with the immediate
# parent is kept alive as a resilient service until ya_root_tn: bool = bool(actor._root_tn)
# cancellation steps have (mostly) occurred in ya_service_tn: bool = bool(actor._service_tn)
# a deterministic way.
async with trio.open_nursery( # NOTE, a top-most "root" nursery in each actor-process
strict_exception_groups=False, # enables a lifetime priority for the IPC-channel connection
) as root_nursery: # with a sub-actor's immediate parent. I.e. this connection
actor._root_n = root_nursery # is kept alive as a resilient service connection until all
assert actor._root_n # other machinery has exited, cancellation of all
# embedded/child scopes have completed. This helps ensure
# a deterministic (and thus "graceful")
# first-class-supervision style teardown where a parent actor
# (vs. say peers) is always the last to be contacted before
# disconnect.
root_tn: trio.Nursery
async with (
collapse_eg(),
maybe_open_nursery(
nursery=actor._root_tn,
) as root_tn,
):
if ya_root_tn:
assert root_tn is actor._root_tn
else:
actor._root_tn = root_tn
ipc_server: _server.IPCServer ipc_server: _server.IPCServer
async with ( async with (
trio.open_nursery( collapse_eg(),
strict_exception_groups=False, maybe_open_nursery(
) as service_nursery, nursery=actor._service_tn,
) as service_tn,
_server.open_ipc_server( _server.open_ipc_server(
parent_tn=service_nursery, parent_tn=service_tn, # ?TODO, why can't this be the root-tn
stream_handler_tn=service_nursery, stream_handler_tn=service_tn,
) as ipc_server, ) as ipc_server,
# ) as actor._ipc_server,
# ^TODO? prettier?
): ):
if ya_service_tn:
assert service_tn is actor._service_tn
else:
# This nursery is used to handle all inbound # This nursery is used to handle all inbound
# connections to us such that if the TCP server # connections to us such that if the TCP server
# is killed, connections can continue to process # is killed, connections can continue to process
# in the background until this nursery is cancelled. # in the background until this nursery is cancelled.
actor._service_n = service_nursery actor._service_tn = service_tn
# set after allocate
actor._ipc_server = ipc_server actor._ipc_server = ipc_server
assert (
actor._service_n
and (
actor._service_n
is
actor._ipc_server._parent_tn
is
ipc_server._stream_handler_tn
)
)
# load exposed/allowed RPC modules # load exposed/allowed RPC modules
# XXX: do this **after** establishing a channel to the parent # XXX: do this **after** establishing a channel to the parent
@ -1445,13 +1591,11 @@ async def async_main(
# - root actor: the ``accept_addr`` passed to this method # - root actor: the ``accept_addr`` passed to this method
# TODO: why is this not with the root nursery? # TODO: why is this not with the root nursery?
# - see above that the `._service_tn` is what's used?
try: try:
log.runtime(
'Booting IPC server'
)
eps: list = await ipc_server.listen_on( eps: list = await ipc_server.listen_on(
accept_addrs=accept_addrs, accept_addrs=accept_addrs,
stream_handler_nursery=service_nursery, stream_handler_nursery=service_tn,
) )
log.runtime( log.runtime(
f'Booted IPC server\n' f'Booted IPC server\n'
@ -1459,7 +1603,7 @@ async def async_main(
) )
assert ( assert (
(eps[0].listen_tn) (eps[0].listen_tn)
is not service_nursery is not service_tn
) )
except OSError as oserr: except OSError as oserr:
@ -1480,18 +1624,6 @@ async def async_main(
# TODO, just read direct from ipc_server? # TODO, just read direct from ipc_server?
accept_addrs: list[UnwrappedAddress] = actor.accept_addrs accept_addrs: list[UnwrappedAddress] = actor.accept_addrs
# NOTE: only set the loopback addr for the
# process-tree-global "root" mailbox since
# all sub-actors should be able to speak to
# their root actor over that channel.
if _state._runtime_vars['_is_root']:
raddrs: list[Address] = _state._runtime_vars['_root_addrs']
for addr in accept_addrs:
waddr: Address = wrap_address(addr)
raddrs.append(addr)
else:
_state._runtime_vars['_root_mailbox'] = raddrs[0]
# Register with the arbiter if we're told its addr # Register with the arbiter if we're told its addr
log.runtime( log.runtime(
f'Registering `{actor.name}` => {pformat(accept_addrs)}\n' f'Registering `{actor.name}` => {pformat(accept_addrs)}\n'
@ -1509,6 +1641,7 @@ async def async_main(
except AssertionError: except AssertionError:
await debug.pause() await debug.pause()
# !TODO, get rid of the local-portal crap XD
async with get_registry(addr) as reg_portal: async with get_registry(addr) as reg_portal:
for accept_addr in accept_addrs: for accept_addr in accept_addrs:
accept_addr = wrap_address(accept_addr) accept_addr = wrap_address(accept_addr)
@ -1533,7 +1666,7 @@ async def async_main(
# start processing parent requests until our channel # start processing parent requests until our channel
# server is 100% up and running. # server is 100% up and running.
if actor._parent_chan: if actor._parent_chan:
await root_nursery.start( await root_tn.start(
partial( partial(
_rpc.process_messages, _rpc.process_messages,
chan=actor._parent_chan, chan=actor._parent_chan,
@ -1545,8 +1678,9 @@ async def async_main(
# 'Blocking on service nursery to exit..\n' # 'Blocking on service nursery to exit..\n'
) )
log.runtime( log.runtime(
"Service nursery complete\n" 'Service nursery complete\n'
"Waiting on root nursery to complete" '\n'
'->} waiting on root nursery to complete..\n'
) )
# Blocks here as expected until the root nursery is # Blocks here as expected until the root nursery is
@ -1601,6 +1735,7 @@ async def async_main(
finally: finally:
teardown_report: str = ( teardown_report: str = (
'Main actor-runtime task completed\n' 'Main actor-runtime task completed\n'
'\n'
) )
# ?TODO? should this be in `._entry`/`._root` mods instead? # ?TODO? should this be in `._entry`/`._root` mods instead?
@ -1630,7 +1765,7 @@ async def async_main(
# XXX TODO but hard XXX # XXX TODO but hard XXX
# we can't actually do this bc the debugger uses the # we can't actually do this bc the debugger uses the
# _service_n to spawn the lock task, BUT, in theory if we had # _service_tn to spawn the lock task, BUT, in theory if we had
# the root nursery surround this finally block it might be # the root nursery surround this finally block it might be
# actually possible to debug THIS machinery in the same way # actually possible to debug THIS machinery in the same way
# as user task code? # as user task code?
@ -1642,7 +1777,8 @@ async def async_main(
# Unregister actor from the registry-sys / registrar. # Unregister actor from the registry-sys / registrar.
if ( if (
is_registered is_registered
and not actor.is_registrar and
not actor.is_registrar
): ):
failed: bool = False failed: bool = False
for addr in actor.reg_addrs: for addr in actor.reg_addrs:
@ -1677,28 +1813,30 @@ async def async_main(
ipc_server.has_peers(check_chans=True) ipc_server.has_peers(check_chans=True)
): ):
teardown_report += ( teardown_report += (
f'-> Waiting for remaining peers {ipc_server._peers} to clear..\n' f'-> Waiting for remaining peers to clear..\n'
f' {pformat(ipc_server._peers)}'
) )
log.runtime(teardown_report) log.runtime(teardown_report)
await ipc_server.wait_for_no_more_peers( await ipc_server.wait_for_no_more_peers()
shield=True,
)
teardown_report += ( teardown_report += (
'-> All peer channels are complete\n' '-]> all peer channels are complete.\n'
) )
op_nested_actor_repr: str = _pformat.nest_from_op( # op_nested_actor_repr: str = _pformat.nest_from_op(
input_op=')> ', # input_op=')>',
text=actor.pformat(), # text=actor.pformat(),
nest_prefix='|_', # nest_prefix='|_',
nest_indent=2, # nest_indent=1, # under >
) # )
teardown_report += ( teardown_report += (
'Actor runtime exited\n' '-)> actor runtime main task exit.\n'
f'{op_nested_actor_repr}\n' # f'{op_nested_actor_repr}'
) )
log.info(teardown_report) # if _state._runtime_vars['_is_root']:
# log.info(teardown_report)
# else:
log.runtime(teardown_report)
# TODO: rename to `Registry` and move to `.discovery._registry`! # TODO: rename to `Registry` and move to `.discovery._registry`!

View File

@ -34,9 +34,9 @@ from typing import (
import trio import trio
from trio import TaskStatus from trio import TaskStatus
from .devx.debug import ( from .devx import (
maybe_wait_for_debugger, debug,
acquire_debug_lock, pformat as _pformat
) )
from tractor._state import ( from tractor._state import (
current_actor, current_actor,
@ -50,15 +50,22 @@ from tractor._addr import UnwrappedAddress
from tractor._portal import Portal from tractor._portal import Portal
from tractor._runtime import Actor from tractor._runtime import Actor
from tractor._entry import _mp_main from tractor._entry import _mp_main
from tractor._exceptions import ActorFailure from tractor._exceptions import (
from tractor.msg.types import ( ActorCancelled,
Aid, ActorFailure,
SpawnSpec, # NoResult,
)
from tractor.msg import (
types as msgtypes,
pretty_struct,
) )
if TYPE_CHECKING: if TYPE_CHECKING:
from ipc import IPCServer from ipc import (
_server,
Channel,
)
from ._supervise import ActorNursery from ._supervise import ActorNursery
ProcessType = TypeVar('ProcessType', mp.Process, trio.Process) ProcessType = TypeVar('ProcessType', mp.Process, trio.Process)
@ -134,7 +141,6 @@ def try_set_start_method(
async def exhaust_portal( async def exhaust_portal(
portal: Portal, portal: Portal,
actor: Actor actor: Actor
@ -182,10 +188,12 @@ async def exhaust_portal(
async def cancel_on_completion( async def cancel_on_completion(
portal: Portal, portal: Portal,
actor: Actor, actor: Actor,
errors: dict[tuple[str, str], Exception], errors: dict[
msgtypes.Aid,
Exception,
],
) -> None: ) -> None:
''' '''
@ -206,24 +214,57 @@ async def cancel_on_completion(
portal, portal,
actor, actor,
) )
aid: msgtypes.Aid = actor.aid
repr_aid: str = aid.reprol(sin_uuid=False)
if isinstance(result, Exception): if isinstance(result, Exception):
errors[actor.uid]: Exception = result errors[aid]: Exception = result
log.cancel( log.cancel(
'Cancelling subactor runtime due to error:\n\n' 'Cancelling subactor {repr_aid!r} runtime due to error\n'
f'Portal.cancel_actor() => {portal.channel.uid}\n\n' f'\n'
f'error: {result}\n' f'Portal.cancel_actor() => {portal.channel.uid}\n'
f'\n'
f'{result!r}\n'
) )
else: else:
log.runtime( report: str = (
'Cancelling subactor gracefully:\n\n' f'Cancelling subactor {repr_aid!r} gracefully..\n'
f'Portal.cancel_actor() => {portal.channel.uid}\n\n' f'\n'
f'result: {result}\n' )
canc_info: str = (
f'Portal.cancel_actor() => {portal.chan.uid}\n'
f'\n'
f'final-result => {result!r}\n'
)
log.cancel(
report
+
canc_info
) )
# cancel the process now that we have a final result # cancel the process now that we have a final result
await portal.cancel_actor() await portal.cancel_actor()
if (
not errors.get(aid)
# and
# result is NoResult
):
pass
# await debug.pause(shield=True)
# errors[aid] = ActorCancelled(
# message=(
# f'Cancelled subactor {repr_aid!r}\n'
# f'{canc_info}\n'
# ),
# canceller=current_actor().aid,
# # TODO? should we have a ack-msg?
# # ipc_msg=??
# # boxed_type=trio.Cancelled,
# )
async def hard_kill( async def hard_kill(
proc: trio.Process, proc: trio.Process,
@ -233,10 +274,6 @@ async def hard_kill(
# whilst also hacking on it XD # whilst also hacking on it XD
# terminate_after: int = 99999, # terminate_after: int = 99999,
# NOTE: for mucking with `.pause()`-ing inside the runtime
# whilst also hacking on it XD
# terminate_after: int = 99999,
) -> None: ) -> None:
''' '''
Un-gracefully terminate an OS level `trio.Process` after timeout. Un-gracefully terminate an OS level `trio.Process` after timeout.
@ -298,6 +335,23 @@ async def hard_kill(
# zombies (as a feature) we ask the OS to do send in the # zombies (as a feature) we ask the OS to do send in the
# removal swad as the last resort. # removal swad as the last resort.
if cs.cancelled_caught: if cs.cancelled_caught:
# TODO? attempt at intermediary-rent-sub
# with child in debug lock?
# |_https://github.com/goodboy/tractor/issues/320
#
# if not is_root_process():
# log.warning(
# 'Attempting to acquire debug-REPL-lock before zombie reap!'
# )
# with trio.CancelScope(shield=True):
# async with debug.acquire_debug_lock(
# subactor_uid=current_actor().uid,
# ) as _ctx:
# log.warning(
# 'Acquired debug lock, child ready to be killed ??\n'
# )
# TODO: toss in the skynet-logo face as ascii art? # TODO: toss in the skynet-logo face as ascii art?
log.critical( log.critical(
# 'Well, the #ZOMBIE_LORD_IS_HERE# to collect\n' # 'Well, the #ZOMBIE_LORD_IS_HERE# to collect\n'
@ -315,6 +369,10 @@ async def soft_kill(
Awaitable, Awaitable,
], ],
portal: Portal, portal: Portal,
errors: dict[
msgtypes.Aid,
Exception,
],
) -> None: ) -> None:
''' '''
@ -328,12 +386,13 @@ async def soft_kill(
see `.hard_kill()`). see `.hard_kill()`).
''' '''
peer_aid: Aid = portal.channel.aid chan: Channel = portal.channel
peer_aid: msgtypes.Aid = chan.aid
try: try:
log.cancel( log.cancel(
f'Soft killing sub-actor via portal request\n' f'Soft killing sub-actor via portal request\n'
f'\n' f'\n'
f'(c=> {peer_aid}\n' f'c)=> {peer_aid.reprol()}@[{chan.maddr}]\n'
f' |_{proc}\n' f' |_{proc}\n'
) )
# wait on sub-proc to signal termination # wait on sub-proc to signal termination
@ -341,7 +400,7 @@ async def soft_kill(
except trio.Cancelled: except trio.Cancelled:
with trio.CancelScope(shield=True): with trio.CancelScope(shield=True):
await maybe_wait_for_debugger( await debug.maybe_wait_for_debugger(
child_in_debug=_runtime_vars.get( child_in_debug=_runtime_vars.get(
'_debug_mode', False '_debug_mode', False
), ),
@ -357,8 +416,8 @@ async def soft_kill(
# below. This means we try to do a graceful teardown # below. This means we try to do a graceful teardown
# via sending a cancel message before getting out # via sending a cancel message before getting out
# zombie killing tools. # zombie killing tools.
async with trio.open_nursery() as n: async with trio.open_nursery() as tn:
n.cancel_scope.shield = True tn.cancel_scope.shield = True
async def cancel_on_proc_deth(): async def cancel_on_proc_deth():
''' '''
@ -368,24 +427,35 @@ async def soft_kill(
''' '''
await wait_func(proc) await wait_func(proc)
n.cancel_scope.cancel() tn.cancel_scope.cancel()
# start a task to wait on the termination of the # start a task to wait on the termination of the
# process by itself waiting on a (caller provided) wait # process by itself waiting on a (caller provided) wait
# function which should unblock when the target process # function which should unblock when the target process
# has terminated. # has terminated.
n.start_soon(cancel_on_proc_deth) tn.start_soon(cancel_on_proc_deth)
# send the actor-runtime a cancel request. # send the actor-runtime a cancel request.
await portal.cancel_actor() await portal.cancel_actor()
# if not errors.get(peer_aid):
# errors[peer_aid] = ActorCancelled(
# message=(
# 'Sub-actor cancelled gracefully by parent\n'
# ),
# canceller=current_actor().aid,
# # TODO? should we have a ack-msg?
# # ipc_msg=??
# # boxed_type=trio.Cancelled,
# )
if proc.poll() is None: # type: ignore if proc.poll() is None: # type: ignore
log.warning( log.warning(
'Subactor still alive after cancel request?\n\n' 'Subactor still alive after cancel request?\n\n'
f'uid: {peer_aid}\n' f'uid: {peer_aid}\n'
f'|_{proc}\n' f'|_{proc}\n'
) )
n.cancel_scope.cancel() tn.cancel_scope.cancel()
raise raise
@ -393,7 +463,10 @@ async def new_proc(
name: str, name: str,
actor_nursery: ActorNursery, actor_nursery: ActorNursery,
subactor: Actor, subactor: Actor,
errors: dict[tuple[str, str], Exception], errors: dict[
msgtypes.Aid,
Exception,
],
# passed through to actor main # passed through to actor main
bind_addrs: list[UnwrappedAddress], bind_addrs: list[UnwrappedAddress],
@ -432,7 +505,10 @@ async def trio_proc(
name: str, name: str,
actor_nursery: ActorNursery, actor_nursery: ActorNursery,
subactor: Actor, subactor: Actor,
errors: dict[tuple[str, str], Exception], errors: dict[
msgtypes.Aid,
Exception,
],
# passed through to actor main # passed through to actor main
bind_addrs: list[UnwrappedAddress], bind_addrs: list[UnwrappedAddress],
@ -465,7 +541,7 @@ async def trio_proc(
"--uid", "--uid",
# TODO, how to pass this over "wire" encodings like # TODO, how to pass this over "wire" encodings like
# cmdline args? # cmdline args?
# -[ ] maybe we can add an `Aid.min_tuple()` ? # -[ ] maybe we can add an `msgtypes.Aid.min_tuple()` ?
str(subactor.uid), str(subactor.uid),
# Address the child must connect to on startup # Address the child must connect to on startup
"--parent_addr", "--parent_addr",
@ -483,13 +559,14 @@ async def trio_proc(
cancelled_during_spawn: bool = False cancelled_during_spawn: bool = False
proc: trio.Process|None = None proc: trio.Process|None = None
ipc_server: IPCServer = actor_nursery._actor.ipc_server ipc_server: _server.Server = actor_nursery._actor.ipc_server
try: try:
try: try:
proc: trio.Process = await trio.lowlevel.open_process(spawn_cmd, **proc_kwargs) proc: trio.Process = await trio.lowlevel.open_process(spawn_cmd, **proc_kwargs)
log.runtime( log.runtime(
'Started new child\n' f'Started new child subproc\n'
f'|_{proc}\n' f'(>\n'
f' |_{proc}\n'
) )
# wait for actor to spawn and connect back to us # wait for actor to spawn and connect back to us
@ -507,10 +584,10 @@ async def trio_proc(
with trio.CancelScope(shield=True): with trio.CancelScope(shield=True):
# don't clobber an ongoing pdb # don't clobber an ongoing pdb
if is_root_process(): if is_root_process():
await maybe_wait_for_debugger() await debug.maybe_wait_for_debugger()
elif proc is not None: elif proc is not None:
async with acquire_debug_lock(subactor.uid): async with debug.acquire_debug_lock(subactor.uid):
# soft wait on the proc to terminate # soft wait on the proc to terminate
with trio.move_on_after(0.5): with trio.move_on_after(0.5):
await proc.wait() await proc.wait()
@ -528,14 +605,19 @@ async def trio_proc(
# send a "spawning specification" which configures the # send a "spawning specification" which configures the
# initial runtime state of the child. # initial runtime state of the child.
sspec = SpawnSpec( sspec = msgtypes.SpawnSpec(
_parent_main_data=subactor._parent_main_data, _parent_main_data=subactor._parent_main_data,
enable_modules=subactor.enable_modules, enable_modules=subactor.enable_modules,
reg_addrs=subactor.reg_addrs, reg_addrs=subactor.reg_addrs,
bind_addrs=bind_addrs, bind_addrs=bind_addrs,
_runtime_vars=_runtime_vars, _runtime_vars=_runtime_vars,
) )
log.runtime(f'Sending spawn spec: {str(sspec)}') log.runtime(
f'Sending spawn spec to child\n'
f'{{}}=> {chan.aid.reprol()!r}\n'
f'\n'
f'{pretty_struct.pformat(sspec)}\n'
)
await chan.send(sspec) await chan.send(sspec)
# track subactor in current nursery # track subactor in current nursery
@ -549,9 +631,9 @@ async def trio_proc(
with trio.CancelScope(shield=True): with trio.CancelScope(shield=True):
await actor_nursery._join_procs.wait() await actor_nursery._join_procs.wait()
async with trio.open_nursery() as nursery: async with trio.open_nursery() as ptl_reaper_tn:
if portal in actor_nursery._cancel_after_result_on_exit: if portal in actor_nursery._cancel_after_result_on_exit:
nursery.start_soon( ptl_reaper_tn.start_soon(
cancel_on_completion, cancel_on_completion,
portal, portal,
subactor, subactor,
@ -563,39 +645,42 @@ async def trio_proc(
# condition. # condition.
await soft_kill( await soft_kill(
proc, proc,
trio.Process.wait, trio.Process.wait, # XXX, uses `pidfd_open()` below.
portal portal,
errors,
) )
# cancel result waiter that may have been spawned in # cancel result waiter that may have been spawned in
# tandem if not done already # tandem if not done already
log.cancel( log.cancel(
'Cancelling portal result reaper task\n' 'Cancelling portal result reaper task\n'
f'>c)\n' f'c)> {subactor.aid.reprol()!r}\n'
f' |_{subactor.uid}\n'
) )
nursery.cancel_scope.cancel() ptl_reaper_tn.cancel_scope.cancel()
finally: finally:
# XXX NOTE XXX: The "hard" reap since no actor zombies are # XXX NOTE XXX: The "hard" reap since no actor zombies are
# allowed! Do this **after** cancellation/teardown to avoid # allowed! Do this **after** cancellation/teardown to avoid
# killing the process too early. # killing the process too early.
if proc: if proc:
reap_repr: str = _pformat.nest_from_op(
input_op='>x)',
text=subactor.pformat(),
)
log.cancel( log.cancel(
f'Hard reap sequence starting for subactor\n' f'Hard reap sequence starting for subactor\n'
f'>x)\n' f'{reap_repr}'
f' |_{subactor}@{subactor.uid}\n'
) )
with trio.CancelScope(shield=True): with trio.CancelScope(shield=True):
# don't clobber an ongoing pdb # don't clobber an ongoing pdb
if cancelled_during_spawn: if cancelled_during_spawn:
# Try again to avoid TTY clobbering. # Try again to avoid TTY clobbering.
async with acquire_debug_lock(subactor.uid): async with debug.acquire_debug_lock(subactor.uid):
with trio.move_on_after(0.5): with trio.move_on_after(0.5):
await proc.wait() await proc.wait()
await maybe_wait_for_debugger( await debug.maybe_wait_for_debugger(
child_in_debug=_runtime_vars.get( child_in_debug=_runtime_vars.get(
'_debug_mode', False '_debug_mode', False
), ),
@ -624,7 +709,7 @@ async def trio_proc(
# acquire the lock and get notified of who has it, # acquire the lock and get notified of who has it,
# check that uid against our known children? # check that uid against our known children?
# this_uid: tuple[str, str] = current_actor().uid # this_uid: tuple[str, str] = current_actor().uid
# await acquire_debug_lock(this_uid) # await debug.acquire_debug_lock(this_uid)
if proc.poll() is None: if proc.poll() is None:
log.cancel(f"Attempting to hard kill {proc}") log.cancel(f"Attempting to hard kill {proc}")
@ -644,7 +729,10 @@ async def mp_proc(
name: str, name: str,
actor_nursery: ActorNursery, # type: ignore # noqa actor_nursery: ActorNursery, # type: ignore # noqa
subactor: Actor, subactor: Actor,
errors: dict[tuple[str, str], Exception], errors: dict[
msgtypes.Aid,
Exception,
],
# passed through to actor main # passed through to actor main
bind_addrs: list[UnwrappedAddress], bind_addrs: list[UnwrappedAddress],
parent_addr: UnwrappedAddress, parent_addr: UnwrappedAddress,
@ -727,7 +815,7 @@ async def mp_proc(
log.runtime(f"Started {proc}") log.runtime(f"Started {proc}")
ipc_server: IPCServer = actor_nursery._actor.ipc_server ipc_server: _server.Server = actor_nursery._actor.ipc_server
try: try:
# wait for actor to spawn and connect back to us # wait for actor to spawn and connect back to us
# channel should have handshake completed by the # channel should have handshake completed by the
@ -769,7 +857,7 @@ async def mp_proc(
cancel_on_completion, cancel_on_completion,
portal, portal,
subactor, subactor,
errors errors,
) )
# This is a "soft" (cancellable) join/reap which # This is a "soft" (cancellable) join/reap which
@ -778,7 +866,8 @@ async def mp_proc(
await soft_kill( await soft_kill(
proc, proc,
proc_waiter, proc_waiter,
portal portal,
errors,
) )
# cancel result waiter that may have been spawned in # cancel result waiter that may have been spawned in

View File

@ -21,7 +21,6 @@
from contextlib import asynccontextmanager as acm from contextlib import asynccontextmanager as acm
from functools import partial from functools import partial
import inspect import inspect
from pprint import pformat
from typing import ( from typing import (
TYPE_CHECKING, TYPE_CHECKING,
) )
@ -31,7 +30,13 @@ import warnings
import trio import trio
from .devx.debug import maybe_wait_for_debugger from .msg import (
types as msgtypes,
)
from .devx import (
debug,
pformat as _pformat,
)
from ._addr import ( from ._addr import (
UnwrappedAddress, UnwrappedAddress,
mk_uuid, mk_uuid,
@ -42,9 +47,11 @@ from ._runtime import Actor
from ._portal import Portal from ._portal import Portal
from .trionics import ( from .trionics import (
is_multi_cancelled, is_multi_cancelled,
collapse_eg,
) )
from ._exceptions import ( from ._exceptions import (
ContextCancelled, ContextCancelled,
ActorCancelled,
) )
from ._root import ( from ._root import (
open_root_actor, open_root_actor,
@ -96,7 +103,10 @@ class ActorNursery:
actor: Actor, actor: Actor,
ria_nursery: trio.Nursery, ria_nursery: trio.Nursery,
da_nursery: trio.Nursery, da_nursery: trio.Nursery,
errors: dict[tuple[str, str], BaseException], errors: dict[
msgtypes.Aid,
BaseException,
],
) -> None: ) -> None:
# self.supervisor = supervisor # TODO # self.supervisor = supervisor # TODO
@ -114,10 +124,11 @@ class ActorNursery:
] ]
] = {} ] = {}
self.cancelled: bool = False # signals when it is ok to start waiting o subactor procs
# for termination.
self._join_procs = trio.Event() self._join_procs = trio.Event()
self._at_least_one_child_in_debug: bool = False self._at_least_one_child_in_debug: bool = False
self.errors = errors self._errors = errors
self._scope_error: BaseException|None = None self._scope_error: BaseException|None = None
self.exited = trio.Event() self.exited = trio.Event()
@ -132,10 +143,53 @@ class ActorNursery:
# TODO: remove the `.run_in_actor()` API and thus this 2ndary # TODO: remove the `.run_in_actor()` API and thus this 2ndary
# nursery when that API get's moved outside this primitive! # nursery when that API get's moved outside this primitive!
self._ria_nursery = ria_nursery self._ria_nursery = ria_nursery
# TODO, factor this into a .hilevel api!
#
# portals spawned with ``run_in_actor()`` are # portals spawned with ``run_in_actor()`` are
# cancelled when their "main" result arrives # cancelled when their "main" result arrives
self._cancel_after_result_on_exit: set = set() self._cancel_after_result_on_exit: set = set()
# trio.Nursery-like cancel (request) statuses
self._cancelled_caught: bool = False
self._cancel_called: bool = False
@property
def cancel_called(self) -> bool:
'''
Records whether cancellation has been requested for this
actor-nursery by a call to `.cancel()` either due to,
- an explicit call by some actor-local-task,
- an implicit call due to an error/cancel emited inside
the `tractor.open_nursery()` block.
'''
return self._cancel_called
@property
def cancelled_caught(self) -> bool:
'''
Set when this nursery was able to cance all spawned subactors
gracefully via an (implicit) call to `.cancel()`.
'''
return self._cancelled_caught
# TODO! remove internal/test-suite usage!
@property
def cancelled(self) -> bool:
warnings.warn(
"`ActorNursery.cancelled` is now deprecated, use "
" `.cancel_called` instead.",
DeprecationWarning,
stacklevel=2,
)
return (
self._cancel_called
# and
# self._cancelled_caught
)
async def start_actor( async def start_actor(
self, self,
name: str, name: str,
@ -199,7 +253,7 @@ class ActorNursery:
loglevel=loglevel, loglevel=loglevel,
# verbatim relay this actor's registrar addresses # verbatim relay this actor's registrar addresses
registry_addrs=current_actor().reg_addrs, registry_addrs=current_actor().registry_addrs,
) )
parent_addr: UnwrappedAddress = self._actor.accept_addr parent_addr: UnwrappedAddress = self._actor.accept_addr
assert parent_addr assert parent_addr
@ -215,7 +269,7 @@ class ActorNursery:
name, name,
self, self,
subactor, subactor,
self.errors, self._errors,
bind_addrs, bind_addrs,
parent_addr, parent_addr,
_rtv, # run time vars _rtv, # run time vars
@ -313,20 +367,23 @@ class ActorNursery:
''' '''
__runtimeframe__: int = 1 # noqa __runtimeframe__: int = 1 # noqa
self.cancelled = True self._cancel_called = True
# TODO: impl a repr for spawn more compact # TODO: impl a repr for spawn more compact
# then `._children`.. # then `._children`..
children: dict = self._children children: dict = self._children
child_count: int = len(children) child_count: int = len(children)
msg: str = f'Cancelling actor nursery with {child_count} children\n' msg: str = (
f'Cancelling actor-nursery with {child_count} children\n'
)
server: IPCServer = self._actor.ipc_server server: IPCServer = self._actor.ipc_server
with trio.move_on_after(3) as cs: with trio.move_on_after(3) as cs:
async with trio.open_nursery( async with (
strict_exception_groups=False, collapse_eg(),
) as tn: trio.open_nursery() as tn,
):
subactor: Actor subactor: Actor
proc: trio.Process proc: trio.Process
@ -345,7 +402,9 @@ class ActorNursery:
else: else:
if portal is None: # actor hasn't fully spawned yet if portal is None: # actor hasn't fully spawned yet
event: trio.Event = server._peer_connected[subactor.uid] event: trio.Event = server._peer_connected[
subactor.uid
]
log.warning( log.warning(
f"{subactor.uid} never 't finished spawning?" f"{subactor.uid} never 't finished spawning?"
) )
@ -370,7 +429,20 @@ class ActorNursery:
# spawn cancel tasks for each sub-actor # spawn cancel tasks for each sub-actor
assert portal assert portal
if portal.channel.connected(): if portal.channel.connected():
tn.start_soon(portal.cancel_actor)
async def canc_subactor():
await portal.cancel_actor()
# aid: msgtypes.Aid = subactor.aid
# reprol: str = aid.reprol(sin_uuid=False)
# if not self._errors.get(aid):
# self._errors[aid] = ActorCancelled(
# message=(
# f'Sub-actor {reprol!r} cancelled gracefully by parent nursery\n'
# ),
# canceller=self._actor.aid,
# )
tn.start_soon(canc_subactor)
log.cancel(msg) log.cancel(msg)
# if we cancelled the cancel (we hung cancelling remote actors) # if we cancelled the cancel (we hung cancelling remote actors)
@ -390,10 +462,53 @@ class ActorNursery:
) in children.values(): ) in children.values():
log.warning(f"Hard killing process {proc}") log.warning(f"Hard killing process {proc}")
proc.terminate() proc.terminate()
else:
self._cancelled_caught
# mark ourselves as having (tried to have) cancelled all subactors # mark ourselves as having (tried to have) cancelled all subactors
self._join_procs.set() self._join_procs.set()
@property
def maybe_error(self) -> (
BaseException|
BaseExceptionGroup|
None
):
'''
Deliver any captured scope errors including those relayed
from subactors such as `ActorCancelled` during a non-graceful
cancellation scenario.
When more then a "graceful cancel" occurrs wrap all collected
sub-exceptions in a raised `ExceptionGroup`.
'''
scope_exc: BaseException|None = self._scope_error
# XXX NOTE, only pack an eg if there i at least one
# non-actorc exception received from a subactor, OR
# return `._scope_error` verbatim.
if (errors := self._errors):
# use `BaseExceptionGroup` as needed
excs: list[BaseException] = list(errors.values())
if (
len(excs) > 1
and
any(
type(exc) not in {ActorCancelled,}
for exc in excs
)
):
return ExceptionGroup(
'ActorNursery multi-errored with',
tuple(excs),
)
# raise the lone subactor exc
return list(excs)[0]
return scope_exc
@acm @acm
async def _open_and_supervise_one_cancels_all_nursery( async def _open_and_supervise_one_cancels_all_nursery(
@ -409,7 +524,10 @@ async def _open_and_supervise_one_cancels_all_nursery(
inner_err: BaseException|None = None inner_err: BaseException|None = None
# the collection of errors retreived from spawned sub-actors # the collection of errors retreived from spawned sub-actors
errors: dict[tuple[str, str], BaseException] = {} errors: dict[
msgtypes.Aid,
BaseException,
] = {}
# This is the outermost level "deamon actor" nursery. It is awaited # This is the outermost level "deamon actor" nursery. It is awaited
# **after** the below inner "run in actor nursery". This allows for # **after** the below inner "run in actor nursery". This allows for
@ -419,10 +537,11 @@ async def _open_and_supervise_one_cancels_all_nursery(
# `ActorNursery.start_actor()`). # `ActorNursery.start_actor()`).
# errors from this daemon actor nursery bubble up to caller # errors from this daemon actor nursery bubble up to caller
async with trio.open_nursery( try:
strict_exception_groups=False, async with (
# ^XXX^ TODO? instead unpack any RAE as per "loose" style? collapse_eg(),
) as da_nursery: trio.open_nursery() as da_nursery,
):
try: try:
# This is the inner level "run in actor" nursery. It is # This is the inner level "run in actor" nursery. It is
# awaited first since actors spawned in this way (using # awaited first since actors spawned in this way (using
@ -432,11 +551,10 @@ async def _open_and_supervise_one_cancels_all_nursery(
# immediately raised for handling by a supervisor strategy. # immediately raised for handling by a supervisor strategy.
# As such if the strategy propagates any error(s) upwards # As such if the strategy propagates any error(s) upwards
# the above "daemon actor" nursery will be notified. # the above "daemon actor" nursery will be notified.
async with trio.open_nursery( async with (
strict_exception_groups=False, collapse_eg(),
# ^XXX^ TODO? instead unpack any RAE as per "loose" style? trio.open_nursery() as ria_nursery,
) as ria_nursery: ):
an = ActorNursery( an = ActorNursery(
actor, actor,
ria_nursery, ria_nursery,
@ -453,13 +571,13 @@ async def _open_and_supervise_one_cancels_all_nursery(
# the "hard join phase". # the "hard join phase".
log.runtime( log.runtime(
'Waiting on subactors to complete:\n' 'Waiting on subactors to complete:\n'
f'{pformat(an._children)}\n' f'>}} {len(an._children)}\n'
) )
an._join_procs.set() an._join_procs.set()
except BaseException as _inner_err: except BaseException as _inner_err:
inner_err = _inner_err inner_err = _inner_err
errors[actor.uid] = inner_err # errors[actor.aid] = inner_err
# If we error in the root but the debugger is # If we error in the root but the debugger is
# engaged we don't want to prematurely kill (and # engaged we don't want to prematurely kill (and
@ -467,7 +585,7 @@ async def _open_and_supervise_one_cancels_all_nursery(
# will make the pdb repl unusable. # will make the pdb repl unusable.
# Instead try to wait for pdb to be released before # Instead try to wait for pdb to be released before
# tearing down. # tearing down.
await maybe_wait_for_debugger( await debug.maybe_wait_for_debugger(
child_in_debug=an._at_least_one_child_in_debug child_in_debug=an._at_least_one_child_in_debug
) )
@ -539,11 +657,9 @@ async def _open_and_supervise_one_cancels_all_nursery(
) as _outer_err: ) as _outer_err:
outer_err = _outer_err outer_err = _outer_err
an._scope_error = outer_err or inner_err
# XXX: yet another guard before allowing the cancel # XXX: yet another guard before allowing the cancel
# sequence in case a (single) child is in debug. # sequence in case a (single) child is in debug.
await maybe_wait_for_debugger( await debug.maybe_wait_for_debugger(
child_in_debug=an._at_least_one_child_in_debug child_in_debug=an._at_least_one_child_in_debug
) )
@ -558,44 +674,87 @@ async def _open_and_supervise_one_cancels_all_nursery(
) )
with trio.CancelScope(shield=True): with trio.CancelScope(shield=True):
await an.cancel() await an.cancel()
raise raise
finally: finally:
# No errors were raised while awaiting ".run_in_actor()" scope_exc = an._scope_error = outer_err or inner_err
# actors but those actors may have returned remote errors as # await debug.pause(shield=True)
# results (meaning they errored remotely and have relayed # if scope_exc:
# those errors back to this parent actor). The errors are # errors[actor.aid] = scope_exc
# collected in ``errors`` so cancel all actors, summarize
# all errors and re-raise. # show this frame on any internal error
if errors: if (
not an.cancelled
and
scope_exc
):
__tracebackhide__: bool = False
# NOTE, it's possible no errors were raised while
# awaiting ".run_in_actor()" actors but those
# sub-actors may have delivered remote errors as
# results, normally captured via machinery in
# `._spawn.cancel_on_completion()`.
#
# Any such remote errors are collected in `an._errors`
# which is summarized via `ActorNursery.maybe_error`
# which is maybe re-raised in an outer block (below).
#
# So here we first cancel all subactors the summarize
# all errors and then later (in that outer block)
# maybe-raise on a "non-graceful" cancellation
# outcome, normally as a summary EG.
if (
scope_exc
or
errors
):
if an._children: if an._children:
with trio.CancelScope(shield=True): with trio.CancelScope(shield=True):
await an.cancel() await an.cancel()
# use `BaseExceptionGroup` as needed # cancel outer tn so we unblock outside this
if len(errors) > 1: # finally!
raise BaseExceptionGroup( da_nursery.cance_scope.cancel()
'tractor.ActorNursery errored with', #
tuple(errors.values()), # ^TODO? still don't get why needed?
) # - an.cancel() should cause all spawn-subtasks
else: # to eventually exit?
raise list(errors.values())[0] # - also, could (instead) we sync to an event here before
# (ever) calling `an.cancel()`??
# show frame on any (likely) internal error # `da_nursery` scope end, thus a checkpoint.
if ( finally:
not an.cancelled
and an._scope_error
):
__tracebackhide__: bool = False
# da_nursery scope end - nursery checkpoint # raise any eg compiled from all subs
# final exit # ??TODO should we also adopt strict-egs here like
# `trio.Nursery`??
#
# XXX justification notes,
# docs: https://trio.readthedocs.io/en/stable/reference-core.html#historical-note-non-strict-exceptiongroups
# anthropic: https://discuss.python.org/t/using-exceptiongroup-at-anthropic-experience-report/20888
# gh: https://github.com/python-trio/trio/issues/611
if an_exc := an.maybe_error:
raise an_exc
if scope_exc := an._scope_error:
raise scope_exc
# @acm-fn scope exit
_shutdown_msg: str = (
'Actor-runtime-shutdown'
)
@acm @acm
# @api_frame # @api_frame
async def open_nursery( async def open_nursery(
hide_tb: bool = True, *, # named params only!
hide_tb: bool = False,
**kwargs, **kwargs,
# ^TODO, paramspec for `open_root_actor()` # ^TODO, paramspec for `open_root_actor()`
@ -631,16 +790,21 @@ async def open_nursery(
# mark us for teardown on exit # mark us for teardown on exit
implicit_runtime: bool = True implicit_runtime: bool = True
async with open_root_actor( async with (
# collapse_eg(hide_tb=hide_tb),
open_root_actor(
hide_tb=hide_tb, hide_tb=hide_tb,
**kwargs, **kwargs,
) as actor: ) as actor,
):
assert actor is current_actor() assert actor is current_actor()
try: try:
async with _open_and_supervise_one_cancels_all_nursery( async with (
_open_and_supervise_one_cancels_all_nursery(
actor actor
) as an: ) as an
):
# NOTE: mark this nursery as having # NOTE: mark this nursery as having
# implicitly started the root actor so # implicitly started the root actor so
@ -679,17 +843,26 @@ async def open_nursery(
): ):
__tracebackhide__: bool = False __tracebackhide__: bool = False
msg: str = (
'Actor-nursery exited\n' op_nested_an_repr: str = _pformat.nest_from_op(
f'|_{an}\n' input_op=')>',
text=f'{an}',
# nest_prefix='|_',
nest_indent=1, # under >
) )
an_msg: str = (
f'Actor-nursery exited\n'
f'{op_nested_an_repr}\n'
)
# keep noise low during std operation.
log.runtime(an_msg)
if implicit_runtime: if implicit_runtime:
# shutdown runtime if it was started and report noisly # shutdown runtime if it was started and report noisly
# that we're did so. # that we're did so.
msg += '=> Shutting down actor runtime <=\n' msg: str = (
'\n'
'\n'
f'{_shutdown_msg} )>\n'
)
log.info(msg) log.info(msg)
else:
# keep noise low during std operation.
log.runtime(msg)

View File

@ -237,9 +237,9 @@ def enable_stack_on_sig(
try: try:
import stackscope import stackscope
except ImportError: except ImportError:
log.error( log.warning(
'`stackscope` not installed for use in debug mode!\n' 'The `stackscope` lib is not installed!\n'
'`Ignoring {enable_stack_on_sig!r} call!\n' '`Ignoring enable_stack_on_sig() call!\n'
) )
return None return None

View File

@ -250,7 +250,7 @@ async def _maybe_enter_pm(
*, *,
tb: TracebackType|None = None, tb: TracebackType|None = None,
api_frame: FrameType|None = None, api_frame: FrameType|None = None,
hide_tb: bool = False, hide_tb: bool = True,
# only enter debugger REPL when returns `True` # only enter debugger REPL when returns `True`
debug_filter: Callable[ debug_filter: Callable[

View File

@ -58,6 +58,7 @@ from tractor._context import Context
from tractor import _state from tractor import _state
from tractor._exceptions import ( from tractor._exceptions import (
NoRuntime, NoRuntime,
InternalError,
) )
from tractor._state import ( from tractor._state import (
current_actor, current_actor,
@ -79,6 +80,9 @@ from ._sigint import (
sigint_shield as sigint_shield, sigint_shield as sigint_shield,
_ctlc_ignore_header as _ctlc_ignore_header _ctlc_ignore_header as _ctlc_ignore_header
) )
from ..pformat import (
ppfmt,
)
if TYPE_CHECKING: if TYPE_CHECKING:
from trio.lowlevel import Task from trio.lowlevel import Task
@ -477,12 +481,12 @@ async def _pause(
# we have to figure out how to avoid having the service nursery # we have to figure out how to avoid having the service nursery
# cancel on this task start? I *think* this works below: # cancel on this task start? I *think* this works below:
# ```python # ```python
# actor._service_n.cancel_scope.shield = shield # actor._service_tn.cancel_scope.shield = shield
# ``` # ```
# but not entirely sure if that's a sane way to implement it? # but not entirely sure if that's a sane way to implement it?
# NOTE currently we spawn the lock request task inside this # NOTE currently we spawn the lock request task inside this
# subactor's global `Actor._service_n` so that the # subactor's global `Actor._service_tn` so that the
# lifetime of the lock-request can outlive the current # lifetime of the lock-request can outlive the current
# `._pause()` scope while the user steps through their # `._pause()` scope while the user steps through their
# application code and when they finally exit the # application code and when they finally exit the
@ -506,7 +510,7 @@ async def _pause(
f'|_{task}\n' f'|_{task}\n'
) )
with trio.CancelScope(shield=shield): with trio.CancelScope(shield=shield):
req_ctx: Context = await actor._service_n.start( req_ctx: Context = await actor._service_tn.start(
partial( partial(
request_root_stdio_lock, request_root_stdio_lock,
actor_uid=actor.uid, actor_uid=actor.uid,
@ -540,7 +544,7 @@ async def _pause(
_repl_fail_report = None _repl_fail_report = None
# when the actor is mid-runtime cancellation the # when the actor is mid-runtime cancellation the
# `Actor._service_n` might get closed before we can spawn # `Actor._service_tn` might get closed before we can spawn
# the request task, so just ignore expected RTE. # the request task, so just ignore expected RTE.
elif ( elif (
isinstance(pause_err, RuntimeError) isinstance(pause_err, RuntimeError)
@ -985,7 +989,7 @@ def pause_from_sync(
# that output and assign the `repl` created above! # that output and assign the `repl` created above!
bg_task, _ = trio.from_thread.run( bg_task, _ = trio.from_thread.run(
afn=partial( afn=partial(
actor._service_n.start, actor._service_tn.start,
partial( partial(
_pause_from_bg_root_thread, _pause_from_bg_root_thread,
behalf_of_thread=thread, behalf_of_thread=thread,
@ -1153,9 +1157,10 @@ def pause_from_sync(
'use_greenback', 'use_greenback',
False, False,
): ):
raise RuntimeError( raise InternalError(
'`greenback` was never initialized in this actor!?\n\n' f'`greenback` was never initialized in this actor?\n'
f'{_state._runtime_vars}\n' f'\n'
f'{ppfmt(_state._runtime_vars)}\n'
) from rte ) from rte
raise raise

View File

@ -101,11 +101,27 @@ class Channel:
# ^XXX! ONLY set if a remote actor sends an `Error`-msg # ^XXX! ONLY set if a remote actor sends an `Error`-msg
self._closed: bool = False self._closed: bool = False
# flag set by ``Portal.cancel_actor()`` indicating remote # flag set by `Portal.cancel_actor()` indicating remote
# (possibly peer) cancellation of the far end actor # (possibly peer) cancellation of the far end actor runtime.
# runtime.
self._cancel_called: bool = False self._cancel_called: bool = False
@property
def closed(self) -> bool:
'''
Was `.aclose()` successfully called?
'''
return self._closed
@property
def cancel_called(self) -> bool:
'''
Set when `Portal.cancel_actor()` is called on a portal which
wraps this IPC channel.
'''
return self._cancel_called
@property @property
def uid(self) -> tuple[str, str]: def uid(self) -> tuple[str, str]:
''' '''
@ -169,12 +185,26 @@ class Channel:
addr, addr,
**kwargs, **kwargs,
) )
assert transport.raddr == addr # XXX, for UDS *no!* since we recv the peer-pid and build out
# a new addr..
# assert transport.raddr == addr
chan = Channel(transport=transport) chan = Channel(transport=transport)
# ?TODO, compact this into adapter level-methods?
# -[ ] would avoid extra repr-calcs if level not active?
# |_ how would the `calc_if_level` look though? func?
if log.at_least_level('runtime'):
from tractor.devx import (
pformat as _pformat,
)
chan_repr: str = _pformat.nest_from_op(
input_op='[>',
text=chan.pformat(),
nest_indent=1,
)
log.runtime( log.runtime(
f'Connected channel IPC transport\n' f'Connected channel IPC transport\n'
f'[>\n' f'{chan_repr}'
f' |_{chan}\n'
) )
return chan return chan
@ -196,9 +226,12 @@ class Channel:
self._transport.codec = orig self._transport.codec = orig
# TODO: do a .src/.dst: str for maddrs? # TODO: do a .src/.dst: str for maddrs?
def pformat(self) -> str: def pformat(
self,
privates: bool = False,
) -> str:
if not self._transport: if not self._transport:
return '<Channel with inactive transport?>' return '<Channel( with inactive transport? )>'
tpt: MsgTransport = self._transport tpt: MsgTransport = self._transport
tpt_name: str = type(tpt).__name__ tpt_name: str = type(tpt).__name__
@ -206,26 +239,35 @@ class Channel:
'connected' if self.connected() 'connected' if self.connected()
else 'closed' else 'closed'
) )
return ( repr_str: str = (
f'<Channel(\n' f'<Channel(\n'
f' |_status: {tpt_status!r}\n' f' |_status: {tpt_status!r}\n'
) + (
f' _closed={self._closed}\n' f' _closed={self._closed}\n'
f' _cancel_called={self._cancel_called}\n' f' _cancel_called={self._cancel_called}\n'
f'\n' if privates else ''
f' |_peer: {self.aid}\n' ) + ( # peer-actor (processs) section
f'\n' f' |_peer: {self.aid.reprol()!r}\n'
if self.aid else ' |_peer: <unknown>\n'
) + (
f' |_msgstream: {tpt_name}\n' f' |_msgstream: {tpt_name}\n'
f' proto={tpt.laddr.proto_key!r}\n' f' maddr: {tpt.maddr!r}\n'
f' layer={tpt.layer_key!r}\n' f' proto: {tpt.laddr.proto_key!r}\n'
f' laddr={tpt.laddr}\n' f' layer: {tpt.layer_key!r}\n'
f' raddr={tpt.raddr}\n' f' codec: {tpt.codec_key!r}\n'
f' codec={tpt.codec_key!r}\n' f' .laddr={tpt.laddr}\n'
f' stream={tpt.stream}\n' f' .raddr={tpt.raddr}\n'
f' maddr={tpt.maddr!r}\n' ) + (
f' drained={tpt.drained}\n' f' ._transport.stream={tpt.stream}\n'
f' ._transport.drained={tpt.drained}\n'
if privates else ''
) + (
f' _send_lock={tpt._send_lock.statistics()}\n' f' _send_lock={tpt._send_lock.statistics()}\n'
f')>\n' if privates else ''
) + (
')>\n'
) )
return repr_str
# NOTE: making this return a value that can be passed to # NOTE: making this return a value that can be passed to
# `eval()` is entirely **optional** FYI! # `eval()` is entirely **optional** FYI!
@ -247,6 +289,10 @@ class Channel:
def raddr(self) -> Address|None: def raddr(self) -> Address|None:
return self._transport.raddr if self._transport else None return self._transport.raddr if self._transport else None
@property
def maddr(self) -> str:
return self._transport.maddr if self._transport else '<no-tpt>'
# TODO: something like, # TODO: something like,
# `pdbp.hideframe_on(errors=[MsgTypeError])` # `pdbp.hideframe_on(errors=[MsgTypeError])`
# instead of the `try/except` hack we have rn.. # instead of the `try/except` hack we have rn..
@ -257,7 +303,7 @@ class Channel:
self, self,
payload: Any, payload: Any,
hide_tb: bool = True, hide_tb: bool = False,
) -> None: ) -> None:
''' '''
@ -434,8 +480,8 @@ class Channel:
await self.send(aid) await self.send(aid)
peer_aid: Aid = await self.recv() peer_aid: Aid = await self.recv()
log.runtime( log.runtime(
f'Received hanshake with peer actor,\n' f'Received hanshake with peer\n'
f'{peer_aid}\n' f'<= {peer_aid.reprol(sin_uuid=False)}\n'
) )
# NOTE, we always are referencing the remote peer! # NOTE, we always are referencing the remote peer!
self.aid = peer_aid self.aid = peer_aid

View File

@ -17,13 +17,38 @@
Utils to tame mp non-SC madeness Utils to tame mp non-SC madeness
''' '''
import platform
def disable_mantracker(): def disable_mantracker():
''' '''
Disable all ``multiprocessing``` "resource tracking" machinery since Disable all `multiprocessing` "resource tracking" machinery since
it's an absolute multi-threaded mess of non-SC madness. it's an absolute multi-threaded mess of non-SC madness.
''' '''
from multiprocessing import resource_tracker as mantracker from multiprocessing.shared_memory import SharedMemory
# 3.13+ only.. can pass `track=False` to disable
# all the resource tracker bs.
# https://docs.python.org/3/library/multiprocessing.shared_memory.html
if (_py_313 := (
platform.python_version_tuple()[:-1]
>=
('3', '13')
)
):
from functools import partial
return partial(
SharedMemory,
track=False,
)
# !TODO, once we drop 3.12- we can obvi remove all this!
else:
from multiprocessing import (
resource_tracker as mantracker,
)
# Tell the "resource tracker" thing to fuck off. # Tell the "resource tracker" thing to fuck off.
class ManTracker(mantracker.ResourceTracker): class ManTracker(mantracker.ResourceTracker):
@ -43,3 +68,8 @@ def disable_mantracker():
mantracker.ensure_running = mantracker._resource_tracker.ensure_running mantracker.ensure_running = mantracker._resource_tracker.ensure_running
mantracker.unregister = mantracker._resource_tracker.unregister mantracker.unregister = mantracker._resource_tracker.unregister
mantracker.getfd = mantracker._resource_tracker.getfd mantracker.getfd = mantracker._resource_tracker.getfd
# use std type verbatim
shmT = SharedMemory
return shmT

View File

@ -26,7 +26,7 @@ from contextlib import (
from functools import partial from functools import partial
from itertools import chain from itertools import chain
import inspect import inspect
from pprint import pformat import textwrap
from types import ( from types import (
ModuleType, ModuleType,
) )
@ -43,7 +43,10 @@ from trio import (
SocketListener, SocketListener,
) )
# from ..devx import debug from ..devx.pformat import (
ppfmt,
nest_from_op,
)
from .._exceptions import ( from .._exceptions import (
TransportClosed, TransportClosed,
) )
@ -141,9 +144,8 @@ async def maybe_wait_on_canced_subs(
): ):
log.cancel( log.cancel(
'Waiting on cancel request to peer..\n' 'Waiting on cancel request to peer\n'
f'c)=>\n' f'c)=> {chan.aid.reprol()}@[{chan.maddr}]\n'
f' |_{chan.aid}\n'
) )
# XXX: this is a soft wait on the channel (and its # XXX: this is a soft wait on the channel (and its
@ -179,7 +181,7 @@ async def maybe_wait_on_canced_subs(
log.warning( log.warning(
'Draining msg from disconnected peer\n' 'Draining msg from disconnected peer\n'
f'{chan_info}' f'{chan_info}'
f'{pformat(msg)}\n' f'{ppfmt(msg)}\n'
) )
# cid: str|None = msg.get('cid') # cid: str|None = msg.get('cid')
cid: str|None = msg.cid cid: str|None = msg.cid
@ -248,7 +250,7 @@ async def maybe_wait_on_canced_subs(
if children := local_nursery._children: if children := local_nursery._children:
# indent from above local-nurse repr # indent from above local-nurse repr
report += ( report += (
f' |_{pformat(children)}\n' f' |_{ppfmt(children)}\n'
) )
log.warning(report) log.warning(report)
@ -279,8 +281,9 @@ async def maybe_wait_on_canced_subs(
log.runtime( log.runtime(
f'Peer IPC broke but subproc is alive?\n\n' f'Peer IPC broke but subproc is alive?\n\n'
f'<=x {chan.aid}@{chan.raddr}\n' f'<=x {chan.aid.reprol()}@[{chan.maddr}]\n'
f' |_{proc}\n' f'\n'
f'{proc}\n'
) )
return local_nursery return local_nursery
@ -324,9 +327,10 @@ async def handle_stream_from_peer(
chan = Channel.from_stream(stream) chan = Channel.from_stream(stream)
con_status: str = ( con_status: str = (
'New inbound IPC connection <=\n' f'New inbound IPC transport connection\n'
f'|_{chan}\n' f'<=( {stream!r}\n'
) )
con_status_steps: str = ''
# initial handshake with peer phase # initial handshake with peer phase
try: try:
@ -372,7 +376,7 @@ async def handle_stream_from_peer(
if _pre_chan := server._peers.get(uid): if _pre_chan := server._peers.get(uid):
familiar: str = 'pre-existing-peer' familiar: str = 'pre-existing-peer'
uid_short: str = f'{uid[0]}[{uid[1][-6:]}]' uid_short: str = f'{uid[0]}[{uid[1][-6:]}]'
con_status += ( con_status_steps += (
f' -> Handshake with {familiar} `{uid_short}` complete\n' f' -> Handshake with {familiar} `{uid_short}` complete\n'
) )
@ -397,7 +401,7 @@ async def handle_stream_from_peer(
None, None,
) )
if event: if event:
con_status += ( con_status_steps += (
' -> Waking subactor spawn waiters: ' ' -> Waking subactor spawn waiters: '
f'{event.statistics().tasks_waiting}\n' f'{event.statistics().tasks_waiting}\n'
f' -> Registered IPC chan for child actor {uid}@{chan.raddr}\n' f' -> Registered IPC chan for child actor {uid}@{chan.raddr}\n'
@ -408,7 +412,7 @@ async def handle_stream_from_peer(
event.set() event.set()
else: else:
con_status += ( con_status_steps += (
f' -> Registered IPC chan for peer actor {uid}@{chan.raddr}\n' f' -> Registered IPC chan for peer actor {uid}@{chan.raddr}\n'
) # type: ignore ) # type: ignore
@ -422,8 +426,15 @@ async def handle_stream_from_peer(
# TODO: can we just use list-ref directly? # TODO: can we just use list-ref directly?
chans.append(chan) chans.append(chan)
con_status += ' -> Entering RPC msg loop..\n' con_status_steps += ' -> Entering RPC msg loop..\n'
log.runtime(con_status) log.runtime(
con_status
+
textwrap.indent(
con_status_steps,
prefix=' '*3, # align to first-ln
)
)
# Begin channel management - respond to remote requests and # Begin channel management - respond to remote requests and
# process received reponses. # process received reponses.
@ -456,41 +467,67 @@ async def handle_stream_from_peer(
disconnected=disconnected, disconnected=disconnected,
) )
# ``Channel`` teardown and closure sequence # `Channel` teardown and closure sequence
# drop ref to channel so it can be gc-ed and disconnected # drop ref to channel so it can be gc-ed and disconnected
con_teardown_status: str = ( #
f'IPC channel disconnected:\n' # -[x]TODO mk this be like
f'<=x uid: {chan.aid}\n' # <=x Channel(
f' |_{pformat(chan)}\n\n' # |_field: blah
# )>
op_repr: str = '<=x '
chan_repr: str = nest_from_op(
input_op=op_repr,
op_suffix='',
nest_prefix='',
text=chan.pformat(),
nest_indent=len(op_repr)-1,
rm_from_first_ln='<',
) )
con_teardown_status: str = (
f'IPC channel disconnect\n'
f'\n'
f'{chan_repr}\n'
f'\n'
)
chans.remove(chan) chans.remove(chan)
# TODO: do we need to be this pedantic? # TODO: do we need to be this pedantic?
if not chans: if not chans:
con_teardown_status += ( con_teardown_status += (
f'-> No more channels with {chan.aid}' f'-> No more channels with {chan.aid.reprol()!r}\n'
) )
server._peers.pop(uid, None) server._peers.pop(uid, None)
peers_str: str = '' if peers := list(server._peers.values()):
for uid, chans in server._peers.items(): peer_cnt: int = len(peers)
peers_str += ( if (
f'uid: {uid}\n' (first := peers[0][0]) is not chan
) and
for i, chan in enumerate(chans): not disconnected
peers_str += ( and
f' |_[{i}] {pformat(chan)}\n' peer_cnt > 1
) ):
con_teardown_status += ( con_teardown_status += (
f'-> Remaining IPC {len(server._peers)} peers: {peers_str}\n' f'-> Remaining IPC {peer_cnt-1!r} peers:\n'
)
for chans in server._peers.values():
first: Channel = chans[0]
if not (
first is chan
and
disconnected
):
con_teardown_status += (
f' |_{first.aid.reprol()!r} -> {len(chans)!r} chans\n'
) )
# No more channels to other actors (at all) registered # No more channels to other actors (at all) registered
# as connected. # as connected.
if not server._peers: if not server._peers:
con_teardown_status += ( con_teardown_status += (
'Signalling no more peer channel connections' '-> Signalling no more peer connections!\n'
) )
server._no_more_peers.set() server._no_more_peers.set()
@ -579,10 +616,10 @@ async def handle_stream_from_peer(
class Endpoint(Struct): class Endpoint(Struct):
''' '''
An instance of an IPC "bound" address where the lifetime of the An instance of an IPC "bound" address where the lifetime of an
"ability to accept connections" (from clients) and then handle "ability to accept connections" and handle the subsequent
those inbound sessions or sequences-of-packets is determined by sequence-of-packets (maybe oriented as sessions) is determined by
a (maybe pair of) nurser(y/ies). the underlying nursery scope(s).
''' '''
addr: Address addr: Address
@ -600,6 +637,24 @@ class Endpoint(Struct):
MsgTransport, # handle to encoded-msg transport stream MsgTransport, # handle to encoded-msg transport stream
] = {} ] = {}
def pformat(
self,
indent: int = 0,
privates: bool = False,
) -> str:
type_repr: str = type(self).__name__
fmtstr: str = (
# !TODO, always be ns aware!
# f'|_netns: {netns}\n'
f' |.addr: {self.addr!r}\n'
f' |_peers: {len(self.peer_tpts)}\n'
)
return (
f'<{type_repr}(\n'
f'{fmtstr}'
f')>'
)
async def start_listener(self) -> SocketListener: async def start_listener(self) -> SocketListener:
tpt_mod: ModuleType = inspect.getmodule(self.addr) tpt_mod: ModuleType = inspect.getmodule(self.addr)
lstnr: SocketListener = await tpt_mod.start_listener( lstnr: SocketListener = await tpt_mod.start_listener(
@ -639,11 +694,13 @@ class Endpoint(Struct):
class Server(Struct): class Server(Struct):
_parent_tn: Nursery _parent_tn: Nursery
_stream_handler_tn: Nursery _stream_handler_tn: Nursery
# level-triggered sig for whether "no peers are currently # level-triggered sig for whether "no peers are currently
# connected"; field is **always** set to an instance but # connected"; field is **always** set to an instance but
# initialized with `.is_set() == True`. # initialized with `.is_set() == True`.
_no_more_peers: trio.Event _no_more_peers: trio.Event
# active eps as allocated by `.listen_on()`
_endpoints: list[Endpoint] = [] _endpoints: list[Endpoint] = []
# connection tracking & mgmt # connection tracking & mgmt
@ -651,12 +708,19 @@ class Server(Struct):
str, # uaid str, # uaid
list[Channel], # IPC conns from peer list[Channel], # IPC conns from peer
] = defaultdict(list) ] = defaultdict(list)
# events-table with entries registered unset while the local
# actor is waiting on a new actor to inbound connect, often
# a parent waiting on its child just after spawn.
_peer_connected: dict[ _peer_connected: dict[
tuple[str, str], tuple[str, str],
trio.Event, trio.Event,
] = {} ] = {}
# syncs for setup/teardown sequences # syncs for setup/teardown sequences
# - null when not yet booted,
# - unset when active,
# - set when fully shutdown with 0 eps active.
_shutdown: trio.Event|None = None _shutdown: trio.Event|None = None
# TODO, maybe just make `._endpoints: list[Endpoint]` and # TODO, maybe just make `._endpoints: list[Endpoint]` and
@ -664,7 +728,6 @@ class Server(Struct):
# @property # @property
# def addrs2eps(self) -> dict[Address, Endpoint]: # def addrs2eps(self) -> dict[Address, Endpoint]:
# ... # ...
@property @property
def proto_keys(self) -> list[str]: def proto_keys(self) -> list[str]:
return [ return [
@ -690,7 +753,7 @@ class Server(Struct):
# TODO: obvi a different server type when we eventually # TODO: obvi a different server type when we eventually
# support some others XD # support some others XD
log.runtime( log.runtime(
f'Cancelling server(s) for\n' f'Cancelling server(s) for tpt-protos\n'
f'{self.proto_keys!r}\n' f'{self.proto_keys!r}\n'
) )
self._parent_tn.cancel_scope.cancel() self._parent_tn.cancel_scope.cancel()
@ -717,6 +780,14 @@ class Server(Struct):
f'protos: {tpt_protos!r}\n' f'protos: {tpt_protos!r}\n'
) )
def len_peers(
self,
) -> int:
return len([
chan.connected()
for chan in chain(*self._peers.values())
])
def has_peers( def has_peers(
self, self,
check_chans: bool = False, check_chans: bool = False,
@ -730,13 +801,11 @@ class Server(Struct):
has_peers has_peers
and and
check_chans check_chans
and
(peer_cnt := self.len_peers())
): ):
has_peers: bool = ( has_peers: bool = (
any(chan.connected() peer_cnt > 0
for chan in chain(
*self._peers.values()
)
)
and and
has_peers has_peers
) )
@ -745,10 +814,14 @@ class Server(Struct):
async def wait_for_no_more_peers( async def wait_for_no_more_peers(
self, self,
shield: bool = False, # XXX, should this even be allowed?
# -> i've seen it cause hangs on teardown
# in `test_resource_cache.py`
# _shield: bool = False,
) -> None: ) -> None:
with trio.CancelScope(shield=shield):
await self._no_more_peers.wait() await self._no_more_peers.wait()
# with trio.CancelScope(shield=_shield):
# await self._no_more_peers.wait()
async def wait_for_peer( async def wait_for_peer(
self, self,
@ -803,30 +876,66 @@ class Server(Struct):
return ev.is_set() return ev.is_set()
def pformat(self) -> str: @property
def repr_state(self) -> str:
'''
A `str`-status describing the current state of this
IPC server in terms of the current operating "phase".
'''
status = 'server is active'
if self.has_peers():
peer_cnt: int = self.len_peers()
status: str = (
f'{peer_cnt!r} peer chans'
)
else:
status: str = 'No peer chans'
if self.is_shutdown():
status: str = 'server-shutdown'
return status
def pformat(
self,
privates: bool = False,
) -> str:
eps: list[Endpoint] = self._endpoints eps: list[Endpoint] = self._endpoints
state_repr: str = ( # state_repr: str = (
f'{len(eps)!r} IPC-endpoints active' # f'{len(eps)!r} endpoints active'
) # )
fmtstr = ( fmtstr = (
f' |_state: {state_repr}\n' f' |_state: {self.repr_state!r}\n'
f' no_more_peers: {self.has_peers()}\n'
) )
if privates:
fmtstr += f' no_more_peers: {self.has_peers()}\n'
if self._shutdown is not None: if self._shutdown is not None:
shutdown_stats: EventStatistics = self._shutdown.statistics() shutdown_stats: EventStatistics = self._shutdown.statistics()
fmtstr += ( fmtstr += (
f' task_waiting_on_shutdown: {shutdown_stats}\n' f' task_waiting_on_shutdown: {shutdown_stats}\n'
) )
if eps := self._endpoints:
addrs: list[tuple] = [
ep.addr for ep in eps
]
repr_eps: str = ppfmt(addrs)
fmtstr += ( fmtstr += (
# TODO, use the `ppfmt()` helper from `modden`! f' |_endpoints: {repr_eps}\n'
f' |_endpoints: {pformat(self._endpoints)}\n' # ^TODO? how to indent closing ']'..
f' |_peers: {len(self._peers)} connected\n' )
if peers := self._peers:
fmtstr += (
f' |_peers: {len(peers)} connected\n'
) )
return ( return (
f'<IPCServer(\n' f'<Server(\n'
f'{fmtstr}' f'{fmtstr}'
f')>\n' f')>\n'
) )
@ -885,24 +994,34 @@ class Server(Struct):
) )
log.runtime( log.runtime(
f'Binding to endpoints for,\n' f'Binding endpoints\n'
f'{accept_addrs}\n' f'{ppfmt(accept_addrs)}\n'
) )
eps: list[Endpoint] = await self._parent_tn.start( eps: list[Endpoint] = await self._parent_tn.start(
partial( partial(
_serve_ipc_eps, _serve_ipc_eps,
server=self, server=self,
stream_handler_tn=stream_handler_nursery, stream_handler_tn=(
stream_handler_nursery
or
self._stream_handler_tn
),
listen_addrs=accept_addrs, listen_addrs=accept_addrs,
) )
) )
self._endpoints.extend(eps)
serv_repr: str = nest_from_op(
input_op='(>',
text=self.pformat(),
nest_indent=1,
)
log.runtime( log.runtime(
f'Started IPC endpoints\n' f'Started IPC server\n'
f'{eps}\n' f'{serv_repr}'
) )
self._endpoints.extend(eps) # XXX, a little sanity on new ep allocations
# XXX, just a little bit of sanity
group_tn: Nursery|None = None group_tn: Nursery|None = None
ep: Endpoint ep: Endpoint
for ep in eps: for ep in eps:
@ -956,9 +1075,13 @@ async def _serve_ipc_eps(
stream_handler_tn=stream_handler_tn, stream_handler_tn=stream_handler_tn,
) )
try: try:
ep_sclang: str = nest_from_op(
input_op='>[',
text=f'{ep.pformat()}',
)
log.runtime( log.runtime(
f'Starting new endpoint listener\n' f'Starting new endpoint listener\n'
f'{ep}\n' f'{ep_sclang}\n'
) )
listener: trio.abc.Listener = await ep.start_listener() listener: trio.abc.Listener = await ep.start_listener()
assert listener is ep._listener assert listener is ep._listener
@ -996,17 +1119,6 @@ async def _serve_ipc_eps(
handler_nursery=stream_handler_tn handler_nursery=stream_handler_tn
) )
) )
# TODO, wow make this message better! XD
log.runtime(
'Started server(s)\n'
+
'\n'.join([f'|_{addr}' for addr in listen_addrs])
)
log.runtime(
f'Started IPC endpoints\n'
f'{eps}\n'
)
task_status.started( task_status.started(
eps, eps,
) )
@ -1037,20 +1149,23 @@ async def open_ipc_server(
async with maybe_open_nursery( async with maybe_open_nursery(
nursery=parent_tn, nursery=parent_tn,
) as rent_tn: ) as parent_tn:
no_more_peers = trio.Event() no_more_peers = trio.Event()
no_more_peers.set() no_more_peers.set()
ipc_server = IPCServer( ipc_server = IPCServer(
_parent_tn=rent_tn, _parent_tn=parent_tn,
_stream_handler_tn=stream_handler_tn or rent_tn, _stream_handler_tn=(
stream_handler_tn
or
parent_tn
),
_no_more_peers=no_more_peers, _no_more_peers=no_more_peers,
) )
try: try:
yield ipc_server yield ipc_server
log.runtime( log.runtime(
f'Waiting on server to shutdown or be cancelled..\n' 'Server-tn running until terminated\n'
f'{ipc_server}'
) )
# TODO? when if ever would we want/need this? # TODO? when if ever would we want/need this?
# with trio.CancelScope(shield=True): # with trio.CancelScope(shield=True):

View File

@ -23,14 +23,15 @@ considered optional within the context of this runtime-library.
""" """
from __future__ import annotations from __future__ import annotations
from multiprocessing import shared_memory as shm
from multiprocessing.shared_memory import (
# SharedMemory,
ShareableList,
)
import platform
from sys import byteorder from sys import byteorder
import time import time
from typing import Optional from typing import Optional
from multiprocessing import shared_memory as shm
from multiprocessing.shared_memory import (
SharedMemory,
ShareableList,
)
from msgspec import ( from msgspec import (
Struct, Struct,
@ -61,7 +62,7 @@ except ImportError:
log = get_logger(__name__) log = get_logger(__name__)
disable_mantracker() SharedMemory = disable_mantracker()
class SharedInt: class SharedInt:
@ -789,10 +790,22 @@ def open_shm_list(
readonly=readonly, readonly=readonly,
) )
# TODO, factor into a @actor_fixture acm-API?
# -[ ] also `@maybe_actor_fixture()` which inludes
# the .current_actor() convenience check?
# |_ orr can that just be in the sin-maybe-version?
#
# "close" attached shm on actor teardown # "close" attached shm on actor teardown
try: try:
actor = tractor.current_actor() actor = tractor.current_actor()
actor.lifetime_stack.callback(shml.shm.close) actor.lifetime_stack.callback(shml.shm.close)
# XXX on 3.13+ we don't need to call this?
# -> bc we pass `track=False` for `SharedMemeory` orr?
if (
platform.python_version_tuple()[:-1] < ('3', '13')
):
actor.lifetime_stack.callback(shml.shm.unlink) actor.lifetime_stack.callback(shml.shm.unlink)
except RuntimeError: except RuntimeError:
log.warning('tractor runtime not active, skipping teardown steps') log.warning('tractor runtime not active, skipping teardown steps')

View File

@ -160,10 +160,9 @@ async def start_listener(
Start a TCP socket listener on the given `TCPAddress`. Start a TCP socket listener on the given `TCPAddress`.
''' '''
log.info( log.runtime(
f'Attempting to bind TCP socket\n' f'Trying socket bind\n'
f'>[\n' f'>[ {addr}\n'
f'|_{addr}\n'
) )
# ?TODO, maybe we should just change the lower-level call this is # ?TODO, maybe we should just change the lower-level call this is
# using internall per-listener? # using internall per-listener?
@ -178,11 +177,10 @@ async def start_listener(
assert len(listeners) == 1 assert len(listeners) == 1
listener = listeners[0] listener = listeners[0]
host, port = listener.socket.getsockname()[:2] host, port = listener.socket.getsockname()[:2]
bound_addr: TCPAddress = type(addr).from_addr((host, port))
log.info( log.info(
f'Listening on TCP socket\n' f'Listening on TCP socket\n'
f'[>\n' f'[> {bound_addr}\n'
f' |_{addr}\n'
) )
return listener return listener

View File

@ -430,20 +430,25 @@ class MsgpackTransport(MsgTransport):
return await self.stream.send_all(size + bytes_data) return await self.stream.send_all(size + bytes_data)
except ( except (
trio.BrokenResourceError, trio.BrokenResourceError,
) as bre: trio.ClosedResourceError,
trans_err = bre ) as _re:
trans_err = _re
tpt_name: str = f'{type(self).__name__!r}' tpt_name: str = f'{type(self).__name__!r}'
match trans_err: match trans_err:
case trio.BrokenResourceError() if (
'[Errno 32] Broken pipe' in trans_err.args[0] # XXX, specifc to UDS transport and its,
# ^XXX, specifc to UDS transport and its,
# well, "speediness".. XD # well, "speediness".. XD
# |_ likely todo with races related to how fast # |_ likely todo with races related to how fast
# the socket is setup/torn-down on linux # the socket is setup/torn-down on linux
# as it pertains to rando pings from the # as it pertains to rando pings from the
# `.discovery` subsys and protos. # `.discovery` subsys and protos.
case trio.BrokenResourceError() if (
'[Errno 32] Broken pipe'
in
trans_err.args[0]
): ):
raise TransportClosed.from_src_exc( tpt_closed = TransportClosed.from_src_exc(
message=( message=(
f'{tpt_name} already closed by peer\n' f'{tpt_name} already closed by peer\n'
), ),
@ -451,14 +456,31 @@ class MsgpackTransport(MsgTransport):
src_exc=trans_err, src_exc=trans_err,
raise_on_report=True, raise_on_report=True,
loglevel='transport', loglevel='transport',
) from bre )
raise tpt_closed from trans_err
# case trio.ClosedResourceError() if (
# 'this socket was already closed'
# in
# trans_err.args[0]
# ):
# tpt_closed = TransportClosed.from_src_exc(
# message=(
# f'{tpt_name} already closed by peer\n'
# ),
# body=f'{self}\n',
# src_exc=trans_err,
# raise_on_report=True,
# loglevel='transport',
# )
# raise tpt_closed from trans_err
# unless the disconnect condition falls under "a # unless the disconnect condition falls under "a
# normal operation breakage" we usualy console warn # normal operation breakage" we usualy console warn
# about it. # about it.
case _: case _:
log.exception( log.exception(
'{tpt_name} layer failed pre-send ??\n' f'{tpt_name} layer failed pre-send ??\n'
) )
raise trans_err raise trans_err
@ -503,7 +525,7 @@ class MsgpackTransport(MsgTransport):
def pformat(self) -> str: def pformat(self) -> str:
return ( return (
f'<{type(self).__name__}(\n' f'<{type(self).__name__}(\n'
f' |_peers: 2\n' f' |_peers: 1\n'
f' laddr: {self._laddr}\n' f' laddr: {self._laddr}\n'
f' raddr: {self._raddr}\n' f' raddr: {self._raddr}\n'
# f'\n' # f'\n'

View File

@ -18,6 +18,9 @@ Unix Domain Socket implementation of tractor.ipc._transport.MsgTransport protoco
''' '''
from __future__ import annotations from __future__ import annotations
from contextlib import (
contextmanager as cm,
)
from pathlib import Path from pathlib import Path
import os import os
from socket import ( from socket import (
@ -29,6 +32,7 @@ from socket import (
) )
import struct import struct
from typing import ( from typing import (
Type,
TYPE_CHECKING, TYPE_CHECKING,
ClassVar, ClassVar,
) )
@ -99,8 +103,6 @@ class UDSAddress(
self.filedir self.filedir
or or
self.def_bindspace self.def_bindspace
# or
# get_rt_dir()
) )
@property @property
@ -205,12 +207,35 @@ class UDSAddress(
f']' f']'
) )
@cm
def _reraise_as_connerr(
src_excs: tuple[Type[Exception]],
addr: UDSAddress,
):
try:
yield
except src_excs as src_exc:
raise ConnectionError(
f'Bad UDS socket-filepath-as-address ??\n'
f'{addr}\n'
f' |_sockpath: {addr.sockpath}\n'
f'\n'
f'from src: {src_exc!r}\n'
) from src_exc
async def start_listener( async def start_listener(
addr: UDSAddress, addr: UDSAddress,
**kwargs, **kwargs,
) -> SocketListener: ) -> SocketListener:
# sock = addr._sock = socket.socket( '''
Start listening for inbound connections via
a `trio.SocketListener` (task) which `socket.bind()`s on `addr`.
Note, if the `UDSAddress.bindspace: Path` directory dne it is
implicitly created.
'''
sock = socket.socket( sock = socket.socket(
socket.AF_UNIX, socket.AF_UNIX,
socket.SOCK_STREAM socket.SOCK_STREAM
@ -221,17 +246,25 @@ async def start_listener(
f'|_{addr}\n' f'|_{addr}\n'
) )
# ?TODO? should we use the `actor.lifetime_stack`
# to rm on shutdown?
bindpath: Path = addr.sockpath bindpath: Path = addr.sockpath
try: if not (bs := addr.bindspace).is_dir():
await sock.bind(str(bindpath)) log.info(
except ( 'Creating bindspace dir in file-sys\n'
f'>{{\n'
f'|_{bs!r}\n'
)
bs.mkdir()
with _reraise_as_connerr(
src_excs=(
FileNotFoundError, FileNotFoundError,
) as fdne: OSError,
raise ConnectionError( ),
f'Bad UDS socket-filepath-as-address ??\n' addr=addr
f'{addr}\n' ):
f' |_sockpath: {addr.sockpath}\n' await sock.bind(str(bindpath))
) from fdne
sock.listen(1) sock.listen(1)
log.info( log.info(
@ -356,27 +389,30 @@ class MsgpackUDSStream(MsgpackTransport):
# `.setsockopt()` call tells the OS provide it; the client # `.setsockopt()` call tells the OS provide it; the client
# pid can then be read on server/listen() side via # pid can then be read on server/listen() side via
# `get_peer_info()` above. # `get_peer_info()` above.
try:
with _reraise_as_connerr(
src_excs=(
FileNotFoundError,
),
addr=addr
):
stream = await open_unix_socket_w_passcred( stream = await open_unix_socket_w_passcred(
str(sockpath), str(sockpath),
**kwargs **kwargs
) )
except (
FileNotFoundError,
) as fdne:
raise ConnectionError(
f'Bad UDS socket-filepath-as-address ??\n'
f'{addr}\n'
f' |_sockpath: {sockpath}\n'
) from fdne
stream = MsgpackUDSStream( tpt_stream = MsgpackUDSStream(
stream, stream,
prefix_size=prefix_size, prefix_size=prefix_size,
codec=codec codec=codec
) )
stream._raddr = addr # XXX assign from new addrs after peer-PID extract!
return stream (
tpt_stream._laddr,
tpt_stream._raddr,
) = cls.get_stream_addrs(stream)
return tpt_stream
@classmethod @classmethod
def get_stream_addrs( def get_stream_addrs(

View File

@ -81,10 +81,35 @@ BOLD_PALETTE = {
} }
def at_least_level(
log: Logger|LoggerAdapter,
level: int|str,
) -> bool:
'''
Predicate to test if a given level is active.
'''
if isinstance(level, str):
level: int = CUSTOM_LEVELS[level.upper()]
if log.getEffectiveLevel() <= level:
return True
return False
# TODO: this isn't showing the correct '{filename}' # TODO: this isn't showing the correct '{filename}'
# as it did before.. # as it did before..
class StackLevelAdapter(LoggerAdapter): class StackLevelAdapter(LoggerAdapter):
def at_least_level(
self,
level: str,
) -> bool:
return at_least_level(
log=self,
level=level,
)
def transport( def transport(
self, self,
msg: str, msg: str,
@ -401,19 +426,3 @@ def get_loglevel() -> str:
# global module logger for tractor itself # global module logger for tractor itself
log: StackLevelAdapter = get_logger('tractor') log: StackLevelAdapter = get_logger('tractor')
def at_least_level(
log: Logger|LoggerAdapter,
level: int|str,
) -> bool:
'''
Predicate to test if a given level is active.
'''
if isinstance(level, str):
level: int = CUSTOM_LEVELS[level.upper()]
if log.getEffectiveLevel() <= level:
return True
return False

View File

@ -210,12 +210,14 @@ class PldRx(Struct):
match msg: match msg:
case Return()|Error(): case Return()|Error():
log.runtime( log.runtime(
f'Rxed final outcome msg\n' f'Rxed final-outcome msg\n'
f'\n'
f'{msg}\n' f'{msg}\n'
) )
case Stop(): case Stop():
log.runtime( log.runtime(
f'Rxed stream stopped msg\n' f'Rxed stream stopped msg\n'
f'\n'
f'{msg}\n' f'{msg}\n'
) )
if passthrough_non_pld_msgs: if passthrough_non_pld_msgs:
@ -261,8 +263,9 @@ class PldRx(Struct):
if ( if (
type(msg) is Return type(msg) is Return
): ):
log.info( log.runtime(
f'Rxed final result msg\n' f'Rxed final result msg\n'
f'\n'
f'{msg}\n' f'{msg}\n'
) )
return self.decode_pld( return self.decode_pld(
@ -304,10 +307,13 @@ class PldRx(Struct):
try: try:
pld: PayloadT = self._pld_dec.decode(pld) pld: PayloadT = self._pld_dec.decode(pld)
log.runtime( log.runtime(
'Decoded msg payload\n\n' f'Decoded payload for\n'
# f'\n'
f'{msg}\n' f'{msg}\n'
f'where payload decoded as\n' # ^TODO?, ideally just render with `,
f'|_pld={pld!r}\n' # pld={decode}` in the `msg.pformat()`??
f'where, '
f'{type(msg).__name__}.pld={pld!r}\n'
) )
return pld return pld
except TypeError as typerr: except TypeError as typerr:
@ -494,7 +500,8 @@ def limit_plds(
finally: finally:
log.runtime( log.runtime(
'Reverted to previous payload-decoder\n\n' f'Reverted to previous payload-decoder\n'
f'\n'
f'{orig_pldec}\n' f'{orig_pldec}\n'
) )
# sanity on orig settings # sanity on orig settings
@ -629,7 +636,8 @@ async def drain_to_final_msg(
(local_cs := rent_n.cancel_scope).cancel_called (local_cs := rent_n.cancel_scope).cancel_called
): ):
log.cancel( log.cancel(
'RPC-ctx cancelled by local-parent scope during drain!\n\n' f'RPC-ctx cancelled by local-parent scope during drain!\n'
f'\n'
f'c}}>\n' f'c}}>\n'
f' |_{rent_n}\n' f' |_{rent_n}\n'
f' |_.cancel_scope = {local_cs}\n' f' |_.cancel_scope = {local_cs}\n'
@ -663,7 +671,8 @@ async def drain_to_final_msg(
# final result arrived! # final result arrived!
case Return(): case Return():
log.runtime( log.runtime(
'Context delivered final draining msg:\n' f'Context delivered final draining msg\n'
f'\n'
f'{pretty_struct.pformat(msg)}' f'{pretty_struct.pformat(msg)}'
) )
ctx._result: Any = pld ctx._result: Any = pld
@ -697,12 +706,14 @@ async def drain_to_final_msg(
): ):
log.cancel( log.cancel(
'Cancelling `MsgStream` drain since ' 'Cancelling `MsgStream` drain since '
f'{reason}\n\n' f'{reason}\n'
f'\n'
f'<= {ctx.chan.uid}\n' f'<= {ctx.chan.uid}\n'
f' |_{ctx._nsf}()\n\n' f' |_{ctx._nsf}()\n'
f'\n'
f'=> {ctx._task}\n' f'=> {ctx._task}\n'
f' |_{ctx._stream}\n\n' f' |_{ctx._stream}\n'
f'\n'
f'{pretty_struct.pformat(msg)}\n' f'{pretty_struct.pformat(msg)}\n'
) )
break break
@ -739,7 +750,8 @@ async def drain_to_final_msg(
case Stop(): case Stop():
pre_result_drained.append(msg) pre_result_drained.append(msg)
log.runtime( # normal/expected shutdown transaction log.runtime( # normal/expected shutdown transaction
'Remote stream terminated due to "stop" msg:\n\n' f'Remote stream terminated due to "stop" msg\n'
f'\n'
f'{pretty_struct.pformat(msg)}\n' f'{pretty_struct.pformat(msg)}\n'
) )
continue continue
@ -814,7 +826,8 @@ async def drain_to_final_msg(
else: else:
log.cancel( log.cancel(
'Skipping `MsgStream` drain since final outcome is set\n\n' f'Skipping `MsgStream` drain since final outcome is set\n'
f'\n'
f'{ctx.outcome}\n' f'{ctx.outcome}\n'
) )

View File

@ -154,6 +154,39 @@ class Aid(
# should also include at least `.pid` (equiv to port for tcp) # should also include at least `.pid` (equiv to port for tcp)
# and/or host-part always? # and/or host-part always?
@property
def uid(self) -> tuple[str, str]:
'''
Legacy actor "unique-id" pair format.
'''
return (
self.name,
self.uuid,
)
def reprol(
self,
sin_uuid: bool = True,
) -> str:
if not sin_uuid:
return (
f'{self.name}[{self.uuid[:6]}]@{self.pid!r}'
)
return (
f'{self.name}@{self.pid!r}'
)
# mk hashable via `.uuid`
def __hash__(self) -> int:
return hash(self.uuid)
def __eq__(self, other: Aid) -> bool:
return self.uuid == other.uuid
# use pretty fmt since often repr-ed for console/log
__repr__ = pretty_struct.Struct.__repr__
class SpawnSpec( class SpawnSpec(
pretty_struct.Struct, pretty_struct.Struct,

View File

@ -130,6 +130,7 @@ class LinkedTaskChannel(
_trio_task: trio.Task _trio_task: trio.Task
_aio_task_complete: trio.Event _aio_task_complete: trio.Event
_closed_by_aio_task: bool = False
_suppress_graceful_exits: bool = True _suppress_graceful_exits: bool = True
_trio_err: BaseException|None = None _trio_err: BaseException|None = None
@ -208,10 +209,15 @@ class LinkedTaskChannel(
async def aclose(self) -> None: async def aclose(self) -> None:
await self._from_aio.aclose() await self._from_aio.aclose()
def started( # ?TODO? async version of this?
def started_nowait(
self, self,
val: Any = None, val: Any = None,
) -> None: ) -> None:
'''
Synchronize aio-side with its trio-parent.
'''
self._aio_started_val = val self._aio_started_val = val
return self._to_trio.send_nowait(val) return self._to_trio.send_nowait(val)
@ -242,6 +248,7 @@ class LinkedTaskChannel(
# cycle on the trio side? # cycle on the trio side?
# await trio.lowlevel.checkpoint() # await trio.lowlevel.checkpoint()
return await self._from_aio.receive() return await self._from_aio.receive()
except BaseException as err: except BaseException as err:
async with translate_aio_errors( async with translate_aio_errors(
chan=self, chan=self,
@ -319,7 +326,7 @@ def _run_asyncio_task(
qsize: int = 1, qsize: int = 1,
provide_channels: bool = False, provide_channels: bool = False,
suppress_graceful_exits: bool = True, suppress_graceful_exits: bool = True,
hide_tb: bool = False, hide_tb: bool = True,
**kwargs, **kwargs,
) -> LinkedTaskChannel: ) -> LinkedTaskChannel:
@ -347,18 +354,6 @@ def _run_asyncio_task(
# value otherwise it would just return ;P # value otherwise it would just return ;P
assert qsize > 1 assert qsize > 1
if provide_channels:
assert 'to_trio' in args
# allow target func to accept/stream results manually by name
if 'to_trio' in args:
kwargs['to_trio'] = to_trio
if 'from_trio' in args:
kwargs['from_trio'] = from_trio
coro = func(**kwargs)
trio_task: trio.Task = trio.lowlevel.current_task() trio_task: trio.Task = trio.lowlevel.current_task()
trio_cs = trio.CancelScope() trio_cs = trio.CancelScope()
aio_task_complete = trio.Event() aio_task_complete = trio.Event()
@ -373,6 +368,25 @@ def _run_asyncio_task(
_suppress_graceful_exits=suppress_graceful_exits, _suppress_graceful_exits=suppress_graceful_exits,
) )
# allow target func to accept/stream results manually by name
if 'to_trio' in args:
kwargs['to_trio'] = to_trio
if 'from_trio' in args:
kwargs['from_trio'] = from_trio
if 'chan' in args:
kwargs['chan'] = chan
if provide_channels:
assert (
'to_trio' in args
or
'chan' in args
)
coro = func(**kwargs)
async def wait_on_coro_final_result( async def wait_on_coro_final_result(
to_trio: trio.MemorySendChannel, to_trio: trio.MemorySendChannel,
coro: Awaitable, coro: Awaitable,
@ -445,9 +459,23 @@ def _run_asyncio_task(
f'Task exited with final result: {result!r}\n' f'Task exited with final result: {result!r}\n'
) )
# only close the sender side which will relay # XXX ALWAYS close the child-`asyncio`-task-side's
# a `trio.EndOfChannel` to the trio (consumer) side. # `to_trio` handle which will in turn relay
# a `trio.EndOfChannel` to the `trio`-parent.
# Consequently the parent `trio` task MUST ALWAYS
# check for any `chan._aio_err` to be raised when it
# receives an EoC.
#
# NOTE, there are 2 EoC cases,
# - normal/graceful EoC due to the aio-side actually
# terminating its "streaming", but the task did not
# error and is not yet complete.
#
# - the aio-task terminated and we specially mark the
# closure as due to the `asyncio.Task`'s exit.
#
to_trio.close() to_trio.close()
chan._closed_by_aio_task = True
aio_task_complete.set() aio_task_complete.set()
log.runtime( log.runtime(
@ -645,8 +673,9 @@ def _run_asyncio_task(
not trio_cs.cancel_called not trio_cs.cancel_called
): ):
log.cancel( log.cancel(
f'Cancelling `trio` side due to aio-side src exc\n' f'Cancelling trio-side due to aio-side src exc\n'
f'{curr_aio_err}\n' f'\n'
f'{curr_aio_err!r}\n'
f'\n' f'\n'
f'(c>\n' f'(c>\n'
f' |_{trio_task}\n' f' |_{trio_task}\n'
@ -758,6 +787,7 @@ async def translate_aio_errors(
aio_done_before_trio: bool = aio_task.done() aio_done_before_trio: bool = aio_task.done()
assert aio_task assert aio_task
trio_err: BaseException|None = None trio_err: BaseException|None = None
eoc: trio.EndOfChannel|None = None
try: try:
yield # back to one of the cross-loop apis yield # back to one of the cross-loop apis
except trio.Cancelled as taskc: except trio.Cancelled as taskc:
@ -789,12 +819,48 @@ async def translate_aio_errors(
# ) # )
# raise # raise
# XXX always passthrough EoC since this translator is often # XXX EoC is a special SIGNAL from the aio-side here!
# called from `LinkedTaskChannel.receive()` which we want # There are 2 cases to handle:
# passthrough and further we have no special meaning for it in # 1. the "EoC passthrough" case.
# terms of relaying errors or signals from the aio side! # - the aio-task actually closed the channel "gracefully" and
except trio.EndOfChannel as eoc: # the trio-task should unwind any ongoing channel
# iteration/receiving,
# |_this exc-translator wraps calls to `LinkedTaskChannel.receive()`
# in which case we want to relay the actual "end-of-chan" for
# iteration purposes.
#
# 2. relaying the "asyncio.Task termination" case.
# - if the aio-task terminates, maybe with an error, AND the
# `open_channel_from()` API was used, it will always signal
# that termination.
# |_`wait_on_coro_final_result()` always calls
# `to_trio.close()` when `provide_channels=True` so we need to
# always check if there is an aio-side exc which needs to be
# relayed to the parent trio side!
# |_in this case the special `chan._closed_by_aio_task` is
# ALWAYS set.
#
except trio.EndOfChannel as _eoc:
eoc = _eoc
if (
chan._closed_by_aio_task
and
aio_err
):
log.cancel(
f'The asyncio-child task terminated due to error\n'
f'{aio_err!r}\n'
)
chan._trio_to_raise = aio_err
trio_err = chan._trio_err = eoc trio_err = chan._trio_err = eoc
#
# ?TODO?, raise something like a,
# chan._trio_to_raise = AsyncioErrored()
# BUT, with the tb rewritten to reflect the underlying
# call stack?
else:
trio_err = chan._trio_err = eoc
raise eoc raise eoc
# NOTE ALSO SEE the matching note in the `cancel_trio()` asyncio # NOTE ALSO SEE the matching note in the `cancel_trio()` asyncio
@ -1047,7 +1113,7 @@ async def translate_aio_errors(
# #
if wait_on_aio_task: if wait_on_aio_task:
await chan._aio_task_complete.wait() await chan._aio_task_complete.wait()
log.info( log.debug(
'asyncio-task is done and unblocked trio-side!\n' 'asyncio-task is done and unblocked trio-side!\n'
) )
@ -1064,11 +1130,17 @@ async def translate_aio_errors(
trio_to_raise: ( trio_to_raise: (
AsyncioCancelled| AsyncioCancelled|
AsyncioTaskExited| AsyncioTaskExited|
Exception| # relayed from aio-task
None None
) = chan._trio_to_raise ) = chan._trio_to_raise
raise_from: Exception = (
trio_err if (aio_err is trio_to_raise)
else aio_err
)
if not suppress_graceful_exits: if not suppress_graceful_exits:
raise trio_to_raise from (aio_err or trio_err) raise trio_to_raise from raise_from
if trio_to_raise: if trio_to_raise:
match ( match (
@ -1101,7 +1173,7 @@ async def translate_aio_errors(
) )
return return
case _: case _:
raise trio_to_raise from (aio_err or trio_err) raise trio_to_raise from raise_from
# Check if the asyncio-side is the cause of the trio-side # Check if the asyncio-side is the cause of the trio-side
# error. # error.
@ -1167,7 +1239,6 @@ async def run_task(
@acm @acm
async def open_channel_from( async def open_channel_from(
target: Callable[..., Any], target: Callable[..., Any],
suppress_graceful_exits: bool = True, suppress_graceful_exits: bool = True,
**target_kwargs, **target_kwargs,
@ -1201,7 +1272,6 @@ async def open_channel_from(
# deliver stream handle upward # deliver stream handle upward
yield first, chan yield first, chan
except trio.Cancelled as taskc: except trio.Cancelled as taskc:
# await tractor.pause(shield=True) # ya it worx ;)
if cs.cancel_called: if cs.cancel_called:
if isinstance(chan._trio_to_raise, AsyncioCancelled): if isinstance(chan._trio_to_raise, AsyncioCancelled):
log.cancel( log.cancel(

View File

@ -31,7 +31,7 @@ from ._broadcast import (
) )
from ._beg import ( from ._beg import (
collapse_eg as collapse_eg, collapse_eg as collapse_eg,
maybe_collapse_eg as maybe_collapse_eg, get_collapsed_eg as get_collapsed_eg,
is_multi_cancelled as is_multi_cancelled, is_multi_cancelled as is_multi_cancelled,
) )
from ._taskc import ( from ._taskc import (

View File

@ -15,8 +15,9 @@
# along with this program. If not, see <https://www.gnu.org/licenses/>. # along with this program. If not, see <https://www.gnu.org/licenses/>.
''' '''
`BaseExceptionGroup` related utils and helpers pertaining to `BaseExceptionGroup` utils and helpers pertaining to
first-class-`trio` from a historical perspective B) first-class-`trio` from a "historical" perspective, like "loose
exception group" task-nurseries.
''' '''
from contextlib import ( from contextlib import (
@ -24,27 +25,84 @@ from contextlib import (
) )
from typing import ( from typing import (
Literal, Literal,
Type,
) )
import trio import trio
# from trio._core._concat_tb import (
# concat_tb,
# )
def maybe_collapse_eg( # XXX NOTE
beg: BaseExceptionGroup, # taken verbatim from `trio._core._run` except,
# - remove the NONSTRICT_EXCEPTIONGROUP_NOTE deprecation-note
# guard-check; we know we want an explicit collapse.
# - mask out tb rewriting in collapse case, i don't think it really
# matters?
#
def collapse_exception_group(
excgroup: BaseExceptionGroup[BaseException],
) -> BaseException: ) -> BaseException:
"""Recursively collapse any single-exception groups into that single contained
exception.
"""
exceptions = list(excgroup.exceptions)
modified = False
for i, exc in enumerate(exceptions):
if isinstance(exc, BaseExceptionGroup):
new_exc = collapse_exception_group(exc)
if new_exc is not exc:
modified = True
exceptions[i] = new_exc
if (
len(exceptions) == 1
and isinstance(excgroup, BaseExceptionGroup)
# XXX trio's loose-setting condition..
# and NONSTRICT_EXCEPTIONGROUP_NOTE in getattr(excgroup, "__notes__", ())
):
# exceptions[0].__traceback__ = concat_tb(
# excgroup.__traceback__,
# exceptions[0].__traceback__,
# )
return exceptions[0]
elif modified:
return excgroup.derive(exceptions)
else:
return excgroup
def get_collapsed_eg(
beg: BaseExceptionGroup,
) -> BaseException|None:
''' '''
If the input beg can collapse to a single non-eg sub-exception, If the input beg can collapse to a single sub-exception which is
return it instead. itself **not** an eg, return it.
''' '''
if len(excs := beg.exceptions) == 1: maybe_exc = collapse_exception_group(beg)
return excs[0] if maybe_exc is beg:
return None
return beg return maybe_exc
@acm @acm
async def collapse_eg(): async def collapse_eg(
hide_tb: bool = True,
# XXX, for ex. will always show begs containing single taskc
ignore: set[Type[BaseException]] = {
# trio.Cancelled,
},
add_notes: bool = True,
bp: bool = False,
):
''' '''
If `BaseExceptionGroup` raised in the body scope is If `BaseExceptionGroup` raised in the body scope is
"collapse-able" (in the same way that "collapse-able" (in the same way that
@ -52,15 +110,58 @@ async def collapse_eg():
only raise the lone emedded non-eg in in place. only raise the lone emedded non-eg in in place.
''' '''
__tracebackhide__: bool = hide_tb
try: try:
yield yield
except* BaseException as beg: except BaseExceptionGroup as _beg:
if ( beg = _beg
exc := maybe_collapse_eg(beg)
) is not beg:
raise exc
raise beg if (
bp
and
len(beg.exceptions) > 1
):
import tractor
if tractor.current_actor(
err_on_no_runtime=False,
):
await tractor.pause(shield=True)
else:
breakpoint()
if (
(exc := get_collapsed_eg(beg))
and
type(exc) not in ignore
):
# TODO? report number of nested groups it was collapsed
# *from*?
if add_notes:
from_group_note: str = (
'( ^^^ this exc was collapsed from a group ^^^ )\n'
)
if (
from_group_note
not in
getattr(exc, "__notes__", ())
):
exc.add_note(from_group_note)
# raise exc
# ^^ this will leave the orig beg tb above with the
# "during the handling of <beg> the following.."
# So, instead do..
#
if cause := exc.__cause__:
raise exc from cause
else:
# suppress "during handling of <the beg>"
# output in tb/console.
raise exc from None
# keep original
raise # beg
def is_multi_cancelled( def is_multi_cancelled(

View File

@ -31,7 +31,6 @@ from typing import (
AsyncIterator, AsyncIterator,
Callable, Callable,
Hashable, Hashable,
Optional,
Sequence, Sequence,
TypeVar, TypeVar,
TYPE_CHECKING, TYPE_CHECKING,
@ -41,6 +40,9 @@ import trio
from tractor._state import current_actor from tractor._state import current_actor
from tractor.log import get_logger from tractor.log import get_logger
# from ._beg import collapse_eg # from ._beg import collapse_eg
# from ._taskc import (
# maybe_raise_from_masking_exc,
# )
if TYPE_CHECKING: if TYPE_CHECKING:
@ -106,6 +108,9 @@ async def _enter_and_wait(
async def gather_contexts( async def gather_contexts(
mngrs: Sequence[AsyncContextManager[T]], mngrs: Sequence[AsyncContextManager[T]],
# caller can provide their own scope
tn: trio.Nursery|None = None,
) -> AsyncGenerator[ ) -> AsyncGenerator[
tuple[ tuple[
T | None, T | None,
@ -148,17 +153,22 @@ async def gather_contexts(
'`.trionics.gather_contexts()` input mngrs is empty?\n' '`.trionics.gather_contexts()` input mngrs is empty?\n'
'\n' '\n'
'Did try to use inline generator syntax?\n' 'Did try to use inline generator syntax?\n'
'Use a non-lazy iterator or sequence-type intead!\n' 'Check that list({mngrs}) works!\n'
# 'or sequence-type intead!\n'
# 'Use a non-lazy iterator or sequence-type intead!\n'
) )
try:
async with ( async with (
#
# ?TODO, does including these (eg-collapsing,
# taskc-unmasking) improve tb noise-reduction/legibility?
#
# collapse_eg(), # collapse_eg(),
trio.open_nursery( maybe_open_nursery(
strict_exception_groups=False, nursery=tn,
# ^XXX^ TODO? soo roll our own then ??
# -> since we kinda want the "if only one `.exception` then
# just raise that" interface?
) as tn, ) as tn,
# maybe_raise_from_masking_exc(),
): ):
for mngr in mngrs: for mngr in mngrs:
tn.start_soon( tn.start_soon(
@ -170,11 +180,12 @@ async def gather_contexts(
seed, seed,
) )
# deliver control once all managers have started up # deliver control to caller once all ctx-managers have
# started (yielded back to us).
await all_entered.wait() await all_entered.wait()
try:
yield tuple(unwrapped.values()) yield tuple(unwrapped.values())
parent_exit.set()
finally: finally:
# XXX NOTE: this is ABSOLUTELY REQUIRED to avoid # XXX NOTE: this is ABSOLUTELY REQUIRED to avoid
# the following wacky bug: # the following wacky bug:
@ -192,7 +203,7 @@ class _Cache:
a kept-alive-while-in-use async resource. a kept-alive-while-in-use async resource.
''' '''
service_n: Optional[trio.Nursery] = None service_tn: trio.Nursery|None = None
locks: dict[Hashable, trio.Lock] = {} locks: dict[Hashable, trio.Lock] = {}
users: int = 0 users: int = 0
values: dict[Any, Any] = {} values: dict[Any, Any] = {}
@ -201,7 +212,7 @@ class _Cache:
tuple[trio.Nursery, trio.Event] tuple[trio.Nursery, trio.Event]
] = {} ] = {}
# nurseries: dict[int, trio.Nursery] = {} # nurseries: dict[int, trio.Nursery] = {}
no_more_users: Optional[trio.Event] = None no_more_users: trio.Event|None = None
@classmethod @classmethod
async def run_ctx( async def run_ctx(
@ -233,6 +244,9 @@ async def maybe_open_context(
kwargs: dict = {}, kwargs: dict = {},
key: Hashable | Callable[..., Hashable] = None, key: Hashable | Callable[..., Hashable] = None,
# caller can provide their own scope
tn: trio.Nursery|None = None,
) -> AsyncIterator[tuple[bool, T]]: ) -> AsyncIterator[tuple[bool, T]]:
''' '''
Maybe open an async-context-manager (acm) if there is not already Maybe open an async-context-manager (acm) if there is not already
@ -265,40 +279,94 @@ async def maybe_open_context(
# have it not be closed until all consumers have exited (which is # have it not be closed until all consumers have exited (which is
# currently difficult to implement any other way besides using our # currently difficult to implement any other way besides using our
# pre-allocated runtime instance..) # pre-allocated runtime instance..)
service_n: trio.Nursery = current_actor()._service_n if tn:
# TODO, assert tn is eventual parent of this task!
task: trio.Task = trio.lowlevel.current_task()
task_tn: trio.Nursery = task.parent_nursery
if not tn._cancel_status.encloses(
task_tn._cancel_status
):
raise RuntimeError(
f'Mis-nesting of task under provided {tn} !?\n'
f'Current task is NOT a child(-ish)!!\n'
f'\n'
f'task: {task}\n'
f'task_tn: {task_tn}\n'
)
service_tn = tn
else:
service_tn: trio.Nursery = current_actor()._service_tn
# TODO: is there any way to allocate # TODO: is there any way to allocate
# a 'stays-open-till-last-task-finshed nursery? # a 'stays-open-till-last-task-finshed nursery?
# service_n: trio.Nursery # service_tn: trio.Nursery
# async with maybe_open_nursery(_Cache.service_n) as service_n: # async with maybe_open_nursery(_Cache.service_tn) as service_tn:
# _Cache.service_n = service_n # _Cache.service_tn = service_tn
cache_miss_ke: KeyError|None = None
maybe_taskc: trio.Cancelled|None = None
try: try:
# **critical section** that should prevent other tasks from # **critical section** that should prevent other tasks from
# checking the _Cache until complete otherwise the scheduler # checking the _Cache until complete otherwise the scheduler
# may switch and by accident we create more then one resource. # may switch and by accident we create more then one resource.
yielded = _Cache.values[ctx_key] yielded = _Cache.values[ctx_key]
except KeyError: except KeyError as _ke:
log.debug(f'Allocating new {acm_func} for {ctx_key}') # XXX, stay mutexed up to cache-miss yield
try:
cache_miss_ke = _ke
log.debug(
f'Allocating new @acm-func entry\n'
f'ctx_key={ctx_key}\n'
f'acm_func={acm_func}\n'
)
mngr = acm_func(**kwargs) mngr = acm_func(**kwargs)
resources = _Cache.resources resources = _Cache.resources
assert not resources.get(ctx_key), f'Resource exists? {ctx_key}' assert not resources.get(ctx_key), f'Resource exists? {ctx_key}'
resources[ctx_key] = (service_n, trio.Event()) resources[ctx_key] = (service_tn, trio.Event())
yielded: Any = await service_tn.start(
# sync up to the mngr's yielded value
yielded = await service_n.start(
_Cache.run_ctx, _Cache.run_ctx,
mngr, mngr,
ctx_key, ctx_key,
) )
_Cache.users += 1 _Cache.users += 1
finally:
# XXX, since this runs from an `except` it's a checkpoint
# whih can be `trio.Cancelled`-masked.
#
# NOTE, in that case the mutex is never released by the
# (first and) caching task and **we can't** simply shield
# bc that will inf-block on the `await
# no_more_users.wait()`.
#
# SO just always unlock!
lock.release() lock.release()
yield False, yielded
try:
yield (
False, # cache_hit = "no"
yielded,
)
except trio.Cancelled as taskc:
maybe_taskc = taskc
log.cancel(
f'Cancelled from cache-miss entry\n'
f'\n'
f'ctx_key: {ctx_key!r}\n'
f'mngr: {mngr!r}\n'
)
# XXX, always unset ke from cancelled context
# since we never consider it a masked exc case!
# - bc this can be called directly ty `._rpc._invoke()`?
#
if maybe_taskc.__context__ is cache_miss_ke:
maybe_taskc.__context__ = None
raise taskc
else: else:
_Cache.users += 1 _Cache.users += 1
log.runtime( log.debug(
f'Re-using cached resource for user {_Cache.users}\n\n' f'Re-using cached resource for user {_Cache.users}\n\n'
f'{ctx_key!r} -> {type(yielded)}\n' f'{ctx_key!r} -> {type(yielded)}\n'
@ -308,9 +376,19 @@ async def maybe_open_context(
# f'{ctx_key!r} -> {yielded!r}\n' # f'{ctx_key!r} -> {yielded!r}\n'
) )
lock.release() lock.release()
yield True, yielded yield (
True, # cache_hit = "yes"
yielded,
)
finally: finally:
if lock.locked():
stats: trio.LockStatistics = lock.statistics()
log.error(
f'Lock left locked by last owner !?\n'
f'{stats}\n'
)
_Cache.users -= 1 _Cache.users -= 1
if yielded is not None: if yielded is not None:

View File

@ -22,7 +22,10 @@ from __future__ import annotations
from contextlib import ( from contextlib import (
asynccontextmanager as acm, asynccontextmanager as acm,
) )
from typing import TYPE_CHECKING from typing import (
Type,
TYPE_CHECKING,
)
import trio import trio
from tractor.log import get_logger from tractor.log import get_logger
@ -65,7 +68,6 @@ def find_masked_excs(
# #
@acm @acm
async def maybe_raise_from_masking_exc( async def maybe_raise_from_masking_exc(
tn: trio.Nursery|None = None,
unmask_from: ( unmask_from: (
BaseException| BaseException|
tuple[BaseException] tuple[BaseException]
@ -74,15 +76,26 @@ async def maybe_raise_from_masking_exc(
raise_unmasked: bool = True, raise_unmasked: bool = True,
extra_note: str = ( extra_note: str = (
'This can occurr when,\n' 'This can occurr when,\n'
' - a `trio.Nursery` scope embeds a `finally:`-block ' '\n'
'which executes a checkpoint!' ' - a `trio.Nursery/CancelScope` embeds a `finally/except:`-block '
'which execs an un-shielded checkpoint!'
# #
# ^TODO? other cases? # ^TODO? other cases?
), ),
always_warn_on: tuple[BaseException] = ( always_warn_on: tuple[Type[BaseException]] = (
trio.Cancelled, trio.Cancelled,
), ),
# don't ever unmask or warn on any masking pair,
# {<masked-excT-key> -> <masking-excT-value>}
never_warn_on: dict[
Type[BaseException],
Type[BaseException],
] = {
KeyboardInterrupt: trio.Cancelled,
trio.Cancelled: trio.Cancelled,
},
# ^XXX, special case(s) where we warn-log bc likely # ^XXX, special case(s) where we warn-log bc likely
# there will be no operational diff since the exc # there will be no operational diff since the exc
# is always expected to be consumed. # is always expected to be consumed.
@ -104,81 +117,91 @@ async def maybe_raise_from_masking_exc(
individual sub-excs but maintain the eg-parent's form right? individual sub-excs but maintain the eg-parent's form right?
''' '''
if not isinstance(unmask_from, tuple):
raise ValueError(
f'Invalid unmask_from = {unmask_from!r}\n'
f'Must be a `tuple[Type[BaseException]]`.\n'
)
from tractor.devx.debug import ( from tractor.devx.debug import (
BoxedMaybeException, BoxedMaybeException,
pause,
) )
boxed_maybe_exc = BoxedMaybeException( boxed_maybe_exc = BoxedMaybeException(
raise_on_exit=raise_unmasked, raise_on_exit=raise_unmasked,
) )
matching: list[BaseException]|None = None matching: list[BaseException]|None = None
maybe_eg: ExceptionGroup|None try:
if tn:
try: # handle egs
yield boxed_maybe_exc yield boxed_maybe_exc
return return
except* unmask_from as _maybe_eg: except BaseException as _bexc:
maybe_eg = _maybe_eg bexc = _bexc
if isinstance(bexc, BaseExceptionGroup):
matches: ExceptionGroup matches: ExceptionGroup
matches, _ = maybe_eg.split( matches, _ = bexc.split(unmask_from)
if matches:
matching = matches.exceptions
elif (
unmask_from unmask_from
) and
if not matches: type(bexc) in unmask_from
raise ):
matching = [bexc]
matching: list[BaseException] = matches.exceptions
else:
try: # handle non-egs
yield boxed_maybe_exc
return
except unmask_from as _maybe_exc:
maybe_exc = _maybe_exc
matching: list[BaseException] = [
maybe_exc
]
# XXX, only unmask-ed for debuggin!
# TODO, remove eventually..
except BaseException as _berr:
berr = _berr
await pause(shield=True)
raise berr
if matching is None: if matching is None:
raise raise
masked: list[tuple[BaseException, BaseException]] = [] masked: list[tuple[BaseException, BaseException]] = []
for exc_match in matching: for exc_match in matching:
if exc_ctx := find_masked_excs( if exc_ctx := find_masked_excs(
maybe_masker=exc_match, maybe_masker=exc_match,
unmask_from={unmask_from}, unmask_from=set(unmask_from),
): ):
masked.append((exc_ctx, exc_match)) masked.append((
exc_ctx,
exc_match,
))
boxed_maybe_exc.value = exc_match boxed_maybe_exc.value = exc_match
note: str = ( note: str = (
f'\n' f'\n'
f'^^WARNING^^ the above {exc_ctx!r} was masked by a {unmask_from!r}\n' f'^^WARNING^^\n'
f'the above {type(exc_ctx)!r} was masked by a {type(exc_match)!r}\n'
) )
if extra_note: if extra_note:
note += ( note += (
f'\n' f'\n'
f'{extra_note}\n' f'{extra_note}\n'
) )
do_warn: bool = (
never_warn_on.get(
type(exc_ctx) # masking type
)
is not
type(exc_match) # masked type
)
if do_warn:
exc_ctx.add_note(note) exc_ctx.add_note(note)
if type(exc_match) in always_warn_on: if (
do_warn
and
type(exc_match) in always_warn_on
):
log.warning(note) log.warning(note)
# await tractor.pause(shield=True) if (
if raise_unmasked: do_warn
and
raise_unmasked
):
if len(masked) < 2: if len(masked) < 2:
raise exc_ctx from exc_match raise exc_ctx from exc_match
else:
# ?TODO, see above but, possibly unmasking sub-exc # ??TODO, see above but, possibly unmasking sub-exc
# entries if there are > 1 # entries if there are > 1
await pause(shield=True) # else:
# await pause(shield=True)
else: else:
raise raise