--- name: piker-conc-expert description: > Distributed-runtime and structured-concurrency expertise for piker's `tractor` actor-tree. Apply when working on daemon/service architecture, actor spawning/discovery, cross-actor RPC (ctx/stream eps), `to_asyncio` integration, cancellation semantics, or debugging hangs/wedges/skews in the actor system. user-invocable: false --- # Piker Concurrency & Runtime Expertise The distilled mental model for piker's distributed runtime: a `trio`-structured actor tree supervised by `tractor` (pinned to git main) where every long-lived subsystem is a named daemon-actor talking over ctx/stream IPC. ## Actor tree & daemon taxonomy ``` pikerd root supervisor + registry ├── datad. feed bus, shm writers, tsp │ history, symbol search ├── brokerd. live order-ctl ONLY; lazily │ spawned by emsd, credentialed ├── emsd dark-clearing + order routing │ └── paperboi. sim-clearing (paper mode) └── samplerd singleton OHLC clock/increment ``` Key invariants: - `datad` hosts all `piker.data.validate._eps['datad']` eps; `brokerd` only the `['brokerd']` (order-ctl) ones. The `_eps` table in `piker/data/validate.py` is the authoritative contract; `get_eps(mod, kind)` introspects a backend's support. - `brokerd.` is booted in EXACTLY one place: `open_brokerd_dialog()` in `piker/clearing/_ems.py` (with a `portal:` override for the `piker ledger` ad-hoc actor). Chart-only + paper sessions run with ZERO brokerd procs. Never add a data-path spawn! - backends declare per-daemon-kind submods via `_datad_mods`/`_brokerd_mods` in their `__init__.py` (fallback: `__enable_modules__`). ## Daemon lifecycle conventions Every daemon-kind follows the same trio of fns (see `piker/brokers/_daemon.py` + `piker/data/_daemon.py` as the canonical pair): - `_setup_persistent_()`: a `@tractor.context` "lifetime fixture" run via `Services.start_service_task()`; does console-log setup ONCE for the actor, allocs any actor-global state (eg. datad's `_FeedsBus`), then `await ctx.started()` + `trio.sleep_forever()`. - `_init()`: builds `enable_modules` + actor name `f'.{brokername}'` and copies backend `_spawn_kwargs` (CRITICAL: `ib` needs `infect_asyncio=True` in EVERY daemon-kind). - `spawn_()` + `maybe_spawn_()`: thin wrappers over `Services.actor_n.start_actor()` and `piker.service.maybe_spawn_daemon()` (registry find-or-spawn w/ per-service-name locking). Caps-sec model: `enable_modules` gates RPC entry ONLY — python imports are unrestricted in-proc. Keep each daemon's enable set minimal; the (credentialed) `brokerd` must never RPC-enable `piker.data.*` feed mods. ## Actor-local state: the #1 split hazard Module-globals and instance caches are PER-ACTOR. Anything that "just worked" because two subsystems shared a process will break when they're split into sibling actors. Canonical example: `ib`'s `Client._contracts` was warmed by feed-side `get_mkt_info()` in-proc; post datad/brokerd-split the trading actor must warm it itself (eagerly at `open_trade_dialog()` startup for open pps/orders + lazily per order request via `symbols.cache_contract()`). When moving code across actor boundaries ALWAYS audit: - module-global registries (`feed._bus`, `_accounts2clients`, `_client_cache`, ..) - `@async_lifo_cache`/`maybe_open_context` caches (NOTE: `async_lifo_cache` keys on POSITIONAL args only; a cache-hit SKIPS the fn body and thus any side-effect writes!) - logging handler placement (see gotchas.md) ## tractor primitives as used here - `@tractor.context` eps: `await ctx.started(val)` unblocks the caller w/ `val`; long-lived eps then `ctx.open_stream()` or `sleep_forever()`. - discovery: `tractor.find_actor()` via `piker.service.find_service()`; `wait_for_actor(name, registry_addr=...)`; `query_actor(name, regaddr=...)` yields `(sockaddr, portal)`. Addrs are wrapped `tractor.discovery._addr.Address` types — use `wrap_address()` to normalize raw tuples and `.unwrap()` for comparisons. - runtime-vars: `_runtime_vars['piker_vars']` is inherited down the spawn tree; used eg. for `piker_test_dir` config isolation — read LAZILY at use-time, never at import time (subactors only get vars post runtime-boot). - cancellation semantics (modern tractor): a `ContextCancelled` whose `.canceller` is your own actor is ABSORBED (clean exit, nothing raised); single-exc groups collapse (`collapse_eg`) so eg. a KBI propagates bare. Exc attrs: `RemoteActorError.boxed_type` (not `.type`). ## `to_asyncio` (infect-asyncio) integration For `ib` (and `deribit`) the backend client runs on an embedded `asyncio` loop via `tractor.to_asyncio.open_channel_from()` + `LinkedTaskChannel`. Rules learned the hard way: - a shared req/resp channel MUST correlate responses to requests (see `MethodProxy._run_method()`'s `mid` protocol in `piker/brokers/ib/api.py`): caller cancellation (eg. `move_on_after` timeouts) otherwise orphans a response and silently skews every later result off-by-one. - the aio-side relay must catch + ship back ALL (non-cancel) exceptions as `{'exception': err}` resps; an escaping error kills the relay task -> channel -> proxy nursery -> the whole dialog, bypassing every caller-side guard. - `TrioTaskExited` ("child asyncio task is still running?") on teardown is a known wart family; prefer upstream `tractor` fixes over piker-side bandaids. See [gotchas.md](gotchas.md) for the symptom->cause registry and [debug-recipes.md](debug-recipes.md) for forensics techniques.