piker/.claude/skills/piker-conc-expert/SKILL.md

---
name: piker-conc-expert
description: >
  Distributed-runtime and structured-concurrency
  expertise for piker's `tractor` actor-tree. Apply
  when working on daemon/service architecture, actor
  spawning/discovery, cross-actor RPC (ctx/stream
  eps), `to_asyncio` integration, cancellation
  semantics, or debugging hangs/wedges/skews in the
  actor system.
user-invocable: false
---

# Piker Concurrency & Runtime Expertise

The distilled mental model for piker's distributed
runtime: a `trio`-structured actor tree supervised by
`tractor` (pinned to git main) where every long-lived
subsystem is a named daemon-actor talking over
ctx/stream IPC.

## Actor tree & daemon taxonomy

```
pikerd                  root supervisor + registry
├── datad.<broker>      feed bus, shm writers, tsp
│                       history, symbol search
├── brokerd.<broker>    live order-ctl ONLY; lazily
│                       spawned by emsd, credentialed
├── emsd                dark-clearing + order routing
│   └── paperboi.<broker>  sim-clearing (paper mode)
└── samplerd            singleton OHLC clock/increment
```

Key invariants:
- `datad` hosts all `piker.data.validate._eps['datad']`
  eps; `brokerd` only the `['brokerd']` (order-ctl)
  ones. The `_eps` table in `piker/data/validate.py`
  is the authoritative contract; `get_eps(mod, kind)`
  introspects a backend's support.
- `brokerd.<broker>` is booted in EXACTLY one place:
  `open_brokerd_dialog()` in `piker/clearing/_ems.py`
  (with a `portal:` override for the `piker ledger`
  ad-hoc actor). Chart-only + paper sessions run with
  ZERO brokerd procs. Never add a data-path spawn!
- backends declare per-daemon-kind submods via
  `_datad_mods`/`_brokerd_mods` in their
  `__init__.py` (fallback: `__enable_modules__`).

## Daemon lifecycle conventions

Every daemon-kind follows the same trio of fns (see
`piker/brokers/_daemon.py` + `piker/data/_daemon.py`
as the canonical pair):

- `_setup_persistent_<kind>()`: a `@tractor.context`
  "lifetime fixture" run via
  `Services.start_service_task()`; does console-log
  setup ONCE for the actor, allocs any actor-global
  state (eg. datad's `_FeedsBus`), then
  `await ctx.started()` + `trio.sleep_forever()`.
- `<kind>_init()`: builds `enable_modules` + actor
  name `f'<kind>.{brokername}'` and copies backend
  `_spawn_kwargs` (CRITICAL: `ib` needs
  `infect_asyncio=True` in EVERY daemon-kind).
- `spawn_<kind>()` + `maybe_spawn_<kind>()`: thin
  wrappers over `Services.actor_n.start_actor()` and
  `piker.service.maybe_spawn_daemon()` (registry
  find-or-spawn w/ per-service-name locking).

Caps-sec model: `enable_modules` gates RPC entry ONLY
— python imports are unrestricted in-proc. Keep each
daemon's enable set minimal; the (credentialed)
`brokerd` must never RPC-enable `piker.data.*` feed
mods.

## Actor-local state: the #1 split hazard

Module-globals and instance caches are PER-ACTOR.
Anything that "just worked" because two subsystems
shared a process will break when they're split into
sibling actors. Canonical example: `ib`'s
`Client._contracts` was warmed by feed-side
`get_mkt_info()` in-proc; post datad/brokerd-split
the trading actor must warm it itself (eagerly at
`open_trade_dialog()` startup for open pps/orders +
lazily per order request via
`symbols.cache_contract()`).

When moving code across actor boundaries ALWAYS audit:
- module-global registries (`feed._bus`,
  `_accounts2clients`, `_client_cache`, ..)
- `@async_lifo_cache`/`maybe_open_context` caches
  (NOTE: `async_lifo_cache` keys on POSITIONAL args
  only; a cache-hit SKIPS the fn body and thus any
  side-effect writes!)
- logging handler placement (see gotchas.md)

## tractor primitives as used here

- `@tractor.context` eps: `await ctx.started(val)`
  unblocks the caller w/ `val`; long-lived eps then
  `ctx.open_stream()` or `sleep_forever()`.
- discovery: `tractor.find_actor()` via
  `piker.service.find_service()`;
  `wait_for_actor(name, registry_addr=...)`;
  `query_actor(name, regaddr=...)` yields
  `(sockaddr, portal)`. Addrs are wrapped
  `tractor.discovery._addr.Address` types — use
  `wrap_address()` to normalize raw tuples and
  `.unwrap()` for comparisons.
- runtime-vars: `_runtime_vars['piker_vars']` is
  inherited down the spawn tree; used eg. for
  `piker_test_dir` config isolation — read LAZILY at
  use-time, never at import time (subactors only get
  vars post runtime-boot).
- cancellation semantics (modern tractor): a
  `ContextCancelled` whose `.canceller` is your own
  actor is ABSORBED (clean exit, nothing raised);
  single-exc groups collapse (`collapse_eg`) so eg.
  a KBI propagates bare. Exc attrs:
  `RemoteActorError.boxed_type` (not `.type`).

## `to_asyncio` (infect-asyncio) integration

For `ib` (and `deribit`) the backend client runs on
an embedded `asyncio` loop via
`tractor.to_asyncio.open_channel_from()` +
`LinkedTaskChannel`.

Rules learned the hard way:
- a shared req/resp channel MUST correlate responses
  to requests (see `MethodProxy._run_method()`'s
  `mid` protocol in `piker/brokers/ib/api.py`):
  caller cancellation (eg. `move_on_after` timeouts)
  otherwise orphans a response and silently skews
  every later result off-by-one.
- the aio-side relay must catch + ship back ALL
  (non-cancel) exceptions as `{'exception': err}`
  resps; an escaping error kills the relay task ->
  channel -> proxy nursery -> the whole dialog,
  bypassing every caller-side guard.
- `TrioTaskExited` ("child asyncio task is still
  running?") on teardown is a known wart family;
  prefer upstream `tractor` fixes over piker-side
  bandaids.

See [gotchas.md](gotchas.md) for the symptom->cause
registry and [debug-recipes.md](debug-recipes.md) for
forensics techniques.
Add `piker-conc-expert` claude-code skill Distilled distributed-runtime + structured-concurrency expertise for the `tractor` actor-tree, auto-applied (not user-invocable) when working on daemon/service arch, RPC eps, `to_asyncio` integration, cancellation semantics or hang/wedge/skew debugging. Deats, - `SKILL.md`: the core mental model incl. the post-split actor-tree taxonomy (`datad`/`brokerd`/`emsd`/etc.), daemon lifecycle conventions, actor-local-state hazards and `tractor` primitive usage as deployed here. - `gotchas.md`: symptom -> cause -> fix entries distilled from this branch's (datad\|brokerd)-split debugging (eg. un-warmed contract caches, stale IPC resps, double/bare log records, ib client-id collisions). - `debug-recipes.md`: actor-system forensics incl. wedged actor triage, hang-proof test gating and regression vs pre-existing attribution. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> 2026-06-11 00:21:48 +00:00			`---`
			`name: piker-conc-expert`
			`description: >`
			`Distributed-runtime and structured-concurrency`
			expertise for piker's `tractor` actor-tree. Apply
			`when working on daemon/service architecture, actor`
			`spawning/discovery, cross-actor RPC (ctx/stream`
			eps), `to_asyncio` integration, cancellation
			`semantics, or debugging hangs/wedges/skews in the`
			`actor system.`
			`user-invocable: false`
			`---`

			`# Piker Concurrency & Runtime Expertise`

			`The distilled mental model for piker's distributed`
			runtime: a `trio`-structured actor tree supervised by
			`tractor` (pinned to git main) where every long-lived
			`subsystem is a named daemon-actor talking over`
			`ctx/stream IPC.`

			`## Actor tree & daemon taxonomy`

			```
			`pikerd root supervisor + registry`
			`├── datad.<broker> feed bus, shm writers, tsp`
			`│ history, symbol search`
			`├── brokerd.<broker> live order-ctl ONLY; lazily`
			`│ spawned by emsd, credentialed`
			`├── emsd dark-clearing + order routing`
			`│ └── paperboi.<broker> sim-clearing (paper mode)`
			`└── samplerd singleton OHLC clock/increment`
			```

			`Key invariants:`
			- `datad` hosts all `piker.data.validate._eps['datad']`
			eps; `brokerd` only the `['brokerd']` (order-ctl)
			ones. The `_eps` table in `piker/data/validate.py`
			is the authoritative contract; `get_eps(mod, kind)`
			`introspects a backend's support.`
			- `brokerd.<broker>` is booted in EXACTLY one place:
			`open_brokerd_dialog()` in `piker/clearing/_ems.py`
			(with a `portal:` override for the `piker ledger`
			`ad-hoc actor). Chart-only + paper sessions run with`
			`ZERO brokerd procs. Never add a data-path spawn!`
			`- backends declare per-daemon-kind submods via`
			`_datad_mods`/`_brokerd_mods` in their
			`__init__.py` (fallback: `__enable_modules__`).

			`## Daemon lifecycle conventions`

			`Every daemon-kind follows the same trio of fns (see`
			`piker/brokers/_daemon.py` + `piker/data/_daemon.py`
			`as the canonical pair):`

			- `_setup_persistent_<kind>()`: a `@tractor.context`
			`"lifetime fixture" run via`
			`Services.start_service_task()`; does console-log
			`setup ONCE for the actor, allocs any actor-global`
			state (eg. datad's `_FeedsBus`), then
			`await ctx.started()` + `trio.sleep_forever()`.
			- `<kind>_init()`: builds `enable_modules` + actor
			name `f'<kind>.{brokername}'` and copies backend
			`_spawn_kwargs` (CRITICAL: `ib` needs
			`infect_asyncio=True` in EVERY daemon-kind).
			- `spawn_<kind>()` + `maybe_spawn_<kind>()`: thin
			wrappers over `Services.actor_n.start_actor()` and
			`piker.service.maybe_spawn_daemon()` (registry
			`find-or-spawn w/ per-service-name locking).`

			Caps-sec model: `enable_modules` gates RPC entry ONLY
			`— python imports are unrestricted in-proc. Keep each`
			`daemon's enable set minimal; the (credentialed)`
			`brokerd` must never RPC-enable `piker.data.*` feed
			`mods.`

			`## Actor-local state: the #1 split hazard`

			`Module-globals and instance caches are PER-ACTOR.`
			`Anything that "just worked" because two subsystems`
			`shared a process will break when they're split into`
			sibling actors. Canonical example: `ib`'s
			`Client._contracts` was warmed by feed-side
			`get_mkt_info()` in-proc; post datad/brokerd-split
			`the trading actor must warm it itself (eagerly at`
			`open_trade_dialog()` startup for open pps/orders +
			`lazily per order request via`
			`symbols.cache_contract()`).

			`When moving code across actor boundaries ALWAYS audit:`
			- module-global registries (`feed._bus`,
			`_accounts2clients`, `_client_cache`, ..)
			- `@async_lifo_cache`/`maybe_open_context` caches
			(NOTE: `async_lifo_cache` keys on POSITIONAL args
			`only; a cache-hit SKIPS the fn body and thus any`
			`side-effect writes!)`
			`- logging handler placement (see gotchas.md)`

			`## tractor primitives as used here`

			- `@tractor.context` eps: `await ctx.started(val)`
			unblocks the caller w/ `val`; long-lived eps then
			`ctx.open_stream()` or `sleep_forever()`.
			- discovery: `tractor.find_actor()` via
			`piker.service.find_service()`;
			`wait_for_actor(name, registry_addr=...)`;
			`query_actor(name, regaddr=...)` yields
			`(sockaddr, portal)`. Addrs are wrapped
			`tractor.discovery._addr.Address` types — use
			`wrap_address()` to normalize raw tuples and
			`.unwrap()` for comparisons.
			- runtime-vars: `_runtime_vars['piker_vars']` is
			`inherited down the spawn tree; used eg. for`
			`piker_test_dir` config isolation — read LAZILY at
			`use-time, never at import time (subactors only get`
			`vars post runtime-boot).`
			`- cancellation semantics (modern tractor): a`
			`ContextCancelled` whose `.canceller` is your own
			`actor is ABSORBED (clean exit, nothing raised);`
			single-exc groups collapse (`collapse_eg`) so eg.
			`a KBI propagates bare. Exc attrs:`
			`RemoteActorError.boxed_type` (not `.type`).

			## `to_asyncio` (infect-asyncio) integration

			For `ib` (and `deribit`) the backend client runs on
			an embedded `asyncio` loop via
			`tractor.to_asyncio.open_channel_from()` +
			`LinkedTaskChannel`.

			`Rules learned the hard way:`
			`- a shared req/resp channel MUST correlate responses`
			to requests (see `MethodProxy._run_method()`'s
			`mid` protocol in `piker/brokers/ib/api.py`):
			caller cancellation (eg. `move_on_after` timeouts)
			`otherwise orphans a response and silently skews`
			`every later result off-by-one.`
			`- the aio-side relay must catch + ship back ALL`
			(non-cancel) exceptions as `{'exception': err}`
			`resps; an escaping error kills the relay task ->`
			`channel -> proxy nursery -> the whole dialog,`
			`bypassing every caller-side guard.`
			- `TrioTaskExited` ("child asyncio task is still
			`running?") on teardown is a known wart family;`
			prefer upstream `tractor` fixes over piker-side
			`bandaids.`

			`See [gotchas.md](gotchas.md) for the symptom->cause`
			`registry and [debug-recipes.md](debug-recipes.md) for`
			`forensics techniques.`