piker/.claude/skills/piker-conc-expert/SKILL.md

5.6 KiB
Raw Blame History

Piker Concurrency & Runtime Expertise

The distilled mental model for pikers distributed runtime: a trio-structured actor tree supervised by tractor (pinned to git main) where every long-lived subsystem is a named daemon-actor talking over ctx/stream IPC.

Actor tree & daemon taxonomy

pikerd                  root supervisor + registry
├── datad.<broker>      feed bus, shm writers, tsp
│                       history, symbol search
├── brokerd.<broker>    live order-ctl ONLY; lazily
│                       spawned by emsd, credentialed
├── emsd                dark-clearing + order routing
│   └── paperboi.<broker>  sim-clearing (paper mode)
└── samplerd            singleton OHLC clock/increment

Key invariants: - datad hosts all piker.data.validate._eps['datad'] eps; brokerd only the ['brokerd'] (order-ctl) ones. The _eps table in piker/data/validate.py is the authoritative contract; get_eps(mod, kind) introspects a backends support. - brokerd.<broker> is booted in EXACTLY one place: open_brokerd_dialog() in piker/clearing/_ems.py (with a portal: override for the piker ledger ad-hoc actor). Chart-only + paper sessions run with ZERO brokerd procs. Never add a data-path spawn! - backends declare per-daemon-kind submods via _datad_mods/_brokerd_mods in their __init__.py (fallback: __enable_modules__).

Daemon lifecycle conventions

Every daemon-kind follows the same trio of fns (see piker/brokers/_daemon.py + piker/data/_daemon.py as the canonical pair):

  • _setup_persistent_<kind>(): a @tractor.context “lifetime fixture” run via Services.start_service_task(); does console-log setup ONCE for the actor, allocs any actor-global state (eg. datads _FeedsBus), then await ctx.started() + trio.sleep_forever().
  • <kind>_init(): builds enable_modules + actor name f'<kind>.{brokername}' and copies backend _spawn_kwargs (CRITICAL: ib needs infect_asyncio=True in EVERY daemon-kind).
  • spawn_<kind>() + maybe_spawn_<kind>(): thin wrappers over Services.actor_n.start_actor() and piker.service.maybe_spawn_daemon() (registry find-or-spawn w/ per-service-name locking).

Caps-sec model: enable_modules gates RPC entry ONLY — python imports are unrestricted in-proc. Keep each daemons enable set minimal; the (credentialed) brokerd must never RPC-enable piker.data.* feed mods.

Actor-local state: the #1 split hazard

Module-globals and instance caches are PER-ACTOR. Anything that “just worked” because two subsystems shared a process will break when theyre split into sibling actors. Canonical example: ibs Client._contracts was warmed by feed-side get_mkt_info() in-proc; post datad/brokerd-split the trading actor must warm it itself (eagerly at open_trade_dialog() startup for open pps/orders + lazily per order request via symbols.cache_contract()).

When moving code across actor boundaries ALWAYS audit: - module-global registries (feed._bus, _accounts2clients, _client_cache, ..) - @async_lifo_cache/maybe_open_context caches (NOTE: async_lifo_cache keys on POSITIONAL args only; a cache-hit SKIPS the fn body and thus any side-effect writes!) - logging handler placement (see gotchas.md)

tractor primitives as used here

  • @tractor.context eps: await ctx.started(val) unblocks the caller w/ val; long-lived eps then ctx.open_stream() or sleep_forever().
  • discovery: tractor.find_actor() via piker.service.find_service(); wait_for_actor(name, registry_addr=...); query_actor(name, regaddr=...) yields (sockaddr, portal). Addrs are wrapped tractor.discovery._addr.Address types — use wrap_address() to normalize raw tuples and .unwrap() for comparisons.
  • runtime-vars: _runtime_vars['piker_vars'] is inherited down the spawn tree; used eg. for piker_test_dir config isolation — read LAZILY at use-time, never at import time (subactors only get vars post runtime-boot).
  • cancellation semantics (modern tractor): a ContextCancelled whose .canceller is your own actor is ABSORBED (clean exit, nothing raised); single-exc groups collapse (collapse_eg) so eg. a KBI propagates bare. Exc attrs: RemoteActorError.boxed_type (not .type).

to_asyncio (infect-asyncio) integration

For ib (and deribit) the backend client runs on an embedded asyncio loop via tractor.to_asyncio.open_channel_from() + LinkedTaskChannel.

Rules learned the hard way: - a shared req/resp channel MUST correlate responses to requests (see MethodProxy._run_method()s mid protocol in piker/brokers/ib/api.py): caller cancellation (eg. move_on_after timeouts) otherwise orphans a response and silently skews every later result off-by-one. - the aio-side relay must catch + ship back ALL (non-cancel) exceptions as {'exception': err} resps; an escaping error kills the relay task -> channel -> proxy nursery -> the whole dialog, bypassing every caller-side guard. - TrioTaskExited (“child asyncio task is still running?”) on teardown is a known wart family; prefer upstream tractor fixes over piker-side bandaids.

See gotchas.md for the symptom->cause registry and debug-recipes.md for forensics techniques.