15 KiB
macOS Compatibility Fixes for Piker/Tractor
This guide documents macOS-specific issues encountered when running piker on macOS and their solutions. These fixes address platform differences between Linux and macOS in areas like socket credentials, shared memory naming, and async runtime coordination.
Table of Contents
- Socket Credential Passing
- Shared Memory Name Length Limits
- Shared Memory Cleanup Race Conditions
- Async Runtime (Trio/AsyncIO) Coordination
1. Socket Credential Passing
Problem
On Linux, tractor uses SO_PASSCRED and SO_PEERCRED socket options for Unix domain socket credential passing. macOS doesn’t support these constants, causing AttributeError when importing.
# Linux code that fails on macOS
from socket import SO_PASSCRED, SO_PEERCRED # AttributeError on macOSError Message
AttributeError: module 'socket' has no attribute 'SO_PASSCRED'
Root Cause
- Linux: Uses
SO_PASSCRED(to enable credential passing) andSO_PEERCRED(to retrieve peer credentials) - macOS: Uses
LOCAL_PEERCRED(value0x0001) instead, and doesn’t require enabling credential passing
Solution
Make the socket credential imports platform-conditional:
File: tractor/ipc/_uds.py (or equivalent in piker if duplicated)
import sys
from socket import (
socket,
AF_UNIX,
SOCK_STREAM,
)
# Platform-specific credential passing constants
if sys.platform == 'linux':
from socket import SO_PASSCRED, SO_PEERCRED
elif sys.platform == 'darwin': # macOS
# macOS uses LOCAL_PEERCRED instead of SO_PEERCRED
# and doesn't need SO_PASSCRED
LOCAL_PEERCRED = 0x0001
SO_PEERCRED = LOCAL_PEERCRED # Alias for compatibility
SO_PASSCRED = None # Not needed on macOS
else:
# Other platforms - may need additional handling
SO_PASSCRED = None
SO_PEERCRED = None
# When creating a socket
if SO_PASSCRED is not None:
sock.setsockopt(SOL_SOCKET, SO_PASSCRED, 1)
# When getting peer credentials
if SO_PEERCRED is not None:
creds = sock.getsockopt(SOL_SOCKET, SO_PEERCRED, struct.calcsize('3i'))Implementation Notes
- The
LOCAL_PEERCREDvalue0x0001is specific to macOS (from<sys/un.h>) - macOS doesn’t require explicitly enabling credential passing like Linux does
- Consider using
ctypesorcffifor a more robust solution if available
2. Shared Memory Name Length Limits
Problem
macOS limits POSIX shared memory names to 31 characters (defined as PSHMNAMLEN in <sys/posix_shm_internal.h>). Piker generates long descriptive names that exceed this limit, causing OSError.
# Long name that works on Linux but fails on macOS
shm_name = "piker_quoter_tsla.nasdaq.ib_hist_1m" # 39 chars - too long!Error Message
OSError: [Errno 63] File name too long: '/piker_quoter_tsla.nasdaq.ib_hist_1m'
Root Cause
- Linux: Supports shared memory names up to 255 characters
- macOS: Limits to 31 characters (including leading
/)
Solution
Implement automatic name shortening for macOS while preserving the original key for lookups:
File: piker/data/_sharedmem.py
import hashlib
import sys
def _shorten_key_for_macos(key: str) -> str:
'''
macOS has a 31 character limit for POSIX shared memory names.
Hash long keys to fit within this limit while maintaining uniqueness.
'''
# macOS shm_open() has a 31 char limit (PSHMNAMLEN)
# Use format: /p_<hash16> where hash is first 16 hex chars of sha256
# This gives us: / + p_ + 16 hex chars = 19 chars, well under limit
# We keep the 'p' prefix to indicate it's from piker
if len(key) <= 31:
return key
# Create a hash of the full key
key_hash = hashlib.sha256(key.encode()).hexdigest()[:16]
short_key = f'p_{key_hash}'
return short_key
class _Token(Struct, frozen=True):
'''
Internal representation of a shared memory "token"
which can be used to key a system wide post shm entry.
'''
shm_name: str # actual OS-level name (may be shortened on macOS)
shm_first_index_name: str
shm_last_index_name: str
dtype_descr: tuple
size: int # in struct-array index / row terms
key: str | None = None # original descriptive key (for lookup)
def __eq__(self, other) -> bool:
'''
Compare tokens based on shm names and dtype, ignoring the key field.
The key field is only used for lookups, not for token identity.
'''
if not isinstance(other, _Token):
return False
return (
self.shm_name == other.shm_name
and self.shm_first_index_name == other.shm_first_index_name
and self.shm_last_index_name == other.shm_last_index_name
and self.dtype_descr == other.dtype_descr
and self.size == other.size
)
def __hash__(self) -> int:
'''Hash based on the same fields used in __eq__'''
return hash((
self.shm_name,
self.shm_first_index_name,
self.shm_last_index_name,
self.dtype_descr,
self.size,
))
def _make_token(
key: str,
size: int,
dtype: np.dtype | None = None,
) -> _Token:
'''
Create a serializable token that uniquely identifies a shared memory segment.
'''
if dtype is None:
dtype = def_iohlcv_fields
# On macOS, shorten long keys to fit the 31-char limit
if sys.platform == 'darwin':
shm_name = _shorten_key_for_macos(key)
shm_first = _shorten_key_for_macos(key + "_first")
shm_last = _shorten_key_for_macos(key + "_last")
else:
shm_name = key
shm_first = key + "_first"
shm_last = key + "_last"
return _Token(
shm_name=shm_name,
shm_first_index_name=shm_first,
shm_last_index_name=shm_last,
dtype_descr=tuple(np.dtype(dtype).descr),
size=size,
key=key, # Store original key for lookup
)Key Design Decisions
- Hash-based shortening: Uses SHA256 to ensure uniqueness and avoid collisions
- Preserve original key: Store the original descriptive key in the
_Tokenfor debugging and lookups - Custom equality: The
__eq__and__hash__methods ignore thekeyfield to ensure tokens are compared by their actual shm properties - Platform detection: Only applies shortening on macOS (
sys.platform == 'darwin')
Edge Cases to Consider
- Token serialization across processes (the
keyfield must survive IPC) - Token lookup in dictionaries and caches
- Debugging output (use
keyfield for human-readable names)
3. Shared Memory Cleanup Race Conditions
Problem
During teardown, shared memory segments may be unlinked by one process while another is still trying to clean them up, causing FileNotFoundError to crash the application.
Error Message
FileNotFoundError: [Errno 2] No such file or directory: '/p_74c86c7228dd773b'
Root Cause
In multi-process architectures like tractor, multiple processes may attempt to clean up shared resources simultaneously. Race conditions during shutdown can cause:
- Process A unlinks the shared memory
- Process B tries to unlink the same memory →
FileNotFoundError - Uncaught exception crashes Process B
Solution
Add defensive error handling to catch and log cleanup races:
File: piker/data/_sharedmem.py
class ShmArray:
# ... existing code ...
def destroy(self) -> None:
'''
Destroy the shared memory segment and cleanup OS resources.
'''
if _USE_POSIX:
# We manually unlink to bypass all the "resource tracker"
# nonsense meant for non-SC systems.
shm = self._shm
name = shm.name
try:
shm_unlink(name)
except FileNotFoundError:
# Might be a teardown race where another process
# already unlinked it - this is fine, just log it
log.warning(f'Shm for {name} already unlinked?')
# Also cleanup the index counters
if hasattr(self, '_first'):
try:
self._first.destroy()
except FileNotFoundError:
log.warning(f'First index shm already unlinked?')
if hasattr(self, '_last'):
try:
self._last.destroy()
except FileNotFoundError:
log.warning(f'Last index shm already unlinked?')
class SharedInt:
# ... existing code ...
def destroy(self) -> None:
if _USE_POSIX:
# We manually unlink to bypass all the "resource tracker"
# nonsense meant for non-SC systems.
name = self._shm.name
try:
shm_unlink(name)
except FileNotFoundError:
# might be a teardown race here?
log.warning(f'Shm for {name} already unlinked?')Implementation Notes
- This fix is platform-agnostic but particularly important on macOS where the shortened names make debugging harder
- The warnings help identify cleanup races during development
- Consider adding metrics/counters if cleanup races become frequent
4. Async Runtime (Trio/AsyncIO) Coordination
Problem
The TrioTaskExited error occurs when trio tasks are cancelled while asyncio tasks are still running, indicating improper coordination between the two async runtimes.
Error Message
tractor._exceptions.TrioTaskExited: but the child `asyncio` task is still running?
>>
|_<Task pending name='Task-2' coro=<wait_on_coro_final_result()> ...>
Root Cause
tractor uses “guest mode” to run trio as a guest in asyncio’s event loop (or vice versa). The error occurs when:
- A trio task is cancelled (e.g., user closes the UI)
- The cancellation propagates to cleanup handlers
- Cleanup tries to exit while asyncio tasks are still running
- The
translate_aio_errorscontext manager detects this inconsistent state
Current State
This issue is partially resolved by the other fixes (socket credentials and shared memory), which eliminate the underlying errors that trigger premature cancellation. However, it may still occur in edge cases.
Potential Solutions
Option 1: Improve Cancellation Propagation (Tractor-level)
File: tractor/to_asyncio.py
async def translate_aio_errors(
chan,
wait_on_aio_task: bool = False,
suppress_graceful_exits: bool = False,
):
'''
Context manager to translate asyncio errors to trio equivalents.
'''
try:
yield
except trio.Cancelled:
# When trio is cancelled, ensure asyncio tasks are also cancelled
if wait_on_aio_task:
# Give asyncio tasks a chance to cleanup
await trio.lowlevel.checkpoint()
# Check if asyncio task is still running
if aio_task and not aio_task.done():
# Cancel it gracefully
aio_task.cancel()
# Wait briefly for cancellation
with trio.move_on_after(0.5): # 500ms timeout
await wait_for_aio_task_completion(aio_task)
raise # Re-raise the cancellationOption 2: Proper Shutdown Sequence (Application-level)
File: piker/brokers/ib/api.py (or similar broker modules)
async def load_clients_for_trio(
client: Client,
...
) -> None:
'''
Load asyncio client and keep it running for trio.
'''
try:
# Setup client
await client.connect()
# Keep alive - but make it cancellable
await trio.sleep_forever()
except trio.Cancelled:
# Explicit cleanup before propagating cancellation
log.info("Shutting down asyncio client gracefully")
# Disconnect client
if client.isConnected():
await client.disconnect()
# Small delay to let asyncio cleanup
await trio.sleep(0.1)
raise # Now safe to propagateOption 3: Detection and Warning (Current Approach)
The current code detects the issue and raises a clear error. This is acceptable if: 1. The error is rare (only during abnormal shutdown) 2. It doesn’t cause data loss 3. Logs provide enough info for debugging
Recommended Approach
For piker: Implement Option 2 (proper shutdown sequence) in broker modules where asyncio is used.
For tractor: Consider Option 1 (improved cancellation propagation) as a library-level enhancement.
Testing
Test the fix by:
# Test graceful shutdown
async def test_asyncio_trio_shutdown():
async with open_channel_from(...) as (first, chan):
# Do some work
await chan.send(msg)
# Trigger cancellation
raise KeyboardInterrupt
# Should cleanup without TrioTaskExited errorSummary of Changes
Files Modified in Piker
piker/data/_sharedmem.py- Added
_shorten_key_for_macos()function - Modified
_Tokenclass to store originalkey - Modified
_make_token()to use shortened names on macOS - Added
FileNotFoundErrorhandling indestroy()methods
- Added
piker/ui/_display.py- Removed assertion that checked for ‘hist’ in shm name (incompatible with shortened names)
Files to Modify in Tractor (Recommended)
tractor/ipc/_uds.py- Make socket credential imports platform-conditional
- Handle macOS-specific
LOCAL_PEERCRED
tractor/to_asyncio.py(Optional)- Improve cancellation propagation between trio and asyncio
- Add graceful shutdown timeout for asyncio tasks
Platform Detection Pattern
Use this pattern consistently:
import sys
if sys.platform == 'darwin': # macOS
# macOS-specific code
pass
elif sys.platform == 'linux': # Linux
# Linux-specific code
pass
else:
# Other platforms / fallback
passTesting Checklist
- Test on macOS (Darwin)
- Test on Linux
- Test shared memory with names > 31 chars
- Test multi-process cleanup race conditions
- Test graceful shutdown (Ctrl+C)
- Test abnormal shutdown (kill signal)
-
Verify no memory leaks (check
/dev/shmon Linux,ipcs -mon macOS)
Additional Resources
- macOS System Headers:
/usr/include/sys/un.h- Unix domain socket constants/usr/include/sys/posix_shm_internal.h- Shared memory limits
- Python Documentation:
- Trio/AsyncIO:
Contributing
When implementing these fixes in your own project:
- Test thoroughly on both macOS and Linux
- Add platform guards to prevent cross-platform breakage
- Document platform-specific behavior in code comments
- Consider CI/CD testing on multiple platforms
- Handle edge cases gracefully with proper logging
If you find additional macOS-specific issues, please contribute to this guide!