piker/.claude/skills/timeseries_numpy_polars_opt...

# Timeseries Optimization: NumPy & Polars

Skill for high-performance timeseries processing using NumPy
and Polars, with focus on patterns common in financial/trading
applications.

## Core Principle: Vectorization Over Iteration

**Never write Python loops over large arrays.**
Always look for vectorized alternatives.

```python
# BAD: Python loop (slow!)
results = []
for i in range(len(array)):
    if array['time'][i] == target_time:
        results.append(array[i])

# GOOD: vectorized boolean indexing (fast!)
results = array[array['time'] == target_time]
```

## NumPy Structured Arrays

Piker uses structured arrays for OHLCV data:

```python
# typical piker array dtype
dtype = [
    ('index', 'i8'),   # absolute sequence index
    ('time', 'f8'),    # unix epoch timestamp
    ('open', 'f8'),
    ('high', 'f8'),
    ('low', 'f8'),
    ('close', 'f8'),
    ('volume', 'f8'),
]

arr = np.array([(0, 1234.0, 100, 101, 99, 100.5, 1000)],
               dtype=dtype)

# field access
times = arr['time']     # returns view, not copy
closes = arr['close']
```

### Structured Array Performance Gotchas

**1. Field access in loops is slow**

```python
# BAD: repeated struct field access per iteration
for i, row in enumerate(arr):
    x = row['index']    # struct access per iteration!
    y = row['close']
    process(x, y)

# GOOD: extract fields once, iterate plain arrays
indices = arr['index']  # extract once
closes = arr['close']
for i in range(len(arr)):
    x = indices[i]      # plain array indexing
    y = closes[i]
    process(x, y)
```

**2. Dict comprehensions with struct arrays**

```python
# SLOW: field access per row in Python loop
time_to_row = {
    float(row['time']): {
        'index': float(row['index']),
        'close': float(row['close']),
    }
    for row in matched_rows  # struct field access!
}

# FAST: extract to plain arrays first
times = matched_rows['time'].astype(float)
indices = matched_rows['index'].astype(float)
closes = matched_rows['close'].astype(float)

time_to_row = {
    t: {'index': idx, 'close': cls}
    for t, idx, cls in zip(times, indices, closes)
}
```

## Timestamp Lookup Patterns

### Linear Scan (O(n)) - Avoid!

```python
# BAD: O(n) scan through entire array
for target_ts in timestamps:  # m iterations
    matches = array[array['time'] == target_ts]  # O(n) scan
    # Total: O(m * n) - catastrophic for large datasets!
```

**Performance:**
- 1000 lookups × 10k array = 10M comparisons
- Timing: ~50-100ms for 1k lookups

### Binary Search (O(log n)) - Good!

```python
# GOOD: O(m log n) using searchsorted
import numpy as np

time_arr = array['time']  # extract once
ts_array = np.array(timestamps)

# binary search for all timestamps at once
indices = np.searchsorted(time_arr, ts_array)

# bounds check and exact match verification
valid_mask = (
    (indices < len(array))
    &
    (time_arr[indices] == ts_array)
)

valid_indices = indices[valid_mask]
matched_rows = array[valid_indices]
```

**Requirements for `searchsorted()`:**
- Input array MUST be sorted (ascending by default)
- Works on any sortable dtype (floats, ints, etc)
- Returns insertion indices (not found = len(array))

**Performance:**
- 1000 lookups × 10k array = ~10k comparisons
- Timing: <1ms for 1k lookups
- **~100-1000x faster than linear scan**

### Hash Table (O(1)) - Best for Multiple Lookups!

If you'll do many lookups on same array, build dict once:

```python
# build lookup once
time_to_idx = {
    float(array['time'][i]): i
    for i in range(len(array))
}

# O(1) lookups
for target_ts in timestamps:
    idx = time_to_idx.get(target_ts)
    if idx is not None:
        row = array[idx]
```

**When to use:**
- Many repeated lookups on same array
- Array doesn't change between lookups
- Can afford upfront dict building cost

## Vectorized Boolean Operations

### Basic Filtering

```python
# single condition
recent = array[array['time'] > cutoff_time]

# multiple conditions with &, |
filtered = array[
    (array['time'] > start_time)
    &
    (array['time'] < end_time)
    &
    (array['volume'] > min_volume)
]

# IMPORTANT: parentheses required around each condition!
# (operator precedence: & binds tighter than >)
```

### Fancy Indexing

```python
# boolean mask
mask = array['close'] > array['open']  # up bars
up_bars = array[mask]

# integer indices
indices = np.array([0, 5, 10, 15])
selected = array[indices]

# combine boolean + fancy indexing
mask = array['volume'] > threshold
high_vol_indices = np.where(mask)[0]
subset = array[high_vol_indices[::2]]  # every other
```

## Common Financial Patterns

### Gap Detection

```python
# assume sorted by time
time_diffs = np.diff(array['time'])
expected_step = 60.0  # 1-minute bars

# find gaps larger than expected
gap_mask = time_diffs > (expected_step * 1.5)
gap_indices = np.where(gap_mask)[0]

# get gap start/end times
gap_starts = array['time'][gap_indices]
gap_ends = array['time'][gap_indices + 1]
```

### Rolling Window Operations

```python
# simple moving average (close)
window = 20
sma = np.convolve(
    array['close'],
    np.ones(window) / window,
    mode='valid',
)

# alternatively, use stride tricks for efficiency
from numpy.lib.stride_tricks import sliding_window_view
windows = sliding_window_view(array['close'], window)
sma = windows.mean(axis=1)
```

### OHLC Resampling (NumPy)

```python
# resample 1m bars to 5m bars
def resample_ohlc(arr, old_step, new_step):
    n_bars = len(arr)
    factor = int(new_step / old_step)

    # truncate to multiple of factor
    n_complete = (n_bars // factor) * factor
    arr = arr[:n_complete]

    # reshape into chunks
    reshaped = arr.reshape(-1, factor)

    # aggregate OHLC
    opens = reshaped[:, 0]['open']
    highs = reshaped['high'].max(axis=1)
    lows = reshaped['low'].min(axis=1)
    closes = reshaped[:, -1]['close']
    volumes = reshaped['volume'].sum(axis=1)

    return np.rec.fromarrays(
        [opens, highs, lows, closes, volumes],
        names=['open', 'high', 'low', 'close', 'volume'],
    )
```

## Polars Integration

Piker is transitioning to Polars for some operations.

### NumPy ↔ Polars Conversion

```python
import polars as pl

# numpy to polars
df = pl.from_numpy(
    arr,
    schema=['index', 'time', 'open', 'high', 'low', 'close', 'volume'],
)

# polars to numpy (via arrow)
arr = df.to_numpy()

# piker convenience
from piker.tsp import np2pl, pl2np
df = np2pl(arr)
arr = pl2np(df)
```

### Polars Performance Patterns

**Lazy evaluation:**
```python
# build query lazily
lazy_df = (
    df.lazy()
    .filter(pl.col('volume') > 1000)
    .with_columns([
        (pl.col('close') - pl.col('open')).alias('change')
    ])
    .sort('time')
)

# execute once
result = lazy_df.collect()
```

**Groupby aggregations:**
```python
# resample to 5-minute bars
resampled = df.groupby_dynamic(
    index_column='time',
    every='5m',
).agg([
    pl.col('open').first(),
    pl.col('high').max(),
    pl.col('low').min(),
    pl.col('close').last(),
    pl.col('volume').sum(),
])
```

### When to Use Polars vs NumPy

**Use Polars when:**
- Complex queries with multiple filters/joins
- Need SQL-like operations (groupby, window functions)
- Working with heterogeneous column types
- Want lazy evaluation optimization

**Use NumPy when:**
- Simple array operations (indexing, slicing)
- Direct memory access needed (e.g., SHM arrays)
- Compatibility with Qt/pyqtgraph (expects NumPy)
- Maximum performance for numerical computation

## Memory Considerations

### Views vs Copies

```python
# VIEW: shares memory (fast, no copy)
times = array['time']         # field access
subset = array[10:20]         # slicing
reshaped = array.reshape(-1, 2)

# COPY: new memory allocation
filtered = array[array['time'] > cutoff]  # boolean indexing
sorted_arr = np.sort(array)               # sorting
casted = array.astype(np.float32)         # type conversion

# force copy when needed
explicit_copy = array.copy()
```

### In-Place Operations

```python
# modify in-place (no new allocation)
array['close'] *= 1.01  # scale prices
array['volume'][mask] = 0  # zero out specific rows

# careful: compound operations may create temporaries
array['close'] = array['close'] * 1.01  # creates temp!
array['close'] *= 1.01  # true in-place
```

## Performance Checklist

When optimizing timeseries operations:

- [ ] Is the array sorted? (enables binary search)
- [ ] Are you doing repeated lookups? (build hash table)
- [ ] Are struct fields accessed in loops? (extract to plain arrays)
- [ ] Are you using boolean indexing? (vectorized vs loop)
- [ ] Can operations be batched? (minimize round-trips)
- [ ] Is memory being copied unnecessarily? (use views)
- [ ] Are you using the right tool? (NumPy vs Polars)

## Common Bottlenecks and Fixes

### Bottleneck: Timestamp Lookups

```python
# BEFORE: O(n*m) - 100ms for 1k lookups
for ts in timestamps:
    matches = array[array['time'] == ts]

# AFTER: O(m log n) - <1ms for 1k lookups
indices = np.searchsorted(array['time'], timestamps)
```

### Bottleneck: Dict Building from Struct Array

```python
# BEFORE: 100ms for 3k rows
result = {
    float(row['time']): {
        'index': float(row['index']),
        'close': float(row['close']),
    }
    for row in matched_rows
}

# AFTER: <5ms for 3k rows
times = matched_rows['time'].astype(float)
indices = matched_rows['index'].astype(float)
closes = matched_rows['close'].astype(float)

result = {
    t: {'index': idx, 'close': cls}
    for t, idx, cls in zip(times, indices, closes)
}
```

### Bottleneck: Repeated Field Access

```python
# BEFORE: 50ms for 1k iterations
for i, spec in enumerate(specs):
    start_row = array[array['time'] == spec['start_time']][0]
    end_row = array[array['time'] == spec['end_time']][0]
    process(start_row['index'], end_row['close'])

# AFTER: <5ms for 1k iterations
# 1. Build lookup once
time_to_row = {...}  # via searchsorted

# 2. Extract fields to plain arrays beforehand
indices_arr = array['index']
closes_arr = array['close']

# 3. Use lookup + plain array indexing
for spec in specs:
    start_idx = time_to_row[spec['start_time']]['array_idx']
    end_idx = time_to_row[spec['end_time']]['array_idx']
    process(indices_arr[start_idx], closes_arr[end_idx])
```

## References

- NumPy structured arrays: https://numpy.org/doc/stable/user/basics.rec.html
- `np.searchsorted`: https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html
- Polars: https://pola-rs.github.io/polars/
- `piker.tsp` - timeseries processing utilities
- `piker.data._formatters` - OHLC array handling

## Skill Maintenance

Update when:
- New vectorization patterns discovered
- Performance bottlenecks identified
- Polars migration patterns emerge
- NumPy best practices evolve

---

*Last updated: 2026-01-31*
*Session: Batch gap annotation optimization*
*Key win: 100ms → 5ms dict building via field extraction*