Nautilus Data Model

Nautilus’s data subsystem is a single, normalized, deterministically-ordered stream of timestamped objects that flow through the DataEngine → Cache → MessageBus pipeline regardless of whether the runtime is a backtest, sandbox, or live TradingNode. Built-in types cover ticks, bars, books, mark/index/funding prices, instrument lifecycle, and option Greeks. Custom data - anything you subclass off Data (or decorate with @customdataclass / @customdataclass_pyo3) - rides the same engine, the same cache, the same bus, and can be persisted in the same ParquetDataCatalog. Two timestamps (ts_event, ts_init) on every object give you deterministic replay ordering and free latency telemetry. The catalog is the canonical historical store; the cache is the runtime store; the bus is the in-process router. Cortana MK3’s UW WebSocket flow alerts, scoring events, and meta-prob outputs all become custom Data subclasses published on the bus. This page is the canonical Nautilus-data reference for the Saturday 2026-05-09 spike.

This page supplements nautilus-concepts.md (which covers DataEngine + Cache + MessageBus + ParquetDataCatalog at the architectural level). For a Cortana → Nautilus translation table see nautilus-developer-guide.md. For IBKR adapter specifics see nautilus-integrations.md.

Built-in data types

Nautilus ships a finite set of normalized market-data classes. Every venue adapter (DataClient) is responsible for parsing wire bytes into one of these types - strategies never see venue-specific shapes.

TypeWhat it represents
OrderBookDelta (L1/L2/L3)Single most-granular book update. Carries flags for event-boundary semantics (see below).
OrderBookDeltasBatched delta sequence for efficient processing.
OrderBookDepth10Pre-aggregated up to 10-deep snapshot per side.
QuoteTickBest bid/ask price + size at top-of-book.
TradeTickA single counterparty match (price, qty, aggressor side, trade ID).
BarOHLCV candle aggregated by some method (see Bar aggregation below).
MarkPriceUpdateMark price (typical for derivatives).
IndexPriceUpdateUnderlying index reference price.
FundingRateUpdatePeriodic funding rate (perpetuals).
InstrumentStatusInstrument-lifecycle event (halt/resume/close).
InstrumentCloseClosing print for an instrument.

Plus option-specific types (OptionGreeks, option-chain snapshots - see the Options/Greeks concept pages). The framework operates “primarily on granular order book data for the highest realism in execution simulations” - but backtests can run against any of the above depending on fidelity needs.

Order books

OrderBook instances are maintained per instrument in both backtest and live. Three book types:

  • L3_MBO - market by order, every event keyed by order ID.
  • L2_MBP - market by price, events aggregated by price level.
  • L1_MBP - best bid/offer (top of book only). QuoteTick / TradeTick / Bar data drives an L1_MBP book in backtest.

Delta flags and event boundaries

Each OrderBookDelta carries a flags bitmask using RecordFlag values:

  • F_LAST - final delta in a logical event group. With buffer_deltas enabled, the DataEngine accumulates deltas and only publishes when it encounters F_LAST. Missing F_LAST on the final delta = buffered consumers accumulate forever and never publish. This is a hard contract.
  • F_SNAPSHOT - delta belongs to a snapshot rather than an incremental update. Snapshots begin with a Clear action followed by Add deltas; the last delta has F_SNAPSHOT | F_LAST set.

Empty book snapshots (only a Clear delta) still must set F_LAST.

Instruments

Instruments are referenced by data but are themselves a separate domain concept. Coverage:

InstrumentNotes
EquityGeneric equity.
CurrencyPairSpot FX.
CommoditySpot commodity.
IndexInstrumentSpot index (reference price; not directly tradable).
FuturesContract, FuturesSpreadDeliverable futures + spreads.
CryptoFuture, CryptoPerpetual, CryptoOptionCrypto derivatives.
PerpetualContractAsset-class-agnostic perpetual swap.
OptionContract, OptionSpreadGeneric options + spreads.
BinaryOptionBinary options.
CfdContract for Difference.
BettingInstrumentBetting markets.
SyntheticInstrumentPrice derived via formula from component instruments.

Instrument identification

Format: {symbol}.{venue} - e.g., SPY.ARCA, ETHUSDT-PERP.BINANCE, AAPL.XNAS. Rules from nautilus-concepts.md:

  • “All native symbols should be unique for a venue.”
  • “The {symbol.venue} combination must be unique for a Nautilus system.”

This is the load-bearing identity for every data subscription, every cache lookup, every order. Cortana’s IBKR symbology choice (IB_SIMPLIFIED vs IB_RAW) lives here.

Bars and aggregation

A Bar carries OHLCV plus a BarType that uniquely identifies what kind of aggregation produced it.

BarType string syntax

Standard: {instrument_id}-{step}-{aggregation}-{price_type}-{INTERNAL|EXTERNAL}

bar_type = BarType.from_str("AAPL.XNAS-5-MINUTE-LAST-INTERNAL")

Composite (bar-to-bar): {instrument_id}-{step}-{aggregation}-{price_type}-INTERNAL@{step}-{aggregation}-{INTERNAL|EXTERNAL}

# 5-min bars derived from externally-provided 1-min bars
bar_type = BarType.from_str("AAPL.XNAS-5-MINUTE-LAST-INTERNAL@1-MINUTE-EXTERNAL")

The target (left of @) is always INTERNAL. The source (right of @) can be either. When @ is absent, the bar aggregates from TradeTick (when price_type=LAST) or QuoteTick (when BID/ASK/MID).

Aggregation methods

CategoryMethods
TimeMILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, YEAR
ThresholdTICK, VOLUME, VALUE, RENKO, plus imbalance variants
Information-driven (imbalance)TICK_IMBALANCE, VOLUME_IMBALANCE, VALUE_IMBALANCE
Information-driven (runs)TICK_RUNS, VOLUME_RUNS, VALUE_RUNS

Imbalance bars close when net buy/sell activity reaches threshold. Runs bars close on sustained one-sided pressure (counter resets on aggressor flip). Information-driven bars require TradeTick - they need aggressor_side.

Time bar configuration

Driven by DataEngineConfig:

OptionDefaultEffect
time_bars_interval_type"left-open"Start excluded, end included (or vice versa).
time_bars_timestamp_on_closeTruets_event = bar close (or open if False).
time_bars_skip_first_non_full_barFalseSkip mid-interval first bar.
time_bars_build_with_no_updatesTrueEmit empty bars during quiet periods.
time_bars_origin_offsetNoneMap BarAggregationTimedelta (e.g., align to 09:30 open).
time_bars_build_delay0Microsecond delay before building (backtest determinism).

request_bars() vs subscribe_bars()

This is the dual-surface pattern - the same idea applies to every data type:

def on_start(self) -> None:
    bar_type = BarType.from_str("SPY.ARCA-1-MINUTE-LAST-INTERNAL")
    start = self.clock.utc_now() - timedelta(days=30)
 
    # Register indicators FIRST so they receive historical updates too
    self.register_indicator_for_bars(bar_type, self.ema)
 
    # Historical fetch → on_historical_data(...) handler
    self.request_bars(bar_type, start=start)
 
    # Live feed → on_bar(...) handler
    self.subscribe_bars(bar_type)

Common pitfall: register indicators before requesting historical data. Reversed order means indicators never see history.

For composite bars use request_aggregated_bars([bar_type], start=start) - this builds the dependency-ordered aggregation chain on the fly.

Timestamps - ts_event and ts_init

Every Data subclass - built-in or custom - carries two UNIX-nanosecond timestamps:

  • ts_event - when the event actually occurred (at the exchange / source).
  • ts_init - when Nautilus initialized the internal object representing that event.
Eventts_eventts_init
TradeTickTrade time at exchangeReceipt time
QuoteTickQuote time at exchangeReceipt time
OrderBookDeltaBook update at exchangeReceipt time
BarBar close timeGeneration time (internal) or receipt (external)
OrderFilledFill time at exchangeLocal processing time
Custom eventWhen conditions occurredObject creation time

Latency analysis

latency = ts_init - ts_event gives total system latency (network + processing

  • queueing). Clocks aren’t synchronized so the difference is not strictly positive - it’s an estimate.

Backtest vs live ordering

  • Backtest: data is ordered by ts_init using a stable sort. Deterministic replay. This is the property MK3 needs for decisions.db replay.
  • Live: data is processed as it arrives. ts_event reflects external time; ts_init reflects local creation. Difference detects network/processing delays.

For internally-created data (e.g., a custom ScoringEvent produced inside the runtime), ts_event and ts_init may legitimately be equal.

Data flow - DataEngine → Cache → MessageBus

From the DataEngine onward, the path is identical across backtest, sandbox, and live:

  1. Source - adapter creates a normalized Data object (live), or the BacktestEngine feeds it directly (backtest).
  2. DataEngine - receives the object, validates type, applies any buffering (e.g., book deltas accumulating until F_LAST).
  3. Cache - engine writes to cache first. This is the cache-then-publish invariant: the latest value is in cache before any subscriber handler runs.
  4. MessageBus - engine then publishes on the topic derived from the data’s identifier. Subscribers receive it via on_quote_tick, on_bar, on_data (custom), etc.

This guarantee is what makes handlers safe - self.cache.quote_tick(id) inside on_quote_tick(tick) returns the very tick that triggered the handler. No race window.

Loading data - DataLoader + DataWrangler

Nautilus splits external-data ingest into two pipeline stages:

  1. DataLoader - venue-specific. Reads raw CSV/JSON/binary into a pd.DataFrame with the right schema for a Nautilus type. One per raw-source × format combo (e.g., BinanceOrderBookDeltaDataLoader).
  2. DataWrangler - type-specific. Takes the pd.DataFrame and emits a list[Data] of Nautilus objects. Lives under nautilus_trader.persistence.wranglers.

Common v1 wranglers:

  • OrderBookDeltaDataWrangler
  • QuoteTickDataWrangler
  • TradeTickDataWrangler
  • BarDataWrangler

Arrow v2 / PyO3: OrderBookDepth10DataWranglerV2 and friends emit PyO3 objects (not compatible with v1 Cython BacktestEngine consumers).

Example pipeline:

from nautilus_trader.adapters.binance.loaders import BinanceOrderBookDeltaDataLoader
from nautilus_trader.persistence.wranglers import OrderBookDeltaDataWrangler
from nautilus_trader.test_kit.providers import TestInstrumentProvider
 
df = BinanceOrderBookDeltaDataLoader.load(data_path)
instrument = TestInstrumentProvider.btcusdt_binance()
wrangler = OrderBookDeltaDataWrangler(instrument)
deltas = wrangler.process(df)
# deltas is now list[OrderBookDelta] - feed to BacktestEngine or write to catalog

Fixed-point precision and raw values

Price and Quantity are fixed-point internally. from_raw() must receive a valid multiple of the scale factor (10^(FIXED_PRECISION - precision) where FIXED_PRECISION is 9 standard / 16 high-precision). Invalid raw values panic the runtime.

The Arrow decode path auto-corrects floating-point drift like int(0.67068 * 1e9) == 670680000000001 by rounding to the nearest valid multiple - so old catalogs work without migration. Small per-decode overhead.

ParquetDataCatalog - the historical store

The catalog is Nautilus’s canonical persistent data store. This is where Cortana MK3’s backtest replay data lives.

Architecture

  • Rust backend - high-perf for OrderBookDelta, OrderBookDeltas, OrderBookDepth10, QuoteTick, TradeTick, Bar, MarkPriceUpdate. Plus registered same-binary Rust custom types.
  • PyArrow backend - flexible fallback for custom Python data types and advanced filtering.
  • fsspec integration - local FS, S3, GCS, Azure Blob (abfs/az).

Initialization

from pathlib import Path
from nautilus_trader.persistence.catalog import ParquetDataCatalog
 
catalog = ParquetDataCatalog(Path.cwd() / "catalog")
# Or: ParquetDataCatalog.from_env()  → uses NAUTILUS_PATH env var
# Or: ParquetDataCatalog.from_uri("s3://bucket/nautilus-data/")

NAUTILUS_PATH points to the parent directory; the catalog appends /catalog automatically.

File organization

catalog/
├── data/
│   ├── quote_ticks/
│   │   └── EURUSD.SIM/
│   │       └── 2024-01-01T00-00-00-000000000Z_2024-01-01T23-59-59-999999999Z.parquet
│   └── trade_ticks/
│       └── BTCUSD.BINANCE/
│           └── ...

Files named {start_timestamp}_{end_timestamp}.parquet (ISO 8601 with filename-safe : and . replaced by -).

Writing

catalog.write_data(quote_ticks)
catalog.write_data(trade_ticks, start=ns_start, end=ns_end)
catalog.write_data(bars, skip_disjoint_check=True)  # Allow overlapping

By default overlapping writes raise ValueError. Use skip_disjoint_check=True when intentional.

Reading

quotes = catalog.query(
    data_cls=QuoteTick,
    identifiers=["EUR/USD.SIM"],
    start="2024-01-01T00:00:00Z",
    end="2024-01-02T00:00:00Z",
)

Time formats: ISO 8601 string, UNIX nanoseconds, pd.Timestamp, datetime.

Filtering:

  • where= - DataFusion SQL predicate (Rust backend only).
  • filter_expr= - PyArrow dataset expression (PyArrow backend or files= forced-path).

Operations

  • reset_all_file_names() / reset_data_file_names(QuoteTick, "BTC.BINANCE") - realign filenames with content timestamps.
  • consolidate_catalog(start=, end=, ensure_contiguous_files=True) - merge small files for query perf.
  • consolidate_catalog_by_period(period=pd.Timedelta(days=1)) - standardize file size by time period.
  • delete_catalog_range(start=, end=) / delete_data_range(...) - permanent delete; partial-overlap files are split to preserve out-of-range data.

BacktestDataConfig - pre-specifying data for a backtest

from nautilus_trader.config import BacktestDataConfig
from nautilus_trader.model import QuoteTick, InstrumentId
 
data_config = BacktestDataConfig(
    catalog_path="/path/to/catalog",
    data_cls=QuoteTick,
    instrument_id=InstrumentId.from_str("EUR/USD.SIM"),
    start_time="2024-01-01T00:00:00Z",
    end_time="2024-01-02T00:00:00Z",
)

For bar data: bar_spec="5-MINUTE-LAST" builds the EXTERNAL bar identifier with the instrument_id. Use bar_types=[...] for explicit INTERNAL or composite types.

For custom data: pass the class via data_cls (or string), client_id="MyAdapter", and metadata={...} to filter.

DataCatalogConfig - on-the-fly access during live trading

from nautilus_trader.persistence.config import DataCatalogConfig
 
catalog_config = DataCatalogConfig(
    path="/data/nautilus/catalog",
    fs_protocol="file",
    name="historical_data",
)
 
node_config = TradingNodeConfig(
    catalogs=[catalog_config],  # Enables historical access from live strategies
    ...
)

StreamingConfig - write live data to catalog as it arrives

from nautilus_trader.persistence.config import StreamingConfig, RotationMode
import pandas as pd
 
streaming_config = StreamingConfig(
    catalog_path="/path/to/streaming/catalog",
    flush_interval_ms=1000,
    rotation_mode=RotationMode.INTERVAL,
    rotation_interval=pd.Timedelta(hours=1),
    max_file_size=1024 * 1024 * 100,  # 100 MB
)

Streams via Feather files first, then catalog.convert_stream_to_data(...) promotes to permanent Parquet.

Custom data types - the load-bearing extensibility point

Anything that is not in the built-in list must be a Data subclass. This is how Cortana MK3 expresses UW flow alerts, scoring events, meta-prob outputs, and EMA-decay flow values - they all become custom data types riding the same engine/cache/bus as native types.

The Data contract

Two required properties: ts_event (int) and ts_init (int), both UNIX nanoseconds. Backing fields + @property is the recommended pattern. Data holds no state - super().__init__() is not strictly required.

Manual definition (full control)

from nautilus_trader.core import Data
 
class MyDataPoint(Data):
    def __init__(
        self,
        label: str,
        x: int, y: int, z: int,
        ts_event: int,
        ts_init: int,
    ) -> None:
        self.label = label
        self.x = x
        self.y = y
        self.z = z
        self._ts_event = ts_event
        self._ts_init = ts_init
 
    @property
    def ts_event(self) -> int:
        return self._ts_event
 
    @property
    def ts_init(self) -> int:
        return self._ts_init

Publishing

from nautilus_trader.model import DataType
 
self.publish_data(
    DataType(MyDataPoint, metadata={"category": "scoring"}),
    MyDataPoint(...),
)

The metadata dict is part of the bus topic - different metadata = different topic. Subscribers must pass matching metadata.

Subscribing

from nautilus_trader.model.identifiers import ClientId
 
self.subscribe_data(
    data_type=DataType(MyDataPoint, metadata={"category": "scoring"}),
    client_id=ClientId("MY_ADAPTER"),
)
 
def on_data(self, data: Data) -> None:
    if isinstance(data, MyDataPoint):
        # Handle it
        ...

on_data is the catch-all handler for custom data - type-check inside.

Signals - lightweight primitives

For one-value notifications (str/float/int/bool/bytes):

self.publish_signal("score_alert", 67.5, ts_event)
self.subscribe_signal("score_alert")
 
def on_signal(self, signal):
    print("Signal", signal)

@customdataclass decorator - automatic boilerplate

For full-featured custom data with serialization/cache/catalog support:

from nautilus_trader.model.custom import customdataclass
from nautilus_trader.core import Data
from nautilus_trader.model import InstrumentId
 
@customdataclass
class GreeksTestData(Data):
    instrument_id: InstrumentId = InstrumentId.from_str("ES.GLBX")
    delta: float = 0.0
 
GreeksTestData(
    instrument_id=InstrumentId.from_str("CL.GLBX"),
    delta=1000.0,
    ts_event=1,
    ts_init=2,
)

This auto-generates to_dict/from_dict/to_bytes/from_bytes/schema/ to_catalog/from_catalog - everything needed for bus serialization, cache storage, and Parquet persistence.

For the Rust-backed PyO3 catalog, use @customdataclass_pyo3() and call register_custom_data_class(MyType) once at startup.

When you need control over schema/serialization (the Greeks pattern is the template):

import msgspec
import pyarrow as pa
from nautilus_trader.core import Data
from nautilus_trader.model import DataType, InstrumentId
from nautilus_trader.serialization.base import register_serializable_type
from nautilus_trader.serialization.arrow.serializer import register_arrow
 
class GreeksData(Data):
    def __init__(self, instrument_id, ts_event, ts_init, delta):
        self.instrument_id = instrument_id
        self._ts_event = ts_event
        self._ts_init = ts_init
        self.delta = delta
 
    @property
    def ts_event(self): return self._ts_event
    @property
    def ts_init(self): return self._ts_init
 
    def to_dict(self):
        return {
            "instrument_id": self.instrument_id.value,
            "ts_event": self._ts_event,
            "ts_init": self._ts_init,
            "delta": self.delta,
        }
 
    @classmethod
    def from_dict(cls, data):
        return GreeksData(InstrumentId.from_str(data["instrument_id"]),
                          data["ts_event"], data["ts_init"], data["delta"])
 
    def to_bytes(self): return msgspec.msgpack.encode(self.to_dict())
 
    @classmethod
    def from_bytes(cls, data): return cls.from_dict(msgspec.msgpack.decode(data))
 
    def to_catalog(self):
        return pa.RecordBatch.from_pylist([self.to_dict()], schema=cls.schema())
 
    @classmethod
    def from_catalog(cls, table):
        return [cls.from_dict(d) for d in table.to_pylist()]
 
    @classmethod
    def schema(cls):
        return pa.schema({
            "instrument_id": pa.string(),
            "ts_event": pa.int64(),
            "ts_init": pa.int64(),
            "delta": pa.float64(),
        })
 
# Register for bus/cache serialization
register_serializable_type(GreeksData, GreeksData.to_dict, GreeksData.from_dict)
# Register for catalog Parquet read/write
register_arrow(GreeksData, GreeksData.schema(), GreeksData.to_catalog, GreeksData.from_catalog)

Cache writes (raw bytes by key)

def greeks_key(instrument_id):
    return f"{instrument_id}_GREEKS"
 
def cache_greeks(self, greeks_data):
    self.cache.add(greeks_key(greeks_data.instrument_id), greeks_data.to_bytes())
 
def greeks_from_cache(self, instrument_id):
    return GreeksData.from_bytes(self.cache.get(greeks_key(instrument_id)))

Catalog writes

catalog = ParquetDataCatalog('.')
catalog.write_data([GreeksData(...)])  # works once register_arrow has run

.pyi stubs for IDE support

When using @customdataclass, the constructor is generated at runtime - IDEs can’t see it. Hand-write a .pyi stub for autocomplete.

Custom data - how to define UWFlowAlert and ScoringEvent

These are the two highest-priority custom types for Cortana MK3. Both should use @customdataclass (Python-side) since the Rust path isn’t yet justified for spike-level work - the per-event cost is dominated by ML inference, not serialization.

UWFlowAlert - UW WebSocket flow alert

Sub-second alerts from UW WebSocket. Primary signal source for the engine. Cortana cares about strike/side/size/premium/aggressor/sweep-or-block.

from nautilus_trader.model.custom import customdataclass
from nautilus_trader.core import Data
from nautilus_trader.model import InstrumentId
 
@customdataclass
class UWFlowAlert(Data):
    instrument_id: InstrumentId = InstrumentId.from_str("SPY.ARCA")
    strike: float = 0.0
    expiry: str = ""              # ISO date
    option_side: str = ""          # "CALL" | "PUT"
    aggressor_side: str = ""       # "BUY" | "SELL" | "MIXED"
    premium_usd: float = 0.0
    size_contracts: int = 0
    is_sweep: bool = False
    is_block: bool = False
    flow_score: float = 0.0        # UW's own score, 0-100
    underlying_price: float = 0.0
    raw_id: str = ""               # UW alert ID for dedup
 
# Publish from UW DataClient
self.publish_data(
    DataType(UWFlowAlert, metadata={"underlying": "SPY"}),
    UWFlowAlert(ts_event=ws_event_ns, ts_init=now_ns, ...),
)
 
# Subscribe in scoring actor
self.subscribe_data(
    data_type=DataType(UWFlowAlert, metadata={"underlying": "SPY"}),
    client_id=ClientId("UW"),
)
 
def on_data(self, data: Data) -> None:
    if isinstance(data, UWFlowAlert):
        self._update_flow_state(data)

The UW DataClient (custom adapter - see nautilus-developer-guide.md) opens the WebSocket in connect(), decodes each frame to a UWFlowAlert, and calls self._handle_data(alert) to push it through DataEngineCacheMessageBus.

ScoringEvent - composite-score snapshot

Output of the scoring engine. 78 features + composite + bias + conviction. Published every time the engine re-scores. This is the event the strategy gates on.

@customdataclass
class ScoringEvent(Data):
    instrument_id: InstrumentId = InstrumentId.from_str("SPY.ARCA")
    composite_score: float = 0.0           # 0-100
    bias: str = ""                         # "BULL" | "BEAR" | "NEUTRAL"
    conviction: str = ""                   # "LOW" | "MEDIUM" | "HIGH"
    # 78 features as a Dict[str, float] would be cleaner but @customdataclass
    # prefers flat scalars for Arrow schema simplicity. Pick the top-N or
    # serialize the full vector as a JSON string field for the spike.
    features_json: str = "{}"
    meta_prob: float = 0.0                 # secondary classifier output, 0-1
    meta_prob_finite: bool = True          # gate flag
    impulse_engine_score: float = 0.0
    contradiction_flag: bool = False
    ema_decay_value: float = 0.0
    spy_price_at_score: float = 0.0        # snapshot for replay

Published by the ScoringActor, subscribed by CortanaStrategy. The strategy’s on_data checks composite_score >= 65 and bias == "BULL" etc.

Other custom types Cortana needs

Cortana conceptCustom type sketch
Meta-model probability outputEither an attribute on ScoringEvent (current sketch) or a separate MetaProbEvent if it updates independently
EMA-decay flow valueEither an attribute on ScoringEvent or EmaDecayUpdate if published at a different cadence
UW REST option-chain snapshotOptionChainSnapshot(Data) - large; consider whether it goes through the bus or just into cache
Brain-page write triggerTradeOutcomeEvent published on_position_closed - out-of-band actor logs to ~/brain

Cortana MK3 implications

This is where the abstract maps to today’s stack.

Input → Nautilus type map

Cortana inputNautilus expression
UW WebSocket flow alerts (sub-second, primary signal)Custom UWFlowAlert(Data) published by a custom UW DataClient. Crate skeleton in nautilus-developer-guide.md Phase 1-7.
UW REST option-chain snapshots (on-demand)Custom OptionChainSnapshot(Data) returned from UWDataClient._on_request(); cached by key.
IBKR top-of-book quotes (price of record)Built-in QuoteTick from the shipped IBKR adapter. No translation work.
IBKR 5s and 1m barsBuilt-in Bar with BarType SPY.ARCA-5-SECOND-LAST-EXTERNAL / SPY.ARCA-1-MINUTE-LAST-EXTERNAL from IBKR adapter.
IBKR option-chain greeksBuilt-in OptionGreeks event from IBKR adapter (or compute locally if not surfaced).

Output → Nautilus type map

Cortana outputNautilus expression
Scoring events (78 features + composite)Custom ScoringEvent(Data) published by a ScoringActor. Subscribers: strategy, dashboard, brain-logger.
Meta-model probabilitiesField on ScoringEvent (or separate MetaProbEvent if cadence differs).
EMA-decay flow valuesField on ScoringEvent (or EmaDecayUpdate(Data) if it updates independently).
Trade outcomes (for brain pages)Strategy on_position_closed triggers TradeOutcomeEvent consumed by an out-of-band brain-logger actor.

Where the catalog lives for backtest replay

Single canonical location. Suggested layout:

~/cortana-data/
  catalog/                           # The ParquetDataCatalog root
    data/
      quote_ticks/
        SPY.ARCA/
          {ts_start}_{ts_end}.parquet   # IBKR quote ticks
      bars/
        SPY.ARCA-1-MINUTE-LAST-EXTERNAL/
          ...
      uw_flow_alert/                  # custom type (PyArrow backend)
        SPY/
          ...
      scoring_event/                  # custom type (PyArrow backend)
        SPY/
          ...

Set NAUTILUS_PATH=~/cortana-data and use ParquetDataCatalog.from_env() everywhere. Multi-tenant note (per spike Step 7.5): each tenant gets its own NAUTILUS_PATH (e.g., ~/cortana-data/tenants/{tenant_id}/), or use S3 with per-tenant key prefix.

Replaying today’s decisions.db rows as Nautilus events

This is one of the spike’s hard questions (Step 6). Two paths:

Path A - manual migration via DataLoader/DataWrangler. Write a one-off CortanaDecisionsDbLoader that reads decisions.db.scoring_events and emits pd.DataFrame with ScoringEvent schema. Wrangle to list[ScoringEvent]. Write to catalog via catalog.write_data(events). Replay from catalog through BacktestDataConfig(data_cls=ScoringEvent, client_id="CORTANA").

# Sketch
import sqlite3, pandas as pd
from nautilus_trader.persistence.catalog import ParquetDataCatalog
 
def replay_decisions_db(db_path: str, catalog: ParquetDataCatalog):
    conn = sqlite3.connect(db_path)
    df = pd.read_sql("SELECT * FROM scoring_events ORDER BY ts_event", conn)
    events = [
        ScoringEvent(
            instrument_id=InstrumentId.from_str("SPY.ARCA"),
            ts_event=row.ts_event_ns,
            ts_init=row.ts_init_ns,
            composite_score=row.composite_score,
            bias=row.bias,
            ...
        )
        for row in df.itertuples()
    ]
    catalog.write_data(events)

Path B - synthetic stream replay. Keep decisions.db authoritative; build a DecisionsDbDataClient(LiveDataClient) that reads rows from the SQLite file and emits them as _handle_data(event) calls in time order. This avoids re-encoding history in Parquet but is brittle and only works if backtest mode can route through a “fake live” data client.

Recommended for spike: Path A. It’s the pattern Nautilus is designed for (catalog-as-source-of-truth) and produces a reusable artifact.

Open spike questions (data-specific)

  1. Sub-second ts_event precision. UW gives millisecond resolution. Nautilus expects nanoseconds. Multiply by 1e6. Check that the DataEngine doesn’t reorder events that share a ts_init after stable-sort - most likely fine, but spike-time confirm.
  2. ScoringEvent schema vs Arrow. Flat scalars are easiest for Arrow schemas. The 78-feature vector is the awkward part. Three options: (a) serialize as JSON string in one column (loses query-ability but easy); (b) split into 78 Arrow columns (clean but rigid); (c) use Arrow Map type (flexible but PyArrow-backend only). Decide during the spike.
  3. OptionChainSnapshot size. A full SPY 0DTE chain is large per snapshot (~50KB pickled). Probably don’t publish on the bus every refresh - write to cache by key (f"{instrument_id}_CHAIN_{ts}") and have actors pull on demand.

Quick reference - import checklist

# Built-in types
from nautilus_trader.model import (
    QuoteTick, TradeTick, Bar, BarType, BarSpecification,
    OrderBookDelta, OrderBookDeltas, OrderBookDepth10,
    InstrumentId, DataType,
)
 
# Custom data
from nautilus_trader.core import Data
from nautilus_trader.model.custom import customdataclass
from nautilus_trader.serialization.base import register_serializable_type
from nautilus_trader.serialization.arrow.serializer import register_arrow
 
# Catalog
from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.persistence.config import (
    DataCatalogConfig, StreamingConfig, RotationMode,
)
 
# Wranglers
from nautilus_trader.persistence.wranglers import (
    OrderBookDeltaDataWrangler, QuoteTickDataWrangler,
    TradeTickDataWrangler, BarDataWrangler,
)
 
# Backtest config
from nautilus_trader.config import BacktestDataConfig, BacktestRunConfig

Anti-patterns to avoid

  • Manual ts_event / ts_init plumbing in strategy code. All ordering flows from these two fields. Don’t second-guess the DataEngine’s stable sort.
  • Mutating a published Data object. nautilus-concepts.md is explicit: messages are immutable post-publish. Derive new local state if you need a different shape.
  • Writing to cache yourself before letting the engine see the data. Use _handle_data(event) from inside a DataClient so the engine’s cache-then-publish invariant holds. Don’t cache.add() then publish_data() separately from a strategy - that breaks the ordering guarantee.
  • Publishing custom data without registering serializable/Arrow handlers. If you skip register_serializable_type you can’t survive a Redis bus hop or restart-replay. If you skip register_arrow you can’t write to the catalog.
  • Putting the 78-feature vector in 78 separate cache writes. One ScoringEvent per score, not 78 partial updates.
  • Using float time.time() anywhere. self.clock.utc_now() or self.clock.timestamp_ns() always - the same code must run in backtest (simulated clock) and live (wall clock).

See Also


Timeline

  • 2026-05-07 | Cody - Filed during pre-spike concept mastery sweep.