Nautilus Data Model

Nautilus’s data subsystem is a single, normalized, deterministically-ordered stream of timestamped objects that flow through the DataEngine → Cache → MessageBus pipeline regardless of whether the runtime is a backtest, sandbox, or live TradingNode. Built-in types cover ticks, bars, books, mark/index/funding prices, instrument lifecycle, and option Greeks. Custom data - anything you subclass off Data (or decorate with @customdataclass / @customdataclass_pyo3) - rides the same engine, the same cache, the same bus, and can be persisted in the same ParquetDataCatalog. Two timestamps (ts_event, ts_init) on every object give you deterministic replay ordering and free latency telemetry. The catalog is the canonical historical store; the cache is the runtime store; the bus is the in-process router. Cortana MK3’s UW WebSocket flow alerts, scoring events, and meta-prob outputs all become custom Data subclasses published on the bus. This page is the canonical Nautilus-data reference for the Saturday 2026-05-09 spike.

This page supplements nautilus-concepts.md (which covers DataEngine + Cache + MessageBus + ParquetDataCatalog at the architectural level). For a Cortana → Nautilus translation table see nautilus-developer-guide.md. For IBKR adapter specifics see nautilus-integrations.md.

Built-in data types

Nautilus ships a finite set of normalized market-data classes. Every venue adapter (DataClient) is responsible for parsing wire bytes into one of these types - strategies never see venue-specific shapes.

Type	What it represents
`OrderBookDelta` (L1/L2/L3)	Single most-granular book update. Carries `flags` for event-boundary semantics (see below).
`OrderBookDeltas`	Batched delta sequence for efficient processing.
`OrderBookDepth10`	Pre-aggregated up to 10-deep snapshot per side.
`QuoteTick`	Best bid/ask price + size at top-of-book.
`TradeTick`	A single counterparty match (price, qty, aggressor side, trade ID).
`Bar`	OHLCV candle aggregated by some method (see Bar aggregation below).
`MarkPriceUpdate`	Mark price (typical for derivatives).
`IndexPriceUpdate`	Underlying index reference price.
`FundingRateUpdate`	Periodic funding rate (perpetuals).
`InstrumentStatus`	Instrument-lifecycle event (halt/resume/close).
`InstrumentClose`	Closing print for an instrument.

Plus option-specific types (OptionGreeks, option-chain snapshots - see the Options/Greeks concept pages). The framework operates “primarily on granular order book data for the highest realism in execution simulations” - but backtests can run against any of the above depending on fidelity needs.

Order books

OrderBook instances are maintained per instrument in both backtest and live. Three book types:

L3_MBO - market by order, every event keyed by order ID.
L2_MBP - market by price, events aggregated by price level.
L1_MBP - best bid/offer (top of book only). QuoteTick / TradeTick / Bar data drives an L1_MBP book in backtest.

Delta flags and event boundaries

Each OrderBookDelta carries a flags bitmask using RecordFlag values:

F_LAST - final delta in a logical event group. With buffer_deltas enabled, the DataEngine accumulates deltas and only publishes when it encounters F_LAST. Missing F_LAST on the final delta = buffered consumers accumulate forever and never publish. This is a hard contract.
F_SNAPSHOT - delta belongs to a snapshot rather than an incremental update. Snapshots begin with a Clear action followed by Add deltas; the last delta has F_SNAPSHOT | F_LAST set.

Empty book snapshots (only a Clear delta) still must set F_LAST.

Instruments

Instruments are referenced by data but are themselves a separate domain concept. Coverage:

Instrument	Notes
`Equity`	Generic equity.
`CurrencyPair`	Spot FX.
`Commodity`	Spot commodity.
`IndexInstrument`	Spot index (reference price; not directly tradable).
`FuturesContract`, `FuturesSpread`	Deliverable futures + spreads.
`CryptoFuture`, `CryptoPerpetual`, `CryptoOption`	Crypto derivatives.
`PerpetualContract`	Asset-class-agnostic perpetual swap.
`OptionContract`, `OptionSpread`	Generic options + spreads.
`BinaryOption`	Binary options.
`Cfd`	Contract for Difference.
`BettingInstrument`	Betting markets.
`SyntheticInstrument`	Price derived via formula from component instruments.

Instrument identification

Format: {symbol}.{venue} - e.g., SPY.ARCA, ETHUSDT-PERP.BINANCE, AAPL.XNAS. Rules from nautilus-concepts.md:

“All native symbols should be unique for a venue.”
“The {symbol.venue} combination must be unique for a Nautilus system.”

This is the load-bearing identity for every data subscription, every cache lookup, every order. Cortana’s IBKR symbology choice (IB_SIMPLIFIED vs IB_RAW) lives here.

Bars and aggregation

A Bar carries OHLCV plus a BarType that uniquely identifies what kind of aggregation produced it.

`BarType` string syntax

Standard: {instrument_id}-{step}-{aggregation}-{price_type}-{INTERNAL|EXTERNAL}

bar_type = BarType.from_str("AAPL.XNAS-5-MINUTE-LAST-INTERNAL")

Composite (bar-to-bar): {instrument_id}-{step}-{aggregation}-{price_type}-INTERNAL@{step}-{aggregation}-{INTERNAL|EXTERNAL}

# 5-min bars derived from externally-provided 1-min bars
bar_type = BarType.from_str("AAPL.XNAS-5-MINUTE-LAST-INTERNAL@1-MINUTE-EXTERNAL")

The target (left of @) is always INTERNAL. The source (right of @) can be either. When @ is absent, the bar aggregates from TradeTick (when price_type=LAST) or QuoteTick (when BID/ASK/MID).

Aggregation methods

Category	Methods
Time	`MILLISECOND`, `SECOND`, `MINUTE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, `YEAR`
Threshold	`TICK`, `VOLUME`, `VALUE`, `RENKO`, plus imbalance variants
Information-driven (imbalance)	`TICK_IMBALANCE`, `VOLUME_IMBALANCE`, `VALUE_IMBALANCE`
Information-driven (runs)	`TICK_RUNS`, `VOLUME_RUNS`, `VALUE_RUNS`

Imbalance bars close when net buy/sell activity reaches threshold. Runs bars close on sustained one-sided pressure (counter resets on aggressor flip). Information-driven bars require TradeTick - they need aggressor_side.

Time bar configuration

Driven by DataEngineConfig:

Option	Default	Effect
`time_bars_interval_type`	`"left-open"`	Start excluded, end included (or vice versa).
`time_bars_timestamp_on_close`	`True`	`ts_event` = bar close (or open if `False`).
`time_bars_skip_first_non_full_bar`	`False`	Skip mid-interval first bar.
`time_bars_build_with_no_updates`	`True`	Emit empty bars during quiet periods.
`time_bars_origin_offset`	`None`	Map `BarAggregation` → `Timedelta` (e.g., align to 09:30 open).
`time_bars_build_delay`	`0`	Microsecond delay before building (backtest determinism).

`request_bars()` vs `subscribe_bars()`

This is the dual-surface pattern - the same idea applies to every data type:

def on_start(self) -> None:
    bar_type = BarType.from_str("SPY.ARCA-1-MINUTE-LAST-INTERNAL")
    start = self.clock.utc_now() - timedelta(days=30)
 
    # Register indicators FIRST so they receive historical updates too
    self.register_indicator_for_bars(bar_type, self.ema)
 
    # Historical fetch → on_historical_data(...) handler
    self.request_bars(bar_type, start=start)
 
    # Live feed → on_bar(...) handler
    self.subscribe_bars(bar_type)

Common pitfall: register indicators before requesting historical data. Reversed order means indicators never see history.

For composite bars use request_aggregated_bars([bar_type], start=start) - this builds the dependency-ordered aggregation chain on the fly.

Timestamps - `ts_event` and `ts_init`

Every Data subclass - built-in or custom - carries two UNIX-nanosecond timestamps:

ts_event - when the event actually occurred (at the exchange / source).
ts_init - when Nautilus initialized the internal object representing that event.

Event	`ts_event`	`ts_init`
`TradeTick`	Trade time at exchange	Receipt time
`QuoteTick`	Quote time at exchange	Receipt time
`OrderBookDelta`	Book update at exchange	Receipt time
`Bar`	Bar close time	Generation time (internal) or receipt (external)
`OrderFilled`	Fill time at exchange	Local processing time
Custom event	When conditions occurred	Object creation time

Latency analysis

latency = ts_init - ts_event gives total system latency (network + processing

queueing). Clocks aren’t synchronized so the difference is not strictly positive - it’s an estimate.

Backtest vs live ordering

Backtest: data is ordered by ts_init using a stable sort. Deterministic replay. This is the property MK3 needs for decisions.db replay.
Live: data is processed as it arrives. ts_event reflects external time; ts_init reflects local creation. Difference detects network/processing delays.

For internally-created data (e.g., a custom ScoringEvent produced inside the runtime), ts_event and ts_init may legitimately be equal.

Data flow - DataEngine → Cache → MessageBus

From the DataEngine onward, the path is identical across backtest, sandbox, and live:

Source - adapter creates a normalized Data object (live), or the BacktestEngine feeds it directly (backtest).
DataEngine - receives the object, validates type, applies any buffering (e.g., book deltas accumulating until F_LAST).
Cache - engine writes to cache first. This is the cache-then-publish invariant: the latest value is in cache before any subscriber handler runs.
MessageBus - engine then publishes on the topic derived from the data’s identifier. Subscribers receive it via on_quote_tick, on_bar, on_data (custom), etc.

This guarantee is what makes handlers safe - self.cache.quote_tick(id) inside on_quote_tick(tick) returns the very tick that triggered the handler. No race window.

Loading data - DataLoader + DataWrangler

Nautilus splits external-data ingest into two pipeline stages:

DataLoader - venue-specific. Reads raw CSV/JSON/binary into a pd.DataFrame with the right schema for a Nautilus type. One per raw-source × format combo (e.g., BinanceOrderBookDeltaDataLoader).
DataWrangler - type-specific. Takes the pd.DataFrame and emits a list[Data] of Nautilus objects. Lives under nautilus_trader.persistence.wranglers.

Common v1 wranglers:

OrderBookDeltaDataWrangler
QuoteTickDataWrangler
TradeTickDataWrangler
BarDataWrangler

Arrow v2 / PyO3: OrderBookDepth10DataWranglerV2 and friends emit PyO3 objects (not compatible with v1 Cython BacktestEngine consumers).

Example pipeline:

from nautilus_trader.adapters.binance.loaders import BinanceOrderBookDeltaDataLoader
from nautilus_trader.persistence.wranglers import OrderBookDeltaDataWrangler
from nautilus_trader.test_kit.providers import TestInstrumentProvider
 
df = BinanceOrderBookDeltaDataLoader.load(data_path)
instrument = TestInstrumentProvider.btcusdt_binance()
wrangler = OrderBookDeltaDataWrangler(instrument)
deltas = wrangler.process(df)
# deltas is now list[OrderBookDelta] - feed to BacktestEngine or write to catalog

Fixed-point precision and raw values

Price and Quantity are fixed-point internally. from_raw() must receive a valid multiple of the scale factor (10^(FIXED_PRECISION - precision) where FIXED_PRECISION is 9 standard / 16 high-precision). Invalid raw values panic the runtime.

The Arrow decode path auto-corrects floating-point drift like int(0.67068 * 1e9) == 670680000000001 by rounding to the nearest valid multiple - so old catalogs work without migration. Small per-decode overhead.

ParquetDataCatalog - the historical store

The catalog is Nautilus’s canonical persistent data store. This is where Cortana MK3’s backtest replay data lives.

Architecture

Rust backend - high-perf for OrderBookDelta, OrderBookDeltas, OrderBookDepth10, QuoteTick, TradeTick, Bar, MarkPriceUpdate. Plus registered same-binary Rust custom types.
PyArrow backend - flexible fallback for custom Python data types and advanced filtering.
fsspec integration - local FS, S3, GCS, Azure Blob (abfs/az).

Initialization

from pathlib import Path
from nautilus_trader.persistence.catalog import ParquetDataCatalog
 
catalog = ParquetDataCatalog(Path.cwd() / "catalog")
# Or: ParquetDataCatalog.from_env()  → uses NAUTILUS_PATH env var
# Or: ParquetDataCatalog.from_uri("s3://bucket/nautilus-data/")

NAUTILUS_PATH points to the parent directory; the catalog appends /catalog automatically.

File organization

catalog/
├── data/
│   ├── quote_ticks/
│   │   └── EURUSD.SIM/
│   │       └── 2024-01-01T00-00-00-000000000Z_2024-01-01T23-59-59-999999999Z.parquet
│   └── trade_ticks/
│       └── BTCUSD.BINANCE/
│           └── ...

Files named {start_timestamp}_{end_timestamp}.parquet (ISO 8601 with filename-safe : and . replaced by -).

Writing

catalog.write_data(quote_ticks)
catalog.write_data(trade_ticks, start=ns_start, end=ns_end)
catalog.write_data(bars, skip_disjoint_check=True)  # Allow overlapping

By default overlapping writes raise ValueError. Use skip_disjoint_check=True when intentional.

Reading

quotes = catalog.query(
    data_cls=QuoteTick,
    identifiers=["EUR/USD.SIM"],
    start="2024-01-01T00:00:00Z",
    end="2024-01-02T00:00:00Z",
)

Time formats: ISO 8601 string, UNIX nanoseconds, pd.Timestamp, datetime.

Filtering:

where= - DataFusion SQL predicate (Rust backend only).
filter_expr= - PyArrow dataset expression (PyArrow backend or files= forced-path).

Operations

reset_all_file_names() / reset_data_file_names(QuoteTick, "BTC.BINANCE") - realign filenames with content timestamps.
consolidate_catalog(start=, end=, ensure_contiguous_files=True) - merge small files for query perf.
consolidate_catalog_by_period(period=pd.Timedelta(days=1)) - standardize file size by time period.
delete_catalog_range(start=, end=) / delete_data_range(...) - permanent delete; partial-overlap files are split to preserve out-of-range data.

`BacktestDataConfig` - pre-specifying data for a backtest

from nautilus_trader.config import BacktestDataConfig
from nautilus_trader.model import QuoteTick, InstrumentId
 
data_config = BacktestDataConfig(
    catalog_path="/path/to/catalog",
    data_cls=QuoteTick,
    instrument_id=InstrumentId.from_str("EUR/USD.SIM"),
    start_time="2024-01-01T00:00:00Z",
    end_time="2024-01-02T00:00:00Z",
)

For bar data: bar_spec="5-MINUTE-LAST" builds the EXTERNAL bar identifier with the instrument_id. Use bar_types=[...] for explicit INTERNAL or composite types.

For custom data: pass the class via data_cls (or string), client_id="MyAdapter", and metadata={...} to filter.

`DataCatalogConfig` - on-the-fly access during live trading

from nautilus_trader.persistence.config import DataCatalogConfig
 
catalog_config = DataCatalogConfig(
    path="/data/nautilus/catalog",
    fs_protocol="file",
    name="historical_data",
)
 
node_config = TradingNodeConfig(
    catalogs=[catalog_config],  # Enables historical access from live strategies
    ...
)

`StreamingConfig` - write live data to catalog as it arrives

from nautilus_trader.persistence.config import StreamingConfig, RotationMode
import pandas as pd
 
streaming_config = StreamingConfig(
    catalog_path="/path/to/streaming/catalog",
    flush_interval_ms=1000,
    rotation_mode=RotationMode.INTERVAL,
    rotation_interval=pd.Timedelta(hours=1),
    max_file_size=1024 * 1024 * 100,  # 100 MB
)

Streams via Feather files first, then catalog.convert_stream_to_data(...) promotes to permanent Parquet.

Custom data types - the load-bearing extensibility point

Anything that is not in the built-in list must be a Data subclass. This is how Cortana MK3 expresses UW flow alerts, scoring events, meta-prob outputs, and EMA-decay flow values - they all become custom data types riding the same engine/cache/bus as native types.

The `Data` contract

Two required properties: ts_event (int) and ts_init (int), both UNIX nanoseconds. Backing fields + @property is the recommended pattern. Data holds no state - super().__init__() is not strictly required.

Manual definition (full control)

from nautilus_trader.core import Data
 
class MyDataPoint(Data):
    def __init__(
        self,
        label: str,
        x: int, y: int, z: int,
        ts_event: int,
        ts_init: int,
    ) -> None:
        self.label = label
        self.x = x
        self.y = y
        self.z = z
        self._ts_event = ts_event
        self._ts_init = ts_init
 
    @property
    def ts_event(self) -> int:
        return self._ts_event
 
    @property
    def ts_init(self) -> int:
        return self._ts_init

Publishing

from nautilus_trader.model import DataType
 
self.publish_data(
    DataType(MyDataPoint, metadata={"category": "scoring"}),
    MyDataPoint(...),
)

The metadata dict is part of the bus topic - different metadata = different topic. Subscribers must pass matching metadata.

Subscribing

from nautilus_trader.model.identifiers import ClientId
 
self.subscribe_data(
    data_type=DataType(MyDataPoint, metadata={"category": "scoring"}),
    client_id=ClientId("MY_ADAPTER"),
)
 
def on_data(self, data: Data) -> None:
    if isinstance(data, MyDataPoint):
        # Handle it
        ...

on_data is the catch-all handler for custom data - type-check inside.

Signals - lightweight primitives

For one-value notifications (str/float/int/bool/bytes):

self.publish_signal("score_alert", 67.5, ts_event)
self.subscribe_signal("score_alert")
 
def on_signal(self, signal):
    print("Signal", signal)

`@customdataclass` decorator - automatic boilerplate

For full-featured custom data with serialization/cache/catalog support:

from nautilus_trader.model.custom import customdataclass
from nautilus_trader.core import Data
from nautilus_trader.model import InstrumentId
 
@customdataclass
class GreeksTestData(Data):
    instrument_id: InstrumentId = InstrumentId.from_str("ES.GLBX")
    delta: float = 0.0
 
GreeksTestData(
    instrument_id=InstrumentId.from_str("CL.GLBX"),
    delta=1000.0,
    ts_event=1,
    ts_init=2,
)

This auto-generates to_dict/from_dict/to_bytes/from_bytes/schema/ to_catalog/from_catalog - everything needed for bus serialization, cache storage, and Parquet persistence.

For the Rust-backed PyO3 catalog, use @customdataclass_pyo3() and call register_custom_data_class(MyType) once at startup.

Manual full-featured implementation (option-greeks pattern)

When you need control over schema/serialization (the Greeks pattern is the template):

import msgspec
import pyarrow as pa
from nautilus_trader.core import Data
from nautilus_trader.model import DataType, InstrumentId
from nautilus_trader.serialization.base import register_serializable_type
from nautilus_trader.serialization.arrow.serializer import register_arrow
 
class GreeksData(Data):
    def __init__(self, instrument_id, ts_event, ts_init, delta):
        self.instrument_id = instrument_id
        self._ts_event = ts_event
        self._ts_init = ts_init
        self.delta = delta
 
    @property
    def ts_event(self): return self._ts_event
    @property
    def ts_init(self): return self._ts_init
 
    def to_dict(self):
        return {
            "instrument_id": self.instrument_id.value,
            "ts_event": self._ts_event,
            "ts_init": self._ts_init,
            "delta": self.delta,
        }
 
    @classmethod
    def from_dict(cls, data):
        return GreeksData(InstrumentId.from_str(data["instrument_id"]),
                          data["ts_event"], data["ts_init"], data["delta"])
 
    def to_bytes(self): return msgspec.msgpack.encode(self.to_dict())
 
    @classmethod
    def from_bytes(cls, data): return cls.from_dict(msgspec.msgpack.decode(data))
 
    def to_catalog(self):
        return pa.RecordBatch.from_pylist([self.to_dict()], schema=cls.schema())
 
    @classmethod
    def from_catalog(cls, table):
        return [cls.from_dict(d) for d in table.to_pylist()]
 
    @classmethod
    def schema(cls):
        return pa.schema({
            "instrument_id": pa.string(),
            "ts_event": pa.int64(),
            "ts_init": pa.int64(),
            "delta": pa.float64(),
        })
 
# Register for bus/cache serialization
register_serializable_type(GreeksData, GreeksData.to_dict, GreeksData.from_dict)
# Register for catalog Parquet read/write
register_arrow(GreeksData, GreeksData.schema(), GreeksData.to_catalog, GreeksData.from_catalog)

Cache writes (raw bytes by key)

def greeks_key(instrument_id):
    return f"{instrument_id}_GREEKS"
 
def cache_greeks(self, greeks_data):
    self.cache.add(greeks_key(greeks_data.instrument_id), greeks_data.to_bytes())
 
def greeks_from_cache(self, instrument_id):
    return GreeksData.from_bytes(self.cache.get(greeks_key(instrument_id)))

Catalog writes

catalog = ParquetDataCatalog('.')
catalog.write_data([GreeksData(...)])  # works once register_arrow has run

`.pyi` stubs for IDE support

When using @customdataclass, the constructor is generated at runtime - IDEs can’t see it. Hand-write a .pyi stub for autocomplete.

Custom data - how to define `UWFlowAlert` and `ScoringEvent`

These are the two highest-priority custom types for Cortana MK3. Both should use @customdataclass (Python-side) since the Rust path isn’t yet justified for spike-level work - the per-event cost is dominated by ML inference, not serialization.

`UWFlowAlert` - UW WebSocket flow alert

Sub-second alerts from UW WebSocket. Primary signal source for the engine. Cortana cares about strike/side/size/premium/aggressor/sweep-or-block.

from nautilus_trader.model.custom import customdataclass
from nautilus_trader.core import Data
from nautilus_trader.model import InstrumentId
 
@customdataclass
class UWFlowAlert(Data):
    instrument_id: InstrumentId = InstrumentId.from_str("SPY.ARCA")
    strike: float = 0.0
    expiry: str = ""              # ISO date
    option_side: str = ""          # "CALL" | "PUT"
    aggressor_side: str = ""       # "BUY" | "SELL" | "MIXED"
    premium_usd: float = 0.0
    size_contracts: int = 0
    is_sweep: bool = False
    is_block: bool = False
    flow_score: float = 0.0        # UW's own score, 0-100
    underlying_price: float = 0.0
    raw_id: str = ""               # UW alert ID for dedup
 
# Publish from UW DataClient
self.publish_data(
    DataType(UWFlowAlert, metadata={"underlying": "SPY"}),
    UWFlowAlert(ts_event=ws_event_ns, ts_init=now_ns, ...),
)
 
# Subscribe in scoring actor
self.subscribe_data(
    data_type=DataType(UWFlowAlert, metadata={"underlying": "SPY"}),
    client_id=ClientId("UW"),
)
 
def on_data(self, data: Data) -> None:
    if isinstance(data, UWFlowAlert):
        self._update_flow_state(data)

The UW DataClient (custom adapter - see nautilus-developer-guide.md) opens the WebSocket in connect(), decodes each frame to a UWFlowAlert, and calls self._handle_data(alert) to push it through DataEngine → Cache → MessageBus.

`ScoringEvent` - composite-score snapshot

Output of the scoring engine. 78 features + composite + bias + conviction. Published every time the engine re-scores. This is the event the strategy gates on.

@customdataclass
class ScoringEvent(Data):
    instrument_id: InstrumentId = InstrumentId.from_str("SPY.ARCA")
    composite_score: float = 0.0           # 0-100
    bias: str = ""                         # "BULL" | "BEAR" | "NEUTRAL"
    conviction: str = ""                   # "LOW" | "MEDIUM" | "HIGH"
    # 78 features as a Dict[str, float] would be cleaner but @customdataclass
    # prefers flat scalars for Arrow schema simplicity. Pick the top-N or
    # serialize the full vector as a JSON string field for the spike.
    features_json: str = "{}"
    meta_prob: float = 0.0                 # secondary classifier output, 0-1
    meta_prob_finite: bool = True          # gate flag
    impulse_engine_score: float = 0.0
    contradiction_flag: bool = False
    ema_decay_value: float = 0.0
    spy_price_at_score: float = 0.0        # snapshot for replay

Published by the ScoringActor, subscribed by CortanaStrategy. The strategy’s on_data checks composite_score >= 65 and bias == "BULL" etc.

Other custom types Cortana needs

Cortana concept	Custom type sketch
Meta-model probability output	Either an attribute on `ScoringEvent` (current sketch) or a separate `MetaProbEvent` if it updates independently
EMA-decay flow value	Either an attribute on `ScoringEvent` or `EmaDecayUpdate` if published at a different cadence
UW REST option-chain snapshot	`OptionChainSnapshot(Data)` - large; consider whether it goes through the bus or just into cache
Brain-page write trigger	`TradeOutcomeEvent` published `on_position_closed` - out-of-band actor logs to `~/brain`

Cortana MK3 implications

This is where the abstract maps to today’s stack.

Input → Nautilus type map

Cortana input	Nautilus expression
UW WebSocket flow alerts (sub-second, primary signal)	Custom `UWFlowAlert(Data)` published by a custom UW `DataClient`. Crate skeleton in nautilus-developer-guide.md Phase 1-7.
UW REST option-chain snapshots (on-demand)	Custom `OptionChainSnapshot(Data)` returned from `UWDataClient._on_request()`; cached by key.
IBKR top-of-book quotes (price of record)	Built-in `QuoteTick` from the shipped IBKR adapter. No translation work.
IBKR 5s and 1m bars	Built-in `Bar` with `BarType` `SPY.ARCA-5-SECOND-LAST-EXTERNAL` / `SPY.ARCA-1-MINUTE-LAST-EXTERNAL` from IBKR adapter.
IBKR option-chain greeks	Built-in `OptionGreeks` event from IBKR adapter (or compute locally if not surfaced).

Output → Nautilus type map

Cortana output	Nautilus expression
Scoring events (78 features + composite)	Custom `ScoringEvent(Data)` published by a `ScoringActor`. Subscribers: strategy, dashboard, brain-logger.
Meta-model probabilities	Field on `ScoringEvent` (or separate `MetaProbEvent` if cadence differs).
EMA-decay flow values	Field on `ScoringEvent` (or `EmaDecayUpdate(Data)` if it updates independently).
Trade outcomes (for brain pages)	Strategy `on_position_closed` triggers `TradeOutcomeEvent` consumed by an out-of-band brain-logger actor.

Where the catalog lives for backtest replay

Single canonical location. Suggested layout:

~/cortana-data/
  catalog/                           # The ParquetDataCatalog root
    data/
      quote_ticks/
        SPY.ARCA/
          {ts_start}_{ts_end}.parquet   # IBKR quote ticks
      bars/
        SPY.ARCA-1-MINUTE-LAST-EXTERNAL/
          ...
      uw_flow_alert/                  # custom type (PyArrow backend)
        SPY/
          ...
      scoring_event/                  # custom type (PyArrow backend)
        SPY/
          ...

Set NAUTILUS_PATH=~/cortana-data and use ParquetDataCatalog.from_env() everywhere. Multi-tenant note (per spike Step 7.5): each tenant gets its own NAUTILUS_PATH (e.g., ~/cortana-data/tenants/{tenant_id}/), or use S3 with per-tenant key prefix.

Replaying today’s `decisions.db` rows as Nautilus events

This is one of the spike’s hard questions (Step 6). Two paths:

Path A - manual migration via DataLoader/DataWrangler. Write a one-off CortanaDecisionsDbLoader that reads decisions.db.scoring_events and emits pd.DataFrame with ScoringEvent schema. Wrangle to list[ScoringEvent]. Write to catalog via catalog.write_data(events). Replay from catalog through BacktestDataConfig(data_cls=ScoringEvent, client_id="CORTANA").

# Sketch
import sqlite3, pandas as pd
from nautilus_trader.persistence.catalog import ParquetDataCatalog
 
def replay_decisions_db(db_path: str, catalog: ParquetDataCatalog):
    conn = sqlite3.connect(db_path)
    df = pd.read_sql("SELECT * FROM scoring_events ORDER BY ts_event", conn)
    events = [
        ScoringEvent(
            instrument_id=InstrumentId.from_str("SPY.ARCA"),
            ts_event=row.ts_event_ns,
            ts_init=row.ts_init_ns,
            composite_score=row.composite_score,
            bias=row.bias,
            ...
        )
        for row in df.itertuples()
    ]
    catalog.write_data(events)

Path B - synthetic stream replay. Keep decisions.db authoritative; build a DecisionsDbDataClient(LiveDataClient) that reads rows from the SQLite file and emits them as _handle_data(event) calls in time order. This avoids re-encoding history in Parquet but is brittle and only works if backtest mode can route through a “fake live” data client.

Recommended for spike: Path A. It’s the pattern Nautilus is designed for (catalog-as-source-of-truth) and produces a reusable artifact.

Open spike questions (data-specific)

Sub-second ts_event precision. UW gives millisecond resolution. Nautilus expects nanoseconds. Multiply by 1e6. Check that the DataEngine doesn’t reorder events that share a ts_init after stable-sort - most likely fine, but spike-time confirm.
ScoringEvent schema vs Arrow. Flat scalars are easiest for Arrow schemas. The 78-feature vector is the awkward part. Three options: (a) serialize as JSON string in one column (loses query-ability but easy); (b) split into 78 Arrow columns (clean but rigid); (c) use Arrow Map type (flexible but PyArrow-backend only). Decide during the spike.
OptionChainSnapshot size. A full SPY 0DTE chain is large per snapshot (~50KB pickled). Probably don’t publish on the bus every refresh - write to cache by key (f"{instrument_id}_CHAIN_{ts}") and have actors pull on demand.

Quick reference - import checklist

# Built-in types
from nautilus_trader.model import (
    QuoteTick, TradeTick, Bar, BarType, BarSpecification,
    OrderBookDelta, OrderBookDeltas, OrderBookDepth10,
    InstrumentId, DataType,
)
 
# Custom data
from nautilus_trader.core import Data
from nautilus_trader.model.custom import customdataclass
from nautilus_trader.serialization.base import register_serializable_type
from nautilus_trader.serialization.arrow.serializer import register_arrow
 
# Catalog
from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.persistence.config import (
    DataCatalogConfig, StreamingConfig, RotationMode,
)
 
# Wranglers
from nautilus_trader.persistence.wranglers import (
    OrderBookDeltaDataWrangler, QuoteTickDataWrangler,
    TradeTickDataWrangler, BarDataWrangler,
)
 
# Backtest config
from nautilus_trader.config import BacktestDataConfig, BacktestRunConfig

Anti-patterns to avoid

Manual ts_event / ts_init plumbing in strategy code. All ordering flows from these two fields. Don’t second-guess the DataEngine’s stable sort.
Mutating a published Data object. nautilus-concepts.md is explicit: messages are immutable post-publish. Derive new local state if you need a different shape.
Writing to cache yourself before letting the engine see the data. Use _handle_data(event) from inside a DataClient so the engine’s cache-then-publish invariant holds. Don’t cache.add() then publish_data() separately from a strategy - that breaks the ordering guarantee.
Publishing custom data without registering serializable/Arrow handlers. If you skip register_serializable_type you can’t survive a Redis bus hop or restart-replay. If you skip register_arrow you can’t write to the catalog.
Putting the 78-feature vector in 78 separate cache writes. One ScoringEvent per score, not 78 partial updates.
Using float time.time() anywhere. self.clock.utc_now() or self.clock.timestamp_ns() always - the same code must run in backtest (simulated clock) and live (wall clock).

Timeline

2026-05-07 | Cody - Filed during pre-spike concept mastery sweep.

CortanaROI Brain

Explorer

nautilus-data