Nautilus Data Model
Nautilus’s data subsystem is a single, normalized, deterministically-ordered stream of timestamped objects that flow through the DataEngine → Cache → MessageBus pipeline regardless of whether the runtime is a backtest, sandbox, or live
TradingNode. Built-in types cover ticks, bars, books, mark/index/funding prices, instrument lifecycle, and option Greeks. Custom data - anything you subclass offData(or decorate with@customdataclass/@customdataclass_pyo3) - rides the same engine, the same cache, the same bus, and can be persisted in the sameParquetDataCatalog. Two timestamps (ts_event,ts_init) on every object give you deterministic replay ordering and free latency telemetry. The catalog is the canonical historical store; the cache is the runtime store; the bus is the in-process router. Cortana MK3’s UW WebSocket flow alerts, scoring events, and meta-prob outputs all become customDatasubclasses published on the bus. This page is the canonical Nautilus-data reference for the Saturday 2026-05-09 spike.
This page supplements nautilus-concepts.md (which covers DataEngine + Cache + MessageBus + ParquetDataCatalog at the architectural level). For a Cortana → Nautilus translation table see nautilus-developer-guide.md. For IBKR adapter specifics see nautilus-integrations.md.
Built-in data types
Nautilus ships a finite set of normalized market-data classes. Every venue
adapter (DataClient) is responsible for parsing wire bytes into one of these
types - strategies never see venue-specific shapes.
| Type | What it represents |
|---|---|
OrderBookDelta (L1/L2/L3) | Single most-granular book update. Carries flags for event-boundary semantics (see below). |
OrderBookDeltas | Batched delta sequence for efficient processing. |
OrderBookDepth10 | Pre-aggregated up to 10-deep snapshot per side. |
QuoteTick | Best bid/ask price + size at top-of-book. |
TradeTick | A single counterparty match (price, qty, aggressor side, trade ID). |
Bar | OHLCV candle aggregated by some method (see Bar aggregation below). |
MarkPriceUpdate | Mark price (typical for derivatives). |
IndexPriceUpdate | Underlying index reference price. |
FundingRateUpdate | Periodic funding rate (perpetuals). |
InstrumentStatus | Instrument-lifecycle event (halt/resume/close). |
InstrumentClose | Closing print for an instrument. |
Plus option-specific types (OptionGreeks, option-chain snapshots - see the
Options/Greeks concept pages). The framework operates “primarily on granular
order book data for the highest realism in execution simulations” - but
backtests can run against any of the above depending on fidelity needs.
Order books
OrderBook instances are maintained per instrument in both backtest and live.
Three book types:
L3_MBO- market by order, every event keyed by order ID.L2_MBP- market by price, events aggregated by price level.L1_MBP- best bid/offer (top of book only).QuoteTick/TradeTick/Bardata drives anL1_MBPbook in backtest.
Delta flags and event boundaries
Each OrderBookDelta carries a flags bitmask using RecordFlag values:
F_LAST- final delta in a logical event group. Withbuffer_deltasenabled, theDataEngineaccumulates deltas and only publishes when it encountersF_LAST. MissingF_LASTon the final delta = buffered consumers accumulate forever and never publish. This is a hard contract.F_SNAPSHOT- delta belongs to a snapshot rather than an incremental update. Snapshots begin with aClearaction followed byAdddeltas; the last delta hasF_SNAPSHOT | F_LASTset.
Empty book snapshots (only a Clear delta) still must set F_LAST.
Instruments
Instruments are referenced by data but are themselves a separate domain concept. Coverage:
| Instrument | Notes |
|---|---|
Equity | Generic equity. |
CurrencyPair | Spot FX. |
Commodity | Spot commodity. |
IndexInstrument | Spot index (reference price; not directly tradable). |
FuturesContract, FuturesSpread | Deliverable futures + spreads. |
CryptoFuture, CryptoPerpetual, CryptoOption | Crypto derivatives. |
PerpetualContract | Asset-class-agnostic perpetual swap. |
OptionContract, OptionSpread | Generic options + spreads. |
BinaryOption | Binary options. |
Cfd | Contract for Difference. |
BettingInstrument | Betting markets. |
SyntheticInstrument | Price derived via formula from component instruments. |
Instrument identification
Format: {symbol}.{venue} - e.g., SPY.ARCA, ETHUSDT-PERP.BINANCE,
AAPL.XNAS. Rules from nautilus-concepts.md:
- “All native symbols should be unique for a venue.”
- “The {symbol.venue} combination must be unique for a Nautilus system.”
This is the load-bearing identity for every data subscription, every cache
lookup, every order. Cortana’s IBKR symbology choice (IB_SIMPLIFIED vs
IB_RAW) lives here.
Bars and aggregation
A Bar carries OHLCV plus a BarType that uniquely identifies what kind of
aggregation produced it.
BarType string syntax
Standard: {instrument_id}-{step}-{aggregation}-{price_type}-{INTERNAL|EXTERNAL}
bar_type = BarType.from_str("AAPL.XNAS-5-MINUTE-LAST-INTERNAL")
Composite (bar-to-bar):
{instrument_id}-{step}-{aggregation}-{price_type}-INTERNAL@{step}-{aggregation}-{INTERNAL|EXTERNAL}
# 5-min bars derived from externally-provided 1-min bars
bar_type = BarType.from_str("AAPL.XNAS-5-MINUTE-LAST-INTERNAL@1-MINUTE-EXTERNAL")
The target (left of @) is always INTERNAL. The source (right of @) can be
either. When @ is absent, the bar aggregates from TradeTick (when
price_type=LAST) or QuoteTick (when BID/ASK/MID).
Aggregation methods
| Category | Methods |
|---|---|
| Time | MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, YEAR |
| Threshold | TICK, VOLUME, VALUE, RENKO, plus imbalance variants |
| Information-driven (imbalance) | TICK_IMBALANCE, VOLUME_IMBALANCE, VALUE_IMBALANCE |
| Information-driven (runs) | TICK_RUNS, VOLUME_RUNS, VALUE_RUNS |
Imbalance bars close when net buy/sell activity reaches threshold. Runs bars
close on sustained one-sided pressure (counter resets on aggressor flip).
Information-driven bars require TradeTick - they need aggressor_side.
Time bar configuration
Driven by DataEngineConfig:
| Option | Default | Effect |
|---|---|---|
time_bars_interval_type | "left-open" | Start excluded, end included (or vice versa). |
time_bars_timestamp_on_close | True | ts_event = bar close (or open if False). |
time_bars_skip_first_non_full_bar | False | Skip mid-interval first bar. |
time_bars_build_with_no_updates | True | Emit empty bars during quiet periods. |
time_bars_origin_offset | None | Map BarAggregation → Timedelta (e.g., align to 09:30 open). |
time_bars_build_delay | 0 | Microsecond delay before building (backtest determinism). |
request_bars() vs subscribe_bars()
This is the dual-surface pattern - the same idea applies to every data type:
def on_start(self) -> None:
bar_type = BarType.from_str("SPY.ARCA-1-MINUTE-LAST-INTERNAL")
start = self.clock.utc_now() - timedelta(days=30)
# Register indicators FIRST so they receive historical updates too
self.register_indicator_for_bars(bar_type, self.ema)
# Historical fetch → on_historical_data(...) handler
self.request_bars(bar_type, start=start)
# Live feed → on_bar(...) handler
self.subscribe_bars(bar_type)Common pitfall: register indicators before requesting historical data. Reversed order means indicators never see history.
For composite bars use request_aggregated_bars([bar_type], start=start) -
this builds the dependency-ordered aggregation chain on the fly.
Timestamps - ts_event and ts_init
Every Data subclass - built-in or custom - carries two UNIX-nanosecond
timestamps:
ts_event- when the event actually occurred (at the exchange / source).ts_init- when Nautilus initialized the internal object representing that event.
| Event | ts_event | ts_init |
|---|---|---|
TradeTick | Trade time at exchange | Receipt time |
QuoteTick | Quote time at exchange | Receipt time |
OrderBookDelta | Book update at exchange | Receipt time |
Bar | Bar close time | Generation time (internal) or receipt (external) |
OrderFilled | Fill time at exchange | Local processing time |
| Custom event | When conditions occurred | Object creation time |
Latency analysis
latency = ts_init - ts_event gives total system latency (network + processing
- queueing). Clocks aren’t synchronized so the difference is not strictly positive - it’s an estimate.
Backtest vs live ordering
- Backtest: data is ordered by
ts_initusing a stable sort. Deterministic replay. This is the property MK3 needs fordecisions.dbreplay. - Live: data is processed as it arrives.
ts_eventreflects external time;ts_initreflects local creation. Difference detects network/processing delays.
For internally-created data (e.g., a custom ScoringEvent produced inside the
runtime), ts_event and ts_init may legitimately be equal.
Data flow - DataEngine → Cache → MessageBus
From the DataEngine onward, the path is identical across backtest, sandbox, and live:
- Source - adapter creates a normalized
Dataobject (live), or theBacktestEnginefeeds it directly (backtest). - DataEngine - receives the object, validates type, applies any buffering
(e.g., book deltas accumulating until
F_LAST). - Cache - engine writes to cache first. This is the cache-then-publish invariant: the latest value is in cache before any subscriber handler runs.
- MessageBus - engine then publishes on the topic derived from the
data’s identifier. Subscribers receive it via
on_quote_tick,on_bar,on_data(custom), etc.
This guarantee is what makes handlers safe - self.cache.quote_tick(id)
inside on_quote_tick(tick) returns the very tick that triggered the handler.
No race window.
Loading data - DataLoader + DataWrangler
Nautilus splits external-data ingest into two pipeline stages:
DataLoader- venue-specific. Reads raw CSV/JSON/binary into apd.DataFramewith the right schema for a Nautilus type. One per raw-source × format combo (e.g.,BinanceOrderBookDeltaDataLoader).DataWrangler- type-specific. Takes thepd.DataFrameand emits alist[Data]of Nautilus objects. Lives undernautilus_trader.persistence.wranglers.
Common v1 wranglers:
OrderBookDeltaDataWranglerQuoteTickDataWranglerTradeTickDataWranglerBarDataWrangler
Arrow v2 / PyO3: OrderBookDepth10DataWranglerV2 and friends emit PyO3
objects (not compatible with v1 Cython BacktestEngine consumers).
Example pipeline:
from nautilus_trader.adapters.binance.loaders import BinanceOrderBookDeltaDataLoader
from nautilus_trader.persistence.wranglers import OrderBookDeltaDataWrangler
from nautilus_trader.test_kit.providers import TestInstrumentProvider
df = BinanceOrderBookDeltaDataLoader.load(data_path)
instrument = TestInstrumentProvider.btcusdt_binance()
wrangler = OrderBookDeltaDataWrangler(instrument)
deltas = wrangler.process(df)
# deltas is now list[OrderBookDelta] - feed to BacktestEngine or write to catalogFixed-point precision and raw values
Price and Quantity are fixed-point internally. from_raw() must receive a
valid multiple of the scale factor (10^(FIXED_PRECISION - precision) where
FIXED_PRECISION is 9 standard / 16 high-precision). Invalid raw values
panic the runtime.
The Arrow decode path auto-corrects floating-point drift like
int(0.67068 * 1e9) == 670680000000001 by rounding to the nearest valid
multiple - so old catalogs work without migration. Small per-decode overhead.
ParquetDataCatalog - the historical store
The catalog is Nautilus’s canonical persistent data store. This is where Cortana MK3’s backtest replay data lives.
Architecture
- Rust backend - high-perf for
OrderBookDelta,OrderBookDeltas,OrderBookDepth10,QuoteTick,TradeTick,Bar,MarkPriceUpdate. Plus registered same-binary Rust custom types. - PyArrow backend - flexible fallback for custom Python data types and advanced filtering.
- fsspec integration - local FS, S3, GCS, Azure Blob (
abfs/az).
Initialization
from pathlib import Path
from nautilus_trader.persistence.catalog import ParquetDataCatalog
catalog = ParquetDataCatalog(Path.cwd() / "catalog")
# Or: ParquetDataCatalog.from_env() → uses NAUTILUS_PATH env var
# Or: ParquetDataCatalog.from_uri("s3://bucket/nautilus-data/")NAUTILUS_PATH points to the parent directory; the catalog appends
/catalog automatically.
File organization
catalog/
├── data/
│ ├── quote_ticks/
│ │ └── EURUSD.SIM/
│ │ └── 2024-01-01T00-00-00-000000000Z_2024-01-01T23-59-59-999999999Z.parquet
│ └── trade_ticks/
│ └── BTCUSD.BINANCE/
│ └── ...
Files named {start_timestamp}_{end_timestamp}.parquet (ISO 8601 with
filename-safe : and . replaced by -).
Writing
catalog.write_data(quote_ticks)
catalog.write_data(trade_ticks, start=ns_start, end=ns_end)
catalog.write_data(bars, skip_disjoint_check=True) # Allow overlappingBy default overlapping writes raise ValueError. Use skip_disjoint_check=True
when intentional.
Reading
quotes = catalog.query(
data_cls=QuoteTick,
identifiers=["EUR/USD.SIM"],
start="2024-01-01T00:00:00Z",
end="2024-01-02T00:00:00Z",
)Time formats: ISO 8601 string, UNIX nanoseconds, pd.Timestamp, datetime.
Filtering:
where=- DataFusion SQL predicate (Rust backend only).filter_expr=- PyArrow dataset expression (PyArrow backend orfiles=forced-path).
Operations
reset_all_file_names()/reset_data_file_names(QuoteTick, "BTC.BINANCE")- realign filenames with content timestamps.consolidate_catalog(start=, end=, ensure_contiguous_files=True)- merge small files for query perf.consolidate_catalog_by_period(period=pd.Timedelta(days=1))- standardize file size by time period.delete_catalog_range(start=, end=)/delete_data_range(...)- permanent delete; partial-overlap files are split to preserve out-of-range data.
BacktestDataConfig - pre-specifying data for a backtest
from nautilus_trader.config import BacktestDataConfig
from nautilus_trader.model import QuoteTick, InstrumentId
data_config = BacktestDataConfig(
catalog_path="/path/to/catalog",
data_cls=QuoteTick,
instrument_id=InstrumentId.from_str("EUR/USD.SIM"),
start_time="2024-01-01T00:00:00Z",
end_time="2024-01-02T00:00:00Z",
)For bar data: bar_spec="5-MINUTE-LAST" builds the EXTERNAL bar identifier
with the instrument_id. Use bar_types=[...] for explicit INTERNAL or
composite types.
For custom data: pass the class via data_cls (or string),
client_id="MyAdapter", and metadata={...} to filter.
DataCatalogConfig - on-the-fly access during live trading
from nautilus_trader.persistence.config import DataCatalogConfig
catalog_config = DataCatalogConfig(
path="/data/nautilus/catalog",
fs_protocol="file",
name="historical_data",
)
node_config = TradingNodeConfig(
catalogs=[catalog_config], # Enables historical access from live strategies
...
)StreamingConfig - write live data to catalog as it arrives
from nautilus_trader.persistence.config import StreamingConfig, RotationMode
import pandas as pd
streaming_config = StreamingConfig(
catalog_path="/path/to/streaming/catalog",
flush_interval_ms=1000,
rotation_mode=RotationMode.INTERVAL,
rotation_interval=pd.Timedelta(hours=1),
max_file_size=1024 * 1024 * 100, # 100 MB
)Streams via Feather files first, then catalog.convert_stream_to_data(...)
promotes to permanent Parquet.
Custom data types - the load-bearing extensibility point
Anything that is not in the built-in list must be a Data subclass. This
is how Cortana MK3 expresses UW flow alerts, scoring events, meta-prob
outputs, and EMA-decay flow values - they all become custom data types riding
the same engine/cache/bus as native types.
The Data contract
Two required properties: ts_event (int) and ts_init (int), both UNIX
nanoseconds. Backing fields + @property is the recommended pattern. Data
holds no state - super().__init__() is not strictly required.
Manual definition (full control)
from nautilus_trader.core import Data
class MyDataPoint(Data):
def __init__(
self,
label: str,
x: int, y: int, z: int,
ts_event: int,
ts_init: int,
) -> None:
self.label = label
self.x = x
self.y = y
self.z = z
self._ts_event = ts_event
self._ts_init = ts_init
@property
def ts_event(self) -> int:
return self._ts_event
@property
def ts_init(self) -> int:
return self._ts_initPublishing
from nautilus_trader.model import DataType
self.publish_data(
DataType(MyDataPoint, metadata={"category": "scoring"}),
MyDataPoint(...),
)The metadata dict is part of the bus topic - different metadata = different
topic. Subscribers must pass matching metadata.
Subscribing
from nautilus_trader.model.identifiers import ClientId
self.subscribe_data(
data_type=DataType(MyDataPoint, metadata={"category": "scoring"}),
client_id=ClientId("MY_ADAPTER"),
)
def on_data(self, data: Data) -> None:
if isinstance(data, MyDataPoint):
# Handle it
...on_data is the catch-all handler for custom data - type-check inside.
Signals - lightweight primitives
For one-value notifications (str/float/int/bool/bytes):
self.publish_signal("score_alert", 67.5, ts_event)
self.subscribe_signal("score_alert")
def on_signal(self, signal):
print("Signal", signal)@customdataclass decorator - automatic boilerplate
For full-featured custom data with serialization/cache/catalog support:
from nautilus_trader.model.custom import customdataclass
from nautilus_trader.core import Data
from nautilus_trader.model import InstrumentId
@customdataclass
class GreeksTestData(Data):
instrument_id: InstrumentId = InstrumentId.from_str("ES.GLBX")
delta: float = 0.0
GreeksTestData(
instrument_id=InstrumentId.from_str("CL.GLBX"),
delta=1000.0,
ts_event=1,
ts_init=2,
)This auto-generates to_dict/from_dict/to_bytes/from_bytes/schema/
to_catalog/from_catalog - everything needed for bus serialization, cache
storage, and Parquet persistence.
For the Rust-backed PyO3 catalog, use @customdataclass_pyo3() and call
register_custom_data_class(MyType) once at startup.
Manual full-featured implementation (option-greeks pattern)
When you need control over schema/serialization (the Greeks pattern is the template):
import msgspec
import pyarrow as pa
from nautilus_trader.core import Data
from nautilus_trader.model import DataType, InstrumentId
from nautilus_trader.serialization.base import register_serializable_type
from nautilus_trader.serialization.arrow.serializer import register_arrow
class GreeksData(Data):
def __init__(self, instrument_id, ts_event, ts_init, delta):
self.instrument_id = instrument_id
self._ts_event = ts_event
self._ts_init = ts_init
self.delta = delta
@property
def ts_event(self): return self._ts_event
@property
def ts_init(self): return self._ts_init
def to_dict(self):
return {
"instrument_id": self.instrument_id.value,
"ts_event": self._ts_event,
"ts_init": self._ts_init,
"delta": self.delta,
}
@classmethod
def from_dict(cls, data):
return GreeksData(InstrumentId.from_str(data["instrument_id"]),
data["ts_event"], data["ts_init"], data["delta"])
def to_bytes(self): return msgspec.msgpack.encode(self.to_dict())
@classmethod
def from_bytes(cls, data): return cls.from_dict(msgspec.msgpack.decode(data))
def to_catalog(self):
return pa.RecordBatch.from_pylist([self.to_dict()], schema=cls.schema())
@classmethod
def from_catalog(cls, table):
return [cls.from_dict(d) for d in table.to_pylist()]
@classmethod
def schema(cls):
return pa.schema({
"instrument_id": pa.string(),
"ts_event": pa.int64(),
"ts_init": pa.int64(),
"delta": pa.float64(),
})
# Register for bus/cache serialization
register_serializable_type(GreeksData, GreeksData.to_dict, GreeksData.from_dict)
# Register for catalog Parquet read/write
register_arrow(GreeksData, GreeksData.schema(), GreeksData.to_catalog, GreeksData.from_catalog)Cache writes (raw bytes by key)
def greeks_key(instrument_id):
return f"{instrument_id}_GREEKS"
def cache_greeks(self, greeks_data):
self.cache.add(greeks_key(greeks_data.instrument_id), greeks_data.to_bytes())
def greeks_from_cache(self, instrument_id):
return GreeksData.from_bytes(self.cache.get(greeks_key(instrument_id)))Catalog writes
catalog = ParquetDataCatalog('.')
catalog.write_data([GreeksData(...)]) # works once register_arrow has run.pyi stubs for IDE support
When using @customdataclass, the constructor is generated at runtime - IDEs
can’t see it. Hand-write a .pyi stub for autocomplete.
Custom data - how to define UWFlowAlert and ScoringEvent
These are the two highest-priority custom types for Cortana MK3. Both should
use @customdataclass (Python-side) since the Rust path isn’t yet justified
for spike-level work - the per-event cost is dominated by ML inference, not
serialization.
UWFlowAlert - UW WebSocket flow alert
Sub-second alerts from UW WebSocket. Primary signal source for the engine. Cortana cares about strike/side/size/premium/aggressor/sweep-or-block.
from nautilus_trader.model.custom import customdataclass
from nautilus_trader.core import Data
from nautilus_trader.model import InstrumentId
@customdataclass
class UWFlowAlert(Data):
instrument_id: InstrumentId = InstrumentId.from_str("SPY.ARCA")
strike: float = 0.0
expiry: str = "" # ISO date
option_side: str = "" # "CALL" | "PUT"
aggressor_side: str = "" # "BUY" | "SELL" | "MIXED"
premium_usd: float = 0.0
size_contracts: int = 0
is_sweep: bool = False
is_block: bool = False
flow_score: float = 0.0 # UW's own score, 0-100
underlying_price: float = 0.0
raw_id: str = "" # UW alert ID for dedup
# Publish from UW DataClient
self.publish_data(
DataType(UWFlowAlert, metadata={"underlying": "SPY"}),
UWFlowAlert(ts_event=ws_event_ns, ts_init=now_ns, ...),
)
# Subscribe in scoring actor
self.subscribe_data(
data_type=DataType(UWFlowAlert, metadata={"underlying": "SPY"}),
client_id=ClientId("UW"),
)
def on_data(self, data: Data) -> None:
if isinstance(data, UWFlowAlert):
self._update_flow_state(data)The UW DataClient (custom adapter - see nautilus-developer-guide.md)
opens the WebSocket in connect(), decodes each frame to a UWFlowAlert, and
calls self._handle_data(alert) to push it through DataEngine → Cache →
MessageBus.
ScoringEvent - composite-score snapshot
Output of the scoring engine. 78 features + composite + bias + conviction. Published every time the engine re-scores. This is the event the strategy gates on.
@customdataclass
class ScoringEvent(Data):
instrument_id: InstrumentId = InstrumentId.from_str("SPY.ARCA")
composite_score: float = 0.0 # 0-100
bias: str = "" # "BULL" | "BEAR" | "NEUTRAL"
conviction: str = "" # "LOW" | "MEDIUM" | "HIGH"
# 78 features as a Dict[str, float] would be cleaner but @customdataclass
# prefers flat scalars for Arrow schema simplicity. Pick the top-N or
# serialize the full vector as a JSON string field for the spike.
features_json: str = "{}"
meta_prob: float = 0.0 # secondary classifier output, 0-1
meta_prob_finite: bool = True # gate flag
impulse_engine_score: float = 0.0
contradiction_flag: bool = False
ema_decay_value: float = 0.0
spy_price_at_score: float = 0.0 # snapshot for replayPublished by the ScoringActor, subscribed by CortanaStrategy. The
strategy’s on_data checks composite_score >= 65 and bias == "BULL" etc.
Other custom types Cortana needs
| Cortana concept | Custom type sketch |
|---|---|
| Meta-model probability output | Either an attribute on ScoringEvent (current sketch) or a separate MetaProbEvent if it updates independently |
| EMA-decay flow value | Either an attribute on ScoringEvent or EmaDecayUpdate if published at a different cadence |
| UW REST option-chain snapshot | OptionChainSnapshot(Data) - large; consider whether it goes through the bus or just into cache |
| Brain-page write trigger | TradeOutcomeEvent published on_position_closed - out-of-band actor logs to ~/brain |
Cortana MK3 implications
This is where the abstract maps to today’s stack.
Input → Nautilus type map
| Cortana input | Nautilus expression |
|---|---|
| UW WebSocket flow alerts (sub-second, primary signal) | Custom UWFlowAlert(Data) published by a custom UW DataClient. Crate skeleton in nautilus-developer-guide.md Phase 1-7. |
| UW REST option-chain snapshots (on-demand) | Custom OptionChainSnapshot(Data) returned from UWDataClient._on_request(); cached by key. |
| IBKR top-of-book quotes (price of record) | Built-in QuoteTick from the shipped IBKR adapter. No translation work. |
| IBKR 5s and 1m bars | Built-in Bar with BarType SPY.ARCA-5-SECOND-LAST-EXTERNAL / SPY.ARCA-1-MINUTE-LAST-EXTERNAL from IBKR adapter. |
| IBKR option-chain greeks | Built-in OptionGreeks event from IBKR adapter (or compute locally if not surfaced). |
Output → Nautilus type map
| Cortana output | Nautilus expression |
|---|---|
| Scoring events (78 features + composite) | Custom ScoringEvent(Data) published by a ScoringActor. Subscribers: strategy, dashboard, brain-logger. |
| Meta-model probabilities | Field on ScoringEvent (or separate MetaProbEvent if cadence differs). |
| EMA-decay flow values | Field on ScoringEvent (or EmaDecayUpdate(Data) if it updates independently). |
| Trade outcomes (for brain pages) | Strategy on_position_closed triggers TradeOutcomeEvent consumed by an out-of-band brain-logger actor. |
Where the catalog lives for backtest replay
Single canonical location. Suggested layout:
~/cortana-data/
catalog/ # The ParquetDataCatalog root
data/
quote_ticks/
SPY.ARCA/
{ts_start}_{ts_end}.parquet # IBKR quote ticks
bars/
SPY.ARCA-1-MINUTE-LAST-EXTERNAL/
...
uw_flow_alert/ # custom type (PyArrow backend)
SPY/
...
scoring_event/ # custom type (PyArrow backend)
SPY/
...
Set NAUTILUS_PATH=~/cortana-data and use
ParquetDataCatalog.from_env() everywhere. Multi-tenant note (per spike Step
7.5): each tenant gets its own NAUTILUS_PATH (e.g.,
~/cortana-data/tenants/{tenant_id}/), or use S3 with per-tenant key prefix.
Replaying today’s decisions.db rows as Nautilus events
This is one of the spike’s hard questions (Step 6). Two paths:
Path A - manual migration via DataLoader/DataWrangler. Write a one-off
CortanaDecisionsDbLoader that reads decisions.db.scoring_events and emits
pd.DataFrame with ScoringEvent schema. Wrangle to list[ScoringEvent].
Write to catalog via catalog.write_data(events). Replay from catalog through
BacktestDataConfig(data_cls=ScoringEvent, client_id="CORTANA").
# Sketch
import sqlite3, pandas as pd
from nautilus_trader.persistence.catalog import ParquetDataCatalog
def replay_decisions_db(db_path: str, catalog: ParquetDataCatalog):
conn = sqlite3.connect(db_path)
df = pd.read_sql("SELECT * FROM scoring_events ORDER BY ts_event", conn)
events = [
ScoringEvent(
instrument_id=InstrumentId.from_str("SPY.ARCA"),
ts_event=row.ts_event_ns,
ts_init=row.ts_init_ns,
composite_score=row.composite_score,
bias=row.bias,
...
)
for row in df.itertuples()
]
catalog.write_data(events)Path B - synthetic stream replay. Keep decisions.db authoritative; build a
DecisionsDbDataClient(LiveDataClient) that reads rows from the SQLite file
and emits them as _handle_data(event) calls in time order. This avoids
re-encoding history in Parquet but is brittle and only works if backtest mode
can route through a “fake live” data client.
Recommended for spike: Path A. It’s the pattern Nautilus is designed for (catalog-as-source-of-truth) and produces a reusable artifact.
Open spike questions (data-specific)
- Sub-second
ts_eventprecision. UW gives millisecond resolution. Nautilus expects nanoseconds. Multiply by 1e6. Check that the DataEngine doesn’t reorder events that share ats_initafter stable-sort - most likely fine, but spike-time confirm. ScoringEventschema vs Arrow. Flat scalars are easiest for Arrow schemas. The 78-feature vector is the awkward part. Three options: (a) serialize as JSON string in one column (loses query-ability but easy); (b) split into 78 Arrow columns (clean but rigid); (c) use ArrowMaptype (flexible but PyArrow-backend only). Decide during the spike.OptionChainSnapshotsize. A full SPY 0DTE chain is large per snapshot (~50KB pickled). Probably don’t publish on the bus every refresh - write to cache by key (f"{instrument_id}_CHAIN_{ts}") and have actors pull on demand.
Quick reference - import checklist
# Built-in types
from nautilus_trader.model import (
QuoteTick, TradeTick, Bar, BarType, BarSpecification,
OrderBookDelta, OrderBookDeltas, OrderBookDepth10,
InstrumentId, DataType,
)
# Custom data
from nautilus_trader.core import Data
from nautilus_trader.model.custom import customdataclass
from nautilus_trader.serialization.base import register_serializable_type
from nautilus_trader.serialization.arrow.serializer import register_arrow
# Catalog
from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.persistence.config import (
DataCatalogConfig, StreamingConfig, RotationMode,
)
# Wranglers
from nautilus_trader.persistence.wranglers import (
OrderBookDeltaDataWrangler, QuoteTickDataWrangler,
TradeTickDataWrangler, BarDataWrangler,
)
# Backtest config
from nautilus_trader.config import BacktestDataConfig, BacktestRunConfigAnti-patterns to avoid
- Manual
ts_event/ts_initplumbing in strategy code. All ordering flows from these two fields. Don’t second-guess the DataEngine’s stable sort. - Mutating a published
Dataobject. nautilus-concepts.md is explicit: messages are immutable post-publish. Derive new local state if you need a different shape. - Writing to cache yourself before letting the engine see the data. Use
_handle_data(event)from inside aDataClientso the engine’s cache-then-publish invariant holds. Don’tcache.add()thenpublish_data()separately from a strategy - that breaks the ordering guarantee. - Publishing custom data without registering serializable/Arrow handlers.
If you skip
register_serializable_typeyou can’t survive a Redis bus hop or restart-replay. If you skipregister_arrowyou can’t write to the catalog. - Putting the 78-feature vector in 78 separate cache writes. One
ScoringEventper score, not 78 partial updates. - Using float
time.time()anywhere.self.clock.utc_now()orself.clock.timestamp_ns()always - the same code must run in backtest (simulated clock) and live (wall clock).
See Also
- Nautilus Concepts (architecture reference) - DataEngine, Cache, MessageBus, ParquetDataCatalog
- Nautilus Developer Guide (extension + contribution) - Custom adapter Phase 1-7, Cortana → Nautilus translation table
- Nautilus Integrations (IBKR-focused) - IBKR adapter (built-in QuoteTick/Bar/OptionGreeks for Cortana), UW custom adapter sketch
- Spike plan:
~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md - Cortana MK2 data sources today:
cortanaroi/data/uw_*.py,cortanaroi/data/ibkr_*.py,cortanaroi/db/decisions.py
Timeline
- 2026-05-07 | Cody - Filed during pre-spike concept mastery sweep.