Nautilus Cache
The Nautilus
Cacheis a single, central, in-memory database that holds every piece of trading-related state a node needs in flight: built-in market data (quotes, trades, bars, books), the full order/position/account/ instrument graph, and arbitrary user-defined objects shared across strategies. The DataEngine and ExecutionEngine are the only writers - the framework guarantees a cache-then-publish ordering for quotes/trades/ bars so any subscriber handler can read the very value that triggered it. Strategies and Actors read viaself.cache.*, never write trading state directly. Persistence is configurable per node: in-memory only (default, with capacity eviction), or backed by Redis viaDatabaseConfigfor crash-only restart, multi-process state sharing, and out-of-band consumers (dashboard, brain logger). For Cortana MK3 this collapses every hand-rolled cache (in-memory dicts, SQLite tables, Pickle blobs,lru_cache) into one runtime object with one ordering invariant - and directly addresses the failure modes fromproject_data_loss_april22and the 2026-05-06 power-outage state divergence.
This page specializes Nautilus Concepts (which
covers the Cache at the architectural level, alongside MessageBus,
DataEngine, ExecutionEngine, RiskEngine) and is the parallel of
Nautilus Architecture (which covers
cache-then-publish ordering at high level). For data-specific patterns
(custom Data subclasses, ParquetDataCatalog, ts_event/ts_init)
see Nautilus Data. For who reads the cache (Strategy
queries vs Actor queries, identical surface) see
Nautilus Strategies and
Nautilus Actors.
Why this page exists
nautilus-architecture.md describes cache-then-publish at the
topology level. nautilus-data.md covers what lands in cache. This
page is the API surface and operational reference: every read method,
every write trigger, every persistence option, every ordering guarantee,
and the multi-tenant question for the SaaS roadmap. It’s the
saturation-level brain page Cody will keep open during Step 4-7 of the
2026-05-09 spike.
Core claim
“The Cache is a central in-memory database that stores and manages all trading-related data, from market data to order history to custom calculations.”
One object. One read API. One write discipline. Three categories of content (market data, trading records, custom objects). Optional Redis externalization. Single-threaded write ordering with cache-then-publish for the data-stream path.
This is the single biggest architectural delta from MK2. Cortana today
has at least seven state stores (in-memory dicts in app.py, SQLite
in cortanaroi/db/decisions.py, Pickle blobs for cooldown, a separate
Pickle for impulse history, an lru_cache on UW REST, IBKR’s
internal cache, and the dashboard’s redundant SQL view). MK3 collapses
all of them into one Cache instance with one consistent contract.
Scope - what the Cache stores
Three top-level categories:
1. Market data
“Stores recent market history (e.g., order books, quotes, trades, bars). Gives you access to both current and historical market data for your strategy.”
Specifically:
| Data type | Cache method (read) | Capacity-bounded? |
|---|---|---|
OrderBook (live state) | order_book(instrument_id) | per-instrument |
QuoteTick history | quote_ticks(instrument_id), quote_tick(id, index=0) | yes (tick_capacity, default 10,000) |
TradeTick history | trade_ticks(instrument_id), trade_tick(id, index=0) | yes (tick_capacity, default 10,000) |
Bar history | bars(bar_type), bar(bar_type, index=0) | yes (bar_capacity, default 10,000 per bar type) |
| Latest price | `price(instrument_id, price_type=BID | ASK |
Quote: “Each bar type maintains its own separate capacity. For example, if you’re using both 1-minute and 5-minute bars, each stores up to bar_capacity bars.”
Eviction: “When bar_capacity is reached, the Cache automatically
removes the oldest data.” No warning logged. If you need full history,
write to ParquetDataCatalog (see nautilus-data.md)
or raise the cap explicitly via CacheConfig.
Reverse indexing convention (load-bearing): “All market data in the
cache uses reverse indexing, so the most recent entry sits at index 0.”
self.cache.bar(bar_type) (no index) is bar(..., index=0) is the
most recent bar.
2. Trading records
“Maintains complete Order history and current execution state. Tracks all Positions and Account information. Stores Instrument definitions and Currency information.”
These are unbounded by default - no LRU eviction on orders/positions because the lifecycle is finite per-instance.
| Record | Cache method |
|---|---|
| Order (any state) | order(client_order_id) |
| Order collections | orders(), orders(venue=...), orders(strategy_id=...), orders(instrument_id=...) |
| Order state filters | orders_open(), orders_closed(), orders_emulated(), orders_inflight(), orders_active_local() |
| Position | position(position_id) |
| Position collections | positions(), positions(venue=...), positions(side=PositionSide.LONG), etc. |
| Account | account(account_id), account_for_venue(venue), account_id(venue) |
| Instrument | instrument(instrument_id), instruments(venue=...), instruments(underlying="ES"), instrument_ids(...) |
3. Custom user data
“You can store any user-defined objects or data in the Cache for later use. Enables data sharing between different strategies.”
Two surfaces:
Raw bytes-by-key (low-level, fast):
self.cache.add(key="my_key", value=b"some binary data")
stored = self.cache.get("my_key") # bytes | NoneCustom Data subclasses (preferred for any structured object):
flow through DataEngine if published as Data, get cached automatically
by type+identifier. See nautilus-data.md for the
@customdataclass pattern. The Greeks-style example uses
add(...) / get(...) with a key-naming convention:
def greeks_key(instrument_id):
return f"{instrument_id}_GREEKS"
def cache_greeks(self, greeks_data):
self.cache.add(greeks_key(greeks_data.instrument_id),
greeks_data.to_bytes())
def greeks_from_cache(self, instrument_id):
raw = self.cache.get(greeks_key(instrument_id))
return GreeksData.from_bytes(raw) if raw else NoneHard caveat from the doc: “The Cache is not designed to be a full database replacement. For large datasets or complex querying needs, consider using a dedicated database system.”
For MK3: cache is the runtime hot store. Historical bulk lives in
ParquetDataCatalog. The brain (~/brain) and any SQL audit table are
out-of-band sinks - not Cache substitutes.
Read API - full surface exposed to Strategy / Actor
self.cache is the same object on Actors and Strategies. The full read
surface (collected from the doc’s API listing):
Market data reads
# Bars
self.cache.bars(bar_type) # list[Bar], reverse-indexed
self.cache.bar(bar_type) # Bar | None (latest)
self.cache.bar(bar_type, index=1) # Bar | None (second-latest)
self.cache.bar_count(bar_type) # int
self.cache.has_bars(bar_type) # bool
# Quotes
self.cache.quote_ticks(instrument_id)
self.cache.quote_tick(instrument_id)
self.cache.quote_tick(instrument_id, index=1)
self.cache.quote_tick_count(instrument_id)
self.cache.has_quote_ticks(instrument_id)
# Trades
self.cache.trade_ticks(instrument_id)
self.cache.trade_tick(instrument_id)
self.cache.trade_tick(instrument_id, index=1)
self.cache.trade_tick_count(instrument_id)
self.cache.has_trade_ticks(instrument_id)
# Books
self.cache.order_book(instrument_id) # OrderBook | None
self.cache.has_order_book(instrument_id)
self.cache.book_update_count(instrument_id)
# Prices (derived; price-type aware)
self.cache.price(instrument_id, price_type=PriceType.MID)
# Bar types catalog
self.cache.bar_types(
instrument_id=instrument_id,
price_type=PriceType.LAST,
aggregation_source=AggregationSource.EXTERNAL,
)Order reads
# Identity
self.cache.order(client_order_id) # Order | None
self.cache.order_exists(client_order_id) # bool
# Collections
self.cache.orders()
self.cache.orders(venue=venue)
self.cache.orders(strategy_id=strategy_id)
self.cache.orders(instrument_id=instrument_id)
# State filters
self.cache.orders_open()
self.cache.orders_closed()
self.cache.orders_emulated()
self.cache.orders_inflight()
self.cache.orders_active_local()
# State predicates
self.cache.is_order_open(client_order_id)
self.cache.is_order_closed(client_order_id)
self.cache.is_order_emulated(client_order_id)
self.cache.is_order_inflight(client_order_id)
self.cache.is_order_active_local(client_order_id)
# Counts
self.cache.orders_open_count()
self.cache.orders_closed_count()
self.cache.orders_emulated_count()
self.cache.orders_inflight_count()
self.cache.orders_active_local_count()
self.cache.orders_total_count()
self.cache.orders_open_count(side=OrderSide.BUY)
self.cache.orders_total_count(venue=venue)Position reads
# Identity
self.cache.position(position_id) # Position | None
self.cache.position_exists(position_id)
# Collections
self.cache.positions()
self.cache.positions_open()
self.cache.positions_closed()
self.cache.positions(venue=venue)
self.cache.positions(instrument_id=instrument_id)
self.cache.positions(strategy_id=strategy_id)
self.cache.positions(side=PositionSide.LONG)
# State predicates
self.cache.is_position_open(position_id)
self.cache.is_position_closed(position_id)
# Cross-references
self.cache.orders_for_position(position_id) # list[Order]
self.cache.position_for_order(client_order_id) # Position | None
# Counts
self.cache.positions_open_count()
self.cache.positions_closed_count()
self.cache.positions_total_count()
self.cache.positions_open_count(side=PositionSide.LONG)
self.cache.positions_total_count(instrument_id=instrument_id)Account & instrument reads
self.cache.account(account_id) # Account | None
self.cache.account_for_venue(venue)
self.cache.account_id(venue)
self.cache.instrument(instrument_id)
self.cache.instruments()
self.cache.instruments(venue=venue)
self.cache.instruments(underlying="ES")
self.cache.instrument_ids()
self.cache.instrument_ids(venue=venue)Custom data reads
self.cache.get(key) # bytes | None(For typed custom data published as Data subclasses, the platform
handles cache by type+id automatically - see nautilus-data.md.)
Universal return shape
All point queries return None when missing. All collection queries
return empty lists when nothing matches. There are no “default values” -
the absence of data is always observable. This matters because the
2026-05-06 incident was partly rooted in stale fallbacks: callers got
old values when fresh ones should have been None.
Write semantics - who writes, when, in what order
The cache has two writers and one writer discipline.
Writers
- DataEngine - writes market data (
QuoteTick,TradeTick,Bar,OrderBookstate viaBookUpdater, customData). - ExecutionEngine - writes orders, positions, accounts, fills.
That’s it. Strategies do not write trading state to the cache.
Strategies submit orders via self.submit_order(...); the
ExecutionEngine writes the resulting Order and Position records.
Actors do not write at all on the trading-state side. For custom
user data, self.cache.add(key, bytes) is permitted but should be used
sparingly - typed custom Data subclasses are preferred because they
ride the DataEngine’s ordering guarantees.
Cache-then-publish invariant (the load-bearing rule)
The Nautilus contract, verbatim:
“The DataEngine writes to the Cache before publishing to subscribers, so the latest value is available in the cache by the time your handler runs.”
And from the architecture page’s Life of a Quote Tick:
“Step 4: Cache stores the quote.
handle_quotewrites the quote into the Cache viacache.add_quote(quote), making it available to any component throughself.cache.quote_tick(instrument_id).”“Step 5: MessageBus publishes.”
The mechanical guarantee: inside on_quote_tick(tick),
self.cache.quote_tick(instrument_id) returns the same tick that
triggered the handler. There is no race window. The single-threaded
kernel + cache-then-publish is what makes this true by construction -
not by careful coding.
This applies to QuoteTick, TradeTick, Bar (and custom Data
that flows through the DataEngine).
Order book exception
The doc is explicit:
“Order book deltas and depth snapshots are published directly without a cache write; book state is maintained separately through
BookUpdatersubscriptions.”
Reason: book state is incremental and high-frequency. The BookUpdater
maintains the book object itself (subscribers query it directly via
self.cache.order_book(instrument_id)); the deltas don’t get written
to the cache as historical records. Implication: if you need a
historical book delta stream, persist via ParquetDataCatalog, not
the Cache.
Live caveat - async write delay
“In live contexts, the engine applies updates asynchronously, so you might see a brief delay between an event and its appearance in the Cache.”
This is the qualifier on cache-then-publish: for events that flow
through the DataEngine’s main path (quotes/trades/bars), ordering holds.
For events that arrive through other paths - e.g., an account snapshot
update arriving while you’re handling a quote - the cache may show a
brief lag. Strategy code should never assume cross-path synchrony:
if you’re in on_quote_tick and you need the most recent
OrderFilled, query the cache; do not assume in-process ordering
between distinct event streams.
Reads inside handlers - the safe pattern
Because of cache-then-publish, this is always correct:
def on_quote_tick(self, tick: QuoteTick) -> None:
# The very tick that triggered this handler is already in cache.
latest = self.cache.quote_tick(tick.instrument_id)
assert latest is tick or latest == tick
# Plus, prior history is already there:
prev = self.cache.quote_tick(tick.instrument_id, index=1)What you must NOT do:
# WRONG: writing to cache yourself before letting the engine see the
# data. Breaks ordering, breaks reconciliation, breaks backtest replay.
self.cache.add("my_quote", quote.to_bytes())
self.publish_data(...)The right path for inserting custom data into the cache from an Actor
is self._handle_data(event) from inside a DataClient, which routes
through DataEngine and preserves cache-then-publish. From outside an
adapter, use self.publish_data(DataType(MyType, ...), event) and let
the DataEngine + bus do the cache write for you.
Persistence - in-memory, file-backed, Redis-backed
Default - in-memory only
No database configured = pure in-memory. Cache state vanishes on
process exit. For backtest this is correct (each run starts clean).
For live this means: flush_on_start=False is meaningless without a
database, and a crash means losing in-flight emulator state, custom
cache entries, and any account/order knowledge that wasn’t already
on the venue.
File-backed
Not a first-class option in the cache concept doc. The catalog
(ParquetDataCatalog, see nautilus-data.md) is
the file-backed historical store; the cache itself is either
in-memory or DB-backed. Don’t conflate these - file-backed cache is
not how Nautilus thinks.
Redis-backed (the recommended live story)
Configuration:
from nautilus_trader.cache.config import CacheConfig
from nautilus_trader.common.config import DatabaseConfig
cache_config = CacheConfig(
database=DatabaseConfig(
type="redis",
host="localhost",
port=6379,
connection_timeout=2,
response_timeout=2,
),
encoding="msgpack", # or "json"
timestamps_as_iso8601=False,
buffer_interval_ms=None,
bulk_read_batch_size=None,
use_trader_prefix=True,
use_instance_id=False,
flush_on_start=False,
drop_instruments_on_reset=True,
tick_capacity=10_000,
bar_capacity=10_000,
)Stated use cases for the database:
“Long-running systems: If you want your data to survive system restarts, upgrading, or unexpected failures.”
“Historical insights: When you need to preserve past trading data for detailed post-analysis or audits.”
“Multi-node or distributed setups: If multiple services or nodes need to access the same state.”
The third bullet is the dashboard / brain-logger story: an out-of-band Python (or Node, or Rust) process subscribes to the same Redis backing and reads cache state without being part of the kernel. Same pattern Nautilus uses for the optional MessageBus Redis externalization (see nautilus-concepts.md).
Persistence taxonomy - when to use what
| Use case | Storage |
|---|---|
| Backtest | In-memory (default) |
| Sandbox / paper / staging | In-memory or Redis (test the persistence path) |
| Live trading, single node, restartable | Redis recommended |
| Live trading, multi-node, dashboard-attached | Redis required |
| Bulk historical replay data | ParquetDataCatalog, NOT cache |
| Append-only audit log | Out-of-band sink (SQLite, brain, S3) |
For Cortana MK3: Redis is recommended for live, mandatory if the dashboard is co-deployed. The 2026-04-22 data loss class (externalized state) is solved IFF live runs against Redis.
Reconciliation - how cache rebuilds from venue truth on startup
The cache concept doc itself is sparse on reconciliation; the work
happens in the LiveExecutionEngine (see
nautilus-concepts.md and
nautilus-architecture.md). The flow:
-
Startup with Redis backing. The cache loads its prior state from Redis. Orders, positions, accounts, instruments, custom user data are all rehydrated.
-
flush_on_start=True(config option) wipes Redis on startup - useful for clean-slate testing, dangerous in live (you’ve thrown away your knowledge of what’s at the venue). -
drop_instruments_on_reset=True(default) clears instruments whenreset()is called - instruments are typically refetched from the venue at start. -
LiveExecutionEnginereconciliation fires next: pulls order status reports, fill reports, and position reports from the venue, then aligns cached state. Per the architecture doc:“With cached state, report data generates missing events to align the state. Without cached state, all orders and positions at the venue are generated from scratch.”
-
Four reconciliation invariants (from nautilus-concepts.md): position quantity matches within instrument precision; average entry price aligns within tolerance; PnL integrity preserved through calculated pricing; synthetic identifiers deterministic across restarts.
-
Continuous monitoring loop (live only): detects stale or lost in-flight messages and re-reconciles.
For Cortana, this is the answer to both the 2026-04-22 data loss
incident and the 2026-05-06 power-outage state divergence: the cache
is rebuilt from Redis (if persisted) AND reconciled against IBKR’s
actual order/position state on startup. The “alert without action”
P0 (project_pm_ibkr_exit_invariant) becomes structural - there is
no scenario where the cache disagrees with the venue and nobody
notices.
Backtest vs live behavior
The cache surface is identical in both modes - same read API, same write disciplines. Differences:
| Aspect | Backtest | Live |
|---|---|---|
| Default persistence | In-memory only (correct - runs are independent) | In-memory by default; Redis recommended |
| Write timing | Synchronous within the deterministic kernel cycle | Synchronous within the kernel cycle, with brief async lag for non-DataEngine paths |
| Reconciliation | None - the simulator owns truth on both sides | Yes - LiveExecutionEngine reconciles against venue on startup + continuously |
| Capacity eviction | Same (10k bars / 10k ticks defaults) | Same |
on_reset() | Called between backtest runs; clears mutable state, preserves data + instruments unless drop_instruments_on_reset=True | Not called - live engines do not reset |
flush_on_start | Irrelevant without DB | Wipes Redis if True |
“Same Cache” is one of the seven structural points of the backtest-live parity claim (see nautilus-architecture.md). Cortana code reading
self.cache.quote_tick(instrument_id)works identically in both contexts.
Multi-tenant scoping - can one process host multiple tenant caches?
The doc does not explicitly address multi-tenant cache scoping.
Inferred answer: partial - one cache per TradingNode, one node
per process, so isolation is process-level not cache-level.
Walking the evidence:
-
The architecture is single-node-per-process. Per nautilus-architecture.md:
“Running multiple
TradingNodeorBacktestNodeinstances concurrently in the same process is not supported due to global singleton state.”Singletons listed:
_FORCE_STOP, logger mode/timestamps, global Tokio runtime, callback registries, otherOnceLockinstances. -
The cache is a per-Kernel object. One
NautilusKernelperTradingNode, one cache per kernel. So a single process has one cache. -
Therefore, multi-tenant requires multi-process. Each tenant runs in its own process, with its own kernel, its own cache, its own Redis namespace (or its own Redis instance).
-
What multi-tenancy is supported within a single cache: queries already filter by
strategy_id-orders(strategy_id=...),positions(strategy_id=...). So “many strategies in one node” is first-class. But the strategies share the same cache object, so they have read access to each other’s orders/positions. That is not tenant isolation. -
Redis namespacing knobs that help:
use_trader_prefix=True(default) prefixes keys with the trader ID.use_instance_id=False- whenTrue, also prefixes with the instance ID. For multi-tenant deployment, setuse_instance_id=Trueso each process gets a distinct prefix even if trader IDs collide.
Verdict for the spike’s Step 7.5 question
Per-tenant cache scoping: PARTIAL.
- Within one process / one tenant: clean. Multi-strategy is
first-class via
strategy_idfiltering. One cache, multiple strategies, no cross-strategy interference if disciplined. - Across tenants: requires process-per-tenant deployment. Each
tenant gets its own
TradingNode, its own cache, its own Redis namespace (use_instance_id=Trueis the load-bearing config flag). Shared market data subscriptions (one UW WebSocket serving N tenants) still need a separate fan-out architecture - likely a shared “market data hub” process that publishes to N tenant nodes via Redis MessageBus. The Cache itself does NOT solve fan-out.
So for the SaaS roadmap: the cache is correctly scoped per tenant because it lives inside the per-tenant process. What it does not do is multiplex multiple tenants inside one process. That’s ultimately the right architecture (compliance, isolation, blast radius), but it changes the unit-economics conversation - Step 7.5’s “process-per-tenant likely” guess is now confirmed.
Cortana MK3 implications
Concrete mapping from MK2’s seven (or more) cache surfaces to Nautilus’s one.
MK2 → Nautilus cache mapping
| MK2 component | Where it lives today | Nautilus replacement |
|---|---|---|
In-memory spy_price, last_score, etc. dicts in app.py | Process-local Python dicts | self.cache.quote_tick(instrument_id) + custom ScoringEvent cache |
cortanaroi/db/decisions.py (SQLite) | Disk SQLite | Out-of-band sink (Cache for runtime; SQLite/audit table for forensics) |
| Cooldown state Pickle | Disk Pickle | Strategy.on_save() / on_load() (per nautilus-strategies.md) - backed by Cache database |
| Impulse history Pickle | Disk Pickle | self.cache.add(key, bytes) or custom Data |
UW REST lru_cache | Function-local | UW custom DataClient writes via _handle_data → cache by type+id |
| IBKR adapter cache | IBKR adapter internal | Built-in - IBKR adapter feeds Nautilus cache directly |
| Dashboard’s redundant SQL view | Separate read replica | Out-of-band Redis subscriber to the same cache backing |
Result: one runtime cache, one ordering invariant, one place to look when state seems wrong.
How this addresses the 2026-04-22 data loss class
project_data_loss_april22 root cause: workspace archive lost
disk-resident state (Pickle blobs, cooldown state, decisions.db
contents). Nautilus’s answer:
- Externalize critical state to Redis via
DatabaseConfig. The cache becomes the runtime store, Redis becomes the durable backing. A workspace archive that loses the local FS no longer destroys trading state - Redis (running on a separate host, or at least a separate volume) survives. - Crash-only design (per nautilus-architecture.md) means restart is the only recovery path, AND it’s the same code path the system runs every day. There is no “graceful shutdown” off-ramp that exists only on the happy day; the cache rebuild from Redis + venue reconciliation is always what happens at boot.
flush_on_start=False(the default) means accidental restarts don’t wipe the durable backing. Combined with externalized state, this is the structural fix to GH #26.
The remaining attack surface is “what if Redis itself is the single-volume the workspace archive lost?” Mitigation: run Redis on a separate host/volume (or use a managed Redis with cross-zone replication). This is the operational discipline; the architecture is sound.
How this addresses the 2026-05-06 cache rewrite race
The 2026-05-06 power-outage post-mortem identified a cache rewrite
race: components were updating their local view of spy_price at
different times, and a strategy fired on a stale value. Nautilus’s
answer is structural, not procedural:
- One cache. No “local view” exists. Every component reads
self.cache.quote_tick(instrument_id)and gets the same answer. - Single-threaded ordering. Per nautilus-architecture.md: “The kernel consumes and dispatches messages on a single thread… cache reads and writes.” No concurrent writers means no rewrite race - every write is serialized through the kernel.
- Cache-then-publish. The DataEngine writes the new quote before
notifying any subscriber. By the time
on_quote_tickruns in the strategy, the cache already reflects the new value. The “handler reads stale cache” failure mode is impossible by construction.
This is one of the strongest single-component arguments for MK3. The cache rewrite race is a class of bug, not an instance, and Nautilus eliminates the class.
Multi-tenant question (spike Step 7.5)
Per the analysis above:
- Cache scopes cleanly to one tenant per process. Use
use_instance_id=Trueto namespace Redis keys per tenant. - Process-per-tenant is the right deployment shape. This was the spike’s guess and it’s confirmed by the singleton constraint.
- Shared market data fan-out requires a separate “hub” process that subscribes to UW once and republishes per-tenant via Redis MessageBus. The cache itself doesn’t multiplex this - but the MessageBus’s Redis externalization is designed for exactly this kind of out-of-band consumer. (See nautilus-message-bus.md when filed.)
- Per-tenant venue credentials (each customer’s IBKR account)
flow through the per-process
ExecutionClientconfig. Each tenant’s process holds its own creds; no shared brokerage pool.
Bottom line for SaaS unit economics: one process per tenant, shared data hub, Redis-per-tenant (or shared Redis with namespace prefixes) is the correct architecture. The spike plan’s “process-per-tenant likely the right starting point” framing is right.
Anti-patterns to avoid
- Strategies / Actors writing trading state to the cache directly.
self.cache.add(...)is permitted for custom user data. It is NOT for orders, positions, accounts. The ExecutionEngine owns those. - Reading stale fallbacks instead of
None. Every cache point query returnsNoneon miss. Don’t substitute “old value” for missing - the doc enforces “explicit unpriceability” elsewhere (see Portfolio in nautilus-concepts.md) and the cache follows the same posture. - Treating the cache as a database. “The Cache is not designed
to be a full database replacement.” If you need queries by
custom predicates, time ranges, or joins, write to
ParquetDataCatalog(historical) or an out-of-band SQL/audit sink. - Writing to cache yourself before letting the engine see the
data. Use
_handle_datafrom inside aDataClient, orpublish_datafrom an Actor/Strategy, so cache-then-publish is preserved. Hand-rolledcache.add()followed bypublish_data()is wrong. - Assuming cross-path synchrony. Cache-then-publish holds within
the DataEngine’s path (quotes/trades/bars). It does NOT guarantee
that an
OrderFilledand aQuoteTickarriving at the same microsecond land in cache in arrival order. Usets_event/ts_initfor cross-path ordering. flush_on_start=Truein live. Wipes the Redis backing. Almost always wrong in live.- Ignoring
tick_capacity/bar_capacity. Default 10,000 is fine for normal use, brutally tight for some research workloads. If you’re queryingbars(bar_type)and getting fewer than expected, check if you’ve blown past capacity. No warning is logged on eviction. - Storing the 78-feature vector as 78 separate
cache.add()calls per scoring tick. OneScoringEventper score, see nautilus-data.md.
Cache vs Portfolio vs strategy variables - which to use
The doc’s own decision rule:
Cache - for:
- Trading-related data (orders, positions, instruments, market data).
- Data shared across strategies.
- Data that must persist across restarts.
- State that must survive
on_reset()(with the right config).
Portfolio - for:
- Aggregated position, exposure, account information.
- Current state (no history).
- Pull-style queries against valuation.
Strategy/Actor instance variables - for:
- Strategy-specific calculations.
- Temporary values, intermediate results.
- Data that only this one component needs.
For Cortana, the migration mostly moves what’s currently in instance variables and Pickle files into the Cache (with Redis backing). What stays in instance variables: short-lived per-tick scratch state, in-progress feature engineering buffers that are fully recoverable on restart, indicator state managed by registered indicators (lives on the Strategy/Actor, fed by the DataEngine).
Quick reference - config defaults
CacheConfig(
database=None, # In-memory only
encoding="msgpack", # or "json"
timestamps_as_iso8601=False,
buffer_interval_ms=None,
bulk_read_batch_size=None,
use_trader_prefix=True, # Prefix Redis keys with trader ID
use_instance_id=False, # SET TRUE FOR MULTI-TENANT
flush_on_start=False, # Don't wipe Redis on boot
drop_instruments_on_reset=True,
tick_capacity=10_000, # Per instrument
bar_capacity=10_000, # Per bar type
)
DatabaseConfig(
type="redis",
host="localhost",
port=6379,
connection_timeout=2,
response_timeout=2,
)Open questions for the 2026-05-09 spike
- Redis ops shape. What’s the recommended deployment topology for Redis in a single-tenant Cortana live deployment? Local Redis on the same host? Separate VM? Managed (ElastiCache, MemoryDB)? The cache concept doc doesn’t recommend; needs spike-time investigation.
- Cache key inspection in live. Is there a CLI or admin surface
for inspecting cache contents at runtime, or do we always go
through
redis-cliagainst the backing store? Affects ops ergonomics for the dashboard. flush_on_startinteraction with reconciliation. If we flush Redis but the venue still has orders/positions, doesLiveExecutionEnginerebuild from venue truth alone? Per the architecture doc “all orders and positions at the venue are generated from scratch” - so yes - but verify the synthetic ID determinism claim holds.- Custom data eviction policy.
cache.add(key, bytes)- does the cache evict custom keys under memory pressure, or are they pinned? Affects whether we can use it for unbounded user data. - Multi-tenant Redis sizing. If each tenant runs
use_instance_id=Trueand we have 1000 tenants, what’s the memory footprint per tenant of a representative trading day? Affects SaaS unit economics.
See Also
- Nautilus Architecture - runtime topology; cache-then-publish at the dispatch level
- Nautilus Data - what flows into cache; custom
Datasubclasses;ParquetDataCatalog(the non-cache historical store);ts_event/ts_initsemantics - Nautilus Strategies -
self.cachequeries from inside a Strategy; emulated-order gotcha;on_save/on_loadcache-database persistence - Nautilus Actors -
self.cachequeries from inside an Actor; same surface, no order writes - Nautilus Concepts - full architecture canon including ExecutionEngine reconciliation invariants
- nautilus-message-bus.md (parallel - pending) - Redis-backed bus for out-of-band consumers; multi-tenant fan-out pattern
- Spike plan:
~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md project_data_loss_april22- the externalized-state failure classproject_pm_ibkr_exit_invariant(#46) - broker-truth alignment that reconciliation directly addressesfeedback_no_hwm_trailing_language- reduce-only / single-shot TP semantics live on Strategy, but their state lives in cache- 2026-05-06 power-outage state divergence postmortem
(
~/brain/writing/2026-05-06-power-outage-state-divergence.md)
Timeline
- 2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 2.