Nautilus Trader Developer-Guide Design Principles
The
/developer_guide/design_principles/page enumerates exactly one normative principle: message immutability. “Once a message (request, response, event, or command) is created, its fields must not be mutated. This includes container fields such asparamsmaps.” The page lists eight system properties this protects (determinism, temporal integrity, safer concurrency, easier debugging, reliable replay/simulation, clear ownership boundaries, better auditability, more robust distribution) and points to three ownership rules in the message-bus integrity section that follow from it. The thinness is the point: every other “principle” the framework lives by - single-threaded core, cache-then-publish, crash-only design, fail-fast on numeric anomalies, ports-and-adapters extensibility, type-safe domain model, externalized state, idempotent operations - is documented elsewhere (architecture, concepts) and the developer guide leaves them implicit. For Cortana MK3 design, this page is the contract our custom messages (ScoreUpdate,MetaProb,Regime) must satisfy. Violating it would destroy the determinism that makes backtest = live possible.
Why this page exists
nautilus-developer-guide.md covers the developer guide as a whole -
adapter layout, testing matrix, FFI contract, contributing back. This
page narrows on the one section labelled “Design Principles,” which is
load-bearing for any code Cortana writes that touches the message bus.
The design-principles page is short by design: it states the invariant the framework wants you to never break, and trusts the rest of the docs to develop everything else. For an extension author the page is short but absolute - break it and the framework’s determinism, replay, backtest-live parity, and audit guarantees collapse.
This page therefore captures (a) the canonical principle as written, (b) the implicit architectural principles the developer guide assumes you’ve already absorbed from concepts/architecture, and (c) the Cortana-specific mapping from each principle to a concrete extension point we will be building during the 2026-05-09 spike and after.
The canonical principle: message immutability
Quoted verbatim from /developer_guide/design_principles/:
“Once a message (request, response, event, or command) is created, its fields must not be mutated.”
The page expands this to explicitly include container fields:
“This includes container fields such as
paramsmaps. Components can read a message and derive local state from it, but they must not rewrite the original.”
What “message” covers
The category is broad. The four message kinds named on the page - request, response, event, command - span every object that crosses the bus:
- Data messages -
QuoteTick,TradeTick,Bar,OrderBookDelta,OrderBookDepth10,MarkPriceUpdate,FundingRateUpdate, customDatasubclasses,@customdataclasstypes. - Event messages -
OrderAccepted,OrderFilled,OrderRejected,OrderCanceled,PositionOpened,PositionChanged,PositionClosed,AccountState,TimeEvent. - Command messages -
SubmitOrder,ModifyOrder,CancelOrder,BatchCancelOrders,QueryOrder,SubscribeData,RequestData. - Request/Response pairs -
DataRequest/DataResponse, plus the request-side options the caller supplies inparams.
If it leaves the producer and lands in a consumer, it’s a message and the rule applies.
The eight protected properties
Each bullet from the source page, expanded with the practical implication:
-
Determinism. “Every consumer sees the same input. Behavior is easier to reason about, replay, and test.” Why it matters: without immutability, two consumers handling the same dispatch may see different values depending on the order they ran. That kills the deterministic dispatch contract Nautilus’s single-threaded kernel depends on.
-
Temporal integrity. “A message preserves what was true when the system emitted it. Events and commands remain factual records instead of containers of drifting state.” Why it matters: an
OrderFilledevent is a historical fact. If a downstream consumer mutateslast_qtyto a partial-fill view, the event no longer documents what happened - it documents what someone later thought. -
Safer concurrency. “Readers do not need coordination to protect message payloads from later rewrites. This removes a common source of races around shared state.” Why it matters: Nautilus’s threading model has the kernel on a single thread and background services on other threads (network I/O, persistence, adapter Tokio runtime). Cross-thread payloads are safe only because they’re immutable. Add mutation and you’ve reintroduced the lock-ordering problem the architecture eliminated.
-
Easier debugging. “Logs, traces, replay tools, and dead-letter inspection remain useful because the message still reflects the original payload.” Why it matters:
LogGuardsnapshots, dead-letter queue inspection, and the redis stream for cross-process MessageBus subscribers all assume what gets logged is what got dispatched. -
Reliable replay and simulation. “Replaying a sequence yields the same logical inputs as the original run. This supports backtesting, incident reconstruction, and regression testing.” Why it matters: the entire backtest-live parity claim - the single-most-important architectural property of Nautilus - depends on this. Replay = determinism = same kernel processes same sequence the same way.
-
Clear ownership boundaries. “Components treat incoming messages as input. If a component needs a different representation, it derives new local state or a new message explicitly.” Why it matters: this is the rule that prevents “mystery side-effects” - every transformation is visible because it produces a new object on the bus or in component-owned context.
-
Better auditability. “The system can answer what it knew, when it knew it, and what it did from that information.” Why it matters: for a real-money trading system, the audit trail is non-negotiable. A regulator (or a human reviewing a postmortem) needs to reconstruct exactly what state the system was in when it submitted an order.
-
More robust distribution. “Serialized messages already cross process and service boundaries as copies. The same ownership rule keeps the in-memory model aligned with that reality.” Why it matters: the optional Redis backing for the MessageBus, and the multi-tenant SaaS shape MK3 is targeting, both serialize messages across process boundaries. If your in-memory consumers can mutate but serialized consumers can’t, you have two semantics for one bus - a class of bug Nautilus refuses to introduce.
The three ownership rules (from message_bus.md “Message integrity”)
The design-principles page links to
/concepts/message_bus.md#message-integrity for the rules that follow
from the immutability invariant. Quoted from that section:
“Three ownership rules follow from this:
- Caller-supplied request options stay on the message.
- Response metadata returned to the caller stays on the response.
- Component workflow state (bounded date ranges, grouping state, replay cursors, counters, processing flags) stays in component-owned context keyed by message or request ID.”
And:
“When a component needs a derived message, it creates a new one with the required values instead of rewriting the original.”
These rules answer the natural follow-up question: “where does the state go if the message can’t carry it?” Answer: caller request options on the request message; response metadata on the response message; processing/workflow state in a component-owned dictionary keyed by message ID. The bus is for facts; the components keep their bookkeeping locally.
Implicit principles the developer guide assumes
The design-principles page is one paragraph plus eight bullets. The
framework lives by many more design principles, but they’re documented
elsewhere - primarily in ~/brain/concepts/nautilus-architecture.md and
nautilus-concepts.md. Capturing them here so a Cortana implementer has
the full set in one place when planning a custom data type, custom
actor, or custom risk rule.
Single-threaded deterministic core
“Within a node, the kernel consumes and dispatches messages on a single thread. The kernel encompasses: the MessageBus and actor callback dispatch, strategy logic and order management, risk engine checks and execution coordination, cache reads and writes.”
nautilus-architecture.md
The single-thread guarantee is the mechanical reason backtest and live behave the same. Combined with message immutability, it gives bitwise deterministic event ordering. Background I/O runs on its own threads but every result lands back on the single-threaded core through MPSC channels.
The corollary: anything that wants to “do work during a callback” has to either finish quickly or kick to a background thread and re-enter via the bus. There is no third option.
Cache-then-publish
“The DataEngine writes to the Cache before publishing to subscribers, so the latest value is available in the cache by the time your handler runs.”
nautilus-concepts.md§ Cache
This is the invariant that lets a strategy’s on_quote_tick handler
rely on self.cache.quote_tick(...) returning the very tick that
triggered it. There is no race between handler entry and cache
visibility.
For order book deltas and depth snapshots the rule is different -
those are “published directly; book state is maintained separately
through BookUpdater subscriptions” (per architecture). For quotes,
trades, and bars the cache-then-publish ordering is an absolute
guarantee.
Crash-only design
Five sub-principles from the architecture page:
- Unified recovery path. “Startup and crash recovery share the same code path, ensuring it is well-tested.”
- Externalized state. “Critical state is meant to be persisted externally when configured, reducing data-loss risk.” Redis is the canonical backing store.
- Fast restart. “Designed to come back up quickly, minimizing downtime.”
- Idempotent operations. “Operations are safe to retry after restart.”
- Fail-fast for unrecoverable errors. “Data corruption or invariant violations terminate immediately rather than continue in a compromised state.”
Production runs panic = abort in release builds so process
supervisors can restart cleanly.
Fail-fast on data integrity
“NautilusTrader prioritizes data integrity over availability for trading operations. The system employs a strict fail-fast policy for arithmetic operations and data handling to prevent silent data corruption.”
nautilus-architecture.md§ Data integrity
When fail-fast (panic) applies:
- Programmer errors (logic bugs, incorrect API usage).
- Data that violates fundamental invariants (negative timestamps, NaN prices).
- Arithmetic that would silently produce incorrect results.
When Result/Option applies instead:
- Network errors, file I/O.
- Order constraints, risk limits.
- User input validation.
The rationale verbatim: “In trading systems, corrupt data is worse than no data.”
Ports and adapters (hexagonal architecture)
The kernel defines abstract ports - DataClient, ExecutionClient,
RiskRule, Actor, Component traits - and implementations adapt
them to backtest fixtures, sandbox simulators, or live venues. The
seam is clean enough that a venue plug-in cannot accidentally reach
into kernel internals.
For Cortana, this is the principle that lets us write a UW
DataClient, a custom risk rule, and Cortana’s scoring Actor /
strategy Strategy without touching framework code.
Type-safe domain model
Price, Quantity, Money are immutable, precision-aware,
fixed-point internally. Same-type arithmetic preserves type;
mixed-dimensional arithmetic returns Decimal. Quantity cannot be
negative; Money requires matching currencies.
This is the principle that prevents “I added two prices and got a quantity” or “I mixed USD and EUR silently.” For Cortana’s option math, this means every score component that has units (a notional, a price, a position size) must use the right type, not a Python float.
Externalized state
The cache database (Redis) is for live execution state - orders, positions, accounts, emulated-order working state, MessageBus stream. The ParquetDataCatalog is for bulk historical data.
“These are different problems and Nautilus addresses them separately rather than forcing one tool to do both.”
nautilus-concepts.md§ Persistence
For Cortana, the implication: any state that must survive a restart
goes through one of these two stores. Strategy-local state goes
through Strategy.on_save() → on_load(), which round-trips through
the cache database.
Idempotent operations and reconciliation
The LiveExecutionEngine reconciles cached state against the venue on
startup and via a continuous monitoring loop. Duplicate fills are
caught by composite key (trade_id + order_side + last_px +
last_qty). Overfills are gated by allow_overfills. All execution
events are persisted.
For Cortana, this is the principle that makes broker-truth-first auditable. Any reconciler we write must be idempotent: running it twice with the same input produces the same state.
Component finite state machine
Stable states: PRE_INITIALIZED, READY, RUNNING, STOPPED,
DEGRADED, FAULTED, DISPOSED. Transitional: STARTING,
STOPPING, RESUMING, RESETTING, DISPOSING, DEGRADING,
FAULTING.
For Cortana, every actor and strategy we write inherits this FSM. The
hooks (on_start, on_stop, on_resume, on_reset, on_dispose,
on_fault, on_degrade) are the only places lifecycle work belongs.
Layered adapter shape
Per the developer guide: every first-class adapter is a Rust crate
under crates/adapters/<adapter>/ plus a Python wiring tree under
nautilus_trader/adapters/<adapter>/. The Rust layer owns
HTTP/WS networking, request signing, rate limiting, and parsing. The
Python layer owns the engine-facing interface and configuration.
“For new work, the Rust + PyO3 stack is the supported path.”
nautilus-developer-guide.md
Pure-Python adapters are explicitly second-class. For Cortana’s UW adapter, the canonical path is Rust+PyO3; a Python-only ingestor is acceptable as a stop-gap but not the long-term home.
Cortana MK3 implications
Each principle, mapped to a concrete Cortana extension point. Where the principle would forbid a pattern we’re tempted to port from MK2, that’s called out explicitly.
Custom data types (ScoreUpdate, MetaProb, Regime, UWFlowEvent)
The principle that bites: message immutability.
The Cortana scoring actor publishes ScoreUpdate messages on the bus.
The temptation from MK2 is to publish a mutable score record and
update fields in-place as new component scores arrive. This is
forbidden by the design principle. Concretely:
- Define
ScoreUpdateas a@customdataclass(Python frozen dataclass pernautilus-concepts.md) or a RustDatasubclass via PyO3. - Every field is set at construction. No in-place edits.
ts_eventandts_initare stamped at publish time and never rewritten.- If a downstream consumer (the meta-prob actor, the risk rule) needs
a derived representation, it creates a new message and publishes
that - it does not edit
ScoreUpdate.params. - Component-owned state (e.g., the meta-prob actor’s hyperparameter
cache, the strategy’s cooldown counter) lives in the actor’s local
attributes, keyed by
instrument_idorrequest_id. It does not ride on the message.
MK2 anti-pattern this rules out: the
cortanaroi/engine/scoring_engine.py pattern where a single
ScoreState dict is mutated by multiple subsystems. In Nautilus, that
becomes N immutable ScoreUpdate messages, each one a snapshot.
Custom DataClient (UW)
The principle that bites: ports and adapters + layered adapter shape + fail-fast on data integrity.
The UW data adapter is the most likely Cortana extension to require
Rust (per nautilus-rust.md § Path 3). Implications:
- The HTTP/WS clients, request signing, rate limiting, and JSON parsing
belong in
crates/adapters/unusual_whales/src/- Rust. - The Python wiring (
LiveDataClientsubclass,InstrumentProvider,LiveDataClientConfig, factory function) lives innautilus_trader/adapters/unusual_whales/. - All numeric fields parse to
Price/Quantity/Money, never Python floats. NaN values must be rejected at parse time, not smoothed downstream. - The adapter publishes via the DataEngine, which writes Cache then
publishes - so a consumer’s
on_uw_eventhandler can read the latest UW state from the cache by the time it runs.
MK2 anti-pattern this rules out: the parser-in-strategy pattern where
cortanaroi/data/uw_*.py parses inside the same module that decides
on signals. In Nautilus, parsing lives in the adapter and decision
logic lives in the actor - and they communicate only through immutable
messages on the bus.
Custom RiskEngine rule
The principle that bites: single-threaded core + fail-fast + no defensive checks in strategies.
The Cortana meta-prob gate (win-prob threshold) belongs in a custom risk rule, not inline in the strategy. Implications:
- The rule runs on the single-threaded kernel inside
RiskEngine. It must be fast (microseconds, not milliseconds). - It receives an immutable
SubmitOrdercommand, reads the latestMetaProbfrom the cache, and either passes or returnsOrderDenied. It never mutates the order. - If meta-prob is missing or NaN, fail-fast: deny the order and log. Don’t substitute a default and submit anyway.
- Any meta-prob caching the rule needs (last update time, stale threshold) lives in the rule’s own context, not on the order.
MK2 anti-pattern this rules out: the inline if score < threshold: return pattern scattered across position_manager.py and
scoring_engine.py. In Nautilus, every order passes through the risk
rule by construction. There is no path that silently bypasses it.
Multiple Actors (scoring, meta-prob, regime, brain logger)
The principle that bites: message immutability + clear ownership boundaries + component FSM.
Multiple actors form a chain: bars/quotes/UW events come in →
ScoringActor publishes ScoreUpdate → MetaProbActor subscribes,
publishes MetaProb → RegimeActor publishes Regime →
CortanaStrategy consumes all three and decides. Implications:
- Each actor is its own
Componentwithon_start,on_stop,on_save,on_load. They don’t share state directly; they communicate only through immutable messages. - Actor-local hyperparameters (model weights, decay constants, stale
thresholds) live in the actor’s own attributes, restored from
on_load(state)after restart. - The brain-logger actor subscribes to
PositionClosedand writes to~/brainmarkdown via a queue (off-thread). It does not block the kernel. This is the “kick to background, re-enter via bus” pattern.
MK2 anti-pattern this rules out: the global singleton state pattern
where multiple subsystems read/write a shared cooldown_state dict.
In Nautilus, each subsystem is an actor with its own state, and the
dispatch order is determined by the bus topic subscriptions, not by
import order or global mutation.
One Strategy (CortanaStrategy)
The principle that bites: single-threaded core + OrderFactory discipline + on_save/on_load round-trip.
Implications:
on_quote_tick,on_bar,on_score_update,on_meta_prob,on_position_opened,on_position_closedall run on the kernel thread. Anything heavy (e.g., re-running an ML model) goes to a background thread or to a separate actor.- Order construction goes through
OrderFactoryandsubmit_order, not through hand-rolled order objects. The factory fills in trader ID, strategy ID, and timestamp consistently. - Strategy-local state (last entry time, current intended position,
hedge state) is stored in instance attributes and serialized via
on_save() -> dict[str, bytes]so it survives restart.
MK2 anti-pattern this rules out: the hand-rolled trade journal pattern
where each strategy writes its own pickle file outside the framework.
In Nautilus, the cache database is the persistence layer; on_save
is the seam.
Backtest harness
The principle that bites: same kernel in backtest and live + replay determinism.
Implications:
- We do not write a separate backtest harness. We use
BacktestNode(orBacktestEnginefor low-level control) and feed the same data shapes our liveDataClientwould emit. - Catalog data (Parquet) flows through the same DataEngine the live adapter would feed. Strategy code is unchanged.
- Replay is bitwise deterministic given the same seed, data, and config. Cortana’s “replay 2026-04-16 chop day” tests become framework-level guarantees, not custom infrastructure.
MK2 anti-pattern this rules out: the parallel paper-vs-live pathway that has drifted multiple times in MK2. In Nautilus, there is no separate path. Drift is impossible by construction.
Reconciliation and broker-truth-first
The principle that bites: idempotent operations + externalized state + fail-fast + continuous reconciliation.
Implications:
- Cortana does not write its own broker reconciler. The
LiveExecutionEnginereconciles on startup and continuously. - Any Cortana-specific reconciliation (e.g., comparing UW account-flow with our position record) is also idempotent: rerunning it produces the same state.
- All execution events persist to the cache database. After a restart the engine rebuilds state from broker reports + cache; Cortana inherits this for free.
MK2 anti-pattern this rules out: the Flex-query reconciler that runs on-demand and may or may not match what the engine thinks. In Nautilus, the reconciler is the engine - there is no parallel “truth” path.
Multi-tenant SaaS (Cody as customer #1)
The principle that bites: one node per process + message immutability across process boundaries.
Per nautilus-architecture.md: “Running multiple TradingNode or
BacktestNode instances concurrently in the same process is not
supported due to global singleton state.” Implications:
- Each tenant gets its own process (a
TradingNodeinstance). - Cross-tenant communication goes through Redis (the persistent MessageBus stream), and because messages are immutable, the in-process and cross-process semantics are identical.
- Per-tenant cache databases isolate state.
- The orchestrator restarts a tenant’s process on
panic = abortwithout affecting other tenants.
MK2 anti-pattern this rules out: the “one big Python process with many strategies tenanted by tag” pattern. In Nautilus, that’s a process boundary.
Anti-patterns to never port from MK2
The single sentence summary of the MK2 patterns the design principles rule out:
| MK2 pattern | Principle it violates | Nautilus replacement |
|---|---|---|
Mutating ScoreState dict in-place across subsystems | Message immutability | Publish a new ScoreUpdate per change |
Parser-inside-strategy (UW JSON parsed in signal_*.py) | Ports and adapters | Custom DataClient adapter publishes typed Data |
Inline if score < threshold in strategies | Single risk gate, fail-fast | Custom RiskRule in RiskEngine |
Global cooldown_state dict shared between subsystems | Clear ownership boundaries | Actor-local state, persisted via on_save |
| Parallel paper/live pathways | Same kernel everywhere | One BacktestNode / TradingNode shape |
| Hand-rolled Flex reconciler running on demand | Idempotent reconciliation, externalized state | LiveExecutionEngine startup + continuous reconciliation |
| Pickle-file trade journal per strategy | Externalized state via cache database | on_save / on_load round-trip |
| Float-based price arithmetic | Type-safe domain model | Price, Quantity, Money |
Unbounded time.sleep inside callbacks | Single-threaded core (don’t block the kernel) | Background actor + bus re-entry |
Mutating an OrderFilled event to “fix” a partial fill | Temporal integrity | The event is a fact; emit a new event for the new fact |
params dict on a message used as workflow scratchpad | Three ownership rules | Component-owned context keyed by request ID |
| Capturing log output to assert behavior in tests | Fragile coupling to logger globals (developer guide anti-pattern) | Verify observable behavior on the bus |
Open questions for the 2026-05-09 spike
@customdataclassimmutability enforcement. Does the Python@customdataclassdecorator generate a frozen class (raisesFrozenInstanceErroron assignment), or does it rely on convention? If convention only, Cortana will need a lint or test to enforce.- Custom
Datatypes and serialization. When Cortana’s customScoreUpdaterides the Redis stream for the dashboard, what’s the serialization codec (JSON or MessagePack), and does it preserveDecimal/Priceprecision? - Risk rule extension API. Is there a public hook to register a
custom
RiskRulefrom Python, or do all risk rules live inside the shippedRiskEngineconfig? (Per concepts: trading-state can be paused at the risk layer; per developer guide: pre-trade checks are central. Confirm extension surface.) on_savepayload size. Cortana’s strategy state could include a 78-column score history buffer. What’s the practical size limit for thedict[str, bytes]returned fromon_save?- Component-owned context for high-frequency messages. The third
ownership rule says workflow state lives in component-owned
context keyed by message ID. For Cortana’s cache of last-N
ScoreUpdateper instrument, what’s the canonical pattern - adicton the actor, or a separate cache subscription? - Background-thread re-entry. When the brain-logger actor writes
to
~/brainmarkdown, it must run off-thread. What’s the recommended pattern for “spawn a worker, write back to bus when done”? Does Nautilus expose a worker pool, or do we use Tokio (Rust) /asyncio.create_task(Python)?
See Also
- Nautilus Architecture - the canonical page for crash-only, fail-fast, single-threaded core, ports and adapters, type safety.
- Nautilus Concepts - the canonical page for cache-then-publish, externalized state, message integrity, FSM, positions, accounting.
- Nautilus Rust - when does Cortana drop to Rust, PyO3 boundary, FFI memory contract.
- Nautilus Developer Guide - adapter layout, testing matrix, contributing back.
- Nautilus Message Bus - pub/sub, three messaging patterns, message-integrity ownership rules.
- Nautilus Custom Data -
@customdataclass, publishing customDatasubclasses. - Nautilus Actors - actor lifecycle and callback discipline.
- Nautilus Strategies - strategy lifecycle,
on_save/on_load. - 2026-05-09 Nautilus Spike Plan:
~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md
Timeline
- 2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 7 (developer guide).