Nautilus Trader Developer-Guide Design Principles

The /developer_guide/design_principles/ page enumerates exactly one normative principle: message immutability. “Once a message (request, response, event, or command) is created, its fields must not be mutated. This includes container fields such as params maps.” The page lists eight system properties this protects (determinism, temporal integrity, safer concurrency, easier debugging, reliable replay/simulation, clear ownership boundaries, better auditability, more robust distribution) and points to three ownership rules in the message-bus integrity section that follow from it. The thinness is the point: every other “principle” the framework lives by - single-threaded core, cache-then-publish, crash-only design, fail-fast on numeric anomalies, ports-and-adapters extensibility, type-safe domain model, externalized state, idempotent operations - is documented elsewhere (architecture, concepts) and the developer guide leaves them implicit. For Cortana MK3 design, this page is the contract our custom messages (ScoreUpdate, MetaProb, Regime) must satisfy. Violating it would destroy the determinism that makes backtest = live possible.

Why this page exists

nautilus-developer-guide.md covers the developer guide as a whole - adapter layout, testing matrix, FFI contract, contributing back. This page narrows on the one section labelled “Design Principles,” which is load-bearing for any code Cortana writes that touches the message bus.

The design-principles page is short by design: it states the invariant the framework wants you to never break, and trusts the rest of the docs to develop everything else. For an extension author the page is short but absolute - break it and the framework’s determinism, replay, backtest-live parity, and audit guarantees collapse.

This page therefore captures (a) the canonical principle as written, (b) the implicit architectural principles the developer guide assumes you’ve already absorbed from concepts/architecture, and (c) the Cortana-specific mapping from each principle to a concrete extension point we will be building during the 2026-05-09 spike and after.

The canonical principle: message immutability

Quoted verbatim from /developer_guide/design_principles/:

“Once a message (request, response, event, or command) is created, its fields must not be mutated.”

The page expands this to explicitly include container fields:

“This includes container fields such as params maps. Components can read a message and derive local state from it, but they must not rewrite the original.”

What “message” covers

The category is broad. The four message kinds named on the page - request, response, event, command - span every object that crosses the bus:

Data messages - QuoteTick, TradeTick, Bar, OrderBookDelta, OrderBookDepth10, MarkPriceUpdate, FundingRateUpdate, custom Data subclasses, @customdataclass types.
Event messages - OrderAccepted, OrderFilled, OrderRejected, OrderCanceled, PositionOpened, PositionChanged, PositionClosed, AccountState, TimeEvent.
Command messages - SubmitOrder, ModifyOrder, CancelOrder, BatchCancelOrders, QueryOrder, SubscribeData, RequestData.
Request/Response pairs - DataRequest / DataResponse, plus the request-side options the caller supplies in params.

If it leaves the producer and lands in a consumer, it’s a message and the rule applies.

The eight protected properties

Each bullet from the source page, expanded with the practical implication:

Determinism. “Every consumer sees the same input. Behavior is easier to reason about, replay, and test.” Why it matters: without immutability, two consumers handling the same dispatch may see different values depending on the order they ran. That kills the deterministic dispatch contract Nautilus’s single-threaded kernel depends on.
Temporal integrity. “A message preserves what was true when the system emitted it. Events and commands remain factual records instead of containers of drifting state.” Why it matters: an OrderFilled event is a historical fact. If a downstream consumer mutates last_qty to a partial-fill view, the event no longer documents what happened - it documents what someone later thought.
Safer concurrency. “Readers do not need coordination to protect message payloads from later rewrites. This removes a common source of races around shared state.” Why it matters: Nautilus’s threading model has the kernel on a single thread and background services on other threads (network I/O, persistence, adapter Tokio runtime). Cross-thread payloads are safe only because they’re immutable. Add mutation and you’ve reintroduced the lock-ordering problem the architecture eliminated.
Easier debugging. “Logs, traces, replay tools, and dead-letter inspection remain useful because the message still reflects the original payload.” Why it matters: LogGuard snapshots, dead-letter queue inspection, and the redis stream for cross-process MessageBus subscribers all assume what gets logged is what got dispatched.
Reliable replay and simulation. “Replaying a sequence yields the same logical inputs as the original run. This supports backtesting, incident reconstruction, and regression testing.” Why it matters: the entire backtest-live parity claim - the single-most-important architectural property of Nautilus - depends on this. Replay = determinism = same kernel processes same sequence the same way.
Clear ownership boundaries. “Components treat incoming messages as input. If a component needs a different representation, it derives new local state or a new message explicitly.” Why it matters: this is the rule that prevents “mystery side-effects” - every transformation is visible because it produces a new object on the bus or in component-owned context.
Better auditability. “The system can answer what it knew, when it knew it, and what it did from that information.” Why it matters: for a real-money trading system, the audit trail is non-negotiable. A regulator (or a human reviewing a postmortem) needs to reconstruct exactly what state the system was in when it submitted an order.
More robust distribution. “Serialized messages already cross process and service boundaries as copies. The same ownership rule keeps the in-memory model aligned with that reality.” Why it matters: the optional Redis backing for the MessageBus, and the multi-tenant SaaS shape MK3 is targeting, both serialize messages across process boundaries. If your in-memory consumers can mutate but serialized consumers can’t, you have two semantics for one bus - a class of bug Nautilus refuses to introduce.

The three ownership rules (from message_bus.md “Message integrity”)

The design-principles page links to /concepts/message_bus.md#message-integrity for the rules that follow from the immutability invariant. Quoted from that section:

“Three ownership rules follow from this:

Caller-supplied request options stay on the message.

Response metadata returned to the caller stays on the response.

Component workflow state (bounded date ranges, grouping state, replay cursors, counters, processing flags) stays in component-owned context keyed by message or request ID.”

And:

“When a component needs a derived message, it creates a new one with the required values instead of rewriting the original.”

These rules answer the natural follow-up question: “where does the state go if the message can’t carry it?” Answer: caller request options on the request message; response metadata on the response message; processing/workflow state in a component-owned dictionary keyed by message ID. The bus is for facts; the components keep their bookkeeping locally.

Implicit principles the developer guide assumes

The design-principles page is one paragraph plus eight bullets. The framework lives by many more design principles, but they’re documented elsewhere - primarily in ~/brain/concepts/nautilus-architecture.md and nautilus-concepts.md. Capturing them here so a Cortana implementer has the full set in one place when planning a custom data type, custom actor, or custom risk rule.

Single-threaded deterministic core

“Within a node, the kernel consumes and dispatches messages on a single thread. The kernel encompasses: the MessageBus and actor callback dispatch, strategy logic and order management, risk engine checks and execution coordination, cache reads and writes.”

nautilus-architecture.md

The single-thread guarantee is the mechanical reason backtest and live behave the same. Combined with message immutability, it gives bitwise deterministic event ordering. Background I/O runs on its own threads but every result lands back on the single-threaded core through MPSC channels.

The corollary: anything that wants to “do work during a callback” has to either finish quickly or kick to a background thread and re-enter via the bus. There is no third option.

Cache-then-publish

“The DataEngine writes to the Cache before publishing to subscribers, so the latest value is available in the cache by the time your handler runs.”

nautilus-concepts.md § Cache

This is the invariant that lets a strategy’s on_quote_tick handler rely on self.cache.quote_tick(...) returning the very tick that triggered it. There is no race between handler entry and cache visibility.

For order book deltas and depth snapshots the rule is different - those are “published directly; book state is maintained separately through BookUpdater subscriptions” (per architecture). For quotes, trades, and bars the cache-then-publish ordering is an absolute guarantee.

Crash-only design

Five sub-principles from the architecture page:

Unified recovery path. “Startup and crash recovery share the same code path, ensuring it is well-tested.”
Externalized state. “Critical state is meant to be persisted externally when configured, reducing data-loss risk.” Redis is the canonical backing store.
Fast restart. “Designed to come back up quickly, minimizing downtime.”
Idempotent operations. “Operations are safe to retry after restart.”
Fail-fast for unrecoverable errors. “Data corruption or invariant violations terminate immediately rather than continue in a compromised state.”

Production runs panic = abort in release builds so process supervisors can restart cleanly.

Fail-fast on data integrity

“NautilusTrader prioritizes data integrity over availability for trading operations. The system employs a strict fail-fast policy for arithmetic operations and data handling to prevent silent data corruption.”

nautilus-architecture.md § Data integrity

When fail-fast (panic) applies:

Programmer errors (logic bugs, incorrect API usage).
Data that violates fundamental invariants (negative timestamps, NaN prices).
Arithmetic that would silently produce incorrect results.

When Result/Option applies instead:

Network errors, file I/O.
Order constraints, risk limits.
User input validation.

The rationale verbatim: “In trading systems, corrupt data is worse than no data.”

Ports and adapters (hexagonal architecture)

The kernel defines abstract ports - DataClient, ExecutionClient, RiskRule, Actor, Component traits - and implementations adapt them to backtest fixtures, sandbox simulators, or live venues. The seam is clean enough that a venue plug-in cannot accidentally reach into kernel internals.

For Cortana, this is the principle that lets us write a UW DataClient, a custom risk rule, and Cortana’s scoring Actor / strategy Strategy without touching framework code.

Type-safe domain model

Price, Quantity, Money are immutable, precision-aware, fixed-point internally. Same-type arithmetic preserves type; mixed-dimensional arithmetic returns Decimal. Quantity cannot be negative; Money requires matching currencies.

This is the principle that prevents “I added two prices and got a quantity” or “I mixed USD and EUR silently.” For Cortana’s option math, this means every score component that has units (a notional, a price, a position size) must use the right type, not a Python float.

Externalized state

The cache database (Redis) is for live execution state - orders, positions, accounts, emulated-order working state, MessageBus stream. The ParquetDataCatalog is for bulk historical data.

“These are different problems and Nautilus addresses them separately rather than forcing one tool to do both.”

nautilus-concepts.md § Persistence

For Cortana, the implication: any state that must survive a restart goes through one of these two stores. Strategy-local state goes through Strategy.on_save() → on_load(), which round-trips through the cache database.

Idempotent operations and reconciliation

The LiveExecutionEngine reconciles cached state against the venue on startup and via a continuous monitoring loop. Duplicate fills are caught by composite key (trade_id + order_side + last_px + last_qty). Overfills are gated by allow_overfills. All execution events are persisted.

For Cortana, this is the principle that makes broker-truth-first auditable. Any reconciler we write must be idempotent: running it twice with the same input produces the same state.

Component finite state machine

Stable states: PRE_INITIALIZED, READY, RUNNING, STOPPED, DEGRADED, FAULTED, DISPOSED. Transitional: STARTING, STOPPING, RESUMING, RESETTING, DISPOSING, DEGRADING, FAULTING.

For Cortana, every actor and strategy we write inherits this FSM. The hooks (on_start, on_stop, on_resume, on_reset, on_dispose, on_fault, on_degrade) are the only places lifecycle work belongs.

Layered adapter shape

Per the developer guide: every first-class adapter is a Rust crate under crates/adapters/<adapter>/ plus a Python wiring tree under nautilus_trader/adapters/<adapter>/. The Rust layer owns HTTP/WS networking, request signing, rate limiting, and parsing. The Python layer owns the engine-facing interface and configuration.

“For new work, the Rust + PyO3 stack is the supported path.”

nautilus-developer-guide.md

Pure-Python adapters are explicitly second-class. For Cortana’s UW adapter, the canonical path is Rust+PyO3; a Python-only ingestor is acceptable as a stop-gap but not the long-term home.

Cortana MK3 implications

Each principle, mapped to a concrete Cortana extension point. Where the principle would forbid a pattern we’re tempted to port from MK2, that’s called out explicitly.

Custom data types (`ScoreUpdate`, `MetaProb`, `Regime`, `UWFlowEvent`)

The principle that bites: message immutability.

The Cortana scoring actor publishes ScoreUpdate messages on the bus. The temptation from MK2 is to publish a mutable score record and update fields in-place as new component scores arrive. This is forbidden by the design principle. Concretely:

Define ScoreUpdate as a @customdataclass (Python frozen dataclass per nautilus-concepts.md) or a Rust Data subclass via PyO3.
Every field is set at construction. No in-place edits.
ts_event and ts_init are stamped at publish time and never rewritten.
If a downstream consumer (the meta-prob actor, the risk rule) needs a derived representation, it creates a new message and publishes that - it does not edit ScoreUpdate.params.
Component-owned state (e.g., the meta-prob actor’s hyperparameter cache, the strategy’s cooldown counter) lives in the actor’s local attributes, keyed by instrument_id or request_id. It does not ride on the message.

MK2 anti-pattern this rules out: the cortanaroi/engine/scoring_engine.py pattern where a single ScoreState dict is mutated by multiple subsystems. In Nautilus, that becomes N immutable ScoreUpdate messages, each one a snapshot.

Custom DataClient (UW)

The principle that bites: ports and adapters + layered adapter shape + fail-fast on data integrity.

The UW data adapter is the most likely Cortana extension to require Rust (per nautilus-rust.md § Path 3). Implications:

The HTTP/WS clients, request signing, rate limiting, and JSON parsing belong in crates/adapters/unusual_whales/src/ - Rust.
The Python wiring (LiveDataClient subclass, InstrumentProvider, LiveDataClientConfig, factory function) lives in nautilus_trader/adapters/unusual_whales/.
All numeric fields parse to Price / Quantity / Money, never Python floats. NaN values must be rejected at parse time, not smoothed downstream.
The adapter publishes via the DataEngine, which writes Cache then publishes - so a consumer’s on_uw_event handler can read the latest UW state from the cache by the time it runs.

MK2 anti-pattern this rules out: the parser-in-strategy pattern where cortanaroi/data/uw_*.py parses inside the same module that decides on signals. In Nautilus, parsing lives in the adapter and decision logic lives in the actor - and they communicate only through immutable messages on the bus.

Custom RiskEngine rule

The principle that bites: single-threaded core + fail-fast + no defensive checks in strategies.

The Cortana meta-prob gate (win-prob threshold) belongs in a custom risk rule, not inline in the strategy. Implications:

The rule runs on the single-threaded kernel inside RiskEngine. It must be fast (microseconds, not milliseconds).
It receives an immutable SubmitOrder command, reads the latest MetaProb from the cache, and either passes or returns OrderDenied. It never mutates the order.
If meta-prob is missing or NaN, fail-fast: deny the order and log. Don’t substitute a default and submit anyway.
Any meta-prob caching the rule needs (last update time, stale threshold) lives in the rule’s own context, not on the order.

MK2 anti-pattern this rules out: the inline if score < threshold: return pattern scattered across position_manager.py and scoring_engine.py. In Nautilus, every order passes through the risk rule by construction. There is no path that silently bypasses it.

Multiple Actors (scoring, meta-prob, regime, brain logger)

The principle that bites: message immutability + clear ownership boundaries + component FSM.

Multiple actors form a chain: bars/quotes/UW events come in → ScoringActor publishes ScoreUpdate → MetaProbActor subscribes, publishes MetaProb → RegimeActor publishes Regime → CortanaStrategy consumes all three and decides. Implications:

Each actor is its own Component with on_start, on_stop, on_save, on_load. They don’t share state directly; they communicate only through immutable messages.
Actor-local hyperparameters (model weights, decay constants, stale thresholds) live in the actor’s own attributes, restored from on_load(state) after restart.
The brain-logger actor subscribes to PositionClosed and writes to ~/brain markdown via a queue (off-thread). It does not block the kernel. This is the “kick to background, re-enter via bus” pattern.

MK2 anti-pattern this rules out: the global singleton state pattern where multiple subsystems read/write a shared cooldown_state dict. In Nautilus, each subsystem is an actor with its own state, and the dispatch order is determined by the bus topic subscriptions, not by import order or global mutation.

One Strategy (`CortanaStrategy`)

The principle that bites: single-threaded core + OrderFactory discipline + on_save/on_load round-trip.

Implications:

on_quote_tick, on_bar, on_score_update, on_meta_prob, on_position_opened, on_position_closed all run on the kernel thread. Anything heavy (e.g., re-running an ML model) goes to a background thread or to a separate actor.
Order construction goes through OrderFactory and submit_order, not through hand-rolled order objects. The factory fills in trader ID, strategy ID, and timestamp consistently.
Strategy-local state (last entry time, current intended position, hedge state) is stored in instance attributes and serialized via on_save() -> dict[str, bytes] so it survives restart.

MK2 anti-pattern this rules out: the hand-rolled trade journal pattern where each strategy writes its own pickle file outside the framework. In Nautilus, the cache database is the persistence layer; on_save is the seam.

Backtest harness

The principle that bites: same kernel in backtest and live + replay determinism.

Implications:

We do not write a separate backtest harness. We use BacktestNode (or BacktestEngine for low-level control) and feed the same data shapes our live DataClient would emit.
Catalog data (Parquet) flows through the same DataEngine the live adapter would feed. Strategy code is unchanged.
Replay is bitwise deterministic given the same seed, data, and config. Cortana’s “replay 2026-04-16 chop day” tests become framework-level guarantees, not custom infrastructure.

MK2 anti-pattern this rules out: the parallel paper-vs-live pathway that has drifted multiple times in MK2. In Nautilus, there is no separate path. Drift is impossible by construction.

Reconciliation and broker-truth-first

The principle that bites: idempotent operations + externalized state + fail-fast + continuous reconciliation.

Implications:

Cortana does not write its own broker reconciler. The LiveExecutionEngine reconciles on startup and continuously.
Any Cortana-specific reconciliation (e.g., comparing UW account-flow with our position record) is also idempotent: rerunning it produces the same state.
All execution events persist to the cache database. After a restart the engine rebuilds state from broker reports + cache; Cortana inherits this for free.

MK2 anti-pattern this rules out: the Flex-query reconciler that runs on-demand and may or may not match what the engine thinks. In Nautilus, the reconciler is the engine - there is no parallel “truth” path.

Multi-tenant SaaS (Cody as customer #1)

The principle that bites: one node per process + message immutability across process boundaries.

Per nautilus-architecture.md: “Running multiple TradingNode or BacktestNode instances concurrently in the same process is not supported due to global singleton state.” Implications:

Each tenant gets its own process (a TradingNode instance).
Cross-tenant communication goes through Redis (the persistent MessageBus stream), and because messages are immutable, the in-process and cross-process semantics are identical.
Per-tenant cache databases isolate state.
The orchestrator restarts a tenant’s process on panic = abort without affecting other tenants.

MK2 anti-pattern this rules out: the “one big Python process with many strategies tenanted by tag” pattern. In Nautilus, that’s a process boundary.

Anti-patterns to never port from MK2

The single sentence summary of the MK2 patterns the design principles rule out:

MK2 pattern	Principle it violates	Nautilus replacement
Mutating `ScoreState` dict in-place across subsystems	Message immutability	Publish a new `ScoreUpdate` per change
Parser-inside-strategy (UW JSON parsed in `signal_*.py`)	Ports and adapters	Custom `DataClient` adapter publishes typed `Data`
Inline `if score < threshold` in strategies	Single risk gate, fail-fast	Custom `RiskRule` in `RiskEngine`
Global `cooldown_state` dict shared between subsystems	Clear ownership boundaries	Actor-local state, persisted via `on_save`
Parallel paper/live pathways	Same kernel everywhere	One `BacktestNode` / `TradingNode` shape
Hand-rolled Flex reconciler running on demand	Idempotent reconciliation, externalized state	`LiveExecutionEngine` startup + continuous reconciliation
Pickle-file trade journal per strategy	Externalized state via cache database	`on_save` / `on_load` round-trip
Float-based price arithmetic	Type-safe domain model	`Price`, `Quantity`, `Money`
Unbounded `time.sleep` inside callbacks	Single-threaded core (don’t block the kernel)	Background actor + bus re-entry
Mutating an `OrderFilled` event to “fix” a partial fill	Temporal integrity	The event is a fact; emit a new event for the new fact
`params` dict on a message used as workflow scratchpad	Three ownership rules	Component-owned context keyed by request ID
Capturing log output to assert behavior in tests	Fragile coupling to logger globals (developer guide anti-pattern)	Verify observable behavior on the bus

Open questions for the 2026-05-09 spike

@customdataclass immutability enforcement. Does the Python @customdataclass decorator generate a frozen class (raises FrozenInstanceError on assignment), or does it rely on convention? If convention only, Cortana will need a lint or test to enforce.
Custom Data types and serialization. When Cortana’s custom ScoreUpdate rides the Redis stream for the dashboard, what’s the serialization codec (JSON or MessagePack), and does it preserve Decimal / Price precision?
Risk rule extension API. Is there a public hook to register a custom RiskRule from Python, or do all risk rules live inside the shipped RiskEngine config? (Per concepts: trading-state can be paused at the risk layer; per developer guide: pre-trade checks are central. Confirm extension surface.)
on_save payload size. Cortana’s strategy state could include a 78-column score history buffer. What’s the practical size limit for the dict[str, bytes] returned from on_save?
Component-owned context for high-frequency messages. The third ownership rule says workflow state lives in component-owned context keyed by message ID. For Cortana’s cache of last-N ScoreUpdate per instrument, what’s the canonical pattern - a dict on the actor, or a separate cache subscription?
Background-thread re-entry. When the brain-logger actor writes to ~/brain markdown, it must run off-thread. What’s the recommended pattern for “spawn a worker, write back to bus when done”? Does Nautilus expose a worker pool, or do we use Tokio (Rust) / asyncio.create_task (Python)?

Timeline

2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 7 (developer guide).

CortanaROI Brain

Explorer

nautilus-dev-design-principles

Nautilus Trader Developer-Guide Design Principles

Why this page exists

The canonical principle: message immutability

What “message” covers

The eight protected properties

The three ownership rules (from message_bus.md “Message integrity”)

Implicit principles the developer guide assumes

Single-threaded deterministic core

Cache-then-publish

Crash-only design

Fail-fast on data integrity

Ports and adapters (hexagonal architecture)

Type-safe domain model

Externalized state

Idempotent operations and reconciliation

Component finite state machine

Layered adapter shape

Cortana MK3 implications

Custom data types (`ScoreUpdate`, `MetaProb`, `Regime`, `UWFlowEvent`)

Custom DataClient (UW)

Custom RiskEngine rule

Multiple Actors (scoring, meta-prob, regime, brain logger)

One Strategy (`CortanaStrategy`)

Backtest harness

Reconciliation and broker-truth-first

Multi-tenant SaaS (Cody as customer #1)

Anti-patterns to never port from MK2

Open questions for the 2026-05-09 spike

See Also

Timeline

Graph View

Table of Contents

Backlinks

CortanaROI Brain

Explorer

nautilus-dev-design-principles

Nautilus Trader Developer-Guide Design Principles

Why this page exists

The canonical principle: message immutability

What “message” covers

The eight protected properties

The three ownership rules (from message_bus.md “Message integrity”)

Implicit principles the developer guide assumes

Single-threaded deterministic core

Cache-then-publish

Crash-only design

Fail-fast on data integrity

Ports and adapters (hexagonal architecture)

Type-safe domain model

Externalized state

Idempotent operations and reconciliation

Component finite state machine

Layered adapter shape

Cortana MK3 implications

Custom data types (ScoreUpdate, MetaProb, Regime, UWFlowEvent)

Custom DataClient (UW)

Custom RiskEngine rule

Multiple Actors (scoring, meta-prob, regime, brain logger)

One Strategy (CortanaStrategy)

Backtest harness

Reconciliation and broker-truth-first

Multi-tenant SaaS (Cody as customer #1)

Anti-patterns to never port from MK2

Open questions for the 2026-05-09 spike

See Also

Timeline

Graph View

Table of Contents

Backlinks

Custom data types (`ScoreUpdate`, `MetaProb`, `Regime`, `UWFlowEvent`)

One Strategy (`CortanaStrategy`)