Nautilus Trader Architecture

The Nautilus /concepts/architecture/ page distills the platform into a single-threaded, event-driven kernel (NautilusKernel) that wires six core components - MessageBus, Cache, DataEngine, ExecutionEngine, RiskEngine, plus the Actor/Component traits that strategies extend - into a uniform dispatch graph. The same kernel runs in backtest, sandbox, and live environment contexts; a ports-and-adapters seam lets venues, data sources, risk rules, and strategies plug in without touching the spine. Quality attributes are listed in priority order: reliability, performance, modularity, testability, maintainability, deployability. Crash-only design and a strict fail-fast policy on data integrity are framework-level commitments, not strategy concerns. Background I/O runs on its own threads (Tokio, REST, WebSocket, persistence) but every result lands back on the single-threaded kernel through MPSC channels - so the trading core never has to reason about lock ordering. This page focuses on the runtime topology view; the deeper component-by-component reference lives in nautilus-concepts.md.

Why this page exists

nautilus-concepts.md is the canonical, exhaustive walkthrough of every component (Kernel, MessageBus, Cache, DataEngine, ExecEngine, RiskEngine, Strategy, Orders, Positions, Accounting, Portfolio, Time, Backtest, Live, Adapters, Logging, Persistence, DST). This page is narrower: it covers what the standalone /concepts/architecture/ doc covers - the topology diagram, the runtime dispatch model, the six-step data flow, the five-step execution flow, the live-vs-backtest equivalence claim, and the framework’s layering - without re-deriving the per-component contracts. Read this for the map; read nautilus-concepts.md for the terrain.

Design philosophy

The architecture page opens with a deliberate ordering of quality attributes - “roughly in order of weighting”:

  1. Reliability
  2. Performance
  3. Modularity
  4. Testability
  5. Maintainability
  6. Deployability

This ordering is load-bearing. Reliability before performance means Nautilus will accept synchronous validation in a hot path rather than skip a check. Modularity before testability means components are decoupled enough that test doubles substitute cleanly. Deployability last means ops ergonomics are the last thing to override the others - never the first.

Architectural techniques the framework explicitly leans on:

  • Domain-driven design (DDD). Trading concepts (Order, Position, Account, Instrument) are first-class types, not dictionaries.
  • Event-driven architecture. Components communicate via events on a bus, not through direct method calls.
  • Messaging patterns - Pub/Sub, Req/Rep, point-to-point - built into the bus rather than reinvented per component.
  • Ports and adapters (hexagonal). The kernel defines interfaces; venues and data providers implement them.
  • Crash-only design. Startup and crash recovery share a code path; there is no separate “graceful shutdown” path that exists only on the happy day.

Assurance-driven engineering

“NautilusTrader is incrementally adopting a high-assurance mindset: critical code paths should carry executable invariants that verify behaviour matches the business requirements.”

The practical recipe the doc spells out:

  • Identify components whose failure has the highest blast radius (core domain types, risk and execution flows).
  • Write down their invariants in plain language.
  • Codify them as executable checks (unit, property, fuzz, static asserts) that run in CI.
  • Lean on Rust’s zero-cost safety (ownership, Result, panic = abort); add formal tools only where they pay for themselves.
  • Track “assurance debt” alongside feature work so new integrations extend the safety net rather than bypass it.

This is the architectural posture that makes “crash-only” actually safe - if the invariants are real, panicking on violation is preferable to corrupting state.

Crash-only design

Five principles from the doc:

  1. Unified recovery path - “Startup and crash recovery share the same code path, ensuring it is well-tested.” There is no untested off-ramp that only runs when something goes wrong.
  2. Externalized state - “Critical state is meant to be persisted externally when configured, reducing data-loss risk; durability depends on the backing store.” Redis is the canonical backing store.
  3. Fast restart - designed to come back up quickly, minimizing downtime.
  4. Idempotent operations - operations are safe to retry after restart.
  5. Fail-fast for unrecoverable errors - data corruption or invariant violations terminate immediately rather than continue in a compromised state.

“The system does provide graceful shutdown flows (stop, dispose) for normal operation. These tear down clients, persist state, and flush writers. The crash-only philosophy applies specifically to unrecoverable faults where attempting graceful cleanup could cause further damage.”

In production, panic = abort in release builds is the recommended config so process supervisors can restart cleanly.

Data integrity & fail-fast policy

“NautilusTrader prioritizes data integrity over availability for trading operations. The system employs a strict fail-fast policy for arithmetic operations and data handling to prevent silent data corruption that could lead to incorrect trading decisions.”

Rationale, verbatim: “In trading systems, corrupt data is worse than no data. A single incorrect price, timestamp, or quantity can cascade through the system, resulting in: incorrect position sizing or risk calculations, orders placed at wrong prices, backtests producing misleading results, silent financial losses.”

When fail-fast (panic) applies:

  • Programmer errors (logic bugs, incorrect API usage).
  • Data that violates fundamental invariants (negative timestamps, NaN prices).
  • Arithmetic that would silently produce incorrect results.

When Result/Option applies instead:

  • Expected runtime failures (network errors, file I/O).
  • Business logic validation (order constraints, risk limits).
  • User input validation.
  • Library APIs exposed to downstream crates where callers need explicit error handling without relying on panics for control flow.

Code shape examples from the doc:

// CORRECT: Panics on overflow - prevents data corruption
let total_ns = timestamp1 + timestamp2; // Panics if result > u64::MAX
 
// CORRECT: Rejects NaN during deserialization
let price = serde_json::from_str("NaN"); // Error: "must be finite"
 
// CORRECT: Explicit overflow handling when needed
let total_ns = timestamp1.checked_add(timestamp2)?; // Returns Option<UnixNanos>

Stated benefits: no silent corruption, immediate feedback during dev/test, clean audit trail in crash logs, deterministic behavior given deterministic inputs.

System architecture - the six core components

The architecture diagram places NautilusKernel at the center with five sibling components arranged around it. Each is summarized here at the level the architecture doc describes them; the exhaustive contracts are in nautilus-concepts.md.

NautilusKernel

“The central orchestration component responsible for: initializing and managing all system components, configuring the messaging infrastructure, maintaining environment-specific behaviors, coordinating shared resources and lifecycle management, providing a unified entry point for system operations.”

The kernel is the single object that exists in all three environment contexts - backtest, sandbox, live. That singularity is the mechanical reason backtest and live behave the same: same wiring, same dispatch, same lifecycle.

MessageBus

“The backbone of inter-component communication, implementing: Publish/ Subscribe patterns (for broadcasting events and data to multiple consumers), Request/Response communication (for operations requiring acknowledgment), Command/Event messaging (for triggering actions and notifying state changes), Optional state persistence (using Redis for durability and restart capabilities).”

Three messaging patterns × three message categories (Data, Events, Commands). Bus instances are thread-local; cross-thread delivery uses MPSC channels that funnel to the single-threaded core.

Cache

“High-performance in-memory storage system that: stores instruments, accounts, orders, positions, and more, provides performant fetching capabilities for trading components, maintains consistent state across the system, supports both read and write operations with optimized access patterns.”

The cache-then-publish ordering (Cache write before MessageBus publish for quotes/trades/bars) is what lets handlers safely read “the latest value” inside their callback.

DataEngine

“Processes and routes market data throughout the system: handles multiple data types (quotes, trades, bars, order books, custom data, and more), routes data to appropriate consumers based on subscriptions, manages data flow from external sources to internal components.”

The DataEngine is venue-agnostic; venue-specific work happens upstream in the DataClient adapter.

ExecutionEngine

“Manages order lifecycle and execution: routes trading commands to the appropriate adapter clients, tracks order and position states, coordinates with risk management systems, handles execution reports and fills from venues, handles reconciliation of external execution state.”

Reconciliation is only on the live side - the simulator owns its own truth in backtest.

RiskEngine

“Provides risk management: pre-trade risk checks and validation, position and exposure monitoring, real-time risk calculations, configurable risk rules and limits.”

Sits in front of the ExecutionClient. Strategies don’t validate themselves; the RiskEngine does, centrally.

Environment contexts

Three contexts, same kernel:

  • Backtest - historical data with simulated venues. Deterministic.
  • Sandbox - real-time data with simulated venues (paper-trading staging).
  • Live - real-time data with live venues (paper or real account).

“The platform has been designed to share as much common code between backtest, sandbox and live trading systems as possible. This is formalized in the system subpackage, where you will find the NautilusKernel class, providing a common core system ‘kernel’.”

The doc names the technique that makes this work: ports and adapters. The core defines abstract ports (DataClient, ExecutionClient, RiskRule, …); implementations adapt them to backtest fixtures, sandbox simulators, or live venues.

Data flow - Life of a Quote Tick

Six steps, exactly as the architecture page lays them out:

  1. Adapter receives raw data. “A venue-specific DataClient (e.g. Binance, Bybit) receives a WebSocket message, parses it, and constructs a QuoteTick.”
  2. Adapter sends a data event. “The adapter sends DataEvent::Data(Data::Quote(quote)) through an MPSC channel. In live mode this is an async unbounded channel; in backtests the engine feeds data directly.”
  3. DataEngine processes the event. “The channel receiver routes the event to DataEngine::process_data, which dispatches to handle_quote.”
  4. Cache stores the quote.handle_quote writes the quote into the Cache via cache.add_quote(quote), making it available to any component through self.cache.quote_tick(instrument_id).”
  5. MessageBus publishes. “The engine publishes the quote on a topic derived from the instrument ID (e.g. data.quotes.BINANCE.BTCUSDT-PERP). The MessageBus finds all handlers subscribed to that topic.”
  6. Strategy handler fires. “Each subscribed strategy’s on_quote_tick(quote) runs on the single-threaded kernel. The quote is already in the cache before the handler executes, so self.cache.quote_tick(instrument_id) returns the same quote.”

The cache-then-publish invariant is what makes step 6 safe to reason about.

“For quotes, trades, and bars the cache-then-publish order means your strategy handler can always read the latest value from the cache. Order book deltas and depth snapshots are published directly; book state is maintained separately through BookUpdater subscriptions.”

Execution flow - Life of an Order

Five steps:

  1. Strategy creates a command. self.submit_order(order).
  2. RiskEngine validates. “Pre-trade checks run (position limits, notional limits, order rate). If a check fails the strategy receives OrderDenied and the order never reaches the venue.”
  3. ExecutionEngine routes. “The command is routed to the ExecutionClient for the target venue.”
  4. ExecutionClient submits. “The adapter sends the order to the venue over REST or WebSocket.”
  5. Events flow back. “The venue responds with acknowledgments and fills. Each event (Accepted, Filled, Canceled, Rejected, Expired) flows back through the ExecutionEngine, which updates order state in the Cache and delivers the event to the strategy’s handler. Fill events also trigger position and portfolio updates.”

The asymmetry to internalize: data flows DataClient → DataEngine → Cache → MessageBus → Strategy; orders flow Strategy → RiskEngine → ExecutionEngine → ExecutionClient → Venue, with events bouncing back along the same path in reverse, updating Cache and Portfolio on the way.

Component state management

Components live on a finite state machine.

Stable states:

  • PRE_INITIALIZED - instantiated but not ready.
  • READY - configured, can be started.
  • RUNNING - operating normally.
  • STOPPED - successfully stopped.
  • DEGRADED - degraded; may not meet full spec.
  • FAULTED - shut down due to detected fault.
  • DISPOSED - shut down and released resources.

Transitional states (brief intermediate steps): STARTING, STOPPING, RESUMING, RESETTING, DISPOSING, DEGRADING, FAULTING.

“Transitional states are brief intermediate states that occur during state transitions. Components should not remain in transitional states for extended periods.”

This explicit FSM is what makes restart sequences testable; you can assert “after start(), the component must be RUNNING within N ms.”

Actor vs Component traits

Two distinct traits, each with one job:

Actor trait - Message dispatch:

  • “Provides the handle method for receiving messages dispatched through the actor registry.”
  • “Enables type-safe lookup and message dispatch by actor ID.”
  • “Used by components that need to receive targeted messages (strategies, throttlers).”

Component trait - Lifecycle management:

  • “Manages state transitions (start, stop, reset, dispose).”
  • “Provides registration with the system kernel (register).”
  • “Tracks component state via the finite state machine described above.”
  • “Used by all system components that need lifecycle management.”

“All components can publish and subscribe to messages via the MessageBus directly - this is independent of the Actor trait. The Actor trait specifically enables the registry-based message dispatch pattern where messages are routed to a specific actor by ID.”

Three combinations exist in practice:

  • Actor-only - lightweight message handlers without lifecycle (e.g., Throttler).
  • Component-only - system infrastructure with lifecycle but using direct bus pub/sub (e.g., DataEngine, ExecutionEngine).
  • Both - trading strategies that need lifecycle and targeted dispatch.

A Strategy is the canonical “both” - it is registered, has lifecycle hooks, and receives targeted callbacks like on_quote_tick.

Threading model

The headline:

“Within a node, the kernel consumes and dispatches messages on a single thread. The kernel encompasses: the MessageBus and actor callback dispatch, strategy logic and order management, risk engine checks and execution coordination, cache reads and writes.”

That single-thread guarantee is what gives Nautilus deterministic event ordering and backtest-live parity (modulo live latency).

“This single-threaded core provides deterministic event ordering and helps maintain backtest-live parity, though live inputs and latency can still cause behavioral differences.”

The doc explicitly cites the LMAX disruptor lineage: “Of interest is the LMAX exchange architecture, which achieves award winning performance running on a single thread.” Nautilus’s single-thread bus is that pattern, applied to trading-engine plumbing.

Background services run on separate threads:

  • Network I/O - WebSocket connections, REST clients, async data feeds.
  • Persistence - DataFusion queries and database operations on a multi- threaded Tokio runtime.
  • Adapters - async adapter operations via thread pool executors.

Cross-thread results land back on the kernel via channels:

“These services communicate results back to the kernel via the MessageBus. The bus itself is thread-local, so each thread has its own instance, with cross-thread communication occurring through channels that ultimately deliver events to the single-threaded core.”

Framework organization

Three layers:

Core / Low-Level

  • core - constants, low-level functions, framework-wide primitives.
  • common - common parts for assembling components.
  • network - base networking client components.
  • serialization - base serializers.
  • model - the rich trading domain model.

Components

  • accounting - account types and management.
  • adapters - venue/broker integrations.
  • analysis - trading performance statistics.
  • cache - caching infrastructure.
  • data - data stack and tooling.
  • execution - execution stack.
  • indicators - efficient indicators and analyzers.
  • persistence - storage and cataloging (mainly for backtesting).
  • portfolio - portfolio management.
  • risk - risk components and tooling.
  • trading - trading domain components.

System Implementations

  • backtest - backtest engine and node.
  • live - live engine, clients, and node.
  • system - the core kernel common across environment contexts.

Code structure - Rust crates and Python bindings

“The foundation of the codebase is the crates directory, containing a collection of Rust crates including a C foreign function interface (FFI) generated by cbindgen.”

“The bulk of the production code resides in the nautilus_trader directory, which contains a collection of Python/Cython subpackages and modules.”

How they connect:

“Python bindings for the Rust core are provided by statically linking the Rust libraries to the C extension modules generated by Cython at compile time (effectively extending the CPython API).”

“Both Rust and Cython are build dependencies. The binary wheels produced from a build do not require Rust or Cython to be installed at runtime.”

End users pip install nautilus_trader - no Rust toolchain required. Rust is for contributors and source builds.

Rust crate categories

CategoryCratesPurpose
Foundationcore, model, common, system, tradingPrimitives, domain model, kernel, actor & strategy base
Enginesdata, execution, portfolio, riskCore trading engine components
Infrastructureserialization, network, cryptography, persistenceEncoding, networking, signing, storage
Runtimelive, backtestEnvironment-specific node implementations
Externaladapters/*Venue and data integrations
Bindingspyo3Python bindings

Feature flags

FeatureCratesEffect
streamingdata, system, liveEnables persistence dependency for catalog streaming
cloudpersistenceEnables S3, Azure, GCP, HTTP storage backends
pythonmost cratesEnables PyO3 bindings (auto-enables streaming, cloud)
deficommon, model, dataEnables DeFi/blockchain data types

Type safety

Rust side: “The Rust codebase under crates/ relies on the rustc compiler’s guarantees for safe code. Any unsafe blocks are explicit opt-outs where we must uphold the required invariants ourselves.”

Cython side: “Cython provides type safety at the C level at both compile time, and runtime: If you pass an argument with an invalid type to a Cython implemented module with typed parameters, then you will receive a TypeError at runtime. If a function or method’s parameter is not explicitly typed to accept None, passing None as an argument will result in a ValueError at runtime.”

The doc notes this is not documented in every docstring to avoid bloat - the type-error behavior is a framework-level convention.

What’s pluggable vs. fixed

Fixed (the kernel spine):

  • Single-threaded message dispatch and event ordering.
  • Cache read/write operations.
  • Risk validation pipeline.
  • Component state-machine transitions.

Pluggable via ports and adapters:

  • DataClient implementations (venue-specific data adapters).
  • ExecutionClient implementations (venue-specific order routers).
  • Custom strategies (as Actor / Component implementations).
  • Storage backends (Redis optional persistence).
  • Risk-rule implementations.
  • Custom data types.

Extensible mechanisms:

  • MessageBus pub/sub topics for arbitrary custom events.
  • Component registration with lifecycle hooks.
  • Actor trait for targeted message dispatch.
  • Feature flags (streaming, cloud, defi, python).

The seam is clean: if you can write a DataClient/ExecutionClient (Rust core + PyO3 binding + Python factory - see nautilus-developer-guide.md), you have wired in a venue. If you can write an Actor or Strategy subclass, you have wired in a behavior. The kernel never has to know.

Processes and threads - operational constraints

The hard rule:

“Running multiple TradingNode or BacktestNode instances concurrently in the same process is not supported due to global singleton state.”

Specific singletons:

  • Backtest force-stop flag - _FORCE_STOP is process-global.
  • Logger mode and timestamps - global state; backtests flip between static and real-time modes.
  • Runtime singletons - global Tokio runtime, callback registries, and other OnceLock instances are process-wide.

What is supported:

  • Sequential execution of multiple nodes (one after another with proper disposal between runs) - fully supported and used in the test suite.
  • For production: “add multiple strategies to a single TradingNode within a process. For parallel execution or workload isolation, run each node in its own separate process.”

Production process termination:

“In production deployments, the system is typically configured with panic = abort in release builds, ensuring that any panic results in a clean process termination that can be handled by process supervisors or orchestration systems. This aligns with the crash-only design principle, where unrecoverable errors lead to immediate restart rather than attempting to continue in a potentially corrupted state.”

So: one node per process; multi-strategy per node; orchestrator restarts on panic. This is the deployable shape.

Live-vs-backtest equivalence - the structural argument

The architecture page makes the equivalence claim explicit and explains the mechanism:

“The platform has been designed to share as much common code between backtest, sandbox and live trading systems as possible. This is formalized in the system subpackage, where you will find the NautilusKernel class, providing a common core system ‘kernel’.”

“The ports and adapters architectural style enables modular components to be integrated into the core system, providing various hooks for user-defined or custom component implementations.”

The claim’s components, made concrete:

  1. Same Kernel object in all three environment contexts.
  2. Same MessageBus - same topics, same pub/sub semantics, same dispatch pattern.
  3. Same Cache - same read/write API, same cache-then-publish ordering.
  4. Same Strategy/Actor lifecycle - on_start, on_quote_tick, on_order_filled, etc., trigger the same way.
  5. Same Clock interface - strategies cannot tell whether they are in simulated or wall-clock time (this is documented further in nautilus-concepts.md).
  6. Same OMS/account/position/order semantics - just with a simulated venue swapped for a real one.
  7. Reconciliation only on the live side - because the simulator owns its own truth in backtest.

The architectural punchline: backtest-vs-live divergence is prevented by construction, not by discipline. There is no “research path” and “production path” to drift apart, because both run the same kernel.

Cortana MK3 implications

Concrete mapping from MK2 components to Nautilus boxes, plus what’s gained and given up.

MK2 → Nautilus translation

MK2 componentNautilus equivalent
Hand-rolled Python event loopNautilusKernel (single-threaded core)
Custom pub/sub between modulesMessageBus (typed pub/sub on topics)
cortanaroi/db/* + ad-hoc dictsCache (in-memory + optional Redis)
cortanaroi/data/uw_*.py, ibkr_*.py parsersDataClient adapter (Rust core + PyO3 + Python factory)
cortanaroi/engine/scoring_engine.pyActor subclass publishing custom ScoreUpdate
cortanaroi/engine/position_manager.py (entry/exit logic)Strategy subclass with on_quote_tick + bracket orders
Risk/size guards inline in scoring/PMCentralized RiskEngine rules
Broker reconciler / Flex queriesLiveExecutionEngine reconciliation loop
Backtest harness duplicated from liveBacktestNode running same kernel as live
Telegram alerter, dashboardOut-of-band MessageBus subscribers (Redis)

What we gain

  • Backtest-live parity is structural, not aspirational. MK2’s parallel paper/live pathways have drifted; Nautilus’s same-kernel guarantee removes the class of bug entirely.
  • Cache-then-publish ordering is built-in. The 2026-05-06 stale spy_price cluster is impossible by construction inside Nautilus.
  • Reconciliation is a framework feature. The LiveExecutionEngine does startup and continuous reconciliation against venue truth - exactly what GH #46 / feedback_dual_tp_defense_in_depth are circling toward.
  • Determinism at the kernel level. Single-threaded dispatch + cache-then- publish + Clock abstraction = backtest replays bitwise (within DST scope - see nautilus-concepts.md).
  • Crash-only design. Restart is the only recovery path, and it’s the same code path the system runs every day at startup. MK2’s launchd brittleness goes away.
  • Pluggable venues. IBKR adapter ships; UW becomes a custom adapter; we delete a lot of bespoke glue.
  • Free spec acceptance tests. DataTester / ExecTester matrices validate adapter behavior - we get integration tests we’d never have written ourselves.

What we give up (or have to negotiate)

  • One node per process. MK2’s “many small Python processes talking via files” pattern doesn’t carry over. We’d run a single TradingNode with multiple strategies, plus separate processes for any hard-isolated workloads (training, dashboard).
  • Rust learning curve for adapters. The UW adapter has to be a Rust crate + PyO3 binding to be first-class. Pure-Python adapters are explicitly second-class. This is real work - but the alternative is carrying parsing/normalization in our strategy code forever.
  • Type-strict domain model. Price, Quantity, Money are precision-aware fixed-point; we lose the ability to “just use floats.” This is a feature, not a regression - but it changes how we write arithmetic.
  • Cython on the build path. Even though end-users pip install, the framework’s own internals route Python ↔ Rust through Cython. Pure-Python monkeypatching of internals isn’t a thing.
  • Single-thread discipline. Anything that wants to do work during a callback has to either finish quickly or kick to a background thread and re-enter via the bus. This is a healthier discipline than MK2’s, but it is a discipline.

Open questions for the 2026-05-09 spike

  1. Custom data types - can we publish ScoreUpdate, MetaProb, Regime as first-class Data subclasses on the bus, with ts_event/ts_init semantics, without writing a Rust crate? (@customdataclass exists in Python - see nautilus-concepts.md. Need to confirm scope.)
  2. IBKR adapter completeness - does the shipped interactive_brokers adapter cover 0DTE option chains, combo orders, and the order types we actually use? Or do we end up forking it?
  3. UW adapter shape - UW is REST + WebSocket; the adapter shape is well- defined. Question is engineering cost: how long does a Rust+PyO3 adapter take a non-Rust dev (with Codex pairing) to land?
  4. Backtest determinism for option Greeks - backtests are deterministic over the engine, but option pricing depends on IV surfaces. Where does that come from in backtest, and is it deterministic?
  5. Strategy persistence (on_save / on_load) - does Nautilus’s strategy persistence cover the things our cooldown_state and trade journal would need to survive across restarts?
  6. Multi-tenant isolation - for the SaaS roadmap, “one node per process” means each tenant is a separate process. What’s the ops shape?
  7. Dashboard wire format - Nautilus supports remote MessageBus subscribers over Redis; what’s the actual schema/codec, and can our existing dashboard subscribe without rewriting?
  8. Live reconciliation cadence vs. our “broker truth” mandate - the LiveExecutionEngine reconciles on startup and via a continuous monitoring loop. What’s the loop period? Tunable? Sufficient for our “alert-without-action is P0” invariant (project_pm_ibkr_exit_invariant)?

See Also


Timeline

  • 2026-05-07 | Cody - Filed during pre-spike concept mastery sweep.

Design rationale (per Nautilus author blog 2026-05-21)

Source: https://nautilustrader.io/blog/why-nautilustrader-exists/. Filed because the load-bearing claims here are quotable framework-author language for the spine §6 / Architecture Map §3 design claims MK3 rests on.

The founding conviction: “research and live trading should inhabit one system.” The project exists to eliminate the gap between backtested strategies and production execution by ensuring they run on identical machinery. The NautilusKernel underpins both BacktestEngine and LiveNode - shared message bus, data routing, portfolio, risk, order state machines. Substitution occurs at exactly one boundary: data and execution clients. Below that boundary, identical code.

Single-threaded by design, not by limitation: “the MessageBus explicitly requiring calls from the same thread as the event loop” eliminates race conditions and non-deterministic scheduling. Events sort by ts_event before delivery; clock enforces non-decreasing progression. Tradeoff named explicitly: “no parallel processing within a single node’s kernel. Scaling requires horizontally distributed nodes, not vertical concurrency.” - directly relevant to MK3 M4 multi-tenant: process-per-tenant is the architecturally-blessed pattern.

Rust core + Python surface: hot path (matching, parsing, protocol) compiles to Rust. Strategy logic runs in Python or Rust. “PyO3 overhead is negligible because hot paths never leave compiled code.” Justifies why MK3’s Python Strategy carries no meaningful perf penalty.

Operational realism: fill modeling accounts for queue position, latency (ns resolution), maker/taker fees, L1/L2/L3 book - not naive last-price fills. Material for SPY 0DTE where bid/ask spread is most of the trade cost.

Load-bearing quote (the one to cite when defending the spine §6 backtest-equals-live claim):

“Research-to-live parity means the engine kernel, event ordering, time handling, and execution flow are all shared across both environments. Parity is built into the architecture rather than relying on convention.”

And:

“The strategy that passed backtest validation is the code that goes live.”

These give MK3’s T1 parity gate framework-level structural backing, not just operational discipline.

What the framework implicitly rejects (we do too):

  • Dual-implementation (separate research and production codebases).
  • Non-deterministic event scheduling.
  • Parallel kernels that trade reproducibility for throughput.
  • Naive fill models that ignore liquidity and latency.