Nautilus Reports

Nautilus’s ReportProvider produces structured pandas DataFrames from the orders, fills, positions, and account-state graph held in the Cache. Five first-class report types (orders, order_fills, fills, positions, account) share a uniform “static method on ReportProvider + Trader helper wrapper” shape; the same code runs in backtest, sandbox, and live. Performance analytics live next door on Portfolio.analyzer - three stat buckets (pnls, returns, general) plus pluggable PortfolioStatistic classes. Visualization plugs in via nautilus_trader.analysis.create_tearsheet (Plotly HTML - equity curve, drawdown, monthly heatmap, returns distribution, stats table) and lower-level create_equity_curve. Reports are runtime-DataFrame-only - there is no native Brier score, AUC, or classification metric, so Cortana’s meta-model evaluation must be a custom PortfolioStatistic (or post-hoc Parquet pipeline) reading the same DataFrames. The cleanest persistence path for downstream ML training is report_df.to_parquet(...) after each backtest run, not the live ParquetDataCatalog (which is for market data, not analytics rows).

Why this page exists

Cortana MK3 milestone M2 requires “Nautilus tearsheet shows backtest of last 4 weeks with comparable Brier score, AUC, win rate to MK2” and “Daily MK2/MK3 decision diff < 5%.” Both deliverables sit on top of the report layer. This page is the API+config saturation reference for the report surface so M2 deliverables are unblocked the moment M1 finishes.

Core claim

“The ReportProvider class in NautilusTrader generates structured analytical reports from trading data, transforming raw orders, fills, positions, and account states into pandas DataFrames for analysis and visualization.”

One provider class, five static methods, two invocation paths (Trader helper or direct), pandas DataFrames as the universal output shape. Backtest and live use the same provider - same call from the same data source (the Cache). This is one of the seven backtest-live parity points (cf. nautilus-architecture.md).

Report taxonomy - the five built-ins

1. Orders report

Full view of every order regardless of status.

# Trader helper (recommended)
orders_report = trader.generate_orders_report()
 
# ReportProvider directly
from nautilus_trader.analysis import ReportProvider
orders = cache.orders()
orders_report = ReportProvider.generate_orders_report(orders)

Indexed by client_order_id. Key columns: instrument_id, strategy_id, trader_id, account_id, venue_order_id, side, type, status, quantity (str), filled_qty (str), price, avg_px, time_in_force, ts_init (Unix nanos), ts_last (Unix nanos). Type-conditional columns: trigger_price for stops, expire_time for GTD. Source of truth for column list: Order.to_dict().

Use case: “Did we attempt to enter? With what params? When?” - the audit row for every entry intent that became an order.

2. Order fills report

One row per order with at least one fill (filtered subset of the orders report).

fills_report = trader.generate_order_fills_report()

Same column shape as the orders report, but ts_init and ts_last are converted to datetime objects for easier analysis. Filter applied: filled_qty > 0.

Use case: “Of the orders we sent, which actually executed?” - first filter for win-rate calculations.

3. Fills report

One row per fill event (an order can produce multiple rows if partially filled across executions).

fills_report = trader.generate_fills_report()

Indexed by client_order_id. Key columns: trade_id, venue_order_id, instrument_id, strategy_id, account_id, position_id, order_side, order_type, last_px (str), last_qty (str), currency, liquidity_side (MAKER / TAKER), commission, ts_event (datetime), ts_init (datetime). Source: OrderFilled.to_dict().

Use case: precise commission and slippage attribution - fills are the atomic unit, not orders.

4. Positions report

Position analysis including historical snapshots (NETTING OMS only; HEDGING uses unique IDs and never reopens).

# Trader helper auto-includes snapshots for NETTING OMS
positions_report = trader.generate_positions_report()
 
# Direct path requires explicit snapshot pass-in
positions = cache.positions()
snapshots = cache.position_snapshots()
positions_report = ReportProvider.generate_positions_report(
    positions=positions,
    snapshots=snapshots,
)

Indexed by position_id. Key columns: instrument_id, strategy_id, trader_id, account_id, opening_order_id, closing_order_id, entry (BUY/SELL), side (LONG/SHORT/FLAT), quantity, peak_qty, avg_px_open, avg_px_close, commissions (list), realized_pnl, realized_return, ts_init, ts_opened, ts_last, ts_closed, duration_ns, is_snapshot (bool).

Snapshot caveat (load-bearing for NETTING OMS): “Always include snapshots in reports for accurate total PnL calculation. In HEDGING OMS, snapshots are not used since each position has a unique ID and is never reopened.” For Cortana (NETTING on IBKR), the Trader helper path is the correct default - direct ReportProvider calls require explicit snapshot pass-in or PnL is wrong.

Use case: win-rate, hold-time distribution, realized-PnL per trade - the canonical row class for “how did each trade perform.”

5. Account report

Balance and margin time-series per venue.

from nautilus_trader.model.identifiers import Venue
venue = Venue("INTERACTIVE_BROKERS")
account_report = trader.generate_account_report(venue)

Indexed by ts_event (timestamp of state change). Key columns: account_id, account_type (SPOT, MARGIN, …), base_currency, total (str), free (str), locked (str), currency, reported (bool - venue-reported vs computed), margins (list), info.

Multi-currency accounts produce multiple rows per state event (one per currency).

Use case: equity curve, margin utilization, currency exposure.

Invocation API - three contexts

Post-backtest

engine.run(start=start_time, end=end_time)
orders_report = engine.trader.generate_orders_report()
positions_report = engine.trader.generate_positions_report()
fills_report = engine.trader.generate_fills_report()

The BacktestEngine exposes its trader after run(). Reports are read from the engine’s Cache, which holds the entire run’s state in memory.

Post-live snapshot

Identical API on the live TradingNode’s trader:

node.trader.generate_orders_report()
node.trader.generate_positions_report()
node.trader.generate_account_report(venue)

Cache is the source. With Redis-backed cache, this works across restarts (per nautilus-cache.md).

In-flight queries (live)

Reports can be generated periodically by an Actor:

import pandas as pd
 
class ReportingActor(Actor):
    def on_start(self):
        self.clock.set_timer(
            name="generate_reports",
            interval=pd.Timedelta(minutes=30),
            callback=self.generate_reports,
        )
 
    def generate_reports(self, event):
        positions_report = self.trader.generate_positions_report()
        positions_report.to_csv(f"positions_{event.ts_event}.csv")

Use cases: end-of-day rollup, periodic CSV/Parquet export, live dashboard refresh.

Output formats - what comes out

FormatHowNotes
pd.DataFrameNative return type of every reportThe universal interchange shape
CSVdf.to_csv(path)Lossy on Decimal types (becomes string) - fine if you keep quantity / price columns string-typed
Parquetdf.to_parquet(path)Recommended for downstream ML. Preserves dtype, columnar, fast
JSONdf.to_json(path)Round-trips cleanly with orient='records'; verbose
HTML tearsheetcreate_tearsheet(engine, output_path=...)Plotly interactive HTML, equity curve + drawdown + monthly heatmap + stats table + returns distribution
Individual Plotly figurescreate_equity_curve(returns).show() (browser) or .write_image('png') (requires kaleido)

The DataFrame is the seam. Every export format is downstream of df.

Visualization - tearsheets

from nautilus_trader.analysis import create_tearsheet
 
engine.run()
create_tearsheet(engine, output_path="tearsheet.html")

Contents (per docs verbatim):

  • Equity curve
  • Drawdown analysis
  • Monthly returns heatmap
  • Performance statistics table
  • Returns distribution

Lower-level Plotly figures available individually:

from nautilus_trader.analysis import create_equity_curve
 
returns = engine.portfolio.analyzer.returns()
fig = create_equity_curve(returns, title="Cortana MK3 - May 2026")
fig.show()
fig.write_image("equity.png")

Install: uv pip install "nautilus_trader[visualization]".

For Cortana M2: create_tearsheet is the literal “tearsheet” referenced in the M2 deliverable. Win-rate and PnL stats appear on the stats table; Brier and AUC do NOT (see Limitations).

Portfolio statistics - the analytics layer next door

portfolio = engine.portfolio
stats_pnls = portfolio.analyzer.get_performance_stats_pnls()
stats_returns = portfolio.analyzer.get_performance_stats_returns()
stats_general = portfolio.analyzer.get_performance_stats_general()

Three buckets, each a dict-like:

  • PnLs - PnL (total), per-currency PnL breakdown
  • Returns - Sharpe Ratio (252 days), Sortino Ratio, Max Drawdown, return distribution moments
  • General - Win Rate, Profit Factor, Avg Winner, Avg Loser, position counts

The doc’s example aggregation:

results = {
    "total_positions": len(positions_closed),
    "pnl_total": stats_pnls.get("PnL (total)"),
    "sharpe_ratio": stats_returns.get("Sharpe Ratio (252 days)"),
    "profit_factor": stats_general.get("Profit Factor"),
    "win_rate": stats_general.get("Win Rate"),
}

Win Rate is the literal field that satisfies the M2 “comparable win rate” deliverable. Brier and AUC are not in any of the three buckets; the parallel nautilus-portfolio.md page (when filed) covers PortfolioStatistic registration as the customization seam.

Customization - PortfolioStatistic and beyond

Per the doc: “For detailed information about available statistics and creating custom metrics, see the Portfolio guide.” Registration pattern (forward-referenced, source canonical):

from nautilus_trader.analysis.statistic import PortfolioStatistic
 
class BrierScore(PortfolioStatistic):
    def name(self): return "Brier Score"
    def calculate_from_returns(self, returns): ...
    # Or calculate_from_orders / from_positions for non-return inputs
 
portfolio.analyzer.register_statistic(BrierScore())
stats = portfolio.analyzer.get_performance_stats_general()
# Now includes "Brier Score" if calculate_from_X matched the bucket

Three bucket-specific calculation methods (from the docs canon): calculate_from_returns(...), calculate_from_orders(...), calculate_from_positions(...). The bucket determines which input the statistic receives. Brier and AUC need predicted probabilities paired with realized outcomes - neither is exposed by the built-in inputs. This is the open question for Cortana customization (see Limitations).

PnL accounting - the load-bearing details

Three considerations the doc calls out:

Position-based PnL

  • Realized PnL computed when positions are partially or fully closed.
  • Unrealized PnL marked-to-market against current price.
  • Commission impact included only when in settlement currency.

Multi-currency accounting

  • Each position tracks PnL in its settlement currency.
  • Portfolio aggregation requires user-provided exchange rates.
  • Commission currencies may differ from settlement.

For Cortana (USD throughout - SPY options, IBKR USD account), this collapses to the single-currency case. No conversion math.

Snapshot considerations (NETTING OMS - Cortana’s case)

from nautilus_trader.model.objects import Money
 
pnl_by_currency = {}
for position in cache.positions(instrument_id=instrument_id):
    if position.realized_pnl:
        currency = position.realized_pnl.currency
        pnl_by_currency.setdefault(currency, 0.0)
        pnl_by_currency[currency] += position.realized_pnl.as_double()
 
for snapshot in cache.position_snapshots(instrument_id=instrument_id):
    if snapshot.realized_pnl:
        currency = snapshot.realized_pnl.currency
        pnl_by_currency.setdefault(currency, 0.0)
        pnl_by_currency[currency] += snapshot.realized_pnl.as_double()
 
total_pnls = [Money(amount, currency)
              for currency, amount in pnl_by_currency.items()]

If you query positions and forget snapshots in NETTING OMS, you undercount realized PnL for any instrument that has reopened. The Trader helper does this for you; the ReportProvider direct path does not.

Backtest vs live - uniformity

The docs claim uniformity verbatim: “Reports provide consistent analytics across both backtesting and live trading environments, enabling reliable performance evaluation and strategy comparison.”

AspectBacktestLive
Data sourceengine.cachenode.cache (in-memory, optionally Redis-backed)
Snapshot inclusionTrader helper auto-includesSame - Trader helper auto-includes
Reconciliation impactNoneReconciliation events flow through Cache; reports reflect them
PerformanceSub-second on backtest cacheO(orders + positions) on live cache; Redis-backed cache may add network latency

For Cortana M2: the same code that produces the backtest tearsheet runs against the live MK3 paper session. Decision-diff harness (next section) exploits this.

Cortana MK3 implications

M2 deliverable: “Nautilus tearsheet shows backtest of last 4 weeks with comparable Brier score, AUC, win rate to MK2”

Three sub-deliverables, three answers:

  1. Win rate - first-class. stats_general["Win Rate"]. Done by running create_tearsheet(engine) and reading the stats table.
  2. Brier score - not built in. Custom PortfolioStatistic subclass implementing calculate_from_X where X is the bucket carrying the meta-model’s predicted probabilities. Predicted probabilities are NOT in the built-in inputs, so this requires either (a) emitting a custom WinProbEstimate event from the Strategy at decision time and persisting predicted-prob alongside the realized outcome, then computing Brier in a post-hoc pandas pipeline reading the Parquet’d events; or (b) attaching predicted prob to the order’s tags field and reading it in a custom calculate_from_orders statistic. (a) is cleaner - the event stream is already the audit trail per nautilus-events.md.
  3. AUC - same shape as Brier. Custom statistic OR post-hoc pipeline joining WinProbEstimate events to position-close outcomes.

Recommendation: post-hoc pandas pipeline. Brier and AUC are diagnostic, not real-time, so adding a PortfolioStatistic adds runtime cost without runtime benefit. Build them as a separate mk3_eval.py module that reads the Parquet event stream + positions report.

M2 deliverable: “Daily MK2/MK3 decision diff < 5%”

The decision-diff harness pattern:

# Run identical inputs through both engines
mk2_decisions = mk2.run(date)        # → list of (ts, action, contract)
mk3_decisions = mk3.run(date)        # → list of (ts, action, contract)
 
# Generate MK3 reports
mk3_orders = mk3.trader.generate_orders_report()
mk3_positions = mk3.trader.generate_positions_report()
 
# Diff
diff_df = compare_decisions(mk2_decisions, mk3_orders)
diff_pct = diff_df["disagreement"].mean()
assert diff_pct < 0.05

The MK3 side comes free from generate_orders_report(). The MK2 side needs a parallel adapter (the existing decisions.db queries already produce something close - port the SELECT into a pandas DataFrame with the same column shape as the Nautilus orders report). The harness becomes a simple pandas inner-join on (ts_init, instrument_id) with a side/status equality check.

Parquet export for downstream ML training

The cleanest path:

# Post-backtest
orders_report = engine.trader.generate_orders_report()
positions_report = engine.trader.generate_positions_report()
fills_report = engine.trader.generate_fills_report()
 
orders_report.to_parquet(f"runs/{run_id}/orders.parquet")
positions_report.to_parquet(f"runs/{run_id}/positions.parquet")
fills_report.to_parquet(f"runs/{run_id}/fills.parquet")

Plus the custom event stream (per nautilus-events.md):

# A logging Actor subscribes to all custom events and writes Parquet
class AuditLogger(Actor):
    def on_event(self, event):
        if isinstance(event, (ScoreUpdate, GateDecision, WinProbEstimate)):
            self._sink.append_parquet(event)

ML training joins these by (ts_event, position_id) to assemble a row class of (features, predicted_prob, realized_outcome, realized_pnl). That row class is the meta-model training set.

Do NOT use ParquetDataCatalog for this. That catalog is for market data (quotes, trades, bars) - it has its own schema conventions ({start_timestamp}_{end_timestamp}.parquet, type-and-instrument partitioning). Analytics rows go to a separate filesystem path (runs/{run_id}/...) so they don’t pollute the catalog and so they can be regenerated without touching market data.

What’s NOT in MK3 reports (deferred)

  • Customer-facing tearsheet UI - M5 deliverable; reuses the Plotly HTML output in an iframe.
  • Per-tenant report partitioning - M4 deliverable; each TradingNode produces its own reports per the per-tenant cache scoping in nautilus-cache.md.
  • Streaming reports to a database - out of scope; the doc explicitly says “Reports are generated from in-memory data structures. For large-scale analysis or long-running systems, consider persisting reports to a database for efficient querying.” Cortana persists to Parquet, not Postgres.

Limitations and gotchas

  • No native Brier / AUC / log-loss / classification metrics. The built-in stats are PnL- and return-centric. Probability-vs-outcome metrics require custom statistics or a post-hoc pipeline. This is the single most important MK3 finding from this page.
  • Snapshot omission silently undercounts realized PnL in NETTING OMS. Use the Trader helper unless you have a specific reason to call ReportProvider directly.
  • quantity, filled_qty, price, last_px, last_qty, total, free, locked are strings, not floats. Preserves Decimal precision; requires explicit cast for math.
  • ts_init and ts_last types differ across reports. Orders report keeps Unix nanoseconds. Order-fills report and fills report convert to datetime. Positions report mixes (ts_init is nanos, ts_opened / ts_closed are datetimes). Reconcile types before cross-report joins.
  • create_tearsheet requires the [visualization] extras (Plotly
    • scipy). PNG export additionally requires kaleido.
  • Reports read from in-memory Cache. With Redis-backed live cache, reports reflect Redis state on demand - but if Redis is down or partially populated post-restart, reports are partial. Reconcile before reporting.
  • HEDGING OMS does not use snapshots. Cortana on IBKR is NETTING so this doesn’t apply, but note for any future market that uses HEDGING (some FX prime brokers, some crypto venues).
  • PositionStatusReport from adapters is NOT a report in this sense - that’s a reconciliation report variant for venue → engine state alignment (per nautilus-execution.md). Different concept, same word.
  • Multi-currency aggregation requires user-provided exchange rates
    • Nautilus does not bundle a forex rate source. Cortana’s USD-throughout architecture sidesteps this.

Open questions for the 2026-05-09 spike

  1. PortfolioStatistic calculation method signature. Source-read nautilus_trader/analysis/statistic.py to confirm whether calculate_from_X methods receive the raw input or a pre-aggregated shape, and whether they can opt into multiple buckets at once.
  2. Custom event → DataFrame conversion. Confirm whether engine.cache exposes custom events for report-style retrieval, or whether the audit-logger Actor + Parquet sink is the only path.
  3. In-flight tearsheet generation for live. Can create_tearsheet accept a live TradingNode’s trader, or is it backtest-engine-only? Affects whether the M3 dashboard can render a live tearsheet.
  4. Redis-backed cache report latency. With 1k-5k orders in cache and Redis on a separate host, what’s the round-trip time for generate_orders_report()? Affects whether to persist reports periodically vs generate-on-demand.
  5. Per-tenant report partitioning. Confirm reports are scoped to the local TradingNode’s cache automatically (they should be - one cache per node - but verify before M4).

See Also

  • Nautilus Portfolio (parallel - analyzer, performance stat buckets, PortfolioStatistic registration, the customization seam for Brier/AUC)
  • Nautilus Events - custom-event taxonomy that carries WinProbEstimate / ScoreUpdate / GateDecision for downstream-ML training rows
  • Nautilus Execution - order/fill/position events that the reports flatten into DataFrames; reconciliation report variants (different “report” - same word)
  • Nautilus Cache - the data source for every report; in-memory vs Redis-backed; per-tenant scoping
  • Nautilus Tutorials - Quickstart and Backtest (High-Level) tutorials that exercise the report API
  • 2026-05-09 Nautilus Spike Plan: ~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md
  • MK3 Roadmap: ~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-mk3-roadmap.md
  • project_codex_review_p2s.md - meta-prob evaluation context (Brier/AUC is how we’ll compare MK3 meta-gate quality vs MK2)

Timeline

2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 3.