Databento vs UW vs IBKR - Cortana data-feed layering

Decision: for Cortana MK3, augment with Databento, do not replace UW or IBKR. Each vendor occupies a distinct role: IBKR for execution + pricing-of-record, UW for derived options-flow signals, Databento for historical-OPRA replay (and optionally raw live OPRA later). The Nautilus stack ties all three through one DataEngineCacheMessageBus pipeline; only Databento has an OOB Nautilus adapter, UW remains a custom DataClient, IBKR is built-in.

Tier recommendation: start on 199/mo) only after deciding you want a continuous historical dataset; Plus ($1,399/mo) only if/when going live with Databento for paper-or-real flow. Cortana’s near-term need is one focused historical pull for spike validation - credits cover it.

Role layering (the load-bearing decision)

ConcernVendorNautilus expression
Execution + pricing-of-recordIBKRBuilt-in InteractiveBrokersExecClient + InteractiveBrokersDataClient
Live underlying ticks (SPY)IBKRBuilt-in QuoteTick / Bar from IBKR DataClient
Live options flow signals (sweeps/blocks/GEX/charm/vanna/dealer pos)UWCustom UWFlowAlert(Data) via custom UWDataClient (sketched in nautilus-integrations.md)
Historical OPRA tape for backtest replayDatabentoOOB DatabentoDataLoaderParquetDataCatalog (per nautilus-options.md § “docs-blessed vendor for replay”)
Full L2/L3 options depth for adversarial backtestDatabento (MBP-10 / MBO schemas)Native Nautilus types via Databento adapter
UW alert cross-validation (catch missed/dropped flow)DatabentoReconstruct raw tape, diff against UW alerts

The pricing-source-of-truth invariant (feedback_ibkr_pricing_source.md) is preserved: IBKR remains the only vendor whose price string lands on an order. Databento is replay + audit; UW is signals only.

Why “augment, not replace”

  • Replacing UW with raw OPRA = 3-6 months of derivatives-quant work. UW pre-derives flow_score, sweep/block classification, GEX/charm/vanna aggregates, dark pool prints, dealer positioning. Databento gives raw prints. Rebuilding UW’s signal layer is the project, not a side quest.
  • Replacing IBKR is impossible. Databento is data-only; orders go nowhere through it. (And IBKR is the OOB Nautilus execution adapter.)
  • Adding Databento has a clear marginal win. Backtest replay fidelity is the gating need for the 80% win-rate mandate - decisions.db rows can be replayed through ParquetDataCatalog (per nautilus-data.md § “Replaying today’s decisions.db rows as Nautilus events”) and adversarial scenarios (what UW missed, what dealer flow looked like at the climax) require the raw OPRA tape Databento ships.

Pricing tiers (verified 2026-05-07)

TierPriceWhat’s includedCortana fit
Free trial$125 credits, 6 moPay-as-you-go burndownUse this first - covers spike-step historical pull
Usage-based$/GB (no monthly)Pay only what you use; OPRA + CME + NYSE + NASDAQ same $/GB rateDefault ongoing tier post-credits - until consumption is known
Standard$199/mo12 mo L1 history, 1 mo L2/L3 then PAYGWorth it once you commit to continuous historical backfills
Plus$1,399/mo (annual)Standard + entire L1 history + live dataOnly if going live with Databento as live tick source
Unlimited$3,500/mo (annual)Entire history, all schemasOverkill for solo MK3

Notes:

  • All exchanges (CME, OPRA, NYSE, NASDAQ) priced uniformly per $/GB - no exchange-tier surcharge on usage-based.
  • Live data has separate license-fee pass-through from publishers AND per-message billing on some datasets. Usage-based live is being deprecated 2025-03-31 - live data effectively requires Plus or higher.
  • One free-credit set per team.

Spike validation (one-day pull)

Per the nautilus-spike plan, the validation pattern is:

  1. Sign up, claim $125 credits.
  2. Pull one day of SPY OPRA in MBP-1 + trades schemas via Databento HTTP API or CLI (target: < 1 GB, well within credit budget).
  3. Use Nautilus’s DatabentoDataLoader (per nautilus-tutorials.md § “Data Catalog with Databento”) to ingest into a local ParquetDataCatalog.
  4. Run a BacktestRunConfig over the catalog with the Cortana stub strategy. Confirm QuoteTick/TradeTick materialize correctly and timestamps align with decisions.db rows for the same day.

If the materialization + replay works in <90 min, the Databento path is proven. Defer live subscription decision until backtest value is concrete.

Caveats

  • Vendor count creep. Three feeds = three auth surfaces, three rate limits, three failure modes. Mitigate via Nautilus’s adapter pattern (one DataClient per vendor; MessageBus topics keep callsites clean).
  • OPRA license semantics. OPRA historical data is generally fine for research; live OPRA has redistribution rules. Solo paper-trading use is in-bounds; anything customer-facing later requires reading Databento’s terms.
  • Latency reality check. Databento live is ~5–20ms region-dependent vs IBKR ~50–200ms. Only matters if Cortana’s reaction-time budget is feed- bound, not signal-compute-bound. Today: signal compute dominates.

Open threads

  • Which Databento schema(s) does Cortana actually need for replay? Likely OPRA Trades (every option print) + MBP-1 (top-of-book on the chain) for starters. MBO (full L3) is overkill until adversarial backtest framework is built. Decide during spike Step 0.5.
  • How does UW’s flow_score correlate with raw-OPRA-derived sweep/block detection? Unknown until cross-validation runs.
  • Multi-tenant cost model: at 1000 customers, do we share one Databento historical dataset (single team account, internal redistribution) or require each customer to BYO Databento credentials? Affects MK3 SaaS unit economics.

See also

  • concepts/databento-account-setup.md - operator reference: API key path, schema names, calc-before-pull discipline
  • concepts/nautilus-integrations.md - Databento as data-only provider; UW custom-adapter sketch; IBKR OOB adapter
  • concepts/nautilus-data.md - ParquetDataCatalog, Databento → catalog pipeline, custom-data types for UW/scoring
  • concepts/nautilus-options.md - Databento as docs-blessed OPRA replay vendor
  • concepts/nautilus-tutorials.md - “Data Catalog with Databento” tutorial entry
  • concepts/nautilus-how-to.md - “Loading External Data” + “Data Catalog with Databento” recipes (both 404 as of 2026-05-06; verify on Saturday)
  • ~/.claude/projects/.../memory/feedback_ibkr_pricing_source.md - IBKR-pricing-of-record invariant (preserved by this layering)
  • plans/2026-05-09-nautilus-spike.md - Step 0.5 Databento validation pull

Timeline

  • 2026-05-07 | Cody - Filed during pre-spike vendor-layering decision; Cody asked “should I sign up for Databento?” Decision: augment, not replace; start on $125 credits + usage-based.

  • 2026-05-07 | Cody - Committed: signing up for Databento. Will use $125 free credits to fund Step 0.5 Databento validation Saturday 2026-05-09. Spike playbook at ~/brain/concepts/nautilus-databento.md. Two blocking unknowns to resolve first thing Saturday: (1) confirm OPRA dataset code (OPRA.PILLAR is the assumption - run databento datasets to verify), (2) check OPRA venue MIC tagging vs IBKR .ARCA listing - re-stamp during catalog ingest if mismatched.

  • 2026-05-07 | Cody - Asked whether Databento’s options + order-book capabilities (0DTE examples, IV estimation, MBP-10/MBO reconstruction) could displace UW. Resolution: yes long-term, no for V1. Decision: stays “backtest only” through M2; Standard tier (1,399/mo for live tape) deferred to M3-M4 transition when Cortana has authored own signal classification on raw OPRA. The dealer-flow/order-book angle is the load-bearing reason to ever subscribe to live Databento - replacing UW’s interpretation (signal classification, dealer aggregates, GEX/charm/vanna) with own-built classifiers is months of derivatives-quant work, not a side quest. Filed as future research direction; do not let it leak into V1 scope.

  • 2026-05-07 (late evening) | Cody - Purchased UW data shop bundle for SPY: 10 datasets, $1,531 total, all parquet, all single-ticker SPY (Market Tide is market-wide). Corrects my earlier wrong claim that “UW historical = REST-only and shallow.” Actual UW historical product shape is one-time-priced parquet bundles in the data shop, mostly cheap, with depth varying by dataset.

    Inventory (Order ID 8158d271-4582-4&a8-ae50-9bf6033316a):

    DatasetURL slugGranularityWindowPriceSchema highlights
    Market Tide/data_shop/market_tide1-minute, market-wide1y$500date / net_call_premium / net_put_premium / net_volume / timestamp
    Big Option Trades/data_shop/big_option_tradesper-trade (>$25k or >150 contracts)1y$180executed_at / nbbo_bid / nbbo_ask / size / price / option_chain_id / expiry / option_type / open_interest / strike / premium / underlying_price / ewma_nbbo_bid / ewma_nbbo_ask / volume / iv / delta / theta / gamma / vega / rho / theo / market_center_locate / canceled / trade_id / exchange
    Net Flow(TBD)TBD1y$500schema unverified at purchase; inspect parquet
    Option Chains/data_shop/option_chainsper-strike chain snapshots1y$180(assumed: strike, side, expiry, OI, volume, Greeks)
    Net Flow Holdings/data_shop/net_flow_holdings1-minute SPY-specific90d only$135date / net_call_premium / net_put_premium / net_volume / timestamp / underlying_price
    Gamma Exposure/data_shop/greek_exposure (named gamma_exposure)daily SPY1y$10date / put_gex / call_gex / net_gex / put_call_gex_ratio
    Delta Exposure(parallel delta_exposure SKU)daily SPY1y$10(assumed: date / put_dex / call_dex / net_dex)
    IV Rank(parallel iv_rank SKU)daily SPY1y$10(assumed: date / iv_rank / iv_percentile)
    OHLC Daily(parallel ohlc_daily SKU)daily SPY5y$3OHLCV
    OHLC Intraday(parallel ohlc SKU)intraday SPY1y$3OHLCV

    Key findings:

    1. Big Option Trades is the load-bearing dataset for backtest. Per-trade with NBBO + full Greeks + OI but NOT pre-classified - sweep / block / OTM / opening / aggressor_side / flow_score are derived at backtest time from the raw schema (~150 LOC of derivation logic, validated against MK2’s live flow-alerts labels at target ≥85% agreement). This is structurally the MK3→MK4 “displace UW interpretation” thread executed early at $180 instead of months of derivatives-quant work to rebuild from raw OPRA. Strategic upside: we own the heuristic and can tune it.

    2. Pricing pattern: UW prices by (granularity × scope × window). Per-trade > per-minute > daily. Full-market > single-ticker. 1y > 90d > 30d. Daily-aggregated single-ticker products are essentially free (135-500). Per-trade single-ticker is the most expensive but still bounded ($180).

    3. Net Flow Holdings only 90d - minute-granularity SPY flow with underlying_price. Limits backtest window for features that depend on it. Acceptable; load-bearing flow signal for 1Y is Big Option Trades.

    4. Daily GEX/DEX provide regime classification, NOT intra-day regime gating. MK2’s live spot-exposures endpoint is intra-day 1-min; data shop daily product gives only end-of-day snapshots. For V1 backtest, daily GEX is good enough for chop-vs-trend regime classification per day, which IS what the post-stabilization scoring engine review needs. Intra-day GEX SKU may exist; if found, future purchase.

    5. The MK3→MK4 trajectory shifts. Earlier I framed Databento Plus tier (180 one-time, the displacement happens in the historical bundle: we develop the classification heuristics on this 1Y of UW-pre-filtered raw trades, then either (a) keep buying yearly UW updates (1,399/mo cost has a clearer path-to-payback now.