Databento vs UW vs IBKR - Cortana data-feed layering
Decision: for Cortana MK3, augment with Databento, do not replace UW or IBKR. Each vendor occupies a distinct role: IBKR for execution + pricing-of-record, UW for derived options-flow signals, Databento for historical-OPRA replay (and optionally raw live OPRA later). The Nautilus stack ties all three through one
DataEngine→Cache→MessageBuspipeline; only Databento has an OOB Nautilus adapter, UW remains a customDataClient, IBKR is built-in.Tier recommendation: start on 199/mo) only after deciding you want a continuous historical dataset; Plus ($1,399/mo) only if/when going live with Databento for paper-or-real flow. Cortana’s near-term need is one focused historical pull for spike validation - credits cover it.
Role layering (the load-bearing decision)
| Concern | Vendor | Nautilus expression |
|---|---|---|
| Execution + pricing-of-record | IBKR | Built-in InteractiveBrokersExecClient + InteractiveBrokersDataClient |
| Live underlying ticks (SPY) | IBKR | Built-in QuoteTick / Bar from IBKR DataClient |
| Live options flow signals (sweeps/blocks/GEX/charm/vanna/dealer pos) | UW | Custom UWFlowAlert(Data) via custom UWDataClient (sketched in nautilus-integrations.md) |
| Historical OPRA tape for backtest replay | Databento | OOB DatabentoDataLoader → ParquetDataCatalog (per nautilus-options.md § “docs-blessed vendor for replay”) |
| Full L2/L3 options depth for adversarial backtest | Databento (MBP-10 / MBO schemas) | Native Nautilus types via Databento adapter |
| UW alert cross-validation (catch missed/dropped flow) | Databento | Reconstruct raw tape, diff against UW alerts |
The pricing-source-of-truth invariant (feedback_ibkr_pricing_source.md)
is preserved: IBKR remains the only vendor whose price string lands on an
order. Databento is replay + audit; UW is signals only.
Why “augment, not replace”
- Replacing UW with raw OPRA = 3-6 months of derivatives-quant work. UW
pre-derives
flow_score, sweep/block classification, GEX/charm/vanna aggregates, dark pool prints, dealer positioning. Databento gives raw prints. Rebuilding UW’s signal layer is the project, not a side quest. - Replacing IBKR is impossible. Databento is data-only; orders go nowhere through it. (And IBKR is the OOB Nautilus execution adapter.)
- Adding Databento has a clear marginal win. Backtest replay fidelity is
the gating need for the 80% win-rate mandate -
decisions.dbrows can be replayed throughParquetDataCatalog(pernautilus-data.md§ “Replaying today’sdecisions.dbrows as Nautilus events”) and adversarial scenarios (what UW missed, what dealer flow looked like at the climax) require the raw OPRA tape Databento ships.
Pricing tiers (verified 2026-05-07)
| Tier | Price | What’s included | Cortana fit |
|---|---|---|---|
| Free trial | $125 credits, 6 mo | Pay-as-you-go burndown | Use this first - covers spike-step historical pull |
| Usage-based | $/GB (no monthly) | Pay only what you use; OPRA + CME + NYSE + NASDAQ same $/GB rate | Default ongoing tier post-credits - until consumption is known |
| Standard | $199/mo | 12 mo L1 history, 1 mo L2/L3 then PAYG | Worth it once you commit to continuous historical backfills |
| Plus | $1,399/mo (annual) | Standard + entire L1 history + live data | Only if going live with Databento as live tick source |
| Unlimited | $3,500/mo (annual) | Entire history, all schemas | Overkill for solo MK3 |
Notes:
- All exchanges (CME, OPRA, NYSE, NASDAQ) priced uniformly per $/GB - no exchange-tier surcharge on usage-based.
- Live data has separate license-fee pass-through from publishers AND per-message billing on some datasets. Usage-based live is being deprecated 2025-03-31 - live data effectively requires Plus or higher.
- One free-credit set per team.
Spike validation (one-day pull)
Per the nautilus-spike plan, the validation pattern is:
- Sign up, claim $125 credits.
- Pull one day of SPY OPRA in MBP-1 + trades schemas via Databento HTTP API or CLI (target: < 1 GB, well within credit budget).
- Use Nautilus’s
DatabentoDataLoader(pernautilus-tutorials.md§ “Data Catalog with Databento”) to ingest into a localParquetDataCatalog. - Run a
BacktestRunConfigover the catalog with the Cortana stub strategy. ConfirmQuoteTick/TradeTickmaterialize correctly and timestamps align withdecisions.dbrows for the same day.
If the materialization + replay works in <90 min, the Databento path is proven. Defer live subscription decision until backtest value is concrete.
Caveats
- Vendor count creep. Three feeds = three auth surfaces, three rate
limits, three failure modes. Mitigate via Nautilus’s adapter pattern
(one
DataClientper vendor;MessageBustopics keep callsites clean). - OPRA license semantics. OPRA historical data is generally fine for research; live OPRA has redistribution rules. Solo paper-trading use is in-bounds; anything customer-facing later requires reading Databento’s terms.
- Latency reality check. Databento live is ~5–20ms region-dependent vs IBKR ~50–200ms. Only matters if Cortana’s reaction-time budget is feed- bound, not signal-compute-bound. Today: signal compute dominates.
Open threads
- Which Databento schema(s) does Cortana actually need for replay? Likely OPRA Trades (every option print) + MBP-1 (top-of-book on the chain) for starters. MBO (full L3) is overkill until adversarial backtest framework is built. Decide during spike Step 0.5.
- How does UW’s
flow_scorecorrelate with raw-OPRA-derived sweep/block detection? Unknown until cross-validation runs. - Multi-tenant cost model: at 1000 customers, do we share one Databento historical dataset (single team account, internal redistribution) or require each customer to BYO Databento credentials? Affects MK3 SaaS unit economics.
See also
concepts/databento-account-setup.md- operator reference: API key path, schema names, calc-before-pull disciplineconcepts/nautilus-integrations.md- Databento as data-only provider; UW custom-adapter sketch; IBKR OOB adapterconcepts/nautilus-data.md-ParquetDataCatalog, Databento → catalog pipeline, custom-data types for UW/scoringconcepts/nautilus-options.md- Databento as docs-blessed OPRA replay vendorconcepts/nautilus-tutorials.md- “Data Catalog with Databento” tutorial entryconcepts/nautilus-how-to.md- “Loading External Data” + “Data Catalog with Databento” recipes (both 404 as of 2026-05-06; verify on Saturday)~/.claude/projects/.../memory/feedback_ibkr_pricing_source.md- IBKR-pricing-of-record invariant (preserved by this layering)plans/2026-05-09-nautilus-spike.md- Step 0.5 Databento validation pull
Timeline
-
2026-05-07 | Cody - Filed during pre-spike vendor-layering decision; Cody asked “should I sign up for Databento?” Decision: augment, not replace; start on $125 credits + usage-based.
-
2026-05-07 | Cody - Committed: signing up for Databento. Will use $125 free credits to fund Step 0.5 Databento validation Saturday 2026-05-09. Spike playbook at
~/brain/concepts/nautilus-databento.md. Two blocking unknowns to resolve first thing Saturday: (1) confirm OPRA dataset code (OPRA.PILLARis the assumption - rundatabento datasetsto verify), (2) check OPRA venue MIC tagging vs IBKR.ARCAlisting - re-stamp during catalog ingest if mismatched. -
2026-05-07 | Cody - Asked whether Databento’s options + order-book capabilities (0DTE examples, IV estimation, MBP-10/MBO reconstruction) could displace UW. Resolution: yes long-term, no for V1. Decision: stays “backtest only” through M2; Standard tier (1,399/mo for live tape) deferred to M3-M4 transition when Cortana has authored own signal classification on raw OPRA. The dealer-flow/order-book angle is the load-bearing reason to ever subscribe to live Databento - replacing UW’s interpretation (signal classification, dealer aggregates, GEX/charm/vanna) with own-built classifiers is months of derivatives-quant work, not a side quest. Filed as future research direction; do not let it leak into V1 scope.
-
2026-05-07 (late evening) | Cody - Purchased UW data shop bundle for SPY: 10 datasets, $1,531 total, all parquet, all single-ticker SPY (Market Tide is market-wide). Corrects my earlier wrong claim that “UW historical = REST-only and shallow.” Actual UW historical product shape is one-time-priced parquet bundles in the data shop, mostly cheap, with depth varying by dataset.
Inventory (Order ID 8158d271-4582-4&a8-ae50-9bf6033316a):
Dataset URL slug Granularity Window Price Schema highlights Market Tide /data_shop/market_tide1-minute, market-wide 1y $500 date / net_call_premium / net_put_premium / net_volume / timestamp Big Option Trades /data_shop/big_option_tradesper-trade (>$25k or >150 contracts) 1y $180 executed_at / nbbo_bid / nbbo_ask / size / price / option_chain_id / expiry / option_type / open_interest / strike / premium / underlying_price / ewma_nbbo_bid / ewma_nbbo_ask / volume / iv / delta / theta / gamma / vega / rho / theo / market_center_locate / canceled / trade_id / exchange Net Flow (TBD) TBD 1y $500 schema unverified at purchase; inspect parquet Option Chains /data_shop/option_chainsper-strike chain snapshots 1y $180 (assumed: strike, side, expiry, OI, volume, Greeks) Net Flow Holdings /data_shop/net_flow_holdings1-minute SPY-specific 90d only $135 date / net_call_premium / net_put_premium / net_volume / timestamp / underlying_price Gamma Exposure /data_shop/greek_exposure(namedgamma_exposure)daily SPY 1y $10 date / put_gex / call_gex / net_gex / put_call_gex_ratio Delta Exposure (parallel delta_exposureSKU)daily SPY 1y $10 (assumed: date / put_dex / call_dex / net_dex) IV Rank (parallel iv_rankSKU)daily SPY 1y $10 (assumed: date / iv_rank / iv_percentile) OHLC Daily (parallel ohlc_dailySKU)daily SPY 5y $3 OHLCV OHLC Intraday (parallel ohlcSKU)intraday SPY 1y $3 OHLCV Key findings:
-
Big Option Trades is the load-bearing dataset for backtest. Per-trade with NBBO + full Greeks + OI but NOT pre-classified - sweep / block / OTM / opening / aggressor_side / flow_score are derived at backtest time from the raw schema (~150 LOC of derivation logic, validated against MK2’s live
flow-alertslabels at target ≥85% agreement). This is structurally the MK3→MK4 “displace UW interpretation” thread executed early at $180 instead of months of derivatives-quant work to rebuild from raw OPRA. Strategic upside: we own the heuristic and can tune it. -
Pricing pattern: UW prices by (granularity × scope × window). Per-trade > per-minute > daily. Full-market > single-ticker. 1y > 90d > 30d. Daily-aggregated single-ticker products are essentially free (135-500). Per-trade single-ticker is the most expensive but still bounded ($180).
-
Net Flow Holdings only 90d - minute-granularity SPY flow with
underlying_price. Limits backtest window for features that depend on it. Acceptable; load-bearing flow signal for 1Y is Big Option Trades. -
Daily GEX/DEX provide regime classification, NOT intra-day regime gating. MK2’s live
spot-exposuresendpoint is intra-day 1-min; data shop daily product gives only end-of-day snapshots. For V1 backtest, daily GEX is good enough for chop-vs-trend regime classification per day, which IS what the post-stabilization scoring engine review needs. Intra-day GEX SKU may exist; if found, future purchase. -
The MK3→MK4 trajectory shifts. Earlier I framed Databento Plus tier (180 one-time, the displacement happens in the historical bundle: we develop the classification heuristics on this 1Y of UW-pre-filtered raw trades, then either (a) keep buying yearly UW updates (1,399/mo cost has a clearer path-to-payback now.
-