Databento account setup - Cortana operator reference
One-page operator reference for the Databento account that funds Cortana MK3’s historical-OPRA replay (per
databento-vs-uw-vs-ibkr-data-feeds.md). Captures sign-up shape, credential storage, schema names you’ll actually request, and the calc-before-pull discipline that keeps the $125 trial credits from disappearing on a fat-finger.
Account shape
- Plan: free tier - $125 credits, 6-month expiration, one set per team.
- Tier ladder (don’t auto-upgrade): Usage-based → Standard (1,399/mo annual) → Unlimited ($3,500/mo annual). See
databento-vs-uw-vs-ibkr-data-feeds.md§ “Pricing tiers” for the upgrade trigger map. - Exchange agreements: OPRA requires a click-through agreement on sign-up. Personal/research use is in-bounds; redistribution (anything customer-facing under an MK3 SaaS) needs a separate license.
Credential storage
- API key lives in the account dashboard. Generate one named
cortana-mk3so the audit trail is obvious. - Storage path:
~/.config/cortana/databento.env- single lineDATABENTO_API_KEY=.... Never commit, never put in repo.env. - Loader pattern:
source ~/.config/cortana/databento.envfrom launchd preflight or shell rc, mirroring the IBKR / UW key pattern. - Rotation: treat like any vendor key - rotate if a workspace is
archived to iCloud (per
project_data_loss_april22.md) or shared with another machine.
Python SDK
# Direct SDK (for ad-hoc pulls / spike validation)
pip install databento
# Nautilus adapter (brings the SDK in transitively as a dep)
uv pip install "nautilus_trader[databento]"The Nautilus adapter wraps the SDK; once nautilus_trader is installed with
the databento extra, ad-hoc SDK calls and Nautilus DatabentoDataLoader
ingest both work from the same env.
Schemas Cortana will actually use
Don’t request anything else during the spike - every additional schema burns credits.
| Schema | What it is | Cortana use |
|---|---|---|
trades | Every option print (executions) | UW alert cross-validation; raw flow reconstruction |
mbp-1 | Top-of-book quote on the chain (best bid/ask + size) | Adverse-selection check for replay; quote at decision time |
mbp-10 | 10-deep book on each side | Adversarial backtest only - defer until v2 |
mbo | Full L3 market-by-order | Overkill - defer indefinitely unless we want HFT-grade replay |
ohlcv-1s / ohlcv-1m | Aggregated bars | Skip - Nautilus aggregates from trades if needed |
Default spike pull: SPY (or SPY chain) - trades + mbp-1, single
trading day, OPRA dataset.
Calc-before-pull discipline (load-bearing)
Always estimate cost before kicking off a pull. The $125 trial budget disappears fast if you accidentally request a month of MBO across the whole chain.
# Cost preview pattern (Databento HTTP client)
from databento import Historical
client = Historical(key=os.environ["DATABENTO_API_KEY"])
cost = client.metadata.get_cost(
dataset="OPRA.PILLAR",
schema="trades",
symbols=["SPY"],
stype_in="parent", # match parent symbol → all option contracts
start="2026-05-06T13:30:00Z",
end="2026-05-06T20:00:00Z",
)
print(f"Estimated cost: ${cost:.2f}")
# ABORT if > $30 for the spike validationHard rules:
- Always call
get_cost()(or the equivalent CLI dry-run) first. - Never request more than 1 trading day for the spike.
- Hard cap: abort any single pull >$30. Re-scope and try again.
- Use
stype_in="parent"to symbol-resolve the whole option chain at once instead of enumerating every strike.
Identifiers / publisher conventions
- Dataset code:
OPRA.PILLARfor US equity options (the consolidated options tape). - Symbology: Databento ships several
stype_inmodes -raw_symbol(OPRA underlying option symbol),instrument_id(Databento numeric ID),parent(symbol-resolves to whole chain). The spike usesparentforSPY. - Nautilus instrument-id mapping:
SPY.ARCA(pernautilus-data.md) on the Nautilus side. Databento’s adapter handles the translation; if things look misaligned in the catalog, this is where to look. - Timestamps: Databento ships nanosecond UNIX epoch - matches Nautilus
ts_eventnatively. No* 1e6adjustment needed (unlike UW’s millisecond timestamps).
Live data caveat
- Usage-based live data is being deprecated 2025-03-31 for most datasets.
- Live OPRA effectively requires Plus ($1,399/mo annual).
- Do not turn on live during the spike. Historical-only is the validated path.
Quick smoke test (before any real pull)
source ~/.config/cortana/databento.env
python -c "
import os
from databento import Historical
c = Historical(key=os.environ['DATABENTO_API_KEY'])
print(c.metadata.list_datasets()[:5])
"
# If this prints a list of datasets, the key works.See also
concepts/databento-vs-uw-vs-ibkr-data-feeds.md- the layering decision and tier upgrade triggersconcepts/nautilus-data.md-ParquetDataCatalog, ingest pipelineconcepts/nautilus-tutorials.md§ “Data Catalog with Databento” - Nautilus-side ingest patternconcepts/nautilus-options.md§ “docs-blessed vendor for replay” - why OPRA via Databento for options replayplans/2026-05-09-nautilus-spike.md§ Step 0.5 - the validation pull this account funds~/.claude/projects/.../memory/feedback_ibkr_pricing_source.md- pricing-of-record invariant; Databento is replay/audit only
Timeline
- 2026-05-07 | Cody - Filed alongside the layering decision; account sign-up imminent. Captures pre-spike operator setup so Saturday’s Step 0.5 can run without re-deriving these conventions.