Leading-signal method families for 0DTE SPY (the timing-method gap)
The core reframe (this is the whole point)
The MK2 failure (“late and wrong, crushed on puts”) is not primarily a missing-feed problem. Codebase inventory 2026-05-18 shows MK2 already ingests ~19 feeds and computes ~24 “leading” methods. The failures are:
- Open interest is a stock; aggressive flow is a flow. The engine collapses the two - reads a static 1–11k. Volume-vs-OI (volume > OI = new positioning; volume ≪ OI = closing churn on the old stack) is the direct antidote and is under-used.
- GEX is computed but used as score points, not as a regime GATE. GEX is never directional - it is a mean-reversion-vs-momentum regime classifier. “Crushed on puts” is textbook buying puts in a positive-gamma (dealer dip-buying) regime that mean-reverts the move. Using GEX for directional score points is the category error.
- Entry fires at climax because the exhaustion gate is partial and there is no start-of-move (CUSUM) trigger - so it confirms late instead of detecting the leg’s start.
So: fix the use of existing data before buying new data. See 2026-05-18-put-side-structural-loss-and-bypass-impulse, 2026-05-15-mk3-setup-hunter-architecture, 2026-05-14-five-corpus-rules-for-mk3.
Method families: leads/lags, data cost, bars-vs-tick
| Family | Leads? | Data cost | Bars OK? | MK2 status |
|---|---|---|---|---|
| Intraday GEX / zero-gamma-flip regime gate | Regime (gates direction) | Low (have it) | Yes (min) | Computed, misused as score pts not a gate |
| Volume-climax + range-decel exhaustion VETO | Negative filter | Trivial (1-min) | Yes | Partial/weak (GH #84 pending) |
| CUSUM start-of-move trigger | Yes (leg start) | Trivial (1-min) | Yes | Absent (regime-change = discrete GEX flip only) |
| ES-leads-SPY (post-midday) / VIX1D / term | Yes (regime/size) | Low (have ES+VIX) | Yes | ES basis + VIX term present; VIX1D not isolated |
| NYSE TICK / $ADD breadth | Yes (breadth aggressor) | Low (standard retail) | 5–15s best | ABSENT - genuine cheap feed gap |
| Volume-vs-OI new-positioning filter | Yes (flow≠stock) | Low (have UW) | Yes | Under-used; the static-OI bug |
| Trade aggressor (Lee-Ready/tick rule) | Yes (~20-tick lead) | High (tick+quote) | No | UW options-side only; no equity-tape classify |
| BVC signed flow | Yes (probabilistic) | Low (1-min) | Yes (bar-native) | Absent (the no-tick-feed fallback) |
| OFI / Kyle’s λ | Yes (strong) | High (L2/MBO) | No | Absent → MK3/Databento |
| BOCPD on signed flow | Yes (regime transition) | High | No | Absent → MK3 |
| VPIN, naked Kyle’s λ | Hype-flag | High | - | Skip (VPIN ≈ volatility proxy, Andersen-Bondarenko 2014) |
Ranked fixes by edge-per-cost (retail-accessible)
- GEX/zero-gamma-flip as a hard regime gate - no new data. Above flip = mean-revert regime → suppress momentum/put continuation; below = momentum regime → trend entries (incl. puts) allowed. Directly explains and fixes “crushed on puts.” Highest leverage, zero feed cost.
- Exhaustion VETO + CUSUM start-of-move - 1-min bars only. Veto = volume-climax + range-expansion-then-deceleration (kills the climax entry). CUSUM (vol-scaled threshold) = fire at drift start. Must be trend-day-gated (don’t veto one-way-day continuations).
- Add NYSE TICK / $ADD breadth feed - the one genuinely missing cheap feed. Breadth-wide aggressor pressure; leading; needs regime disambiguation (TICK extreme = reversal on normal day, continuation if persists >10min on trend day).
- Volume-vs-OI new-positioning filter + sweep tags as weak prior - corrects the static-OI-as-flow category error directly.
- BVC signed flow - only bar-native aggressor proxy; use if no tick budget. 6+. True OFI / Lee-Ready / BOCPD-on-flow → MK3 via Databento (paid microstructure; degrades in fast bursts anyway - not a quick win).
Hype / category-error flags (do NOT chase)
- VPIN is largely a volatility proxy (Andersen & Bondarenko 2014) - a risk/sizing conditioner at best, never an informed-flow trigger.
- Options “sweep” tags = weak prior; many are routing artifacts, not conviction. MK2 may be over-trusting them.
- GEX as a directional signal = the exact category error driving the put losses. GEX classifies regime, never direction.
- “ES always leads SPY” = false; cash/ETF leads in the morning, ES takes over post-midday. Lead direction is time-of-day dependent.
- Lee-Ready/tick rule have degraded to ~77–79% in the HFT era and fail worst in fast bursts - i.e., exactly the climax moments. BVC (~90% at coarse buckets) is the bar-native alternative.
The microstructure feed path = MK3 / Databento (Nautilus-native)
Databento OPRA (options) + equity tape decodes natively to Nautilus
TradeTick with aggressor_side embedded; MBP-1/MBP-10/MBO give
L1/L2/L3 for OFI/λ. Path: DBN → ParquetDataCatalog (convert once) →
BacktestEngine streams it. This is also the answer to “do the 1-yr
backtest the Nautilus way.” Spike (1 day) < ~$30 on free credits; full
1-year needs metadata.get_cost budget confirmation. See
nautilus-databento, nautilus-data, nautilus-backtesting.
Per 0dte-ml-best-in-class-comparison: path to 80% WR =
clean labels + meta-veto + microstructure (not a better model); needs
n≥500 + 6–12mo - the Databento microstructure stack is the enabling input.
Prerequisite flag (not a method)
The 1-year UW data-shop bundle purchased 2026-05-07 ($1,531, 10
datasets) appears not downloaded on this machine (RED in
2026-05-15-mk3-data-foundation-constraint). It gates the lead-lag /
regime-conditioned backtests. Re-acquire from the UW account before those
analyses can run; this is a download prerequisite, not a build task.
Sources
Lee-Ready/BVC era-degradation: Chakrabarty/Pascual/Shkilko (JFM 2015); Panayides/Shohfi/Smith. OFI/VPIN/λ: Easley/López de Prado/O’Hara; Andersen & Bondarenko 2014 (VPIN≈vol proxy). GEX/charm/vanna: SpotGamma, MenthorQ, FlashAlpha. Cross-asset: Hasbrouck 2003 (ES price discovery); Cboe VIX1D; NYSE TICK trend-day (tosindicators). Change-point: BOCPD order-flow (arXiv 2307.02375), BOCPD financial TS (ACM 2025). Feed vendors: Databento OPRA.PILLAR, Polygon, CBOE LiveVol. (Full URLs in the 2026-05-18 research agent transcript.)