IMPULSE vs Composite gate - empirical tier analysis

Context. 2026-05-14 ~12:15 CT, IMPULSE engine flagged BEARISH conv=0.55→0.60 (strike_stack trigger). Composite scoring engine returned BEARISH score=26-33 the whole time. Engine telegram threshold is 58. No alert fired. SPY proceeded to drop through the Bollinger lower band over the next ~15 min. User flagged the miss as an “early and right” mandate violation.

The proposed fix: tier the score threshold by IMPULSE conviction (e.g. IMPULSE 0.60+ drops threshold from 58 to ~42). Before staging a Codex handoff, I ran the empirical TP-hit rate analysis against the full corpus (2026-04-07 → 2026-05-14, ~5 weeks, 293 paper trades).

The data inverted the recommendation.

Method

  • Source: ~/cortanaroi-data/decisions.db.scoring_events JOIN ~/cortanaroi-data/paper_trades.db.outcomes via paper_trade_id → trades.id → outcomes.trade_id.
  • Universe: scoring_events with trigger LIKE 'impulse%', decision = 'ENTERED', bias != 'NEUTRAL'. n=138.
  • Outcome metric: TP-hit = outcomes.pnl_pct >= 0.10 (the +10% mandate target within hold).
  • Baseline cohort: all entered trades regardless of trigger, same bucketing, n=293.

IMPULSE-entered TP-hit rate by trigger × bias

triggerbiasnavg scoreTP-hitTP-hit %
hiro+strike_stackBULL1072.5444.4%
strike_stackBULL3869.61027.8%
hiroBEAR3337.1721.9%
hiro+strike_stackBEAR634.5116.7%
strike_stackBEAR3334.4412.1%
hiroBULL1870.4211.1%

IMPULSE vs baseline by composite score bucket (TP-hit %)

biasbucketIMPULSE only (TP-hit %, n)Baseline all (TP-hit %, n)
BEAR42-4928.6% (n=7)26.3% (n=19)
BEAR35-4121.1% (n=38)21.4% (n=70)
BEAR<357.7% (n=27)15.5% (n=62)
BULL58+25.4% (n=66)22.9% (n=191)

Findings

1. IMPULSE provides almost no edge over baseline. At every score bucket except one, IMPULSE-tagged trades hit TP at within ±1 percentage point of non-IMPULSE trades in the same bucket. strike_stack alone is barely a signal. hiro alone has the same problem.

2. Today’s BEAR setup was statistically the correct skip. Composite=33 + IMPULSE strike_stack falls in the worst bucket (BEAR <35), TP-hit rate 7.7%. Lowering the threshold to fire that bucket would be 1-in-13 trades hitting TP, well below baseline. The composite gate is not “missing winners” - it’s filtering signals where the option mid only hits +10% one time in thirteen.

3. The hiro+strike_stack compound is the only IMPULSE variant with measurable edge. BULL combo: 44.4% TP-hit vs 22.9% baseline - nearly 2x lift. n=10 is too thin for a hot patch but is the strongest single finding in the corpus. Hold for MK3 lock-in.

4. The 80% TP-hit mandate is structurally unreachable with the current engine + 10% TP target on 0DTE. Across 293 trades: 64 WIN_TP / 109 WIN_partial / 120 LOSS = 22% TP-hit. No bucket - IMPULSE, baseline, BULL, BEAR - exceeds 45%. The mandate as written needs reframing. Two paths:

  • Different KPI: target ”% of closes with positive P&L” (~55% achievable, but the 80% mandate language has to change too)
  • Smaller TP target: test 5% TP - likely doubles hit rate. Different exit ladder required.

Decision

  • Do NOT lower the composite score threshold based on IMPULSE conviction. Data contradicts the original hypothesis.
  • Promote hiro+strike_stack as a separate alert sleeve in MK3 once n hits 30+ entered. Currently n=16 combined BULL+BEAR. Watch over next 2-3 weeks of paper.
  • Fix the broken counterfactual capture as P0. scoring_events_hypothetical_outcomes has 6,048 recommendation rows since corpus start but 100% of option_mid_5m/15m/30m columns are NULL. Root cause: backfill_skip_hypothetical_outcomes.py looks for UWClient.get_option_contract_historical_mid which is NotImplementedError. Without this fix, we can only evaluate the 138 trades we took, not the 5,910 IMPULSE events we skipped - biggest data gap blocking the MK3 strategy revision.