Scoring Engine Review - post-MK3 stabilization

Once MK3 paper trading stabilizes (target: 1-2 weeks of clean Monday-Friday runtime after 2026-05-12), revisit MK3’s scoring weights - particularly market-tide’s 35-point weight, which Cody flagged 2026-05-07 as suspected of hurting performance on choppy days. Re-tune weights against the M2 backtest harness, with regime-conditioned variants if data supports.

State

  • Owner: Cody
  • Repo path: cortana-mk3 (per plans/2026-05-09-mk3-build-weekend.md - repo bootstraps Saturday 2026-05-09)
  • Depends on: MK3 paper trading running cleanly (M1 weekend pass), M2 backtest harness landed (M1 week 2-3)
  • Triggers earlier review IF: week-1 MK3 paper performance shows obvious signal-quality issues traceable to specific features
  • Key metrics: Brier score, AUC, win rate per regime (chop / trend / power-hour); MK3-vs-MK2 decision diff broken down per feature contribution

Hypothesis

/api/market/market-tide carries 35 points of weight in MK2’s composite score (largest single feature). Empirically, MK2 chop-day clusters (notably 2026-04-16, see ~/.claude/.../memory/project_losses_april16_chop.md: -$9,742 across 3 flips) show market-tide flipping bias rapidly - driving the score above threshold in opposite directions within minutes, which then triggers entries on whichever side fires first and stops out on the reversal.

If that pattern holds: market-tide should be regime-conditioned, not flat-weighted. Specifically:

  • On trend days: full weight (or higher) - tide is the cleanest bias signal
  • On chop days: weight reduced or replaced by a “tide stability” derivative (e.g., require tide to hold direction for N minutes before contributing)
  • Power Hour (last 30min): potentially separate tide treatment (theta acceleration changes the tide signature)

Open Threads

  • Regime-detection prerequisite: chop vs trend classifier needs to ship first (per ~/.claude/.../memory/project_trend_day_detection.md). Without it, “regime-conditioned weights” is unreachable.
  • Replay set: 2026-04-16 chop cluster + 2026-04-22 chop + recent trend days (TBD which) - MK3 backtest harness must replay all of them with weight-variant scoring to find the optimal split.
  • Confounders: MK2’s 35pt tide ran alongside dropped meta-gate (GH #88) and dead-code sizing - disentangling whether 35pt was bad or whether the gate-failure compounded it requires the M2 harness with meta-model wired through RiskEngine first.

See Also

  • Build weekend plan - initial MK3 scoring carries MK2-equivalent 35/15/15… weights as starting point pending this review
  • MK3 roadmap - M2 milestone backtest harness is the prerequisite tooling
  • ~/.claude/.../memory/project_losses_april16_chop.md - empirical chop-day case
  • ~/.claude/.../memory/project_trend_day_detection.md - regime-detector blocker
  • ~/.claude/.../memory/project_scoring_research_april13.md - prior scoring research thread

Timeline

  • 2026-05-07 | Cody - Flagged during MK3 build-weekend planning. Suspicion: market-tide’s 35pts may be net-negative on chop days. Decision: defer review until MK3 stabilizes (1-2 weeks paper post-2026-05-12); meanwhile, ship MK3 V1 with MK2-equivalent weights as starting point, instrument feature contributions in dashboard for empirical visibility, then re-tune.
  • 2026-05-07 (evening) | Cody - Pushed MK3 paper-open deadline. New trigger window for the scoring review: 1-2 weeks of clean MK3 paper after Cody calls Phase 3 stable (could be 2026-05-19 onward, not the original 2026-05-12). Plan reframed at plans/2026-05-09-mk3-build-week.md (supersedes build-weekend). Substance unchanged - market-tide 35pt-on-chop hypothesis still queued; same prerequisites (regime detector, M2 backtest harness with meta-model wired through RiskEngine).
  • 2026-05-07 (late evening) | Cody - Purchased UW data shop bundle ($1,531, 10 datasets, 1Y SPY). Scoring engine review preflight now possible IN Phase 2b of build week, not deferred. With 1Y of Market Tide + 1Y of GEX (daily) + 1Y of Big Option Trades (per-trade) + 1Y of Delta Exposure + 1Y of IV Rank, the chop-vs-trend market-tide weight sweep can run during build week instead of waiting until M2. First-pass tearsheets ship Phase 2b (Thu 5/14 → Fri 5/16); full review still happens post-stabilization with the more rigorous 252-trading-day regime-conditioned analysis. See ~/brain/concepts/databento-vs-uw-vs-ibkr-data-feeds.md for the full purchased inventory.