2026-05-05 - Week of Data-Layer Rebuild + Adversarial Review

Six-day sprint from May 1 to May 5 that rebuilt the ML data layer, shipped 64+ new feature columns across 8 literature-backed categories, fixed 8 P0/P1 bugs from a hard adversarial code review, and established the discipline of routing all code through Codex with verification. 680 tests passing (was 564 a week ago). Schema v19. Going into next week with a clean foundation for the prune → commercialization arc.

Sprint outcomes by category

ML data integrity (the foundation)

  • #49 outcome_resolved silent rollback FIXED - the load-bearing bug. 36 → 130 trainable scoring_events overnight.
  • #43 forward_pnl UPDATE silent failure fixed
  • #39 trades.contracts inflation race fixed
  • #54 14 historical CLOSED trades with no outcome row healed
  • #45 outcome_hit_tp substring check (TAKE_PROFIT match)
  • #47 broker reconciler scheduled daily 15:30 CT

Cortana Probability Score (CPS) - feature taxonomy

8 literature-backed categories. 7 of 8 complete. Microstructure (LOB) is the lone gap (Task #51, multi-week scope).

CategoryStatus
Dealer Greeks (charm/vanna/GEX/spot/delta magnitudes)✓ #58, #65, #67
Technical indicators (RSI/ATR/MACD/ADX/realized vol/pivot)✓ #68
Microstructure / LOB✗ #51 pending
Session context (VWAP, range, exhaustion, edges)✓ #41, #66, #72
Cross-asset (divergences + VIX term)✓ #70
Options chain (per-contract Greeks + chain microstructure)✓ #69
Trader state (trade_n_today, loss_streak, day_pnl)✓ #51 MVP
Macro proximity (minutes_to_FOMC/CPI/OpEx)✓ #71

Modeling pipeline

  • #56 meta-labeling secondary classifier (L1 logistic) live in shadow mode
  • #57 triple-barrier vol-scaled label (López de Prado canon)
  • #52 training projection trimmed from ~80 cols to canonical featurizer set
  • #33, #34, #63, #64 post-exit + pre-entry path snapshots, forward-return labels, counterfactual capture for skips
  • #60 TabPFN spike: XGBoost AUC 0.430 (anti-predictive at n=100) empirically established the literature’s small-n verdict

Operational hardening

  • #42 broker-direct reconciler (live socket + Flex Query)
  • #46 broker-truth-first design doc filed
  • #48 dashboard ML probabilities + commercialization-grade config reframe
  • #36 dashboard self-heal from broker_executions
  • engine off-hours guard (post-15:10 CT exit)
  • nightly_ml_pipeline orchestrator (5-step chain)

Adversarial review (the gate before live capital)

Codex high-reasoning review of 30 commits / 8.5k LOC found:

  • 3 P0s: dashboard double-count + stale ML labels + live model feature drift - all FIXED tonight
  • 5 P1s: WAL flag + UW timeout + plist paths + step status + watchdog TZ - all FIXED tonight
  • 2 P2s: macro_calendar 2026-only + market-holidays not in guard

Operational lessons learned (added to brain as separate concepts)

concepts/codex-sandbox-silent-drop.md

2 of 5 codex tabs tonight silently dropped patches despite reporting success. Workflow change: ALWAYS verify git status --short after codex exec before committing. Filed as Task #80 to escalate to Conductor maintainers.

concepts/launchd-calendar-catchup.md

launchd fires StartCalendarInterval immediately on load if the slot is “missed” since last run. RunAtLoad=false doesn’t suppress. Fixed via start_mk2.sh off-hours self-quiesce.

concepts/ml-training-label-grounding.md

Model trains on outcome_pnl >= 0 (P&L-grounded), not realized_label (directional). Orphaned realized_label column = noise.

concepts/0dte-ml-best-in-class-comparison.md

Literature verdict: meta-labeling is the highest-EV move at n=100. 80% win rate via ML alone is a 6-12 month horizon.

concepts/cortana-probability-score-feature-taxonomy.md

The 8-category inventory of CPS inputs.

concepts/meta-labeling-implementation-patterns.md

L1 logistic + mutual-info-top-8 + triple-barrier label as canonical target. AFML Ch.3.6 + H&T published research.

Going into next week - the prune arc

User asked Friday: “How do I shed old/unwanted code as we move toward real money?” Strategy filed as Tasks 74-78:

  1. #77 Quick wins (30 min): delete force_close_*.py, .bak plists, obvious dead code
  2. #74 Phase 1 (15 min): coverage.py + vulture against a paper trading day. Quantitative dead-code list.
  3. #76 Lifecycle tagging (3 hr): every module gets # LIFECYCLE: paper-only|both|live-only|deprecated
  4. #75 Phase 2 (3 hr): folder-by-folder Codex prune. ~5-10k LOC removed.
  5. #78 Commercialization-grade config: per-user SQLite, schema validation, audit log, encrypted secrets. ~1 week scope.

The sequence reduces blast radius at every step: measure, isolate, prune, then commercialize.

Scoreboard

MetricWeek startWeek end
Tests passing564680
FEATURIZE_EVENT_COLUMNS38~95
Schema version (decisions.db)619
ML trainable rows36130+
GEX magnitude coverage08,277
Charm/vanna magnitude coverage08,277
Open P0s (per #73 review)unknown0
Brain pages511
Codex tabs fireda few~25
Lines committed-~12k+

See Also


Timeline

2026-05-05 22:30 CDT | wrap - 680 tests green, all #73 review findings fixed and verified on disk, plists installed, safe pip upgrades done, brain pages filed. Going to bed at hour ~17 of focused work.