2026-05-05 - Week of Data-Layer Rebuild + Adversarial Review

Six-day sprint from May 1 to May 5 that rebuilt the ML data layer, shipped 64+ new feature columns across 8 literature-backed categories, fixed 8 P0/P1 bugs from a hard adversarial code review, and established the discipline of routing all code through Codex with verification. 680 tests passing (was 564 a week ago). Schema v19. Going into next week with a clean foundation for the prune → commercialization arc.

Sprint outcomes by category

ML data integrity (the foundation)

#49 outcome_resolved silent rollback FIXED - the load-bearing bug. 36 → 130 trainable scoring_events overnight.
#43 forward_pnl UPDATE silent failure fixed
#39 trades.contracts inflation race fixed
#54 14 historical CLOSED trades with no outcome row healed
#45 outcome_hit_tp substring check (TAKE_PROFIT match)
#47 broker reconciler scheduled daily 15:30 CT

Cortana Probability Score (CPS) - feature taxonomy

8 literature-backed categories. 7 of 8 complete. Microstructure (LOB) is the lone gap (Task #51, multi-week scope).

Category	Status
Dealer Greeks (charm/vanna/GEX/spot/delta magnitudes)	✓ #58, #65, #67
Technical indicators (RSI/ATR/MACD/ADX/realized vol/pivot)	✓ #68
Microstructure / LOB	✗ #51 pending
Session context (VWAP, range, exhaustion, edges)	✓ #41, #66, #72
Cross-asset (divergences + VIX term)	✓ #70
Options chain (per-contract Greeks + chain microstructure)	✓ #69
Trader state (trade_n_today, loss_streak, day_pnl)	✓ #51 MVP
Macro proximity (minutes_to_FOMC/CPI/OpEx)	✓ #71

Modeling pipeline

#56 meta-labeling secondary classifier (L1 logistic) live in shadow mode
#57 triple-barrier vol-scaled label (López de Prado canon)
#52 training projection trimmed from ~80 cols to canonical featurizer set
#33, #34, #63, #64 post-exit + pre-entry path snapshots, forward-return labels, counterfactual capture for skips
#60 TabPFN spike: XGBoost AUC 0.430 (anti-predictive at n=100) empirically established the literature’s small-n verdict

Operational hardening

#42 broker-direct reconciler (live socket + Flex Query)
#46 broker-truth-first design doc filed
#48 dashboard ML probabilities + commercialization-grade config reframe
#36 dashboard self-heal from broker_executions
engine off-hours guard (post-15:10 CT exit)
nightly_ml_pipeline orchestrator (5-step chain)

Adversarial review (the gate before live capital)

Codex high-reasoning review of 30 commits / 8.5k LOC found:

3 P0s: dashboard double-count + stale ML labels + live model feature drift - all FIXED tonight
5 P1s: WAL flag + UW timeout + plist paths + step status + watchdog TZ - all FIXED tonight
2 P2s: macro_calendar 2026-only + market-holidays not in guard

Operational lessons learned (added to brain as separate concepts)

`concepts/codex-sandbox-silent-drop.md`

2 of 5 codex tabs tonight silently dropped patches despite reporting success. Workflow change: ALWAYS verify git status --short after codex exec before committing. Filed as Task #80 to escalate to Conductor maintainers.

`concepts/launchd-calendar-catchup.md`

launchd fires StartCalendarInterval immediately on load if the slot is “missed” since last run. RunAtLoad=false doesn’t suppress. Fixed via start_mk2.sh off-hours self-quiesce.

`concepts/ml-training-label-grounding.md`

Model trains on outcome_pnl >= 0 (P&L-grounded), not realized_label (directional). Orphaned realized_label column = noise.

`concepts/0dte-ml-best-in-class-comparison.md`

Literature verdict: meta-labeling is the highest-EV move at n=100. 80% win rate via ML alone is a 6-12 month horizon.

`concepts/cortana-probability-score-feature-taxonomy.md`

The 8-category inventory of CPS inputs.

`concepts/meta-labeling-implementation-patterns.md`

L1 logistic + mutual-info-top-8 + triple-barrier label as canonical target. AFML Ch.3.6 + H&T published research.

Going into next week - the prune arc

User asked Friday: “How do I shed old/unwanted code as we move toward real money?” Strategy filed as Tasks 74-78:

#77 Quick wins (30 min): delete force_close_*.py, .bak plists, obvious dead code
#74 Phase 1 (15 min): coverage.py + vulture against a paper trading day. Quantitative dead-code list.
#76 Lifecycle tagging (3 hr): every module gets # LIFECYCLE: paper-only|both|live-only|deprecated
#75 Phase 2 (3 hr): folder-by-folder Codex prune. ~5-10k LOC removed.
#78 Commercialization-grade config: per-user SQLite, schema validation, audit log, encrypted secrets. ~1 week scope.

The sequence reduces blast radius at every step: measure, isolate, prune, then commercialize.

Scoreboard

Metric	Week start	Week end
Tests passing	564	680
FEATURIZE_EVENT_COLUMNS	38	~95
Schema version (decisions.db)	6	19
ML trainable rows	36	130+
GEX magnitude coverage	0	8,277
Charm/vanna magnitude coverage	0	8,277
Open P0s (per #73 review)	unknown	0
Brain pages	5	11
Codex tabs fired	a few	~25
Lines committed	-	~12k+

Timeline

2026-05-05 22:30 CDT | wrap - 680 tests green, all #73 review findings fixed and verified on disk, plists installed, safe pip upgrades done, brain pages filed. Going to bed at hour ~17 of focused work.

CortanaROI Brain

Explorer

2026-05-05-week-of-data-layer-rebuild