2026-05-05 - Week of Data-Layer Rebuild + Adversarial Review
Six-day sprint from May 1 to May 5 that rebuilt the ML data layer, shipped 64+ new feature columns across 8 literature-backed categories, fixed 8 P0/P1 bugs from a hard adversarial code review, and established the discipline of routing all code through Codex with verification. 680 tests passing (was 564 a week ago). Schema v19. Going into next week with a clean foundation for the prune → commercialization arc.
Sprint outcomes by category
ML data integrity (the foundation)
- #49 outcome_resolved silent rollback FIXED - the load-bearing bug. 36 → 130 trainable scoring_events overnight.
- #43 forward_pnl UPDATE silent failure fixed
- #39 trades.contracts inflation race fixed
- #54 14 historical CLOSED trades with no outcome row healed
- #45 outcome_hit_tp substring check (TAKE_PROFIT match)
- #47 broker reconciler scheduled daily 15:30 CT
Cortana Probability Score (CPS) - feature taxonomy
8 literature-backed categories. 7 of 8 complete. Microstructure (LOB) is the lone gap (Task #51, multi-week scope).
| Category | Status |
|---|---|
| Dealer Greeks (charm/vanna/GEX/spot/delta magnitudes) | ✓ #58, #65, #67 |
| Technical indicators (RSI/ATR/MACD/ADX/realized vol/pivot) | ✓ #68 |
| Microstructure / LOB | ✗ #51 pending |
| Session context (VWAP, range, exhaustion, edges) | ✓ #41, #66, #72 |
| Cross-asset (divergences + VIX term) | ✓ #70 |
| Options chain (per-contract Greeks + chain microstructure) | ✓ #69 |
| Trader state (trade_n_today, loss_streak, day_pnl) | ✓ #51 MVP |
| Macro proximity (minutes_to_FOMC/CPI/OpEx) | ✓ #71 |
Modeling pipeline
- #56 meta-labeling secondary classifier (L1 logistic) live in shadow mode
- #57 triple-barrier vol-scaled label (López de Prado canon)
- #52 training projection trimmed from ~80 cols to canonical featurizer set
- #33, #34, #63, #64 post-exit + pre-entry path snapshots, forward-return labels, counterfactual capture for skips
- #60 TabPFN spike: XGBoost AUC 0.430 (anti-predictive at n=100) empirically established the literature’s small-n verdict
Operational hardening
- #42 broker-direct reconciler (live socket + Flex Query)
- #46 broker-truth-first design doc filed
- #48 dashboard ML probabilities + commercialization-grade config reframe
- #36 dashboard self-heal from broker_executions
- engine off-hours guard (post-15:10 CT exit)
- nightly_ml_pipeline orchestrator (5-step chain)
Adversarial review (the gate before live capital)
Codex high-reasoning review of 30 commits / 8.5k LOC found:
- 3 P0s: dashboard double-count + stale ML labels + live model feature drift - all FIXED tonight
- 5 P1s: WAL flag + UW timeout + plist paths + step status + watchdog TZ - all FIXED tonight
- 2 P2s: macro_calendar 2026-only + market-holidays not in guard
Operational lessons learned (added to brain as separate concepts)
concepts/codex-sandbox-silent-drop.md
2 of 5 codex tabs tonight silently dropped patches despite reporting
success. Workflow change: ALWAYS verify git status --short after
codex exec before committing. Filed as Task #80 to escalate to
Conductor maintainers.
concepts/launchd-calendar-catchup.md
launchd fires StartCalendarInterval immediately on load if the slot is “missed” since last run. RunAtLoad=false doesn’t suppress. Fixed via start_mk2.sh off-hours self-quiesce.
concepts/ml-training-label-grounding.md
Model trains on outcome_pnl >= 0 (P&L-grounded), not realized_label (directional). Orphaned realized_label column = noise.
concepts/0dte-ml-best-in-class-comparison.md
Literature verdict: meta-labeling is the highest-EV move at n=100. 80% win rate via ML alone is a 6-12 month horizon.
concepts/cortana-probability-score-feature-taxonomy.md
The 8-category inventory of CPS inputs.
concepts/meta-labeling-implementation-patterns.md
L1 logistic + mutual-info-top-8 + triple-barrier label as canonical target. AFML Ch.3.6 + H&T published research.
Going into next week - the prune arc
User asked Friday: “How do I shed old/unwanted code as we move toward real money?” Strategy filed as Tasks 74-78:
- #77 Quick wins (30 min): delete force_close_*.py, .bak plists, obvious dead code
- #74 Phase 1 (15 min): coverage.py + vulture against a paper trading day. Quantitative dead-code list.
- #76 Lifecycle tagging (3 hr): every module gets
# LIFECYCLE: paper-only|both|live-only|deprecated - #75 Phase 2 (3 hr): folder-by-folder Codex prune. ~5-10k LOC removed.
- #78 Commercialization-grade config: per-user SQLite, schema validation, audit log, encrypted secrets. ~1 week scope.
The sequence reduces blast radius at every step: measure, isolate, prune, then commercialize.
Scoreboard
| Metric | Week start | Week end |
|---|---|---|
| Tests passing | 564 | 680 |
| FEATURIZE_EVENT_COLUMNS | 38 | ~95 |
| Schema version (decisions.db) | 6 | 19 |
| ML trainable rows | 36 | 130+ |
| GEX magnitude coverage | 0 | 8,277 |
| Charm/vanna magnitude coverage | 0 | 8,277 |
| Open P0s (per #73 review) | unknown | 0 |
| Brain pages | 5 | 11 |
| Codex tabs fired | a few | ~25 |
| Lines committed | - | ~12k+ |
See Also
- Codex Sandbox Silent Drop
- Cortana Probability Score Feature Taxonomy
- ML Training Label Grounding
- 0DTE ML Best-in-Class Comparison
- Meta-Labeling Implementation Patterns
- launchd Calendar Catch-up
- 2026-05-04 Adversarial ML Data Review
- 2026-05-05 TabPFN Spike + XGBoost Baseline
- 2026-05-05 Overnight ML Pipeline Rebuild
Timeline
2026-05-05 22:30 CDT | wrap - 680 tests green, all #73 review findings fixed and verified on disk, plists installed, safe pip upgrades done, brain pages filed. Going to bed at hour ~17 of focused work.