2025-2026 NCAAM Model Performance Analysis
Only games where at least one team had an AP Poll rank on game day. All models are re-evaluated on that same subset, and AP Poll joins the comparison set here.
Comparing prediction accuracy across 831 games using multiple rating models.
7-day holdout coverage: 17/18 models .
Rolling Holdout Curves
Each point is a strict weekly holdout: train on all games before that week, test on that week. This first version uses a 21-day warmup, then 7-day holdouts stepped forward weekly.
Weekly strict holdout log loss. Lower is better. Showing 16 models across 22 windows. Click legend items to hide/show series.
Recent Window Winners
| Holdout | Best | Log Loss | Runner-up | Models |
|---|---|---|---|---|
| Apr 1 - Apr 6 | Adjusted Context Blend | 0.463 | Log Adjusted (0.490) | 16 |
| Mar 25 - Mar 31 | Log Adjusted | 0.591 | Adjusted Efficiency (0.591) | 16 |
| Mar 18 - Mar 24 | Adjusted Efficiency | 0.427 | Log Adjusted (0.428) | 16 |
| Mar 11 - Mar 17 | Margin | 0.550 | Points Off/Def (0.552) | 16 |
| Mar 4 - Mar 10 | Adjusted Context Blend | 0.453 | Points Off/Def (0.466) | 16 |
| Feb 25 - Mar 3 | Points Off/Def Recency | 0.592 | Points Off/Def (0.601) | 16 |
| Feb 18 - Feb 24 | Points Off/Def Recency | 0.609 | Points Off/Def (0.609) | 16 |
| Feb 11 - Feb 17 | Points Off/Def | 0.517 | Points Off/Def Recency (0.525) | 16 |
Model Performance Leaderboard
Models ranked by strict holdout AUC when available (fallback: full-season AUC). Hover over column headers for explanations.
| # | Model | 7d Split | AUC | Acc | Brier | LogLoss | n | AUC 7d | Acc 7d | Brier 7d | n 7d |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | AP Poll AP Poll Human ranking baseline for games involving a ranked team. More → |
STRICT
5g
|
0.804 | 76.1% | - | - | 831 | 1.000 | 80.0% | - | 5 |
| 2 | Adjusted Context Blend Adjusted Context Blend Experimental context-heavy win model blending strong team components with rest and venue context. More → |
STRICT
5g
|
- | - | - | - | 0 | 1.000 | 80.0% | 0.143 | 5 |
| 3 | Efficiency Efficiency Tempo-adjusted efficiency version of Pythagorean ratings. More → |
FULL
no 7d
|
0.859 | 77.6% | 0.152 | 0.464 | 831 | - | - | - | 0 |
| 4 | Margin Margin Linear team-strength model fit on point differential instead of binary wins. More → |
STRICT
5g
|
0.878 | 80.4% | 0.144 | 0.447 | 831 | 0.833 | 80.0% | 0.157 | 5 |
| 5 | Margin Recency Margin Recency Margin regression with exponential recency weights on newer games. More → |
STRICT
5g
|
- | - | - | - | 0 | 0.833 | 60.0% | 0.191 | 5 |
| 6 | Adjusted Efficiency Adjusted Efficiency Opponent-adjusted efficiency model with separate offensive and defensive components. More → |
STRICT
5g
|
0.880 | 80.1% | 0.140 | 0.427 | 831 | 0.833 | 80.0% | 0.161 | 5 |
| 7 | Log Adjusted Log Adjusted Log-scale adjusted efficiency model that downweights blowout leverage. More → |
STRICT
5g
|
0.880 | 80.3% | 0.140 | 0.427 | 831 | 0.833 | 80.0% | 0.160 | 5 |
| 8 | Points Off/Def Points Off/Def Raw points regression with separate offensive and defensive team parameters. More → |
STRICT
5g
|
0.877 | 80.6% | 0.147 | 0.456 | 831 | 0.833 | 80.0% | 0.163 | 5 |
| 9 | Home Team Baseline Home Team Baseline Always favor the home team with a fixed prior. More → |
STRICT
5g
|
0.674 | 67.7% | 0.225 | 0.642 | 831 | 0.750 | 80.0% | 0.200 | 5 |
| 10 | Points Off/Def Recency Points Off/Def Recency Off/def points regression with exponential recency weights. More → |
STRICT
5g
|
- | - | - | - | 0 | 0.667 | 60.0% | 0.209 | 5 |
| 11 | Core Ensemble Core Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and points off/def. More → |
STRICT
5g
|
- | - | - | - | 0 | 0.667 | 40.0% | 0.208 | 5 |
| 12 | Recency Ensemble Recency Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and recency points off/def. More → |
STRICT
5g
|
- | - | - | - | 0 | 0.667 | 40.0% | 0.220 | 5 |
| 13 | Dynamic Bradley-Terry Dynamic Bradley-Terry Time-evolving paired-comparison model with latent team strength drift. More → |
STRICT
5g
|
- | - | - | - | 0 | 0.500 | 60.0% | 0.260 | 5 |
| 14 | Elo Elo Streaming paired-comparison rating with recency baked into sequential updates. More → |
STRICT
5g
|
0.854 | 78.3% | 0.155 | 0.480 | 831 | 0.333 | 40.0% | 0.300 | 5 |
| 15 | Bradley-Terry Bradley-Terry Static logistic paired-comparison model with one team strength parameter. More → |
STRICT
5g
|
0.874 | 79.3% | 0.143 | 0.445 | 831 | 0.333 | 20.0% | 0.288 | 5 |
| 16 | Avg Margin Baseline Avg Margin Baseline Predict from simple average scoring margin in the training window. More → |
STRICT
5g
|
0.857 | 78.2% | 0.164 | 0.504 | 831 | 0.333 | 40.0% | 0.285 | 5 |
| 17 | Bradley-Terry Recency Bradley-Terry Recency Static Bradley-Terry with exponential recency weights on newer games. More → |
STRICT
5g
|
- | - | - | - | 0 | 0.167 | 20.0% | 0.318 | 5 |
| 18 | Pythagorean Pythagorean Pythagorean win expectation from raw points scored and allowed. More → |
STRICT
5g
|
0.842 | 76.7% | 0.198 | 0.581 | 831 | 0.167 | 40.0% | 0.271 | 5 |
Methodology
ELO / Bradley-Terry
- ELO: Iterative updates, K=64, HCA=100
- BT: Static logistic regression on all games
- Both model win probability, not margin
- ELO updates after each game; BT fits once
Pythagorean Models
- Raw: Classic points scored/allowed formula
- Efficiency: Pace-adjusted (pts per possession)
- Adjusted: Opponent-adjusted efficiency
- Log: Log-linear multiplicative scale
Margin Regression
- Team-level ridge regression on point margin
- Linear Bradley-Terry (margin target)
- Alpha=0.05 (CV-tuned)
- Learns home advantage from data (~6 pts)
Baselines
- Home Team: Always predict home wins (60%)
- Avg Margin: Higher average margin wins
- Models should beat these to add value