🐻⬇️🏀

2025-2026 NCAAM Model Performance Analysis

Scope

Only games where at least one team had an AP Poll rank on game day. All models are re-evaluated on that same subset, and AP Poll joins the comparison set here.

Comparing prediction accuracy across 831 games using multiple rating models.

Model Catalog

7-day holdout coverage: 17/18 models .

Rolling Holdout Curves

Each point is a strict weekly holdout: train on all games before that week, test on that week. This first version uses a 21-day warmup, then 7-day holdouts stepped forward weekly.

Log Loss Brier AUC Accuracy

Weekly strict holdout log loss. Lower is better. Showing 16 models across 22 windows. Click legend items to hide/show series.

Recent Window Winners

Holdout Best Log Loss Runner-up Models
Apr 1 - Apr 6 Adjusted Context Blend 0.463 Log Adjusted (0.490) 16
Mar 25 - Mar 31 Log Adjusted 0.591 Adjusted Efficiency (0.591) 16
Mar 18 - Mar 24 Adjusted Efficiency 0.427 Log Adjusted (0.428) 16
Mar 11 - Mar 17 Margin 0.550 Points Off/Def (0.552) 16
Mar 4 - Mar 10 Adjusted Context Blend 0.453 Points Off/Def (0.466) 16
Feb 25 - Mar 3 Points Off/Def Recency 0.592 Points Off/Def (0.601) 16
Feb 18 - Feb 24 Points Off/Def Recency 0.609 Points Off/Def (0.609) 16
Feb 11 - Feb 17 Points Off/Def 0.517 Points Off/Def Recency (0.525) 16

Model Performance Leaderboard

Models ranked by strict holdout AUC when available (fallback: full-season AUC). Hover over column headers for explanations.

# Model 7d Split AUC Acc Brier LogLoss n AUC 7d Acc 7d Brier 7d n 7d
1 AP Poll AP Poll Human ranking baseline for games involving a ranked team. More → STRICT
5g
0.804 76.1% - - 831 1.000 80.0% - 5
2 Adjusted Context Blend Adjusted Context Blend Experimental context-heavy win model blending strong team components with rest and venue context. More → STRICT
5g
- - - - 0 1.000 80.0% 0.143 5
3 Efficiency Efficiency Tempo-adjusted efficiency version of Pythagorean ratings. More → FULL
no 7d
0.859 77.6% 0.152 0.464 831 - - - 0
4 Margin Margin Linear team-strength model fit on point differential instead of binary wins. More → STRICT
5g
0.878 80.4% 0.144 0.447 831 0.833 80.0% 0.157 5
5 Margin Recency Margin Recency Margin regression with exponential recency weights on newer games. More → STRICT
5g
- - - - 0 0.833 60.0% 0.191 5
6 Adjusted Efficiency Adjusted Efficiency Opponent-adjusted efficiency model with separate offensive and defensive components. More → STRICT
5g
0.880 80.1% 0.140 0.427 831 0.833 80.0% 0.161 5
7 Log Adjusted Log Adjusted Log-scale adjusted efficiency model that downweights blowout leverage. More → STRICT
5g
0.880 80.3% 0.140 0.427 831 0.833 80.0% 0.160 5
8 Points Off/Def Points Off/Def Raw points regression with separate offensive and defensive team parameters. More → STRICT
5g
0.877 80.6% 0.147 0.456 831 0.833 80.0% 0.163 5
9 Home Team Baseline Home Team Baseline Always favor the home team with a fixed prior. More → STRICT
5g
0.674 67.7% 0.225 0.642 831 0.750 80.0% 0.200 5
10 Points Off/Def Recency Points Off/Def Recency Off/def points regression with exponential recency weights. More → STRICT
5g
- - - - 0 0.667 60.0% 0.209 5
11 Core Ensemble Core Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and points off/def. More → STRICT
5g
- - - - 0 0.667 40.0% 0.208 5
12 Recency Ensemble Recency Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and recency points off/def. More → STRICT
5g
- - - - 0 0.667 40.0% 0.220 5
13 Dynamic Bradley-Terry Dynamic Bradley-Terry Time-evolving paired-comparison model with latent team strength drift. More → STRICT
5g
- - - - 0 0.500 60.0% 0.260 5
14 Elo Elo Streaming paired-comparison rating with recency baked into sequential updates. More → STRICT
5g
0.854 78.3% 0.155 0.480 831 0.333 40.0% 0.300 5
15 Bradley-Terry Bradley-Terry Static logistic paired-comparison model with one team strength parameter. More → STRICT
5g
0.874 79.3% 0.143 0.445 831 0.333 20.0% 0.288 5
16 Avg Margin Baseline Avg Margin Baseline Predict from simple average scoring margin in the training window. More → STRICT
5g
0.857 78.2% 0.164 0.504 831 0.333 40.0% 0.285 5
17 Bradley-Terry Recency Bradley-Terry Recency Static Bradley-Terry with exponential recency weights on newer games. More → STRICT
5g
- - - - 0 0.167 20.0% 0.318 5
18 Pythagorean Pythagorean Pythagorean win expectation from raw points scored and allowed. More → STRICT
5g
0.842 76.7% 0.198 0.581 831 0.167 40.0% 0.271 5

Methodology

ELO / Bradley-Terry

  • ELO: Iterative updates, K=64, HCA=100
  • BT: Static logistic regression on all games
  • Both model win probability, not margin
  • ELO updates after each game; BT fits once

Pythagorean Models

  • Raw: Classic points scored/allowed formula
  • Efficiency: Pace-adjusted (pts per possession)
  • Adjusted: Opponent-adjusted efficiency
  • Log: Log-linear multiplicative scale

Margin Regression

  • Team-level ridge regression on point margin
  • Linear Bradley-Terry (margin target)
  • Alpha=0.05 (CV-tuned)
  • Learns home advantage from data (~6 pts)

Baselines

  • Home Team: Always predict home wins (60%)
  • Avg Margin: Higher average margin wins
  • Models should beat these to add value