🐻⬇️🏀

2025-2026 NCAAWD2 Model Performance Analysis

Scope

All scored games in the selected league and season. AP Poll is excluded here.

Comparing prediction accuracy across 1722 games using multiple rating models.

Model Catalog

7-day holdout coverage: 16/17 models .

Rolling Holdout Curves

Each point is a strict weekly holdout: train on all games before that week, test on that week. This first version uses a 21-day warmup, then 7-day holdouts stepped forward weekly.

Log Loss Brier AUC Accuracy

Weekly strict holdout log loss. Lower is better. Showing 16 models across 14 windows. Click legend items to hide/show series.

Recent Window Winners

Holdout Best Log Loss Runner-up Models
Feb 4 - Feb 8 Recency Ensemble 0.495 Margin Recency (0.496) 16
Jan 28 - Feb 3 Margin Recency 0.477 Margin (0.478) 16
Jan 21 - Jan 27 Margin Recency 0.476 Recency Ensemble (0.477) 16
Jan 14 - Jan 20 Recency Ensemble 0.487 Core Ensemble (0.487) 16
Jan 7 - Jan 13 Core Ensemble 0.462 Recency Ensemble (0.463) 16
Dec 31 - Jan 6 Dynamic Bradley-Terry 0.577 Margin Recency (0.581) 16
Dec 24 - Dec 30 Log Adjusted 0.055 Adjusted Efficiency (0.055) 16
Dec 17 - Dec 23 Recency Ensemble 0.514 Core Ensemble (0.515) 16

Model Performance Leaderboard

Models ranked by strict holdout AUC when available (fallback: full-season AUC). Hover over column headers for explanations.

# Model 7d Split AUC Acc Brier LogLoss n AUC 7d Acc 7d Brier 7d n 7d
1 Margin Recency Margin Recency Margin regression with exponential recency weights on newer games. More → STRICT
290g
- - - - 0 0.856 76.6% 0.159 290
2 Adjusted Context Blend Adjusted Context Blend Experimental context-heavy win model blending strong team components with rest and venue context. More → STRICT
290g
- - - - 0 0.853 75.5% 0.159 290
3 Margin Margin Linear team-strength model fit on point differential instead of binary wins. More → STRICT
290g
0.847 76.6% 0.162 0.492 2989 0.851 76.9% 0.161 290
4 Core Ensemble Core Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and points off/def. More → STRICT
290g
- - - - 0 0.851 76.9% 0.158 290
5 Recency Ensemble Recency Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and recency points off/def. More → STRICT
290g
- - - - 0 0.851 76.9% 0.158 290
6 Points Off/Def Recency Points Off/Def Recency Off/def points regression with exponential recency weights. More → STRICT
290g
- - - - 0 0.848 76.2% 0.163 290
7 Points Off/Def Points Off/Def Raw points regression with separate offensive and defensive team parameters. More → STRICT
290g
0.797 72.3% 0.185 0.550 2989 0.840 76.9% 0.165 290
8 Adjusted Efficiency Adjusted Efficiency Opponent-adjusted efficiency model with separate offensive and defensive components. More → STRICT
290g
0.797 72.5% 0.196 0.626 2989 0.836 75.5% 0.174 290
9 Dynamic Bradley-Terry Dynamic Bradley-Terry Time-evolving paired-comparison model with latent team strength drift. More → STRICT
290g
- - - - 0 0.834 75.9% 0.167 290
10 Log Adjusted Log Adjusted Log-scale adjusted efficiency model that downweights blowout leverage. More → STRICT
290g
0.797 72.5% 0.196 0.622 2989 0.834 75.5% 0.175 290
11 Elo Elo Streaming paired-comparison rating with recency baked into sequential updates. More → STRICT
290g
0.847 76.0% 0.163 0.495 2989 0.832 75.5% 0.169 290
12 Bradley-Terry Bradley-Terry Static logistic paired-comparison model with one team strength parameter. More → STRICT
290g
0.859 76.6% 0.157 0.481 2989 0.828 77.6% 0.171 290
13 Bradley-Terry Recency Bradley-Terry Recency Static Bradley-Terry with exponential recency weights on newer games. More → STRICT
290g
- - - - 0 0.828 74.5% 0.174 290
14 Avg Margin Baseline Avg Margin Baseline Predict from simple average scoring margin in the training window. More → STRICT
290g
0.851 76.2% 0.161 0.488 2989 0.806 74.1% 0.180 290
15 Pythagorean Pythagorean Pythagorean win expectation from raw points scored and allowed. More → STRICT
290g
0.788 71.8% 0.190 0.561 2989 0.730 70.7% 0.223 290
16 Home Team Baseline Home Team Baseline Always favor the home team with a fixed prior. More → STRICT
290g
0.571 57.1% 0.246 0.685 2989 0.561 56.2% 0.248 290
- Efficiency Efficiency Tempo-adjusted efficiency version of Pythagorean ratings. More → FULL
no 7d
- - - - 0 - - - 0

Methodology

ELO / Bradley-Terry

  • ELO: Iterative updates, K=64, HCA=100
  • BT: Static logistic regression on all games
  • Both model win probability, not margin
  • ELO updates after each game; BT fits once

Pythagorean Models

  • Raw: Classic points scored/allowed formula
  • Efficiency: Pace-adjusted (pts per possession)
  • Adjusted: Opponent-adjusted efficiency
  • Log: Log-linear multiplicative scale

Margin Regression

  • Team-level ridge regression on point margin
  • Linear Bradley-Terry (margin target)
  • Alpha=0.05 (CV-tuned)
  • Learns home advantage from data (~6 pts)

Baselines

  • Home Team: Always predict home wins (60%)
  • Avg Margin: Higher average margin wins
  • Models should beat these to add value