🐻⬇️🏀

2025-2026 NCAAW Model Performance Analysis

Scope

All scored games in the selected league and season. AP Poll is excluded here.

Comparing prediction accuracy across 2698 games using multiple rating models.

Model Catalog

7-day holdout coverage: 16/17 models .

Rolling Holdout Curves

Each point is a strict weekly holdout: train on all games before that week, test on that week. This first version uses a 21-day warmup, then 7-day holdouts stepped forward weekly.

Log Loss Brier AUC Accuracy

Weekly strict holdout log loss. Lower is better. Showing 16 models across 22 windows. Click legend items to hide/show series.

Recent Window Winners

Holdout Best Log Loss Runner-up Models
Apr 1 - Apr 5 Elo 0.673 Pythagorean (0.687) 16
Mar 25 - Mar 31 Adjusted Context Blend 0.474 Points Off/Def (0.500) 16
Mar 18 - Mar 24 Adjusted Efficiency 0.431 Log Adjusted (0.431) 16
Mar 11 - Mar 17 Margin Recency 0.603 Margin (0.611) 16
Mar 4 - Mar 10 Recency Ensemble 0.500 Core Ensemble (0.501) 16
Feb 25 - Mar 3 Recency Ensemble 0.484 Core Ensemble (0.484) 16
Feb 18 - Feb 24 Recency Ensemble 0.515 Core Ensemble (0.516) 16
Feb 11 - Feb 17 Core Ensemble 0.485 Recency Ensemble (0.485) 16

Model Performance Leaderboard

Models ranked by strict holdout AUC when available (fallback: full-season AUC). Hover over column headers for explanations.

# Model 7d Split AUC Acc Brier LogLoss n AUC 7d Acc 7d Brier 7d n 7d
1 Efficiency Efficiency Tempo-adjusted efficiency version of Pythagorean ratings. More → FULL
no 7d
0.840 75.4% 0.170 0.528 5048 - - - 0
2 Bradley-Terry Bradley-Terry Static logistic paired-comparison model with one team strength parameter. More → STRICT
14g
0.876 78.8% 0.149 0.462 5048 0.800 57.1% 0.212 14
3 Margin Recency Margin Recency Margin regression with exponential recency weights on newer games. More → STRICT
14g
- - - - 0 0.800 71.4% 0.180 14
4 Points Off/Def Recency Points Off/Def Recency Off/def points regression with exponential recency weights. More → STRICT
14g
- - - - 0 0.756 78.6% 0.180 14
5 Core Ensemble Core Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and points off/def. More → STRICT
14g
- - - - 0 0.756 71.4% 0.194 14
6 Recency Ensemble Recency Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and recency points off/def. More → STRICT
14g
- - - - 0 0.756 71.4% 0.191 14
7 Dynamic Bradley-Terry Dynamic Bradley-Terry Time-evolving paired-comparison model with latent team strength drift. More → STRICT
14g
- - - - 0 0.733 78.6% 0.204 14
8 Margin Margin Linear team-strength model fit on point differential instead of binary wins. More → STRICT
14g
0.878 79.4% 0.148 0.455 5048 0.711 71.4% 0.202 14
9 Adjusted Efficiency Adjusted Efficiency Opponent-adjusted efficiency model with separate offensive and defensive components. More → STRICT
14g
0.871 78.0% 0.151 0.466 5048 0.711 71.4% 0.208 14
10 Log Adjusted Log Adjusted Log-scale adjusted efficiency model that downweights blowout leverage. More → STRICT
14g
0.870 78.0% 0.151 0.468 5048 0.711 71.4% 0.204 14
11 Points Off/Def Points Off/Def Raw points regression with separate offensive and defensive team parameters. More → STRICT
14g
0.870 78.1% 0.153 0.469 5048 0.711 78.6% 0.197 14
12 Elo Elo Streaming paired-comparison rating with recency baked into sequential updates. More → STRICT
14g
0.841 75.4% 0.168 0.511 5048 0.689 64.3% 0.215 14
13 Bradley-Terry Recency Bradley-Terry Recency Static Bradley-Terry with exponential recency weights on newer games. More → STRICT
14g
- - - - 0 0.689 57.1% 0.224 14
14 Adjusted Context Blend Adjusted Context Blend Experimental context-heavy win model blending strong team components with rest and venue context. More → STRICT
14g
- - - - 0 0.667 64.3% 0.216 14
15 Avg Margin Baseline Avg Margin Baseline Predict from simple average scoring margin in the training window. More → STRICT
14g
0.843 75.9% 0.166 0.502 5048 0.644 50.0% 0.226 14
16 Home Team Baseline Home Team Baseline Always favor the home team with a fixed prior. More → STRICT
14g
0.595 59.6% 0.241 0.675 5048 0.633 64.3% 0.231 14
17 Pythagorean Pythagorean Pythagorean win expectation from raw points scored and allowed. More → STRICT
14g
0.794 73.6% 0.189 0.563 5048 0.600 50.0% 0.226 14

Methodology

ELO / Bradley-Terry

  • ELO: Iterative updates, K=64, HCA=100
  • BT: Static logistic regression on all games
  • Both model win probability, not margin
  • ELO updates after each game; BT fits once

Pythagorean Models

  • Raw: Classic points scored/allowed formula
  • Efficiency: Pace-adjusted (pts per possession)
  • Adjusted: Opponent-adjusted efficiency
  • Log: Log-linear multiplicative scale

Margin Regression

  • Team-level ridge regression on point margin
  • Linear Bradley-Terry (margin target)
  • Alpha=0.05 (CV-tuned)
  • Learns home advantage from data (~6 pts)

Baselines

  • Home Team: Always predict home wins (60%)
  • Avg Margin: Higher average margin wins
  • Models should beat these to add value