Model Performance Analysis | NCAAW

2025-2026 NCAAW Model Performance Analysis

Scope

Only games where at least one team had an AP Poll rank on game day. All models are re-evaluated on that same subset, and AP Poll joins the comparison set here.

Season

2025-2026

Comparing prediction accuracy across 808 games using multiple rating models.

Model Catalog

7-day holdout coverage: 17/18 models .

Rolling Holdout Curves

Each point is a strict weekly holdout: train on all games before that week, test on that week. This first version uses a 21-day warmup, then 7-day holdouts stepped forward weekly.

Log Loss Brier AUC Accuracy

Weekly strict holdout log loss. Lower is better. Showing 16 models across 22 windows. Click legend items to hide/show series.

Recent Window Winners

Holdout	Best	Log Loss	Runner-up	Models
Apr 1 - Apr 5	Adjusted Context Blend	0.613	Elo (0.615)	16
Mar 25 - Mar 31	Adjusted Context Blend	0.345	Recency Ensemble (0.384)	16
Mar 18 - Mar 24	Adjusted Efficiency	0.277	Log Adjusted (0.278)	16
Mar 11 - Mar 17	Adjusted Context Blend	0.175	Log Adjusted (0.179)	16
Mar 4 - Mar 10	Margin Recency	0.456	Recency Ensemble (0.457)	16
Feb 25 - Mar 3	Recency Ensemble	0.390	Core Ensemble (0.391)	16
Feb 18 - Feb 24	Points Off/Def	0.410	Margin (0.411)	16
Feb 11 - Feb 17	Points Off/Def	0.346	Points Off/Def Recency (0.349)	16

Model Performance Leaderboard

Models ranked by strict holdout AUC when available (fallback: full-season AUC). Hover over column headers for explanations.

#	Model	7d Split	AUC	Acc	Brier	LogLoss	n	AUC 7d	Acc 7d	Brier 7d	n 7d
1	Elo Elo Streaming paired-comparison rating with recency baked into sequential updates. More →	STRICT 5g	0.920	84.8%	0.128	0.413	808	1.000	80.0%	0.177	5
2	Bradley-Terry Bradley-Terry Static logistic paired-comparison model with one team strength parameter. More →	STRICT 5g	0.935	85.0%	0.112	0.368	808	1.000	60.0%	0.186	5
3	Bradley-Terry Recency Bradley-Terry Recency Static Bradley-Terry with exponential recency weights on newer games. More →	STRICT 5g	-	-	-	-	0	1.000	100.0%	0.208	5
4	Dynamic Bradley-Terry Dynamic Bradley-Terry Time-evolving paired-comparison model with latent team strength drift. More →	STRICT 5g	-	-	-	-	0	1.000	80.0%	0.172	5
5	Efficiency Efficiency Tempo-adjusted efficiency version of Pythagorean ratings. More →	FULL no 7d	0.926	84.3%	0.109	0.354	808	-	-	-	0
6	Adjusted Context Blend Adjusted Context Blend Experimental context-heavy win model blending strong team components with rest and venue context. More →	STRICT 5g	-	-	-	-	0	0.833	80.0%	0.146	5
7	AP Poll AP Poll Human ranking baseline for games involving a ranked team. More →	STRICT 5g	0.880	84.8%	-	-	808	0.750	80.0%	-	5
8	Home Team Baseline Home Team Baseline Always favor the home team with a fixed prior. More →	STRICT 5g	0.661	66.3%	0.227	0.647	808	0.750	80.0%	0.200	5
9	Margin Margin Linear team-strength model fit on point differential instead of binary wins. More →	STRICT 5g	0.944	85.9%	0.098	0.317	808	0.667	80.0%	0.174	5
10	Margin Recency Margin Recency Margin regression with exponential recency weights on newer games. More →	STRICT 5g	-	-	-	-	0	0.667	60.0%	0.165	5
11	Pythagorean Pythagorean Pythagorean win expectation from raw points scored and allowed. More →	STRICT 5g	0.910	83.5%	0.171	0.514	808	0.667	40.0%	0.249	5
12	Adjusted Efficiency Adjusted Efficiency Opponent-adjusted efficiency model with separate offensive and defensive components. More →	STRICT 5g	0.943	86.1%	0.100	0.314	808	0.667	80.0%	0.188	5
13	Log Adjusted Log Adjusted Log-scale adjusted efficiency model that downweights blowout leverage. More →	STRICT 5g	0.943	85.8%	0.100	0.316	808	0.667	80.0%	0.176	5
14	Points Off/Def Points Off/Def Raw points regression with separate offensive and defensive team parameters. More →	STRICT 5g	0.943	85.9%	0.100	0.324	808	0.667	80.0%	0.170	5
15	Points Off/Def Recency Points Off/Def Recency Off/def points regression with exponential recency weights. More →	STRICT 5g	-	-	-	-	0	0.667	80.0%	0.170	5
16	Core Ensemble Core Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and points off/def. More →	STRICT 5g	-	-	-	-	0	0.667	80.0%	0.166	5
17	Recency Ensemble Recency Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and recency points off/def. More →	STRICT 5g	-	-	-	-	0	0.667	80.0%	0.166	5
18	Avg Margin Baseline Avg Margin Baseline Predict from simple average scoring margin in the training window. More →	STRICT 5g	0.918	84.3%	0.121	0.387	808	0.667	40.0%	0.245	5

Methodology

ELO / Bradley-Terry

ELO: Iterative updates, K=64, HCA=100
BT: Static logistic regression on all games
Both model win probability, not margin
ELO updates after each game; BT fits once

Pythagorean Models

Raw: Classic points scored/allowed formula
Efficiency: Pace-adjusted (pts per possession)
Adjusted: Opponent-adjusted efficiency
Log: Log-linear multiplicative scale

Margin Regression

Team-level ridge regression on point margin
Linear Bradley-Terry (margin target)
Alpha=0.05 (CV-tuned)
Learns home advantage from data (~6 pts)

Baselines

Home Team: Always predict home wins (60%)
Avg Margin: Higher average margin wins
Models should beat these to add value