Model Performance Analysis | NCAAWD2

2025-2026 NCAAWD2 Model Performance Analysis

Scope

All Games Ranked Games

All scored games in the selected league and season. AP Poll is excluded here.

Season

2025-2026

Comparing prediction accuracy across 1722 games using multiple rating models.

Model Catalog

7-day holdout coverage: 16/17 models .

Rolling Holdout Curves

Each point is a strict weekly holdout: train on all games before that week, test on that week. This first version uses a 21-day warmup, then 7-day holdouts stepped forward weekly.

Log Loss Brier AUC Accuracy

Weekly strict holdout log loss. Lower is better. Showing 16 models across 14 windows. Click legend items to hide/show series.

Recent Window Winners

Holdout	Best	Log Loss	Runner-up	Models
Feb 4 - Feb 8	Recency Ensemble	0.495	Margin Recency (0.496)	16
Jan 28 - Feb 3	Margin Recency	0.477	Margin (0.478)	16
Jan 21 - Jan 27	Margin Recency	0.476	Recency Ensemble (0.477)	16
Jan 14 - Jan 20	Recency Ensemble	0.487	Core Ensemble (0.487)	16
Jan 7 - Jan 13	Core Ensemble	0.462	Recency Ensemble (0.463)	16
Dec 31 - Jan 6	Dynamic Bradley-Terry	0.574	Margin Recency (0.581)	16
Dec 24 - Dec 30	Log Adjusted	0.055	Adjusted Efficiency (0.055)	16
Dec 17 - Dec 23	Recency Ensemble	0.514	Core Ensemble (0.515)	16

Model Performance Leaderboard

Models ranked by strict holdout AUC when available (fallback: full-season AUC). Hover over column headers for explanations.

#	Model	7d Split	AUC	Acc	Brier	LogLoss	n	AUC 7d	Acc 7d	Brier 7d	n 7d
1	Margin Recency Margin Recency Margin regression with exponential recency weights on newer games. More →	STRICT 290g	-	-	-	-	0	0.856	76.6%	0.159	290
2	Adjusted Context Blend Adjusted Context Blend Experimental context-heavy win model blending strong team components with rest and venue context. More →	STRICT 290g	-	-	-	-	0	0.853	75.5%	0.159	290
3	Margin Margin Linear team-strength model fit on point differential instead of binary wins. More →	STRICT 290g	0.848	76.7%	0.161	0.492	2989	0.851	76.9%	0.161	290
4	Core Ensemble Core Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and points off/def. More →	STRICT 290g	-	-	-	-	0	0.851	76.9%	0.158	290
5	Recency Ensemble Recency Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and recency points off/def. More →	STRICT 290g	-	-	-	-	0	0.851	76.9%	0.158	290
6	Points Off/Def Recency Points Off/Def Recency Off/def points regression with exponential recency weights. More →	STRICT 290g	-	-	-	-	0	0.848	76.2%	0.163	290
7	Points Off/Def Points Off/Def Raw points regression with separate offensive and defensive team parameters. More →	STRICT 290g	0.797	72.4%	0.185	0.550	2989	0.840	76.9%	0.165	290
8	Dynamic Bradley-Terry Dynamic Bradley-Terry Time-evolving paired-comparison model with latent team strength drift. More →	STRICT 290g	-	-	-	-	0	0.836	76.2%	0.166	290
9	Adjusted Efficiency Adjusted Efficiency Opponent-adjusted efficiency model with separate offensive and defensive components. More →	STRICT 290g	0.797	72.5%	0.196	0.626	2989	0.836	75.5%	0.174	290
10	Log Adjusted Log Adjusted Log-scale adjusted efficiency model that downweights blowout leverage. More →	STRICT 290g	0.797	72.5%	0.196	0.622	2989	0.834	75.5%	0.175	290
11	Elo Elo Streaming paired-comparison rating with recency baked into sequential updates. More →	STRICT 290g	0.847	75.9%	0.163	0.496	2989	0.831	75.5%	0.169	290
12	Bradley-Terry Bradley-Terry Static logistic paired-comparison model with one team strength parameter. More →	STRICT 290g	0.858	76.5%	0.157	0.482	2989	0.828	77.6%	0.171	290
13	Bradley-Terry Recency Bradley-Terry Recency Static Bradley-Terry with exponential recency weights on newer games. More →	STRICT 290g	-	-	-	-	0	0.828	74.5%	0.174	290
14	Avg Margin Baseline Avg Margin Baseline Predict from simple average scoring margin in the training window. More →	STRICT 290g	0.849	76.5%	0.161	0.489	2989	0.806	74.1%	0.180	290
15	Pythagorean Pythagorean Pythagorean win expectation from raw points scored and allowed. More →	STRICT 290g	0.787	71.7%	0.190	0.561	2989	0.730	70.7%	0.223	290
16	Home Team Baseline Home Team Baseline Always favor the home team with a fixed prior. More →	STRICT 290g	0.570	57.0%	0.246	0.685	2989	0.561	56.2%	0.248	290
-	Efficiency Efficiency Tempo-adjusted efficiency version of Pythagorean ratings. More →	FULL no 7d	-	-	-	-	0	-	-	-	0

Methodology

ELO / Bradley-Terry

ELO: Iterative updates, K=64, HCA=100
BT: Static logistic regression on all games
Both model win probability, not margin
ELO updates after each game; BT fits once

Pythagorean Models

Raw: Classic points scored/allowed formula
Efficiency: Pace-adjusted (pts per possession)
Adjusted: Opponent-adjusted efficiency
Log: Log-linear multiplicative scale

Margin Regression

Team-level ridge regression on point margin
Linear Bradley-Terry (margin target)
Alpha=0.05 (CV-tuned)
Learns home advantage from data (~6 pts)

Baselines

Home Team: Always predict home wins (60%)
Avg Margin: Higher average margin wins
Models should beat these to add value