Model Performance Analysis | NCAAM

2025-2026 NCAAM Model Performance Analysis

Scope

Only games where at least one team had an AP Poll rank on game day. All models are re-evaluated on that same subset, and AP Poll joins the comparison set here.

Season

2025-2026 2024-2025 2023-2024 2022-2023 2021-2022 2020-2021 2019-2020 2018-2019 2017-2018 2016-2017 2015-2016 2014-2015 2013-2014 2012-2013 2011-2012 2010-2011 2009-2010 2008-2009 2007-2008 2006-2007 2005-2006 2004-2005 2003-2004

Comparing prediction accuracy across 831 games using multiple rating models.

Model Catalog

7-day holdout coverage: 17/18 models .

Rolling Holdout Curves

Each point is a strict weekly holdout: train on all games before that week, test on that week. This first version uses a 21-day warmup, then 7-day holdouts stepped forward weekly.

Log Loss Brier AUC Accuracy

Weekly strict holdout log loss. Lower is better. Showing 16 models across 22 windows. Click legend items to hide/show series.

Recent Window Winners

Holdout	Best	Log Loss	Runner-up	Models
Apr 1 - Apr 6	Adjusted Context Blend	0.463	Log Adjusted (0.490)	16
Mar 25 - Mar 31	Log Adjusted	0.591	Adjusted Efficiency (0.591)	16
Mar 18 - Mar 24	Adjusted Efficiency	0.427	Log Adjusted (0.428)	16
Mar 11 - Mar 17	Margin	0.550	Points Off/Def (0.552)	16
Mar 4 - Mar 10	Adjusted Context Blend	0.453	Points Off/Def (0.466)	16
Feb 25 - Mar 3	Points Off/Def Recency	0.592	Points Off/Def (0.601)	16
Feb 18 - Feb 24	Points Off/Def Recency	0.609	Points Off/Def (0.609)	16
Feb 11 - Feb 17	Points Off/Def	0.517	Points Off/Def Recency (0.525)	16

Model Performance Leaderboard

Models ranked by strict holdout AUC when available (fallback: full-season AUC). Hover over column headers for explanations.

#	Model	7d Split	AUC	Acc	Brier	LogLoss	n	AUC 7d	Acc 7d	Brier 7d	n 7d
1	AP Poll AP Poll Human ranking baseline for games involving a ranked team. More →	STRICT 5g	0.804	76.1%	-	-	831	1.000	80.0%	-	5
2	Adjusted Context Blend Adjusted Context Blend Experimental context-heavy win model blending strong team components with rest and venue context. More →	STRICT 5g	-	-	-	-	0	1.000	80.0%	0.143	5
3	Efficiency Efficiency Tempo-adjusted efficiency version of Pythagorean ratings. More →	FULL no 7d	0.859	77.6%	0.152	0.464	831	-	-	-	0
4	Margin Margin Linear team-strength model fit on point differential instead of binary wins. More →	STRICT 5g	0.878	80.4%	0.144	0.447	831	0.833	80.0%	0.157	5
5	Margin Recency Margin Recency Margin regression with exponential recency weights on newer games. More →	STRICT 5g	-	-	-	-	0	0.833	60.0%	0.191	5
6	Adjusted Efficiency Adjusted Efficiency Opponent-adjusted efficiency model with separate offensive and defensive components. More →	STRICT 5g	0.880	80.1%	0.140	0.427	831	0.833	80.0%	0.161	5
7	Log Adjusted Log Adjusted Log-scale adjusted efficiency model that downweights blowout leverage. More →	STRICT 5g	0.880	80.3%	0.140	0.427	831	0.833	80.0%	0.160	5
8	Points Off/Def Points Off/Def Raw points regression with separate offensive and defensive team parameters. More →	STRICT 5g	0.877	80.6%	0.147	0.456	831	0.833	80.0%	0.163	5
9	Home Team Baseline Home Team Baseline Always favor the home team with a fixed prior. More →	STRICT 5g	0.674	67.7%	0.225	0.642	831	0.750	80.0%	0.200	5
10	Points Off/Def Recency Points Off/Def Recency Off/def points regression with exponential recency weights. More →	STRICT 5g	-	-	-	-	0	0.667	60.0%	0.209	5
11	Core Ensemble Core Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and points off/def. More →	STRICT 5g	-	-	-	-	0	0.667	40.0%	0.208	5
12	Recency Ensemble Recency Ensemble Equal-logit blend of Elo, recency BT, recency margin, log-adjusted pyth, and recency points off/def. More →	STRICT 5g	-	-	-	-	0	0.667	40.0%	0.220	5
13	Dynamic Bradley-Terry Dynamic Bradley-Terry Time-evolving paired-comparison model with latent team strength drift. More →	STRICT 5g	-	-	-	-	0	0.500	60.0%	0.260	5
14	Elo Elo Streaming paired-comparison rating with recency baked into sequential updates. More →	STRICT 5g	0.854	78.3%	0.155	0.480	831	0.333	40.0%	0.300	5
15	Bradley-Terry Bradley-Terry Static logistic paired-comparison model with one team strength parameter. More →	STRICT 5g	0.874	79.3%	0.143	0.445	831	0.333	20.0%	0.288	5
16	Avg Margin Baseline Avg Margin Baseline Predict from simple average scoring margin in the training window. More →	STRICT 5g	0.857	78.2%	0.164	0.504	831	0.333	40.0%	0.285	5
17	Bradley-Terry Recency Bradley-Terry Recency Static Bradley-Terry with exponential recency weights on newer games. More →	STRICT 5g	-	-	-	-	0	0.167	20.0%	0.318	5
18	Pythagorean Pythagorean Pythagorean win expectation from raw points scored and allowed. More →	STRICT 5g	0.842	76.7%	0.198	0.581	831	0.167	40.0%	0.271	5

Methodology

ELO / Bradley-Terry

ELO: Iterative updates, K=64, HCA=100
BT: Static logistic regression on all games
Both model win probability, not margin
ELO updates after each game; BT fits once

Pythagorean Models

Raw: Classic points scored/allowed formula
Efficiency: Pace-adjusted (pts per possession)
Adjusted: Opponent-adjusted efficiency
Log: Log-linear multiplicative scale

Margin Regression

Team-level ridge regression on point margin
Linear Bradley-Terry (margin target)
Alpha=0.05 (CV-tuned)
Learns home advantage from data (~6 pts)

Baselines

Home Team: Always predict home wins (60%)
Avg Margin: Higher average margin wins
Models should beat these to add value