🐻⬇️🏀

Elo vs Bradley-Terry: Dynamic vs Static Ratings

2026-01-03 • By Sports Data

Elo vs Bradley-Terry: Dynamic vs Static Team Ratings

This report compares two fundamental approaches to rating sports teams: 1. Elo — Iterative updates after each game (dynamic) 2. Bradley-Terry — Global logistic regression over all games (static)

The Models

Elo (Dynamic)

Updates ratings after each game using the classic formula:

R_new = R_old + K × (Actual - Expected)
Expected = 1 / (1 + 10^((R_opponent - R_team - HCA) / 400))

Tuned parameters: - K-factor: 64 (how fast ratings change) - HCA: 150 rating points (home court advantage)

Bradley-Terry (Static)

Solves for the single best rating for each team that maximizes the likelihood of all observed game outcomes:

P(A beats B) = 1 / (1 + 10^((R_B - R_A - β_home) / 400))

Tuned parameters: - Regularization C: 1.0 (inverse of L2 penalty) - β_home: 201 rating points (learned, not fixed)

Methodology

Time-series cross-validation: - Train: 2024-25 season (1,706 games) - Test: 2025-26 season (2,527 games) - Metric: Log Loss (lower = better calibrated probabilities)

Results

Model Log Loss Accuracy HCA
Elo 0.576 69.6% 150
Bradley-Terry 0.611 62.5% 201

Elo outperforms Bradley-Terry on out-of-sample prediction.

Why Does Elo Win?

  1. Teams change during the season. Injuries, player development, and coaching adjustments make early-season performance less predictive of late-season outcomes.

  2. Elo adapts. Its iterative updates naturally weight recent games more heavily, tracking a team's current form.

  3. Bradley-Terry overfits to early noise. It treats November games as equally informative as January games, even though early-season data is noisier.

Trade-offs

Aspect Elo Bradley-Terry
Path dependent? Yes (game order matters) No
Converges to MLE? Only approximately Yes (global optimum)
Captures team evolution? Yes No
Better for prediction? ✅ Yes No
Better for "season average"? No ✅ Yes

Conclusion

For prediction, use Elo. It adapts to current form.

For historical analysis ("Who was best on average?"), Bradley-Terry provides a cleaner, path-independent answer.

We now run both models daily for comparison.