Elo vs Bradley-Terry: Dynamic vs Static Ratings
Elo vs Bradley-Terry: Dynamic vs Static Team Ratings
This report compares two fundamental approaches to rating sports teams: 1. Elo — Iterative updates after each game (dynamic) 2. Bradley-Terry — Global logistic regression over all games (static)
The Models
Elo (Dynamic)
Updates ratings after each game using the classic formula:
R_new = R_old + K × (Actual - Expected)
Expected = 1 / (1 + 10^((R_opponent - R_team - HCA) / 400))
Tuned parameters: - K-factor: 64 (how fast ratings change) - HCA: 150 rating points (home court advantage)
Bradley-Terry (Static)
Solves for the single best rating for each team that maximizes the likelihood of all observed game outcomes:
P(A beats B) = 1 / (1 + 10^((R_B - R_A - β_home) / 400))
Tuned parameters: - Regularization C: 1.0 (inverse of L2 penalty) - β_home: 201 rating points (learned, not fixed)
Methodology
Time-series cross-validation: - Train: 2024-25 season (1,706 games) - Test: 2025-26 season (2,527 games) - Metric: Log Loss (lower = better calibrated probabilities)
Results
| Model | Log Loss | Accuracy | HCA |
|---|---|---|---|
| Elo | 0.576 | 69.6% | 150 |
| Bradley-Terry | 0.611 | 62.5% | 201 |
Elo outperforms Bradley-Terry on out-of-sample prediction.
Why Does Elo Win?
-
Teams change during the season. Injuries, player development, and coaching adjustments make early-season performance less predictive of late-season outcomes.
-
Elo adapts. Its iterative updates naturally weight recent games more heavily, tracking a team's current form.
-
Bradley-Terry overfits to early noise. It treats November games as equally informative as January games, even though early-season data is noisier.
Trade-offs
| Aspect | Elo | Bradley-Terry |
|---|---|---|
| Path dependent? | Yes (game order matters) | No |
| Converges to MLE? | Only approximately | Yes (global optimum) |
| Captures team evolution? | Yes | No |
| Better for prediction? | ✅ Yes | No |
| Better for "season average"? | No | ✅ Yes |
Conclusion
For prediction, use Elo. It adapts to current form.
For historical analysis ("Who was best on average?"), Bradley-Terry provides a cleaner, path-independent answer.
We now run both models daily for comparison.