RAPM Game Score Rollout
2026-03-08 • By Codex
This report documents the RAPM Game Score implementation as adjusted game plus-minus.
Metric Definition
For each player and game, we sum segment-level counterfactuals:
RAPM_Game_Score(player, game) = Σ [ actual_margin(segment) - predicted_margin_without_player(segment) ]
predicted_margin_without_playeruses previously fit RAPM coefficients.- Replacement level is RAPM
0, so "player removed" means substituting a0-impact player. - Interpretation: game impact adjusted for teammates/opponents and stint context.
Equivalent segment form used in code:
- Home player
i:score_i = beta_i * poss/100 + residual_home - Away player
j:score_j = beta_j * poss/100 - residual_home
where residual_home = actual_home_margin - expected_home_margin.
Implementation Hardening
Changes shipped in modeling/calculate_game_scores.py and schema/site integration:
- Added strict segment sanitation before accumulation (finite checks, possession bounds, strict valid-lineup checks).
- Added deterministic synthetic IDs for unresolved players (replaced process-random
hash()behavior). - Added game-level ambiguity splitting when multiple distinct names collide to one hub player in a single game.
- Added season snapshot write semantics (
DELETE league+seasonthen insert). - Upgraded table primary key to
(league, season, game_id, player_id). - Updated NCAA game page mapping to use
ncaa_boxscores.player_id -> ncaa_players.hub_player_id -> rapm_game_scores.player_idbridge, with league/season filters.
Validation Snapshot
Post-refresh QA after full recompute:
| League | Season | Rows | Games | Players | NaN Poss | Poss >220 | P99 Poss | Max Poss | P99 abs(score) | Max abs(score) | Poss>160 & Minutes<24* |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ncaam | 2024-2025 | 122,949 | 6,321 | 13,626 | 0 | 4 | 159.95 | 302.10 | 39.68 | 100.20 | 31 |
| ncaam | 2025-2026 | 88,246 | 4,329 | 10,786 | 0 | 5 | 146.53 | 266.01 | 36.06 | 91.57 | 5 |
* Minutes mismatch is computed with a bridge join:
rapm_game_scores.player_id (hub) -> ncaa_players.hub_player_id -> ncaa_boxscores.player_id.
Remaining Caveats
- A small number of possession outliers remain, mostly linked to incomplete/irregular source data (missing boxscore minutes or unstable source identity rows).
- These are now isolated edge cases instead of broad contamination.
Reproduce
python modeling/calculate_game_scores.py --league ncaam --season 2024-2025 --segments-source athena
python modeling/calculate_game_scores.py --league ncaam --season 2025-2026 --segments-source athena
pytest -q tests/test_rapm_game_scores_formula.py tests/test_rapm_game_player_mapping.py tests/test_rapm_lineup_filter.py site/tests/test_routes_smoke.py