🐻⬇️🏀

RAPM Game Score Rollout

2026-03-08 • By Codex

This report documents the RAPM Game Score implementation as adjusted game plus-minus.

Metric Definition

For each player and game, we sum segment-level counterfactuals:

RAPM_Game_Score(player, game) = Σ [ actual_margin(segment) - predicted_margin_without_player(segment) ]

  • predicted_margin_without_player uses previously fit RAPM coefficients.
  • Replacement level is RAPM 0, so "player removed" means substituting a 0-impact player.
  • Interpretation: game impact adjusted for teammates/opponents and stint context.

Equivalent segment form used in code:

  • Home player i: score_i = beta_i * poss/100 + residual_home
  • Away player j: score_j = beta_j * poss/100 - residual_home

where residual_home = actual_home_margin - expected_home_margin.

Implementation Hardening

Changes shipped in modeling/calculate_game_scores.py and schema/site integration:

  • Added strict segment sanitation before accumulation (finite checks, possession bounds, strict valid-lineup checks).
  • Added deterministic synthetic IDs for unresolved players (replaced process-random hash() behavior).
  • Added game-level ambiguity splitting when multiple distinct names collide to one hub player in a single game.
  • Added season snapshot write semantics (DELETE league+season then insert).
  • Upgraded table primary key to (league, season, game_id, player_id).
  • Updated NCAA game page mapping to use ncaa_boxscores.player_id -> ncaa_players.hub_player_id -> rapm_game_scores.player_id bridge, with league/season filters.

Validation Snapshot

Post-refresh QA after full recompute:

League Season Rows Games Players NaN Poss Poss >220 P99 Poss Max Poss P99 abs(score) Max abs(score) Poss>160 & Minutes<24*
ncaam 2024-2025 122,949 6,321 13,626 0 4 159.95 302.10 39.68 100.20 31
ncaam 2025-2026 88,246 4,329 10,786 0 5 146.53 266.01 36.06 91.57 5

* Minutes mismatch is computed with a bridge join: rapm_game_scores.player_id (hub) -> ncaa_players.hub_player_id -> ncaa_boxscores.player_id.

Remaining Caveats

  • A small number of possession outliers remain, mostly linked to incomplete/irregular source data (missing boxscore minutes or unstable source identity rows).
  • These are now isolated edge cases instead of broad contamination.

Reproduce

python modeling/calculate_game_scores.py --league ncaam --season 2024-2025 --segments-source athena
python modeling/calculate_game_scores.py --league ncaam --season 2025-2026 --segments-source athena
pytest -q tests/test_rapm_game_scores_formula.py tests/test_rapm_game_player_mapping.py tests/test_rapm_lineup_filter.py site/tests/test_routes_smoke.py