<- Back to data projects

2025 / UC Berkeley (IEOR 242A)

NBA Career Trajectory Prediction

Predict career length, survival, and awards from early-career data

NBA logo

Overview

Built a reproducible pipeline to forecast NBA career outcomes using rookie + sophomore season data, draft context, and advanced stats, delivered as a report, codebase, and Streamlit app.

Problem

Front offices must evaluate long-term player trajectories using limited early-career signals. The goal was to predict career length, survival probability, and award likelihood with transparent, reproducible models.

Role

Team project (IEOR 242A final project)

Timeline

Fall 2025

Tools

Python / pandas / scikit-learn / Streamlit / joblib / LaTeX

Tags

ML / Classification / Regression / Forecasting / App

Data

  • - Historical player tables (draft, combine, per-game, advanced stats)
  • - Coverage through 2025/26 labels per report; processed into a modeling table
  • - Targets: career length, survival threshold, All-Star, MVP

Approach

  • - Built a unified modeling table from raw CSVs and standardized features
  • - Trained L1-regularized regression for career length and L1-logistic classifiers
  • - Calibrated sparse award probabilities and blended predictions with nearest analogs

Evaluation

  • - Representative validation: career length MAE ~1.8 seasons, RMSE ~3.4
  • - Survival ROC-AUC ~0.82; All-Star ROC-AUC ~0.71; MVP ROC-AUC ~0.62
  • - Compared against baseline heuristics and inspected calibration curves

Results

  • - Delivered a Streamlit app with fuzzy player search, analogs, and projections
  • - Packaged reproducible reports/slides and joblib model artifacts

Deployment & Monitoring

  • - Streamlit app supports model-backed inference or placeholder fallbacks
  • - Data refresh plan documented; pipelines are scriptable for new seasons

Limitations

  • - Award labels are sparse and era-dependent; outputs treated as exploratory
  • - Data coverage ends at season_id 42022 without newer raw tables

Gallery

ROC curves across targets
ROC curves across targets
Predicted award probabilities
Predicted award probabilities
Model summary table
Model summary table
Validation metrics summary
Validation metrics summary

Video

Final project video

Repro Steps

  • - Place raw CSVs in data/raw per README instructions
  • - Run notebooks or scripts to rebuild modeling_table.parquet
  • - Train models and launch Streamlit app

Next Steps

  • - Ingest updated box-score tables for recent seasons
  • - Replace heuristic projection blending with model-based forecasting
  • - Expand calibration for rare-event targets
View repository