2025 / UC Berkeley (IEOR 242A)
NBA Career Trajectory Prediction
Predict career length, survival, and awards from early-career data

Overview
Built a reproducible pipeline to forecast NBA career outcomes using rookie + sophomore season data, draft context, and advanced stats, delivered as a report, codebase, and Streamlit app.
Problem
Front offices must evaluate long-term player trajectories using limited early-career signals. The goal was to predict career length, survival probability, and award likelihood with transparent, reproducible models.
Role
Team project (IEOR 242A final project)
Timeline
Fall 2025
Tools
Python / pandas / scikit-learn / Streamlit / joblib / LaTeX
Tags
ML / Classification / Regression / Forecasting / App
Data
- - Historical player tables (draft, combine, per-game, advanced stats)
- - Coverage through 2025/26 labels per report; processed into a modeling table
- - Targets: career length, survival threshold, All-Star, MVP
Approach
- - Built a unified modeling table from raw CSVs and standardized features
- - Trained L1-regularized regression for career length and L1-logistic classifiers
- - Calibrated sparse award probabilities and blended predictions with nearest analogs
Evaluation
- - Representative validation: career length MAE ~1.8 seasons, RMSE ~3.4
- - Survival ROC-AUC ~0.82; All-Star ROC-AUC ~0.71; MVP ROC-AUC ~0.62
- - Compared against baseline heuristics and inspected calibration curves
Results
- - Delivered a Streamlit app with fuzzy player search, analogs, and projections
- - Packaged reproducible reports/slides and joblib model artifacts
Deployment & Monitoring
- - Streamlit app supports model-backed inference or placeholder fallbacks
- - Data refresh plan documented; pipelines are scriptable for new seasons
Limitations
- - Award labels are sparse and era-dependent; outputs treated as exploratory
- - Data coverage ends at season_id 42022 without newer raw tables
Gallery

ROC curves across targets

Predicted award probabilities

Model summary table

Validation metrics summary
Video
Final project video
Repro Steps
- - Place raw CSVs in data/raw per README instructions
- - Run notebooks or scripts to rebuild modeling_table.parquet
- - Train models and launch Streamlit app
Next Steps
- - Ingest updated box-score tables for recent seasons
- - Replace heuristic projection blending with model-based forecasting
- - Expand calibration for rare-event targets