How to grade an NBA rookie

Trying to put data to gut feelings

Each year, NBA franchises project the hope and ambition of their fanbase on the shoulders of their prized rookies. Fans of cellar-dwelling, tanking oddities like my darling Knicks explain away current losing by pointing to the promise of player development and future winning. A dunk here, a three there, and if you look with the right set of eyes, any rookie can be obscured into a star. Yet, the history of the NBA tells us that rookie performance, good or bad, may not necessarily project into a successful career: for every LeBron, who arrived with overwhelming hype and exceeded all of it, you get Michael Carter Williams, a Rookie of the Year winner who fell out out of the league five years and five teams later. Expected returns on draft investments keep front offices employed, but the science is never exact. Despite that, we’ll do our best to try.

Thanks to the invaluable Basketball Reference, we have extensive data on each rookie season for the history of the NBA, including detailed advanced stats around usage rate, adjusted box score plus/minus, and countless measures of efficiency, scoring, and utility on the court. These stats help visualize the successes, and areas for growth, for each of these players, and by combining them, we can try and understand a rookie’s performance in a more nuanced context.

To do that, I’m looking to build out a dataset that compiles all rookie seasons since 2000, through this current season (up to the All-Star break). To qualify for our dataset, the rookies have to have at least 20 appearances, along with an average of 15 minutes per game. We can combine their per game stats with the advanced metrics outlined above to get one holistic view of their seasons.

Programming in Python allows us to easily transform this dataset through some statistical packages like scipy.stats, and for our purposes we’ll focus on the z-score. This extensive post on Towards Data Science has a great overview, but in short, a z-score measures the number of standard deviations a value is from the mean. In NBA terms, Chris Paul had 7.8 assists a game as a rookie, which generates a z-score of roughly 4: he’s four full standard deviations above the mean per game assists for rookies, who only average 1.86 a game. By calculating the z-score for each of our target metrics, we can frame the performance of each rookie against their peers.

Those scores can then be combined to generate one composite score, which is useful for producing one unified number to rank each of the rookies. I chose a variety of metrics to build into my score, aiming to reward high performance in key areas while also including stats that apply across all positions.

Per game metrics in counting stats like points, rebounds, and assists were included to reflect basic requirements for stardom. We want our rookies to, at minimum, produce measurable impacts on the game, and these stats often form the first part of a conversation for rookie of the year.

For advanced metrics, I wanted to focus on two fronts. We need to capture their impact on the team’s success by focusing on WS/48 and BPM—definitions linked out in Basketball Reference’s glossary—while also including individual stats like VORP, PER, true shooting, usage, and free throw attempt rate. Win shares per 48 minutes measures how much a rookie contributes to a victory while adjusting for time played, and box score plus/minus analyzes what they contribute to a team’s final score. To avoid penalizing the players who walked into a terrible situation, I also included four of my favorite individual stats. Value over replacement player attempts to frame their performance in relative terms to their peers, while metrics like true shooting and free throw attempt rate measure efficiency in an increasingly efficient NBA. PER tries to combine all of that into one unified metric of value, while usage measures the relative load placed upon each player. Initial tests didn’t include usage, but without an acknowledgment of the offensive duties each player had, we ended up slightly overvaluing efficient big men without measuring the workload of playmaking guards.

My methodology incorporates all of these factors into our composite score. How does this look? Here’s the top and bottom ten scores since the 2000-2001 season:

At first glance, this list is incredibly promising. Five of our top six scorers won Rookie of the Year, while the sixth, Joel Embiid, had a transcendent but injury-shortened year that would have won in a landslide if he stayed healthy. I’m also really excited to see Big Honey, Mr. Nikola Jokic, popping up so highly in this list: he came in third in voting that season and has only gotten better each subsequent season, making me wonder if I’ve stumbled into something predictive and fun.

Check out the top 50 rookie seasons here, and the code I used to build out my model. In future posts, we’ll look at how my rankings align with rookie of the year winners, call out special seasons, and explore a bit around the 2018-2019 rookie class.