Breaking Down the 2015-2016 NBA Season

In this article, I will use Data Science / Machine Learning methodologies to break down the real factors separating the playoff from non-playoff teams. In particular, I used the data from to associate 44 predictor variables which each team: “FG” “FGA” “FG.” “X3P” “X3PA” “X3P.” “X2P” “X2PA” “X2P.” “FT” “FTA” “FT.” “ORB” “DRB” “TRB” “AST”   “STL” “BLK” “TOV” “PF” “PTS” “PS.G” “oFG” “oFGA” “oFG.” “o3P” “o3PA” “o3P.” “o2P” “o2PA” “o2P.” “oFT”   “oFTA” “oFT.” “oORB” “oDRB” “oTRB” “oAST” “oSTL” “oBLK” “oTOV” “oPF” “oPTS” “oPS.G”

, where a letter ‘o’ before the last 22 predictor variables indicates a defensive variable. (‘o’ stands for opponent. )

Using principal components analysis (PCA), I was able to project this 44-dimensional data set to a 5-D dimensional data set. That is, the first 5 principal components were found to explain 85% of the variance. 

Here are the various biplots: 

In these plots, the teams are grouped according to whether they made the playoffs or not. 

One sees from this biplot of the first two principal components that the dominant component along the first PC is 3 point attempts, while the dominant component along the second PC is opponent points. CLE and TOR have a high negative score along the second PC indicating a strong defensive performance. Indeed, one suspects that the final separating factor that led CLE to the championship was their defensive play as opposed to 3-point shooting which all-in-all didn’t do GSW any favours. This is in line with some of my previous analyses

Live Metrics for NBA Games

Yesterday for the first time, I took the playoff game between Cleveland and Toronto as an opportunity to test out a script I wrote in R that keeps track of key statistics during a game in real time (well, every 30 seconds). Based on previous work, it is evident that championship-calibre teams are the ones that have excellent 2PT-FG% and the ability to draw fouls, so I tracked these during the game, and I came up with the following plot of several time series:

One sees for example that while Toronto started off the game with a much higher 2PT FG%, towards the end Cleveland ended up winning that battle.

A video of this animation is as follows (set the YouTube player to 1080p + FullScreen for Max Quality!)

An interesting question to ask is how are these series correlated? Well, let’s see:

In this correlation plot, “pd” indicates point difference, “PF” indicates personal fouls, “2PFG.” indicates 2-Point field goal percentage.

One sees immediately from the correlation plot above that there is a very strong correlation between Cleveland’s point difference  and Toronto’s personal fouls, with some strong correlations attributed to Cleveland’s 2-Point FG% as well.  The equal and opposite is true for Toronto’s point difference. It seems that during a game of this intensity in the playoffs, drawing fouls is a very important factor in determining which team leads and eventually wins in the game combined with 2-Point field goal percentage.




The Three-Point Shot Delusion

The vast majority of NBA analysts claim today that the NBA has changed. It has become more fast-paced, and there is a significantly greater emphasis on teams attempting more three point shots. The evidence for this is the repeated recital of the fact that over the last number of years, the average three-point attempt rate has increased. An example of such an article can be found here. 

It is my hypothesis that this is all based on a very shallow analysis of what is actually going on. In particular, there are more than 60 variables on that classify each team’s play. It seems strange that analysts have picked out one statistic, noticed a trend, and have made conclusions ushering in the “modern-day” NBA. As I will demonstrate below, using concepts from statistical and machine learning, many things have been missed in their analyses. What is even more strange is that there have been an increasing number of articles claiming that, for example, if teams do not shoot more three point shots, they will probably not make the playoffs or win a championship. Examples of such articles can be found here, here, and here.

I will now demonstrate why all of these analyses are incomplete, and why their conclusions are wholly incorrect.

Using the great service provided by, I looked at the last 15 seasons of  every NBA team, looking at more than 60 predictor variables that classified each team’s performance in the season. Some of these included: MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS PTS/G oG oMP oFG oFGA oFG% o3P o3PA o3P% o2P o2PA o2P% oFT oFTA oFT% oORB oDRB oTRB oAST oSTL oBLK oTOV oPF oPTS oPTS/G MOV SOS SRS ORtg DRtg Pace FTr 3PAr TOV% ORB% FT/FGA  TOV% DRB% FT/FGA, where a small “o” indicates a team’s opponent’s statistics.

What classifies a playoff team?

Building a classification tree, I wanted to analyze what factors specifically lead to a team making the playoffs in a given season. I found the following:


(For this classification tree, the misclassification error rate was 2.73% indicating a good fit to the data.)


At the top of the tree, we see that the distinguishing factor is the average MOV/”Margin of Victory” measured per game. Teams that on average beat their opponents by more than 2.695 points are predicted to make the playoffs, while teams that on average lose by more than 1.825 points are predicted to not make the playoffs. Further, the only factor relating to three-point shooting  in this entire classification tree is the o3PA, which is the number of opponent 3-point attempts per game. For example, suppose a team can has an average MOV of less than -0.54 but greater than -1.825. If that team’s opponent attempts more than 16.0732 3-point shots per game, the team is expected to make the playoffs. In this particular case, getting your opponent to take a lot of three point shots is indeed desirable, and leads to the expectation of a team making the playoffs.


What classifies a championship team?

The next question to analyze is what characteristics/features classify a championship team. Looking at the last 20 years of playoff data, we see that the following classification tree describes the championship criteria for a given NBA playoff team.


(The learning error rate was 1.172% indicating an excellent fit to the data). One sees that at the very top is a team opponent’s field goal percentage (OFG.). If the average per game OFG% is greater than 44.95%, that team is predicted to not win a championship. Further, there are apparently three predicted paths to a championship:

  1. OFG% < 44.95 –> ORtg (Opponent Team Points Scored per 100 possessions) < 108.55 –> FT% < 73.5% –> Opponent Offensive Rebounds per game (OORB) < 30.2405 –> Personal Fouls per game (PF) < 24.1467
  2. OFG% < 44.95 –> ORtg > 108.55 –> O3P% < 32.45%
  3. OFG% < 44.95 –> ORtg > 108.55 –> O3P% > 32.45% –> AST > 19.9076 –> OAST < 19.0938

This shows once again that the three point shot is not at all relevant in winning a championship amongst playoff teams, in that, shooting a lot of threes, or playing as a “modern” team, does not uniquely determine a team’s success. What is tremendously important is defense, and offensive efficiency, and there are multiple ways to achieve this. One does not need to be a prolific three-point shooting team to achieve these metrics. 



The increasing  trend of teams shooting more threes and playing at a higher pace still does not uniquely determine whether a team will make the playoffs or win a championship, which is why I have called it a “delusion”. Indeed, the common statement that “nowadays, teams that make the playoffs also have the highest number of three-point shot attempts” is a very shallow statement, and is not actually why teams make the playoffs as this analysis very clearly shows. Further, attempting more three-point shots is not at all uniquely indicative of a team’s success in winning a championship.