What Do NBA Playoff Teams Have in Common?

I’ve been interested for some time on figuring out an analytical way to determine what characterizes an NBA team as a playoff team. Looking at the previous six seasons, I pulled together almost 65 different statistics that characterize how a team plays, and then performed a classification tree analysis. I found the following result:

For the above tree, the misclassification error rate was 2.73%. Also, MOV stands for margin of victory, o3PA is the number of opponent three-point attempts per game, DRtg, is defensive rating, which is the number of points a team allows per 100 possessions, and so on. The data itself was taken from Basketball-Reference.com.

We see that the following patterns emerge among NBA playoff teams over the past number of seasons.

  1. MOV > 2.695
  2. MOV < -0.54, MOV > -1.825, Opponent 3PA > 16.0732, Defensive Rating < 106.05
  3. MOV < -0.54, MOV > -1.825, Opponent 3PA > 16.0732, Defensive Rating > 106.05, FGA < 80.2195
  4. MOV < 2.695, Opponent FGA < 82.0671, MOV < 0.295, Opponent FT > 16.7866
  5. MOV < 2.695, Opponent FGA < 82.0671, MOV > 0.295
  6. MOV < 2.695, Opponent FGA > 82.0671,  Opponent DRB > 29.7683, FGA < 83.128
  7. MOV < 2.695, Opponent FGA > 82.0671,  Opponent DRB > 29.7683, FGA < 83.128, MOV < 2.17



The Three-Point Shot Delusion

The vast majority of NBA analysts claim today that the NBA has changed. It has become more fast-paced, and there is a significantly greater emphasis on teams attempting more three point shots. The evidence for this is the repeated recital of the fact that over the last number of years, the average three-point attempt rate has increased. An example of such an article can be found here. 

It is my hypothesis that this is all based on a very shallow analysis of what is actually going on. In particular, there are more than 60 variables on Basketball-Reference.com that classify each team’s play. It seems strange that analysts have picked out one statistic, noticed a trend, and have made conclusions ushering in the “modern-day” NBA. As I will demonstrate below, using concepts from statistical and machine learning, many things have been missed in their analyses. What is even more strange is that there have been an increasing number of articles claiming that, for example, if teams do not shoot more three point shots, they will probably not make the playoffs or win a championship. Examples of such articles can be found here, here, and here.

I will now demonstrate why all of these analyses are incomplete, and why their conclusions are wholly incorrect.

Using the great service provided by Basketball-Reference.com, I looked at the last 15 seasons of  every NBA team, looking at more than 60 predictor variables that classified each team’s performance in the season. Some of these included: MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS PTS/G oG oMP oFG oFGA oFG% o3P o3PA o3P% o2P o2PA o2P% oFT oFTA oFT% oORB oDRB oTRB oAST oSTL oBLK oTOV oPF oPTS oPTS/G MOV SOS SRS ORtg DRtg Pace FTr 3PAr TOV% ORB% FT/FGA  TOV% DRB% FT/FGA, where a small “o” indicates a team’s opponent’s statistics.

What classifies a playoff team?

Building a classification tree, I wanted to analyze what factors specifically lead to a team making the playoffs in a given season. I found the following:


(For this classification tree, the misclassification error rate was 2.73% indicating a good fit to the data.)


At the top of the tree, we see that the distinguishing factor is the average MOV/”Margin of Victory” measured per game. Teams that on average beat their opponents by more than 2.695 points are predicted to make the playoffs, while teams that on average lose by more than 1.825 points are predicted to not make the playoffs. Further, the only factor relating to three-point shooting  in this entire classification tree is the o3PA, which is the number of opponent 3-point attempts per game. For example, suppose a team can has an average MOV of less than -0.54 but greater than -1.825. If that team’s opponent attempts more than 16.0732 3-point shots per game, the team is expected to make the playoffs. In this particular case, getting your opponent to take a lot of three point shots is indeed desirable, and leads to the expectation of a team making the playoffs.


What classifies a championship team?

The next question to analyze is what characteristics/features classify a championship team. Looking at the last 20 years of playoff data, we see that the following classification tree describes the championship criteria for a given NBA playoff team.


(The learning error rate was 1.172% indicating an excellent fit to the data). One sees that at the very top is a team opponent’s field goal percentage (OFG.). If the average per game OFG% is greater than 44.95%, that team is predicted to not win a championship. Further, there are apparently three predicted paths to a championship:

  1. OFG% < 44.95 –> ORtg (Opponent Team Points Scored per 100 possessions) < 108.55 –> FT% < 73.5% –> Opponent Offensive Rebounds per game (OORB) < 30.2405 –> Personal Fouls per game (PF) < 24.1467
  2. OFG% < 44.95 –> ORtg > 108.55 –> O3P% < 32.45%
  3. OFG% < 44.95 –> ORtg > 108.55 –> O3P% > 32.45% –> AST > 19.9076 –> OAST < 19.0938

This shows once again that the three point shot is not at all relevant in winning a championship amongst playoff teams, in that, shooting a lot of threes, or playing as a “modern” team, does not uniquely determine a team’s success. What is tremendously important is defense, and offensive efficiency, and there are multiple ways to achieve this. One does not need to be a prolific three-point shooting team to achieve these metrics. 



The increasing  trend of teams shooting more threes and playing at a higher pace still does not uniquely determine whether a team will make the playoffs or win a championship, which is why I have called it a “delusion”. Indeed, the common statement that “nowadays, teams that make the playoffs also have the highest number of three-point shot attempts” is a very shallow statement, and is not actually why teams make the playoffs as this analysis very clearly shows. Further, attempting more three-point shots is not at all uniquely indicative of a team’s success in winning a championship.

Ranking NBA Championship Teams

The first thing to note is that just by looking at Basketball-Reference.com there are 62 factors that uniquely classify a team: MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS OMP OFG OFGA OFG% O3P O3PA O3P% O2P O2PA O2P% OFT OFTA OFT% OORB ODRB OTRB OAST OSTL OBLK OTOV OPF OPTS PW PL MOV SOS SRS ORtg DRtg Pace FTr 3PAr eFG% TOV% ORB% FT/FGA eFG% TOV% DRB% FT/FGA, where OFGA indicates a given team’s opponent’s FGA per game average for a specific season.
The reason it is not meaningful to look at a specific statistic or a pair of statistics such as “three-point attempt rate” is that,

\boxed{\frac{62!}{2! 60!} = 1891} possible comparisons can be made.

Because of this, what is required is a detailed statistic learning approach. I looked at the full season statistics for the last twenty NBA champions from the 1995-1996 Chicago Bulls to the 2014-2015 Golden State Warriors.

I employed principle compoent analysis (PCA) to reduce the number of dimensions to see which variables contribute most to the variance of the data set. I found that the first 7 of 20 principle compoents explained 88.52% of the variance. Therefore, we can effectively reduce the dimension of the data set from 63 to 7.  This can be seen in the scree plot below:

A visualization of the 63-variable data set is as follows:

A matrix visualization of the full 63-variable data set.
A matrix visualization of the full 63-variable data set.
The power of principle components analysis reduced this high-dimensional dataset to a more manageable (but, perhaps still complicated) 7-dimensional data set, visualized as follows:

A visualization of the reduced-dimension dataset obtained via principle components analysis (PCA).
A visualization of the reduced-dimension dataset obtained via principle components analysis (PCA).
Next, I computed the Euclidean distance metric to perform hierarchical clustering on these seven principle components. I obtained the following result:

NBA Championship teams from 1996-2015
We notice immediately that:

  1. The 2015 Golden State Warriors were very similar to the 2014 San Antonio Spurs.
  2. Not surprisingly, Phil Jackson’s 2000 and 2002 Lakers teams were very similar to each other but not to any other championship team, and similarly for his 2009 and 2010 Lakers teams.
  3. Interestingly, the two teams that stand out which are truly dissimilar to any other championship team are the 2008 Boston Celtics and the 1998 Chicago Bulls.

This analysis also eliminates the notion that a team has to play a specific style, for example “modern-day play” to win a championship. In principle, there are many possible ways and styles that lead to a championship and an analysis such as this deeply probing the data shows this to be the case.

Ranking NBA Players

The 2015-2016 NBA season is dawning upon us, and as usual, ESPN has been doing their usual #NBArank, where they are ranking players based on the following non-rigorous methodology:

We asked, “Which player will be better in 2015-16?” To decide, voters had to consider both the quality and quantity of each player’s contributions to his team’s ability to win games. More than 100 voters weighed in on nearly 30,000 pairs of players.

Of course, while I suspect this type of thing has to be just for fun , it has generated a great deal of controversy with many arguments ensuing between fans. For example, Kobe Bryant being ranked 93rd overall in the NBA this year gained a fair deal of criticism from Stephen A. Smith on ESPN First Take.

In general, at least to me, it does not make any sense to rank players from different positions that bring different strengths to a team sport such as basketball. That is, what does it really mean for Tim Duncan to be better than Russell Westbrook (or vice-versa), or Kevin Love to be better than Mike Conley (or vice-versa), etc…

From a mathematical/data science perspective, the only sensible thing to do is to take all the players in the league, and apply a clustering algorithm such as K-means clustering to group players of similar talents and contributions into groups. This is not a trivial thing to do, but it is the sort of thing that data scientists do all the time! For this analysis, I went to Basketball-Reference.com, and pulled out last season’s (2014-2015) per game averages of every player in the league, looking at 25 statistical factors from FGA, FG% to STL, BLK, and TOV. One can see that this is a 25-dimensional problem. 

Our goal then is to consider the problem where denoting C_{1}, ... C_{K} as sets containing the observations in each cluster, we want to solve the optimization problem:

\mbox{minimize}_{C_{1},...C_{k}} \left\{\sum_{k=1}^{K} W(C_{k})\right\},

where W is our distance measure. We use the squared Euclidean distance to define the within-cluster variation, and then solve:


The first thing to do is to decide how many clusters we want to use in our solution. This is done by looking at the within sum of squares (WSS) plot:


First, we will use 3 clusters in our K-means solution. In this case, the between sum of squares versus total sum of squares ratio was 77.0%, indicating a good “fit”). We use three clusters to begin with, because based on visual inspection, the data clusters very nicely into 3 clusters. The plots obtained were as follows:

3cluster3 3cluster2 3cluster1

The three clusters of players can be found in the following PDF File. Note that the blue circles represent Cluster 1, the red circles represent Cluster 2, and the green circles represent Cluster 3.

Next, we dramatically increase the number of clusters to 20 in our K-means solution.

Performing the K-means clustering, we obtain the following sets of scatter plots. (Note that, it is a bit difficult to display a 25×25 plot on here, so I have split them into a series of plots. Note also, that the between sum of squares versus total sum of squares ratio was 94.8 %, indicating a good “fit”):


clusterplot4 clusterplot3 clusterplot2

The cluster behaviour can be seen more clearly in three dimensions. We now display some examples:


 The 20 groups of players we obtained can be seen in the PDF file linked below:


The legend for the clusters obtained was:


Two sample group clusters from our analysis are displayed below in the table. It is interesting that the analysis/algorithm provided that Carmelo Anthony and Kobe Bryant  belong in one group/cluster while LaMarcus Aldridge, Lebron James, and Dwyane Wade belong in another cluster.

Group 16 Group 19
Arron.Afflalo.1 Steven.Adams
Carmelo.Anthony LaMarcus.Aldridge
Patrick.Beverley Bradley.Beal
Chris.Bosh Andrew.Bogut
Kobe.Bryant Jimmy.Butler
Jose.Calderon DeMarre.Carroll
Michael.Carter.Williams.1 Michael.Carter.Williams
Darren.Collison Mike.Conley
Goran.Dragic.1 DeMarcus.Cousins
Langston.Galloway Anthony.Davis
Kevin.Garnett DeMar.DeRozan
Kevin.Garnett.1 Mike.Dunleavy
Jeff.Green.2 Rudy.Gay
George.Hill Eric.Gordon
Jrue.Holiday Blake.Griffin
Dwight.Howard Tobias.Harris
Brandon.Jennings Nene.Hilario
Enes.Kanter.1 Jordan.Hill
Michael.Kidd.Gilchrist Serge.Ibaka
Brandon.Knight.1 LeBron.James
Kevin.Martin Al.Jefferson
Timofey.Mozgov.2 Wesley.Johnson
Rajon.Rondo.2 Brandon.Knight
Derrick.Rose Kawhi.Leonard
J.R..Smith.2 Robin.Lopez
Jared.Sullinger Kyle.Lowry
Thaddeus.Young.1 Wesley.Matthews

If we use more clusters, players will obviously be placed into smaller groups. The following clustering results can be seen in the linked PDF files.

  1. 50 Clusters – (between_SS / total_SS =  97.4 %) – PDF File
  2. 70 Clusters – (between_SS / total_SS =  97.8 %) – PDF File
  3. 100 Clusters – (between_SS / total_SS =  98.3 %) – PDF File
  4. 200 Clusters (extreme case) – (between_SS / total_SS =  99.1 %) – PDF File

I did not include the visualizations for these computations because they are quite difficult to visualize.

Looking at the 100 Clusters file, we see two interesting results:

  • In Cluster 16, we have: Carmelo Anthony, Chris Bosh, Kobe Bryant and Kevin Martin
  • In Cluster 74, we have: LaMarcus Aldridge, Anthony Davis, Rudy Gay, Blake Griffin, LeBron James and Russell Westbrook


We therefore see that is does not make much mathematical/statistical sense to compare and two pairs of players. In my opinion, the only logical thing to do when ranking players is to decide on rankings within clusters. So, based on the above analysis, it makes sense to ask for example whether Carmelo is a better player than Kobe or whether Lebron is a better player than Westbrook, etc… But, based on last season’s statistics, it doesn’t make much sense to ask whether Kobe is a better player than Westbrook, because they have been clustered differently. I think ESPN could benefit tremendously by using a rigorous approach to these sorts of things which spark many conversations because many people take them seriously.

Canadian Federal Election Predictions for 10/19/2015

Tomorrow is the date of the Canadian Federal Elections. Here are my predictions for the outcome:


That is, I predict the Liberals will win, with the NDP trailing very far behind either party. 

Do More Gun Laws Prevent Gun Violence?

Update: March 16, 2018: I have received quite a few comments about my critique of Volokh’s WaPo article, and just as a summary of my reply back to those comments:

The main point that I made and demonstrated below is that the concept of a correlation is only useful as a measure of linearity between the two variables you are comparing. ALL of Volokh’s correlations that he computes are close to zero: 0.032 for correlation between homicide rate, including gun accidents and the Brady score, 0.065 for correlation between intentional homicide rate and Brady score, 0.0178, correlation between the homicide rate including gun accidents and the National Journal score, and 0.0511, correlation between just the intentional homicide rate and National Journal score. All of these numbers are completely *useless*. You cannot conclude anything from these scores. All you can conclude is that the relationship between homicide rate (including or not including gun accidents) and the Brady score is highly nonlinear. Since they are nonlinear, I have investigated this nonlinear relationship using data science methodologies such as regression trees.

Article begins below:


  1. The number and quality of gun-control laws a state has drastically effects the number of gun-related deaths.
  2. Other factors like mean household income play a smaller role in the number of gun-related deaths.
  3. Factors like the amount of money a state spends on mental-health care has a negligible effect on the number of gun-related deaths. This point is quite important as there are a number of policy-makers that consistently argue that the focus needs to be on the mentally ill and that this will curb the number of gun-related deaths.


  1. Critique of Recent Gun-Control Opposition Studies
  2. A more correct way to look at the Gun Deaths data using data science methodologies.

A Critique of Recent Gun-Control Opposition Studies

In light of the recent tragedy in Oregon which is part of a disturbing trend in an increase in gun violence in The United States, we are once again in the aftermath where President Obama and most Democrats are advocating for more gun laws that they claim would aid in decreasing gun violence while their Republican counterparts are as usual arguing the precise opposite. Indeed, there have been two very simplified  “studies” presented in the media thus far that have been cited frequently by gun advocates:

  1. Glenn Kessler’s so-called Fact-Checker Article
  2. Eugene Volokh’s opinion article in The Washington Post

I have singled out these two examples, but most of the studies claiming to “do statistics” follow a similar suit and methodology, so I have listed them here. It should be noted that these studies are extremely simplified, as they compute correlations, while in reality they only look at two factors (the gun death rate and a state’s “Brady grade”). As we show below, the answer to the question of interest and one that allows us to determine causation and correlation must depend on several state-dependent factors and hence, requires deeper statistical learning methodologies, of which NONE of the second amendment advocates seem to be aware of.

The reason why one cannot deduce anything significant from correlations as is done in Volokh’s article is correlation coefficients are good “summary statistics” but they hardly tell you anything deep about the data you are working with. For example, in Volokh’s article, he uses MS Excel to compute the correlations between a pair of variables, but Excel itself uses the Pearson correlation coefficient, which essentially is a measure of the linearity between two variables. If the underlying data exhibits a nonlinear relationship, the correlation coefficient will return a small value, but this in no way means there is no relationship between the data, it just means it is not linear. Similarly, other correlation coefficient computations make other assumptions about the data such as coming from a normal distribution, which is strange to assume from the onset. (There is also the more technical issue that a state’s Brady grade is not exactly a random variable. So measuring the correlation between a supposed random variable (the number of homicides) and a non-random variable is not exactly a sound idea.)

A simple example of where the correlation calculation fails is to try to determine the relationship between the following set of data. Consider 2 variables, x and y. Let x have the data

x              y
-1.0000  0.2420
-0.9000  0.2661
-0.8000  0.2897
-0.7000  0.3123
-0.6000  0.3332
-0.5000  0.3521
-0.4000  0.3683
-0.3000  0.3814
-0.2000  0.3910
-0.1000  0.3970
0            0.3989
0.1000  0.3970
0.2000  0.3910
0.3000  0.3814
0.4000  0.3683
0.5000  0.3521
0.6000  0.3332
0.7000  0.3123
0.8000  0.2897
0.9000  0.2661
1.0000  0.2420

If one tries to compute the correlation between x and y, one will obtain that the correlation coefficient is zero! (Try it!) A simple conclusion would be that therefore there is no linear causation/dependence between x and y. But, if one now makes a scatter plot of x and y, one gets:


Despite having zero correlation, there is apparently a very strong relationship between x and y. In fact, after some analysis,  one can show that they obey the following relationship:

y = \frac{1}{\sqrt{2 \pi}} e^{-(x^2)/2},

that is, y is the normal distribution. So, in this example and similar examples where there is a strong nonlinear relationship between the two variables, the correlation, in particular, the Pearson correlation is meaningless. Strangely, despite this, Volokh uses a near-zero correlation of his data to demonstrate that there is no correlation between a state’s gun score and the number of gun-related deaths, but this is not what his results show! He is misinterpreting his calculations.

Indeed, looking at Volokh’s specific example of comparing the Brady score to the number of Homicides, one gets the following scatter plot:


Volokh that computes the Pearson correlation between the two variables and obtains a result of 0.0323, that is, quite close to zero, which leads him to conclude that there is no correlation between the two. But, this is not what this result means. What it is saying in this case, is that there is a strong nonlinear relationship between the two. Even a very rough analysis between the two variables, and as I’ve said above, and demonstrate below, looking at two variables for a state is hardly useful, but for argument sake, there is a rough sinusoidal relationship between the two variables:


In fact, the fit of this sum-of-sines curve is an 8-term sine function with a R^2 of 0.5322. So, it’s not great, but there is clearly at least some causal behaviour between the two variables. But, I will say again, that due to the clustering of points around zero on the x-axis above, there will be simply NO function that fits the points, because it will not be one-to-one and onto, that is, there are repeated x-points for the same y-value in the data, and this is problematic. So, looking at two variables is not useful at all, and what this calculation shows is that the relationship if there is one would be strongly nonlinear, so measuring the correlation doesn’t make any sense.

Therefore, one requires a much deeper analysis, which we attempt to provide below.

A more correct way to look at the Gun Homicide data using data science methodologies.

I wanted to analyze using data science methodologies which side is correct. Due to limited time resources, I was only able to look at data from previous years (2010-2014) and looked at state-by-state data comparing:

  1. # of Firearm deaths per 100,000 people (Data from: http://kff.org/other/state-indicator/firearms-death-rate-per-100000/)
  2. Total State Population (Obtained from Wikipedia)
  3. Population Density / Square Mile (Obtained from Wikipedia)
  4. Median Household Income (Obtained from Wikipedia)
  5. Gun Law Grade: This data was obtained from http://gunlawscorecard.org/, which is The Law Center to Prevent Gun Violence and grades each state based on the number and quality of their gun laws using letter grades, i.e., A,A+,B+,F, etc… To use this data in the data science algorithms, I converted each letter grade to a numerical grade based on the following scale: A+: 90, A-: 90, A: 85, B:73,B-:70,B+:77,C:63,C-:60,C+:67, D:53,D-:50,D+:57,F:0.
  6. State Mental Health Agency Per Capita Mental Health Services Expenditures (Obtained from: http://kff.org/other/state-indicator/smha-expenditures-per-capita/#table)
  7. Some data was available for some years and not for others, so there are very slight percentage changes from year-to-year, but overall, this should have a negligible effect on the results.

This is what I found.

Using a boosted regression tree algorithm, I wanted to find which are the largest contributing factors to the number of firearm deaths per 100,000 people and found:


(The above numbers were calculated from a gradient boosted model with a gaussian loss function. 5000 iterations were performed.)

One sees right away that the quality and number of gun laws a state has is the overwhelming factor in the number of gun-related deaths, with the amount of money a state spends on mental health services having a negligible effect.

Next, I created a regression tree to analyze this problem further. I found the following:


The numbers in the very last level of each tree indicate the number of gun-related deaths. One sees that once again where the individual state’s gun law grade is above 73.5%, that is, higher than a “B”, the number of gun-related deaths is at its lowest at a predicted 5.7 / 100,000 people. (Note that: the sum of squares error for this regression was found to be 3.838). Interestingly, the regression tree also predicts that highest number of gun-related deaths all occur for states that score an “F”!

In fact, using a Principle Components Analysis (PCA), and plotting the first two principle components, we find that:


One sees from this PCA analysis, that states that have a high gun-law grade have a low death rate.

Finally, using K-means clustering, I found the following:


One sees from the above results, the states that have a very low “Gun Law grade” are clustered together in having the highest firearms death rate. (See the fourth column in this matrix). That is, zooming in:


What about Suicides? 

This question has been raised many times because the gun deaths number above includes the number of self-inflicted gun deaths. The argument has been that if we filter out this data from the gun deaths above, the arguments in this article fall apart. As I now show, this is in fact, not the case. Using the state-by-state firearm suicide rate from (http://archinte.jamanetwork.com/article.aspx?articleid=1661390), I performed this filtering to obtain the following principle components analysis biplot:


One sees that the PCA puts approximately equal weight (loadings) onto population density, gun-law grade, and median household income. It is quite clear that states that have a very high gun-law grade have a low amount of gun murders, and vice-versa.

One sees that the data shows that there is a very large anti-correlation between a state’s gun law grade and the death rate. There is also a very small anti-correlation between how much a state spends on mental health care and the death rate.

Therefore, the conclusions one can draw immediately are:

  1. The number and quality of gun-control laws a state has drastically effects the number of gun-related deaths.
  2. Other factors like mean household income play a smaller role in the number of gun-related deaths.
  3. Factors like the amount of money a state spends on mental-health care has a negligible effect on the number of gun-related deaths. This point is quite important as there are a number of policy-makers that consistently argue that the focus needs to be on the mentally ill and that this will curb the number of gun-related deaths.
  4. It would be interesting to apply these methodologies to data from other years. I will perhaps pursue this at a later time.

Let’s not go overboard with this Trump stuff! 

It has certainly become the talk of the town with some of the latest polls showing that Donald Trump is leading Hillary Clinton in a hypothetical 2016 matchup.

I decided to run my polling algorithm to simulate 100,000 election matchups between Clinton and Trump. I calibrated my model using a variety of data sources.

These were the results:

Based on these simulations, I conclude that:

I think in the era of the 24-hour news cycle, too much is made of one poll.