## The Three-Point Shot Delusion

The vast majority of NBA analysts claim today that the NBA has changed. It has become more fast-paced, and there is a significantly greater emphasis on teams attempting more three point shots. The evidence for this is the repeated recital of the fact that over the last number of years, the average three-point attempt rate has increased. An example of such an article can be found here.

It is my hypothesis that this is all based on a very shallow analysis of what is actually going on. In particular, there are more than 60 variables on Basketball-Reference.com that classify each team’s play. It seems strange that analysts have picked out one statistic, noticed a trend, and have made conclusions ushering in the “modern-day” NBA. As I will demonstrate below, using concepts from statistical and machine learning, many things have been missed in their analyses. What is even more strange is that there have been an increasing number of articles claiming that, for example, if teams do not shoot more three point shots, they will probably not make the playoffs or win a championship. Examples of such articles can be found here, here, and here.

I will now demonstrate why all of these analyses are incomplete, and why their conclusions are wholly incorrect.

Using the great service provided by Basketball-Reference.com, I looked at the last 15 seasons of  every NBA team, looking at more than 60 predictor variables that classified each team’s performance in the season. Some of these included: MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS PTS/G oG oMP oFG oFGA oFG% o3P o3PA o3P% o2P o2PA o2P% oFT oFTA oFT% oORB oDRB oTRB oAST oSTL oBLK oTOV oPF oPTS oPTS/G MOV SOS SRS ORtg DRtg Pace FTr 3PAr TOV% ORB% FT/FGA  TOV% DRB% FT/FGA, where a small “o” indicates a team’s opponent’s statistics.

What classifies a playoff team?

Building a classification tree, I wanted to analyze what factors specifically lead to a team making the playoffs in a given season. I found the following:

(For this classification tree, the misclassification error rate was 2.73% indicating a good fit to the data.)

At the top of the tree, we see that the distinguishing factor is the average MOV/”Margin of Victory” measured per game. Teams that on average beat their opponents by more than 2.695 points are predicted to make the playoffs, while teams that on average lose by more than 1.825 points are predicted to not make the playoffs. Further, the only factor relating to three-point shooting  in this entire classification tree is the o3PA, which is the number of opponent 3-point attempts per game. For example, suppose a team can has an average MOV of less than -0.54 but greater than -1.825. If that team’s opponent attempts more than 16.0732 3-point shots per game, the team is expected to make the playoffs. In this particular case, getting your opponent to take a lot of three point shots is indeed desirable, and leads to the expectation of a team making the playoffs.

What classifies a championship team?

The next question to analyze is what characteristics/features classify a championship team. Looking at the last 20 years of playoff data, we see that the following classification tree describes the championship criteria for a given NBA playoff team.

(The learning error rate was 1.172% indicating an excellent fit to the data). One sees that at the very top is a team opponent’s field goal percentage (OFG.). If the average per game OFG% is greater than 44.95%, that team is predicted to not win a championship. Further, there are apparently three predicted paths to a championship:

1. OFG% < 44.95 –> ORtg (Opponent Team Points Scored per 100 possessions) < 108.55 –> FT% < 73.5% –> Opponent Offensive Rebounds per game (OORB) < 30.2405 –> Personal Fouls per game (PF) < 24.1467
2. OFG% < 44.95 –> ORtg > 108.55 –> O3P% < 32.45%
3. OFG% < 44.95 –> ORtg > 108.55 –> O3P% > 32.45% –> AST > 19.9076 –> OAST < 19.0938

This shows once again that the three point shot is not at all relevant in winning a championship amongst playoff teams, in that, shooting a lot of threes, or playing as a “modern” team, does not uniquely determine a team’s success. What is tremendously important is defense, and offensive efficiency, and there are multiple ways to achieve this. One does not need to be a prolific three-point shooting team to achieve these metrics.

Conclusions

The increasing  trend of teams shooting more threes and playing at a higher pace still does not uniquely determine whether a team will make the playoffs or win a championship, which is why I have called it a “delusion”. Indeed, the common statement that “nowadays, teams that make the playoffs also have the highest number of three-point shot attempts” is a very shallow statement, and is not actually why teams make the playoffs as this analysis very clearly shows. Further, attempting more three-point shots is not at all uniquely indicative of a team’s success in winning a championship.

## Ranking NBA Players

The 2015-2016 NBA season is dawning upon us, and as usual, ESPN has been doing their usual #NBArank, where they are ranking players based on the following non-rigorous methodology:

We asked, “Which player will be better in 2015-16?” To decide, voters had to consider both the quality and quantity of each player’s contributions to his team’s ability to win games. More than 100 voters weighed in on nearly 30,000 pairs of players.

Of course, while I suspect this type of thing has to be just for fun , it has generated a great deal of controversy with many arguments ensuing between fans. For example, Kobe Bryant being ranked 93rd overall in the NBA this year gained a fair deal of criticism from Stephen A. Smith on ESPN First Take.

In general, at least to me, it does not make any sense to rank players from different positions that bring different strengths to a team sport such as basketball. That is, what does it really mean for Tim Duncan to be better than Russell Westbrook (or vice-versa), or Kevin Love to be better than Mike Conley (or vice-versa), etc…

From a mathematical/data science perspective, the only sensible thing to do is to take all the players in the league, and apply a clustering algorithm such as K-means clustering to group players of similar talents and contributions into groups. This is not a trivial thing to do, but it is the sort of thing that data scientists do all the time! For this analysis, I went to Basketball-Reference.com, and pulled out last season’s (2014-2015) per game averages of every player in the league, looking at 25 statistical factors from FGA, FG% to STL, BLK, and TOV. One can see that this is a 25-dimensional problem.

Our goal then is to consider the problem where denoting $C_{1}, ... C_{K}$ as sets containing the observations in each cluster, we want to solve the optimization problem:

$\mbox{minimize}_{C_{1},...C_{k}} \left\{\sum_{k=1}^{K} W(C_{k})\right\}$,

where $W$ is our distance measure. We use the squared Euclidean distance to define the within-cluster variation, and then solve:

The first thing to do is to decide how many clusters we want to use in our solution. This is done by looking at the within sum of squares (WSS) plot:

First, we will use 3 clusters in our K-means solution. In this case, the between sum of squares versus total sum of squares ratio was 77.0%, indicating a good “fit”). We use three clusters to begin with, because based on visual inspection, the data clusters very nicely into 3 clusters. The plots obtained were as follows:

The three clusters of players can be found in the following PDF File. Note that the blue circles represent Cluster 1, the red circles represent Cluster 2, and the green circles represent Cluster 3.

Next, we dramatically increase the number of clusters to 20 in our K-means solution.

Performing the K-means clustering, we obtain the following sets of scatter plots. (Note that, it is a bit difficult to display a 25×25 plot on here, so I have split them into a series of plots. Note also, that the between sum of squares versus total sum of squares ratio was 94.8 %, indicating a good “fit”):

The cluster behaviour can be seen more clearly in three dimensions. We now display some examples:

The 20 groups of players we obtained can be seen in the PDF file linked below:

nbastatsnewclusters

The legend for the clusters obtained was:

Two sample group clusters from our analysis are displayed below in the table. It is interesting that the analysis/algorithm provided that Carmelo Anthony and Kobe Bryant  belong in one group/cluster while LaMarcus Aldridge, Lebron James, and Dwyane Wade belong in another cluster.

 Group 16 Group 19 Arron.Afflalo.1 Steven.Adams Carmelo.Anthony LaMarcus.Aldridge Patrick.Beverley Bradley.Beal Chris.Bosh Andrew.Bogut Kobe.Bryant Jimmy.Butler Jose.Calderon DeMarre.Carroll Michael.Carter.Williams.1 Michael.Carter.Williams Darren.Collison Mike.Conley Goran.Dragic.1 DeMarcus.Cousins Langston.Galloway Anthony.Davis Kevin.Garnett DeMar.DeRozan Kevin.Garnett.1 Mike.Dunleavy Jeff.Green.2 Rudy.Gay George.Hill Eric.Gordon Jrue.Holiday Blake.Griffin Dwight.Howard Tobias.Harris Brandon.Jennings Nene.Hilario Enes.Kanter.1 Jordan.Hill Michael.Kidd.Gilchrist Serge.Ibaka Brandon.Knight.1 LeBron.James Kevin.Martin Al.Jefferson Timofey.Mozgov.2 Wesley.Johnson Rajon.Rondo.2 Brandon.Knight Derrick.Rose Kawhi.Leonard J.R..Smith.2 Robin.Lopez Jared.Sullinger Kyle.Lowry Thaddeus.Young.1 Wesley.Matthews Luc.Mbah.a.Moute Khris.Middleton Greg.Monroe Donatas.Motiejunas Joakim.Noah Victor.Oladipo Tony.Parker Chandler.Parsons Zach.Randolph Andre.Roberson Rajon.Rondo P.J..Tucker Dwyane.Wade Kemba.Walker David.West Russell.Westbrook Deron.Williams

If we use more clusters, players will obviously be placed into smaller groups. The following clustering results can be seen in the linked PDF files.

1. 50 Clusters – (between_SS / total_SS =  97.4 %) – PDF File
2. 70 Clusters – (between_SS / total_SS =  97.8 %) – PDF File
3. 100 Clusters – (between_SS / total_SS =  98.3 %) – PDF File
4. 200 Clusters (extreme case) – (between_SS / total_SS =  99.1 %) – PDF File

I did not include the visualizations for these computations because they are quite difficult to visualize.

Looking at the 100 Clusters file, we see two interesting results:

• In Cluster 16, we have: Carmelo Anthony, Chris Bosh, Kobe Bryant and Kevin Martin
• In Cluster 74, we have: LaMarcus Aldridge, Anthony Davis, Rudy Gay, Blake Griffin, LeBron James and Russell Westbrook

CONCLUSIONS:

We therefore see that is does not make much mathematical/statistical sense to compare and two pairs of players. In my opinion, the only logical thing to do when ranking players is to decide on rankings within clusters. So, based on the above analysis, it makes sense to ask for example whether Carmelo is a better player than Kobe or whether Lebron is a better player than Westbrook, etc… But, based on last season’s statistics, it doesn’t make much sense to ask whether Kobe is a better player than Westbrook, because they have been clustered differently. I think ESPN could benefit tremendously by using a rigorous approach to these sorts of things which spark many conversations because many people take them seriously.