Ranking NBA Championship Teams

The first thing to note is that just by looking at Basketball-Reference.com there are 62 factors that uniquely classify a team: MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS OMP OFG OFGA OFG% O3P O3PA O3P% O2P O2PA O2P% OFT OFTA OFT% OORB ODRB OTRB OAST OSTL OBLK OTOV OPF OPTS PW PL MOV SOS SRS ORtg DRtg Pace FTr 3PAr eFG% TOV% ORB% FT/FGA eFG% TOV% DRB% FT/FGA, where OFGA indicates a given team’s opponent’s FGA per game average for a specific season.
The reason it is not meaningful to look at a specific statistic or a pair of statistics such as “three-point attempt rate” is that,

$\boxed{\frac{62!}{2! 60!} = 1891}$ possible comparisons can be made.

Because of this, what is required is a detailed statistic learning approach. I looked at the full season statistics for the last twenty NBA champions from the 1995-1996 Chicago Bulls to the 2014-2015 Golden State Warriors.

I employed principle compoent analysis (PCA) to reduce the number of dimensions to see which variables contribute most to the variance of the data set. I found that the first 7 of 20 principle compoents explained 88.52% of the variance. Therefore, we can effectively reduce the dimension of the data set from 63 to 7.  This can be seen in the scree plot below:

A visualization of the 63-variable data set is as follows:

The power of principle components analysis reduced this high-dimensional dataset to a more manageable (but, perhaps still complicated) 7-dimensional data set, visualized as follows:

Next, I computed the Euclidean distance metric to perform hierarchical clustering on these seven principle components. I obtained the following result:

We notice immediately that:

1. The 2015 Golden State Warriors were very similar to the 2014 San Antonio Spurs.
2. Not surprisingly, Phil Jackson’s 2000 and 2002 Lakers teams were very similar to each other but not to any other championship team, and similarly for his 2009 and 2010 Lakers teams.
3. Interestingly, the two teams that stand out which are truly dissimilar to any other championship team are the 2008 Boston Celtics and the 1998 Chicago Bulls.

This analysis also eliminates the notion that a team has to play a specific style, for example “modern-day play” to win a championship. In principle, there are many possible ways and styles that lead to a championship and an analysis such as this deeply probing the data shows this to be the case.

Ranking NBA Players

The 2015-2016 NBA season is dawning upon us, and as usual, ESPN has been doing their usual #NBArank, where they are ranking players based on the following non-rigorous methodology:

We asked, “Which player will be better in 2015-16?” To decide, voters had to consider both the quality and quantity of each player’s contributions to his team’s ability to win games. More than 100 voters weighed in on nearly 30,000 pairs of players.

Of course, while I suspect this type of thing has to be just for fun , it has generated a great deal of controversy with many arguments ensuing between fans. For example, Kobe Bryant being ranked 93rd overall in the NBA this year gained a fair deal of criticism from Stephen A. Smith on ESPN First Take.

In general, at least to me, it does not make any sense to rank players from different positions that bring different strengths to a team sport such as basketball. That is, what does it really mean for Tim Duncan to be better than Russell Westbrook (or vice-versa), or Kevin Love to be better than Mike Conley (or vice-versa), etc…

From a mathematical/data science perspective, the only sensible thing to do is to take all the players in the league, and apply a clustering algorithm such as K-means clustering to group players of similar talents and contributions into groups. This is not a trivial thing to do, but it is the sort of thing that data scientists do all the time! For this analysis, I went to Basketball-Reference.com, and pulled out last season’s (2014-2015) per game averages of every player in the league, looking at 25 statistical factors from FGA, FG% to STL, BLK, and TOV. One can see that this is a 25-dimensional problem.

Our goal then is to consider the problem where denoting $C_{1}, ... C_{K}$ as sets containing the observations in each cluster, we want to solve the optimization problem:

$\mbox{minimize}_{C_{1},...C_{k}} \left\{\sum_{k=1}^{K} W(C_{k})\right\}$,

where $W$ is our distance measure. We use the squared Euclidean distance to define the within-cluster variation, and then solve:

The first thing to do is to decide how many clusters we want to use in our solution. This is done by looking at the within sum of squares (WSS) plot:

First, we will use 3 clusters in our K-means solution. In this case, the between sum of squares versus total sum of squares ratio was 77.0%, indicating a good “fit”). We use three clusters to begin with, because based on visual inspection, the data clusters very nicely into 3 clusters. The plots obtained were as follows:

The three clusters of players can be found in the following PDF File. Note that the blue circles represent Cluster 1, the red circles represent Cluster 2, and the green circles represent Cluster 3.

Next, we dramatically increase the number of clusters to 20 in our K-means solution.

Performing the K-means clustering, we obtain the following sets of scatter plots. (Note that, it is a bit difficult to display a 25×25 plot on here, so I have split them into a series of plots. Note also, that the between sum of squares versus total sum of squares ratio was 94.8 %, indicating a good “fit”):

The cluster behaviour can be seen more clearly in three dimensions. We now display some examples:

The 20 groups of players we obtained can be seen in the PDF file linked below:

nbastatsnewclusters

The legend for the clusters obtained was:

Two sample group clusters from our analysis are displayed below in the table. It is interesting that the analysis/algorithm provided that Carmelo Anthony and Kobe Bryant  belong in one group/cluster while LaMarcus Aldridge, Lebron James, and Dwyane Wade belong in another cluster.

 Group 16 Group 19 Arron.Afflalo.1 Steven.Adams Carmelo.Anthony LaMarcus.Aldridge Patrick.Beverley Bradley.Beal Chris.Bosh Andrew.Bogut Kobe.Bryant Jimmy.Butler Jose.Calderon DeMarre.Carroll Michael.Carter.Williams.1 Michael.Carter.Williams Darren.Collison Mike.Conley Goran.Dragic.1 DeMarcus.Cousins Langston.Galloway Anthony.Davis Kevin.Garnett DeMar.DeRozan Kevin.Garnett.1 Mike.Dunleavy Jeff.Green.2 Rudy.Gay George.Hill Eric.Gordon Jrue.Holiday Blake.Griffin Dwight.Howard Tobias.Harris Brandon.Jennings Nene.Hilario Enes.Kanter.1 Jordan.Hill Michael.Kidd.Gilchrist Serge.Ibaka Brandon.Knight.1 LeBron.James Kevin.Martin Al.Jefferson Timofey.Mozgov.2 Wesley.Johnson Rajon.Rondo.2 Brandon.Knight Derrick.Rose Kawhi.Leonard J.R..Smith.2 Robin.Lopez Jared.Sullinger Kyle.Lowry Thaddeus.Young.1 Wesley.Matthews Luc.Mbah.a.Moute Khris.Middleton Greg.Monroe Donatas.Motiejunas Joakim.Noah Victor.Oladipo Tony.Parker Chandler.Parsons Zach.Randolph Andre.Roberson Rajon.Rondo P.J..Tucker Dwyane.Wade Kemba.Walker David.West Russell.Westbrook Deron.Williams

If we use more clusters, players will obviously be placed into smaller groups. The following clustering results can be seen in the linked PDF files.

1. 50 Clusters – (between_SS / total_SS =  97.4 %) – PDF File
2. 70 Clusters – (between_SS / total_SS =  97.8 %) – PDF File
3. 100 Clusters – (between_SS / total_SS =  98.3 %) – PDF File
4. 200 Clusters (extreme case) – (between_SS / total_SS =  99.1 %) – PDF File

I did not include the visualizations for these computations because they are quite difficult to visualize.

Looking at the 100 Clusters file, we see two interesting results:

• In Cluster 16, we have: Carmelo Anthony, Chris Bosh, Kobe Bryant and Kevin Martin
• In Cluster 74, we have: LaMarcus Aldridge, Anthony Davis, Rudy Gay, Blake Griffin, LeBron James and Russell Westbrook

CONCLUSIONS:

We therefore see that is does not make much mathematical/statistical sense to compare and two pairs of players. In my opinion, the only logical thing to do when ranking players is to decide on rankings within clusters. So, based on the above analysis, it makes sense to ask for example whether Carmelo is a better player than Kobe or whether Lebron is a better player than Westbrook, etc… But, based on last season’s statistics, it doesn’t make much sense to ask whether Kobe is a better player than Westbrook, because they have been clustered differently. I think ESPN could benefit tremendously by using a rigorous approach to these sorts of things which spark many conversations because many people take them seriously.

Some Thoughts On Howard Beck’s Bleacher Report Article

Howard Beck had an interesting article today on Bleacher Report, basically suggesting that the NBA finals, in particular, the current style of play embodied by The Golden State Warriors is somehow a vindication of D’Antoni’s basketball philosophies: “Shoot a lot of threes”, “Shoot in 7 seconds or less”, “Play small lineups”, etc…

While the Warriors have certainly embodied some of these philosophies, my personal opinion is that D’Antoni’s style of play can only be vindicated if there is a clear trend in championship teams that reflect these philosophies. As I show below, this is simply not the case.

I looked at the last 15 NBA Champions (from 2000-2014), and tried to see if there were any clear patterns in common between the teams. This is essentially what I found:

Two things that are immediately clear are:

1. There is very little that championship teams have in common!

2. The overwhelming thing that they do have in common is that 14 of the last 15 NBA champions have all been ranked in the Top 10 for Defensive Rating, something that Mike D’Antoni’s coaching philosophy has never really included throughout his years in Phoenix, New York, and Los Angeles.

This, I believe is the grand point that no one seems to be interested in making, perhaps, because according to the “mainstream”, defensive-oriented basketball, which, by definition is “less-flashy” still is the overwhelming common characteristic amongst championship-winning teams.

Perhaps, the Warriors will win this year, but as I said above, I do not believe that one year is anywhere near enough to establish a trend and a vindication of D’Antoni’s basketball philosophies.

Further, there were some other things in Beck’s article that I found to be a bit concerning:

He claimed Today, coaches speak enthusiastically about “positionless” basketball—whereas 10 years ago, D’Antoni had to sell Marion and Stoudemire on the concept.”

This is not actually true. The triangle offense is the de facto example of “positionless” basketball, and has been around since the 1940s when Sam Barry introduced it at USC. Phil Jackson and Tex Winter’s Bulls and Lakers teams embodied the concept of positionless basketball. In fact, as can be seen from the diagram below (taken from http://khamel83.tripod.com/intro.htm), players don’t have set positions in the triangle offense. Rather, there are regions based on optimality and spacing:

Many examples can be found from teams playing in the triangle offense system of guards posting up, big men coming out to shoot threes, etc…

Three-Point Shooting Teams and The 2014-2015 NBA Playoffs

Major Update: June 22, 2015.  I have now published a formal article on the arXiv proving many of the assertions made earlier in this blog post. It can be found here: http://arxiv.org/abs/1506.06687

Some controversy was stirred up today when Knicks President and Basketball coaching legend Phil Jackson made the following tweets regarding three-point shooting teams not doing so well in the second round of the playoffs:

Data Analytics and The 1995-1996 Chicago Bulls

It is without question that the greatest team in NBA history was the 1995-1996 Chicago Bulls. They went 72-10 that year and went on to win the NBA Championship against a top-notch Seattle Supersonics team.

Phil Jackson’s system and first-class coaching were the major reasons why the Bulls were so good, but I wanted to analyze their reason for winning using data science methodologies.

The results that I found were very interesting. First, I mined through each individual game’s data to obtain patterns in the Bulls wins and losses, and this is what I found:

One sees that the Bulls were a defensive nightmare, and if you look at these results in detail, it makes sense that the Sonics were really the only team that ever posed a threat to them. This shows that to beat the Bulls, the opposing team would have to simultaneously:

1.  Ensure Ron Harper had a FG% less than 44.95% in a game,
2. Ensure Dennis Rodman would have less than 17 total rebounds in a game,
3. Ensure Luc Longley had less than 2 blocks in a game,
4. Ensure Michael Jordan had a FG% less than 46.55% in a game.

If any one of these conditions were not met, the Bulls would win!

This analysis on some level also dispels the notion espoused by several sports analysts like Skip Bayless of ESPN who continually claim that the Bulls’ sole reason for success was Michael Jordan. Ron Harper’s contributions although of paramount importance are rarely mentioned nowadays.

This analysis also shows that the key to the success of the Bulls was not necessarily the number of points that Jordan scored, but the incredible efficiency with which he scored them.

A boosting algorithm also allows us to deduce the most important characteristics in the Bulls’ quality of play and whether they would win or lose a game.  The results are as follows:

We see that a key feature of the Bulls’ quality of play depends on how efficient Ron Harper in terms of his FG%.

It is quite interesting that this analysis shows that winning a championship is not about one player, sure, every team needs great players, but the Bulls were a great team, consisting of many great components working together.