What if Michael Jordan Played in Today’s NBA?

By: Dr. Ikjyot Singh Kohli

It seems that one cannot turn on ESPN or any YouTube channel nowadays without the ongoing debate of whether Michael Jordan is better than Lebron, what would happen if Michael Jordan played in today’s NBA, etc… However, I have not seen a single scientific approach to this question. Albeit, it is sort of an impossible question to answer, but, using data science I will try.

From a data science perspective, it only makes sense to look at Michael Jordan’s performance in a single season, and try to predict based on that season how he would perform in the most recent NBA season. That being said, let’s look at Michael Jordan’s game-to-game performance in the 1995-1996 NBA season when the Bulls went 72-10.

Using neural networks and Garson’s algorithm , to regress against Michael Jordan’s per game point total, we note the following:

In this plot, the “o” stands for opponent.


One can see from this variable importance plot, Michael’s points in a given game were most positively associated with teams that committed a high number of turnovers followed by teams that make a lot of 3-point shots. Interestingly, there was not a strong negative factor on Michael’s points in a given game.

Given this information, and the per-game league averages of the 2017 season, we used this neural network to make a prediction on how many points Michael would average in today’s season:

Michael Jordan: 2017 NBA Season Prediction: 32.91 Points / Game (+/- 6.9)

It is interesting to note that Michael averaged 30.4 Points/Game in the 1995-1996 NBA Season. We therefore conclude that the 1995-1996 Michael would average a higher points/game if he played in today’s NBA.

As an aside, a plot of the neural network used to generate these variable importance plots and predictions is as follows:


What about the reverse question? What if the 2016-2017 Lebron James played in the 1995-1996 NBA? What would happen to his per-game point average? Using the same methodology as above, we used neural networks in combination with Garson’s algorithm to obtain a variable importance plot for Lebron James’ per-game point totals:



One sees from this plot that Lebron’s points every game were most positively impacted by teams that predominantly committed personal fouls, followed by teams that got a lot of offensive rebounds. There were no predominantly strong negative factors that affected Lebron’s ability to score.

Using this neural network model, we then tried to make a prediction on how many points per game Lebron would score if he played in the 1995-1996 NBA Season:

Lebron James: 1995-1996 NBA Season Prediction: 18.81 Points / Game (+/- 4.796)

This neural network model predicts that Lebron James would average 18.81 Points/Game if he played in the 1995-1996 NBA season, which is a drop from the 26.4 Points/Game he averaged this most recent NBA season.

Therefore, at least from this neural network model, one concludes that Lebron’s per game points would decrease if he played in the 1995-1996 Season, while Michael’s number would increase slightly if he played in the 2016-2017 Season.

Basketball – A Game of Geometry

In a previous post, I described the most optimal offensive strategy for the Knicks based on developing relevant joint probability density functions.

In this post, I attempt a solution to the following problem:

Given 5 players on the court, how can one determine (x,y) coordinates for each player such that the spacing / distance between each player is maximized. Thus, mathematically providing a solution in which the arrangement of these 5 players is optimal from an offensive strategy standpoint. The idea is that such an arrangement of these 5 players will always stretch the defense to the maximum.

The problem is then stated as follows. Let (x_i, y_i) be the x and y coordinates of player i on the court. We wish to solve:


Problems of this type are known as multi-objective optimization problems, and in general are quite difficult to solve. Note that in setting up the coordinate system for this problem, we have for convenience placed the basket at (x,y) = (0,0), i.e., at the origin.

Now, for solving this problem we used the Non-dominated Sorting Genetic Algorithm-II (NSGA-II) in the MCO package in R.

In general, what I found were that there are many possible solutions to this problem, all of which are Pareto optimal. Here are some of these results.





Here are some more plots of of player coordinates clearly showing the origin point (which as mentioned earlier, is the location of the basket):

Each plot above shows the x-y coordinates of players on the floor such that the distance between them is a maximum. Thus, these are some possible configurations of 5 players on the floor where the defense of the opposing team would be stretched to a maximum. What is even more interesting is that in each solution displayed above, and indeed, each numerical solution we found that is not displayed here, there is at least one triangle formation. It can therefore be said that the triangle offense is amongst the most optimal offensive strategies that produces maximum spacing of offensive players while simultaneously stretching the defense to a maximum as well. Here is more on the unpredictability of the triangle offense and its structure. 

Based on these coordinates, we obtained the following distance matrices showing the maximum / optimal possible distance between player i and player j:


Above, we show 5 possible distance matrices out of the several generated for brevity. So, one can see that looking at the fifth matrix for example, players are at a maximum and optimal distance from each other if for example the distance between player 1 and 2 is 9.96 feet, while the distance between player 3 and 4 is 18.703 feet, while the distance between player 4 and 5 is 4.96 feet, and so on.

Breaking Down the 2015-2016 NBA Season

In this article, I will use Data Science / Machine Learning methodologies to break down the real factors separating the playoff from non-playoff teams. In particular, I used the data from Basketball-Reference.com to associate 44 predictor variables which each team: “FG” “FGA” “FG.” “X3P” “X3PA” “X3P.” “X2P” “X2PA” “X2P.” “FT” “FTA” “FT.” “ORB” “DRB” “TRB” “AST”   “STL” “BLK” “TOV” “PF” “PTS” “PS.G” “oFG” “oFGA” “oFG.” “o3P” “o3PA” “o3P.” “o2P” “o2PA” “o2P.” “oFT”   “oFTA” “oFT.” “oORB” “oDRB” “oTRB” “oAST” “oSTL” “oBLK” “oTOV” “oPF” “oPTS” “oPS.G”

, where a letter ‘o’ before the last 22 predictor variables indicates a defensive variable. (‘o’ stands for opponent. )

Using principal components analysis (PCA), I was able to project this 44-dimensional data set to a 5-D dimensional data set. That is, the first 5 principal components were found to explain 85% of the variance. 

Here are the various biplots: 

In these plots, the teams are grouped according to whether they made the playoffs or not. 

One sees from this biplot of the first two principal components that the dominant component along the first PC is 3 point attempts, while the dominant component along the second PC is opponent points. CLE and TOR have a high negative score along the second PC indicating a strong defensive performance. Indeed, one suspects that the final separating factor that led CLE to the championship was their defensive play as opposed to 3-point shooting which all-in-all didn’t do GSW any favours. This is in line with some of my previous analyses

Optimal Positions for NBA Players

I was thinking about how one can use the NBA’s new SportVU system to figure out optimal positions for players on the court. One of the interesting things about the SportVU system is that it tracks player (x,y) coordinates on the court. Presumably, it also keeps track of whether or not a player located at (x,y) makes a shot or misses it. Let us denote a player making a shot by 1, and a player missing a shot by 0. Then, one essentially will have data in the form (x,y, \text{1/0}).

One can then use a logistic regression to determine the probability that a player at position (x,y) will make a shot:

p(x,y) = \frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}

The main idea is that the parameters \beta_0, \beta_1, \beta_2 uniquely characterize a given player’s probability of making a shot.

As a coaching staff from an offensive perspective, let us say we wish to position players as to say they have a very high probability of making a shot, let us say, for demonstration purposes 99%. This means we must solve the optimization problem:

\frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)} = 0.99

\text{s.t. } 0 \leq x \leq 28, \quad 0 \leq y \leq 47

(The constraints are determined here by the x-y dimensions of a standard NBA court).

This has the following solutions:

x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad \frac{-1. \beta _0-28. \beta _1+4.59512}{\beta _2} \leq y

with the following conditions:


One can also have:

x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad y \leq 47

with the following conditions:


Another solution is:

x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}

with the following conditions:


The fourth possible solution is:

x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}

with the following conditions:


In practice, it should be noted, that it is typically unlikely to have a player that has a 99% probability of making a shot.

To put this example in more practical terms, I generated some random data (1000 points) for a player in terms of (x,y) coordinates and whether he made a shot from that distance or not. The following scatter plot shows the result of this simulation:


In this plot, the red dots indicate a player has made a shot (a response of 1.0) from the (x,y) coordinates given, while a purple dot indicates a player has missed a shot from the (x,y) coordinates given (a response of 0.0).

Performing a logistic regression on this data, we obtain that \beta_0 = 0, \beta_1 = 0.00066876, \beta_2 = -0.00210949.

Using the equations above, we see that this player has a maximum probability of 58.7149 \% of making a shot from a location of (x,y) = (0,23), and a minimum probability of 38.45 \% of making a shot from a location of (x,y) = (28,0).

The Three-Point Shot Myth Continued…

I’ve been ranting a lot about the so-called “value” of the three-point shot in “modern-day” basketball. I know! But, here is yet one more entry.

The common consensus is that teams are shooting more three point shots as discussed in the articles below:

  1. http://www.businessinsider.com/nba-three-point-shooting-2016-3
  2. http://www.nba.com/2014/news/features/john_schuhmann/11/07/history-of-the-three-point-shot/
  3. http://nyloncalculus.com/2016/03/08/three-pointers-and-skill-displacement/

There are several more where these have come from. My issue is that on one hand these analyses seem grossly oversimplified. Second, none of the analyses have looked at a per-team trend. From my observations of these articles, they are just looking at total number of three point shots taken/made every year over the past number of seasons.

Indeed, the standard approach is to look at the league averages from the past number of years, and note that the average number of three point shots and attempts has increased (well almost) year-to-year, but this is not entirely useful.

What one should do is look at the probability that any team attempts / makes more than a given number of three point shots per game in a given season. Below, we use a kernel density method to calculate these probabilities.

Just as a reference point, looking at the past 16 seasons of NBA data per team per season (courtesy of Basketball-Reference.com), one generates the following plot:


One sees that the actual number of three point shots made has really not dramatically increased or decreased over the past number of seasons.

But, let’s break this down even further. What one really needs to do is analyze the probability that a team will attempt/make more than a certain number of three point shots per game in a given season. This is highly non-trivial. A first approach is to calculate the mean number and standard deviation of the number of three-point shots attempted and made per season for each of the previous sixteen seasons. These will generate time-dependent functions \mu(t) and \sigma(t).

One can in principle then solve a Fokker-Planck equation to obtain a time-dependent probability distribution p(x,t) for the number of three point shots attempted and another p(x,t) for the number of three-point shots made:

p(x,t)_t = -\left[\mu(t) p(x,t)\right]_x + \left[\frac{\sigma^2(t)}{2} p(x,t)\right]_{xx}


(where subscripts indicate partial derivatives). However, as one will quickly discover, this PDE is not separable!

My alternative approach then was to perform a non-parametric analysis using a kernel density method to fit a cumulative distribution function to each season for the past sixteen seasons.  The following set of plots was generated from this method:

One sees from this analysis, specifically, from the density analysis above, in a given season, the probability that a certain team makes more than 10 3-Point shots per game never seems to exceed 10%, so while the probability of a given team attempting more three point shots may have increased, the probability of the same team making more than say 10 3-Point shots per game has essentially stayed the same over the past number of years.

The question then remains do only “good” / “efficient” teams attempt more three point shots, in particular, does this aid in their attempt to make the playoffs or eventually be a championship-calibre team. This question has been analyzed in detail and has resulted in the following paper, which is now on the arXiv.

New Paper on Machine Learning and Basketball

A new and formal paper of mine describing how one can use machine learn methodologies to help determine which NBA teams will make the playoffs is now online: 

  1. arXiv link
  2. SSRN link

Have a look!


Stephen Curry and Mahmoud Abdul-Rauf?

As usual, Phil Jackson made another interesting tweet today:

And, as usual received many criticisms from “Experts”, who just looked at the raw numbers from each players, and saw that there is just no way such a statement is justified, but it is not that simple!

When you compare two players (or two objects) who have very different data feature values, it is not that they can’t be compared, you must effectively normalize the data somehow to make the sets comparable.

In this case, I used the data from Basketball-Reference.com to compare Chris Jackson’s 6 seasons in Denver to Stephen Curry’s last 6 seasons (including this one) and took into account 45 different statistical measures, and came up with the following correlation matrix/similarity matrix plot:


Dark blue circles indicate a strong correlation, while dark red circles indicate a weak correlation between two sets of features. 

What would be of interest in an analysis like this is to examine the diagonal of this matrix, which offers a direct comparison between the two players: 

One can see that there are many features that have strong correlation coefficients. 

Therefore, it is true that Stephen Curry and Chris Jackson do in fact share many strong similarities!