A very interesting result: computing payoffs of players, the following is a diagram that shows when it is optimal for a player to shoot a 2 point or a 3point shot. One sees that it is hardly ever optimal for a player to shoot a 3point shot, since the region corresponding to 3point optimality is quite narrow. This can be interpreted as saying that for a 3point attempt to be optimal, a player’s 2PT% must be roughly equal to his/her 3PT%, which is certainly not the case for the vast majority of even designated 3point shooters in the NBA!
Tag: Statistics
What if Michael Jordan Played in Today’s NBA?
It seems that one cannot turn on ESPN or any YouTube channel nowadays without the ongoing debate of whether Michael Jordan is better than Lebron, what would happen if Michael Jordan played in today’s NBA, etc… However, I have not seen a single scientific approach to this question. Albeit, it is sort of an impossible question to answer, but, using data science I will try.
From a data science perspective, it only makes sense to look at Michael Jordan’s performance in a single season, and try to predict based on that season how he would perform in the most recent NBA season. That being said, let’s look at Michael Jordan’s gametogame performance in the 19951996 NBA season when the Bulls went 7210.
Using neural networks and Garson’s algorithm , to regress against Michael Jordan’s per game point total, we note the following:
One can see from this variable importance plot, Michael’s points in a given game were most positively associated with teams that committed a high number of turnovers followed by teams that make a lot of 3point shots. Interestingly, there was not a strong negative factor on Michael’s points in a given game.
Given this information, and the pergame league averages of the 2017 season, we used this neural network to make a prediction on how many points Michael would average in today’s season:
Michael Jordan: 2017 NBA Season Prediction: 32.91 Points / Game (+/ 6.9)
It is interesting to note that Michael averaged 30.4 Points/Game in the 19951996 NBA Season. We therefore conclude that the 19951996 Michael would average a higher points/game if he played in today’s NBA.
As an aside, a plot of the neural network used to generate these variable importance plots and predictions is as follows:
What about the reverse question? What if the 20162017 Lebron James played in the 19951996 NBA? What would happen to his pergame point average? Using the same methodology as above, we used neural networks in combination with Garson’s algorithm to obtain a variable importance plot for Lebron James’ pergame point totals:
One sees from this plot that Lebron’s points every game were most positively impacted by teams that predominantly committed personal fouls, followed by teams that got a lot of offensive rebounds. There were no predominantly strong negative factors that affected Lebron’s ability to score.
Using this neural network model, we then tried to make a prediction on how many points per game Lebron would score if he played in the 19951996 NBA Season:
Lebron James: 19951996 NBA Season Prediction: 18.81 Points / Game (+/ 4.796)
This neural network model predicts that Lebron James would average 18.81 Points/Game if he played in the 19951996 NBA season, which is a drop from the 26.4 Points/Game he averaged this most recent NBA season.
Therefore, at least from this neural network model, one concludes that Lebron’s per game points would decrease if he played in the 19951996 Season, while Michael’s number would increase slightly if he played in the 20162017 Season.
So, What’s Wrong with the Knicks?
As I write this post, the Knicks are currently 12th in the Eastern conference with a record of 2232. A plethora of people are offering the opinions on what is wrong with the Knicks, and of course, most of it being from ESPN and the New York media, most of it is incorrect/useless, here are some examples:
 The Bulls are following the Knicks’ blueprint for failure and …
 Spike Lee ‘still believes’ in Melo, says time for Phil Jackson to go
 25 reasons being a New York Knicks fan is the most depressing …
 Carmelo Anthony needs to escape the Knicks
 Another Awful Week for Knicks
A while ago, I wrote this paper based on statistical learning that shows the common characteristics for NBA playoff teams. Basically, I obtained the following important result:
This classification tree shows along with arguments in the paper, that while the most important factor in teams making the playoffs tends to be the opponent number of assists per game, there are paths to the playoffs where teams are not necessarily strong in this area. Specifically, for the Knicks, as of today, we see that:
opp. Assists / game : 22.4 > 20. 75, STL / game: 7. 2 < 8.0061, TOV / game : 14.1 < 14.1585, DRB / game: 33.8 > 29.9024, opp. TOV / game: 13.0 < 13.1585.
So, one sees that what is keeping the Knicks out of the playoffs is specifically pressure defense, in that, they are not forcing enough turnovers per game. Ironically, they are very close to the threshold, but, it is not enough.
A probability density approximation of the Knicks’ Opp. TOV/G is as follows:
This PDF has the approximate functional form:
P(oTOV) =
Therefore, by computing:
,
=
,
where Erfc is the complementary error function, and is given by:
Given that the threshold for playoffbound teams is more than 13.1585 opp. TOV/game, setting A = 13 above, we obtain: 0.435. This means that the Knicks have roughly a 43.5% chance of forcing more than 13 TOV in any single game. Similarly, setting A = 14, one obtains: 0.3177. This means that the Knicks have roughly a 31.77% chance of forcing more than 14 TOV in any single game, and so forth.
Therefore, one concludes that while the Knicks problems are defensiveoriented, it is specifically related to pressure defense and forcing turnovers.
By: Dr. Ikjyot Singh Kohli, About the Author
Basketball Machine Learning Paper Updated
I have now made a significant update to my applied machine learning paper on predicting patterns among NBA playoff and championship teams, which can be accessed here: arXiv Link .
The Most Optimal Strategy for the Knicks
In a previous article, I showed how one could use data in combination with advanced probability techniques to determine the optimal shot / court positions for LeBron James. I decided to use this algorithm on the Knicks’ starting 5, and obtained the following joint probability density contour plots:
One sees that the Knicks offensive strategy is optimal if and only if players gets shots as close to the basket as possible. If this is the case, the players have a high probability of making shots even if defenders are playing them tightly. This means that the Knicks would be served best by driving in the paint, posting up, and Porzingis NOT attempting a multitude of three point shots.
By the way, a lot of people are convinced nowadays that someone like Porzingis attempting 3’s is a sign of a good offense, as it is an optimal way to space the floor. I am not convinced of this. Spacing the floor geometrically translates to a multiobjective nonlinear optimization problem. In particular, let represent the (xy)coordinates of a player on the floor. Spreading the floor means one must maximize (simultaneously) each element of the following distance metric:
subject to . While a player attempting 3point shots may be one way to solve this problem, I am not convinced that it is a unique solution to this optimization problem. In fact, I am convinced that there are a multiple of solutions to this optimization problem.
This solution is slightly simpler if one realizes that the metric above is symmetric, so that there are only 11 independent components.
Analyzing Lebron James’ Offensive Play
Where is Lebron James most effective on the court?
Based on 20152016 data, we obtained from NBA.com the following data which tracks Lebron’s FG% based on defender distance:
From BasketballReference.com, we then obtained data of Lebron’s FG% based on his shot distance from the basket:
Based on this data, we generated tens of thousands of sample data points to perform a Monte Carlo simulation to obtain relevant probability density functions. We found that the joint PDF was a very lengthy expression(!):
Graphically, this was:
A contour plot of the joint PDF was computed to be:
From this information, we can compute where/when LeBron has the highest probability of making a shot. Numerically, we found that the maximum probability occurs when Lebron’s defender is 0.829988 feet away, while Lebron is 1.59378 feet away from the basket. What is interesting is that this analysis shows that defending Lebron tightly doesn’t seem to be an effective strategy if his shot distance is within 5 feet of the basket. It is only an effective strategy further than 5 feet away from the basket. Therefore, opposing teams have the best chance at stopping Lebron from scoring by playing him tightly and forcing him as far away from the basket as possible.
The Relationship Between The Electoral College and Popular Vote
An interesting machine learning problem: Can one figure out the relationship between the popular vote margin, voter turnout, and the percentage of electoral college votes a candidate wins? Going back to the election of John Quincy Adams, the raw data looks like this:
Electoral College  Party  Popular vote Margin (%) 
Percentage of EC 

John Quincy Adams  D.R.  0.1044  0.27  0.3218 
Andrew Jackson  Dem.  0.1225  0.58  0.68 
Andrew Jackson  Dem.  0.1781  0.55  0.7657 
Martin Van Buren  Dem.  0.14  0.58  0.5782 
William Henry Harrison  Whig  0.0605  0.80  0.7959 
James Polk  Dem.  0.0145  0.79  0.6182 
Zachary Taylor  Whig  0.0479  0.73  0.5621 
Franklin Pierce  Dem.  0.0695  0.70  0.8581 
James Buchanan  Dem.  0.12  0.79  0.5878 
Abraham Lincoln  Rep.  0.1013  0.81  0.5941 
Abraham Lincoln  Rep.  0.1008  0.74  0.9099 
Ulysses Grant  Rep.  0.0532  0.78  0.7279 
Ulysses Grant  Rep.  0.12  0.71  0.8195 
Rutherford Hayes  Rep.  0.03  0.82  0.5014 
James Garfield  Rep.  0.0009  0.79  0.5799 
Grover Cleveland  Dem.  0.0057  0.78  0.5461 
Benjamin Harrison  Rep.  0.0083  0.79  0.58 
Grover Cleveland  Dem.  0.0301  0.75  0.6239 
William McKinley  Rep.  0.0431  0.79  0.6063 
William McKinley  Rep.  0.0612  0.73  0.6532 
Theodore Roosevelt  Rep.  0.1883  0.65  0.7059 
William Taft  Rep.  0.0853  0.65  0.6646 
Woodrow Wilson  Dem.  0.1444  0.59  0.8192 
Woodrow Wilson  Dem.  0.0312  0.62  0.5217 
Warren Harding  Rep.  0.2617  0.49  0.7608 
Calvin Coolidge  Rep.  0.2522  0.49  0.7194 
Herbert Hoover  Rep.  0.1741  0.57  0.8362 
Franklin Roosevelt  Dem.  0.1776  0.57  0.8889 
Franklin Roosevelt  Dem.  0.2426  0.61  0.9849 
Franklin Roosevelt  Dem.  0.0996  0.63  0.8456 
Franklin Roosevelt  Dem.  0.08  0.56  0.8136 
Harry Truman  Dem.  0.0448  0.53  0.5706 
Dwight Eisenhower  Rep.  0.1085  0.63  0.8324 
Dwight Eisenhower  Rep.  0.15  0.61  0.8606 
John Kennedy  Dem.  0.0017  0.6277  0.5642 
Lyndon Johnson  Dem.  0.2258  0.6192  0.9033 
Richard Nixon  Rep.  0.01  0.6084  0.5595 
Richard Nixon  Rep.  0.2315  0.5521  0.9665 
Jimmy Carter  Dem.  0.0206  0.5355  0.55 
Ronald Reagan  Rep.  0.0974  0.5256  0.9089 
Ronald Reagan  Rep.  0.1821  0.5311  0.9758 
George H. W. Bush  Rep.  0.0772  0.5015  0.7918 
Bill Clinton  Dem.  0.0556  0.5523  0.6877 
Bill Clinton  Dem.  0.0851  0.4908  0.7045 
George W. Bush  Rep.  0.0051  0.51  0.5037 
George W. Bush  Rep.  0.0246  0.5527  0.5316 
Barack Obama  Dem.  0.0727  0.5823  0.6784 
Barack Obama  Dem.  0.0386  0.5487  0.6171 
Clearly, the percentage of electoral college votes a candidate depends nonlinearly on the voter turnout percentage and popular vote margin (%) as this nonparametric regression shows:
We therefore chose to perform a nonlinear regression using neural networks, for which our structure was:
As is turns out, this simple neural network structure with one hidden layer gave the lowest test error, which was 0.002496419 in this case.
Now, looking at the most recent national polls for the upcoming election, we see that Hillary Clinton has a 6.1% lead in the popular vote. Our neural network model then predicts the following:
Simulation  Popular Vote Margin  Percentage of Voter Turnout  Predicted Percentage of Electoral College Votes (+/ 0.04996417) 
1  0.061  0.30  0.6607371 
2  0.061  0.35  0.6647464 
3  0.061  0.40  0.6687115 
4  0.061  0.45  0.6726314 
5  0.061  0.50  0.6765048 
6  0.061  0.55  0.6803307 
7  0.061  0.60  0.6841083 
8  0.061  0.65  0.6878366 
9  0.061  0.70  0.6915149 
10  0.061  0.75  0.6951424 
One sees that even for an extremely low voter turnout (30%), at this point Hillary Clinton can expect to win the Electoral College by a margin of 61.078% to 71.07013%, or 328 to 382 electoral college votes. Therefore, what seems like a relatively small lead in the popular vote (6.1%) translates according to this neural network model into a large margin of victory in the electoral college.
One can see that the predicted percentage of electoral college votes really depends on popular vote margin and voter turnout. For example, if we reduce the popular vote margin to 1%, the results are less promising for the leading candidate:
Pop.Vote Margin  Voter Turnout %  E.C. % Win  E.C% Win Best Case  E.C.% Win Worst Case 
0.01  0.30  0.5182854  0.4675000  0.5690708 
0.01  0.35  0.5244157  0.4736303  0.5752011 
0.01  0.40  0.5305820  0.4797967  0.5813674 
0.01  0.45  0.5367790  0.4859937  0.5875644 
0.01  0.50  0.5430013  0.4922160  0.5937867 
0.01  0.55  0.5492434  0.4984580  0.6000287 
0.01  0.60  0.5554995  0.5047141  0.6062849 
0.01  0.65  0.5617642  0.5109788  0.6125496 
0.01  0.70  0.5680317  0.5172463  0.6188171 
0.01  0.75  0.5742963  0.5235109  0.6250817 
One sees that if the popular vote margin is just 1% for the leading candidate, that candidate is not in the clear unless the popular vote exceeds 60%.