So, What’s Wrong with the Knicks?

By: Dr. Ikjyot Singh Kohli

As I write this post, the Knicks are currently 12th in the Eastern conference with a record of 22-32. A plethora of people are offering the opinions on what is wrong with the Knicks, and of course, most of it being from ESPN and the New York media, most of it is incorrect/useless, here are some examples:

  1. The Bulls are following the Knicks’ blueprint for failure and …
  2. Spike Lee ‘still believes’ in Melo, says time for Phil Jackson to go
  3. 25 reasons being a New York Knicks fan is the most depressing …
  4. Carmelo Anthony needs to escape the Knicks
  5. Another Awful Week for Knicks

A while ago, I wrote this paper based on statistical learning that shows the common characteristics for NBA playoff teams. Basically, I obtained the following important result:


This classification tree shows along with arguments in the paper, that while the most important factor in teams making the playoffs tends to be the opponent number of assists per game, there are paths to the playoffs where teams are not necessarily strong in this area. Specifically, for the Knicks, as of today, we see that:

opp. Assists / game : 22.4 > 20. 75, STL / game: 7. 2 < 8.0061, TOV / game : 14.1 < 14.1585, DRB / game: 33.8 > 29.9024, opp. TOV / game: 13.0 < 13.1585.

So, one sees that what is keeping the Knicks out of the playoffs is specifically pressure defense, in that, they are not forcing enough turnovers per game. Ironically, they are very close to the threshold, but, it is not enough.

A probability density approximation of the Knicks’ Opp. TOV/G is as follows:



This PDF has the approximate functional form:

P(oTOV) =


Therefore, by computing:

\int_{A}^{\infty} P(oTOV) d(oTOV),



where Erfc is the complementary error function, and is given by:

erfc(z) = \frac{2}{\sqrt{\pi}} \int_{z}^{\infty} e^{-t^2} dt


Given that the threshold for playoff-bound teams is more than 13.1585 opp. TOV/game, setting A = 13 above, we obtain: 0.435. This means that the Knicks have roughly a 43.5% chance of forcing more than 13 TOV in any single game. Similarly, setting A = 14, one obtains: 0.3177. This means that the Knicks have roughly a 31.77% chance of forcing more than 14 TOV in any single game, and so forth.

Therefore, one concludes that while the Knicks problems are defensive-oriented, it is specifically related to pressure defense and forcing turnovers.


 By: Dr. Ikjyot Singh Kohli, About the Author

Basketball Machine Learning Paper Updated 

I have now made a significant update to my applied machine learning paper on predicting patterns among NBA playoff and championship teams, which can be accessed here: arXiv Link . 

The Most Optimal Strategy for the Knicks

In a previous article, I showed how one could use data in combination with advanced probability techniques to determine the optimal shot / court positions for LeBron James. I decided to use this algorithm on the Knicks’ starting 5, and obtained the following joint probability density contour plots:

One sees that the Knicks offensive strategy is optimal if and only if players gets shots as close to the basket as possible. If this is the case, the players have a high probability of making shots even if defenders are playing them tightly. This means that the Knicks would be served best by driving in the paint, posting up, and Porzingis NOT attempting a multitude of three point shots.

By the way, a lot of people are convinced nowadays that someone like Porzingis attempting 3’s is a sign of a good offense, as it is an optimal way to space the floor. I am not convinced of this. Spacing the floor geometrically translates to a multi-objective nonlinear optimization problem. In particular, let (x_i, y_i) represent the (x-y)-coordinates of a player on the floor. Spreading the floor means one must maximize (simultaneously) each element of the following distance metric:


subject to -14 \leq x_i \leq 14, 0 \leq y_i \leq 23.75. While a player attempting 3-point shots may be one way to solve this problem, I am not convinced that it is a unique solution to this optimization problem. In fact, I am convinced that there are a multiple of solutions to this optimization problem.

This solution is slightly simpler if one realizes that the metric above is symmetric, so that there are only 11 independent components.

Analyzing Lebron James’ Offensive Play

Where is Lebron James most effective on the court?

Based on 2015-2016 data, we obtained from the following data which tracks Lebron’s FG% based on defender distance:


From, we then obtained data of Lebron’s FG% based on his shot distance from the basket:


Based on this data, we generated tens of thousands of sample data points to perform a Monte Carlo simulation to obtain relevant probability density functions. We found that the joint PDF was a very lengthy expression(!):


Graphically, this is:


A contour plot of the joint PDF was computed to be:


From this information, we can compute where/when LeBron has the highest probability of making a shot. Numerically, we found that the maximum probability occurs when Lebron’s defender is 0.829988 feet away, while Lebron is 1.59378 feet away from the basket. What is interesting is that this analysis shows that defending Lebron tightly doesn’t seem to be an effective strategy if his shot distance is within 5 feet of the basket. It is only an effective strategy further than 5 feet away from the basket. Therefore, opposing teams have the best chance at stopping Lebron from scoring by playing him tightly and forcing him as far away from the basket as possible.


The Relationship Between The Electoral College and Popular Vote

An interesting machine learning problem: Can one figure out the relationship between the popular vote margin, voter turnout, and the percentage of electoral college votes a candidate wins? Going back to the election of John Quincy Adams, the raw data looks like this:

Electoral College Party Popular vote  Margin (%)


Percentage of EC

John Quincy Adams D.-R. -0.1044 0.27 0.3218
Andrew Jackson Dem. 0.1225 0.58 0.68
Andrew Jackson Dem. 0.1781 0.55 0.7657
Martin Van Buren Dem. 0.14 0.58 0.5782
William Henry Harrison Whig 0.0605 0.80 0.7959
James Polk Dem. 0.0145 0.79 0.6182
Zachary Taylor Whig 0.0479 0.73 0.5621
Franklin Pierce Dem. 0.0695 0.70 0.8581
James Buchanan Dem. 0.12 0.79 0.5878
Abraham Lincoln Rep. 0.1013 0.81 0.5941
Abraham Lincoln Rep. 0.1008 0.74 0.9099
Ulysses Grant Rep. 0.0532 0.78 0.7279
Ulysses Grant Rep. 0.12 0.71 0.8195
Rutherford Hayes Rep. -0.03 0.82 0.5014
James Garfield Rep. 0.0009 0.79 0.5799
Grover Cleveland Dem. 0.0057 0.78 0.5461
Benjamin Harrison Rep. -0.0083 0.79 0.58
Grover Cleveland Dem. 0.0301 0.75 0.6239
William McKinley Rep. 0.0431 0.79 0.6063
William McKinley Rep. 0.0612 0.73 0.6532
Theodore Roosevelt Rep. 0.1883 0.65 0.7059
William Taft Rep. 0.0853 0.65 0.6646
Woodrow Wilson Dem. 0.1444 0.59 0.8192
Woodrow Wilson Dem. 0.0312 0.62 0.5217
Warren Harding Rep. 0.2617 0.49 0.7608
Calvin Coolidge Rep. 0.2522 0.49 0.7194
Herbert Hoover Rep. 0.1741 0.57 0.8362
Franklin Roosevelt Dem. 0.1776 0.57 0.8889
Franklin Roosevelt Dem. 0.2426 0.61 0.9849
Franklin Roosevelt Dem. 0.0996 0.63 0.8456
Franklin Roosevelt Dem. 0.08 0.56 0.8136
Harry Truman Dem. 0.0448 0.53 0.5706
Dwight Eisenhower Rep. 0.1085 0.63 0.8324
Dwight Eisenhower Rep. 0.15 0.61 0.8606
John Kennedy Dem. 0.0017 0.6277 0.5642
Lyndon Johnson Dem. 0.2258 0.6192 0.9033
Richard Nixon Rep. 0.01 0.6084 0.5595
Richard Nixon Rep. 0.2315 0.5521 0.9665
Jimmy Carter Dem. 0.0206 0.5355 0.55
Ronald Reagan Rep. 0.0974 0.5256 0.9089
Ronald Reagan Rep. 0.1821 0.5311 0.9758
George H. W. Bush Rep. 0.0772 0.5015 0.7918
Bill Clinton Dem. 0.0556 0.5523 0.6877
Bill Clinton Dem. 0.0851 0.4908 0.7045
George W. Bush Rep. -0.0051 0.51 0.5037
George W. Bush Rep. 0.0246 0.5527 0.5316
Barack Obama Dem. 0.0727 0.5823 0.6784
Barack Obama Dem. 0.0386 0.5487 0.6171

Clearly, the percentage of electoral college votes a candidate depends nonlinearly on the voter turnout percentage and popular vote margin (%) as this non-parametric regression shows:


We therefore chose to perform a nonlinear regression using neural networks, for which our structure was:


As is turns out, this simple neural network structure with one hidden layer gave the lowest test error, which was 0.002496419 in this case.

Now, looking at the most recent national polls for the upcoming election, we see that Hillary Clinton has a 6.1% lead in the popular vote. Our neural network model then predicts the following:

Simulation Popular Vote Margin Percentage of Voter Turnout Predicted Percentage of Electoral College Votes (+/- 0.04996417)
1 0.061 0.30 0.6607371
2 0.061 0.35 0.6647464
3 0.061 0.40 0.6687115
4 0.061 0.45 0.6726314
5 0.061 0.50 0.6765048
6 0.061 0.55 0.6803307
7 0.061 0.60 0.6841083
8 0.061 0.65 0.6878366
9 0.061 0.70 0.6915149
10 0.061 0.75 0.6951424

One sees that even for an extremely low voter turnout (30%), at this point Hillary Clinton can expect to win the Electoral College by a margin of 61.078% to 71.07013%, or 328 to 382 electoral college votes. Therefore, what seems like a relatively small lead in the popular vote (6.1%) translates according to this neural network model into a large margin of victory in the electoral college.

One can see that the predicted percentage of electoral college votes really depends on popular vote margin and voter turnout. For example, if we reduce the popular vote margin to 1%, the results are less promising for the leading candidate:

Pop.Vote Margin Voter Turnout % E.C. % Win E.C% Win Best Case E.C.% Win Worst Case
0.01 0.30 0.5182854 0.4675000 0.5690708
0.01 0.35 0.5244157 0.4736303 0.5752011
0.01 0.40 0.5305820 0.4797967 0.5813674
0.01 0.45 0.5367790 0.4859937 0.5875644
0.01 0.50 0.5430013 0.4922160 0.5937867
0.01 0.55 0.5492434 0.4984580 0.6000287
0.01 0.60 0.5554995 0.5047141 0.6062849
0.01 0.65 0.5617642 0.5109788 0.6125496
0.01 0.70 0.5680317 0.5172463 0.6188171
0.01 0.75 0.5742963 0.5235109 0.6250817

One sees that if the popular vote margin is just 1% for the leading candidate, that candidate is not in the clear unless the popular vote exceeds 60%.


Breaking Down the 2015-2016 NBA Season

In this article, I will use Data Science / Machine Learning methodologies to break down the real factors separating the playoff from non-playoff teams. In particular, I used the data from to associate 44 predictor variables which each team: “FG” “FGA” “FG.” “X3P” “X3PA” “X3P.” “X2P” “X2PA” “X2P.” “FT” “FTA” “FT.” “ORB” “DRB” “TRB” “AST”   “STL” “BLK” “TOV” “PF” “PTS” “PS.G” “oFG” “oFGA” “oFG.” “o3P” “o3PA” “o3P.” “o2P” “o2PA” “o2P.” “oFT”   “oFTA” “oFT.” “oORB” “oDRB” “oTRB” “oAST” “oSTL” “oBLK” “oTOV” “oPF” “oPTS” “oPS.G”

, where a letter ‘o’ before the last 22 predictor variables indicates a defensive variable. (‘o’ stands for opponent. )

Using principal components analysis (PCA), I was able to project this 44-dimensional data set to a 5-D dimensional data set. That is, the first 5 principal components were found to explain 85% of the variance. 

Here are the various biplots: 

In these plots, the teams are grouped according to whether they made the playoffs or not. 

One sees from this biplot of the first two principal components that the dominant component along the first PC is 3 point attempts, while the dominant component along the second PC is opponent points. CLE and TOR have a high negative score along the second PC indicating a strong defensive performance. Indeed, one suspects that the final separating factor that led CLE to the championship was their defensive play as opposed to 3-point shooting which all-in-all didn’t do GSW any favours. This is in line with some of my previous analyses

Optimal Positions for NBA Players

I was thinking about how one can use the NBA’s new SportVU system to figure out optimal positions for players on the court. One of the interesting things about the SportVU system is that it tracks player (x,y) coordinates on the court. Presumably, it also keeps track of whether or not a player located at (x,y) makes a shot or misses it. Let us denote a player making a shot by 1, and a player missing a shot by 0. Then, one essentially will have data in the form (x,y, \text{1/0}).

One can then use a logistic regression to determine the probability that a player at position (x,y) will make a shot:

p(x,y) = \frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}

The main idea is that the parameters \beta_0, \beta_1, \beta_2 uniquely characterize a given player’s probability of making a shot.

As a coaching staff from an offensive perspective, let us say we wish to position players as to say they have a very high probability of making a shot, let us say, for demonstration purposes 99%. This means we must solve the optimization problem:

\frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)} = 0.99

\text{s.t. } 0 \leq x \leq 28, \quad 0 \leq y \leq 47

(The constraints are determined here by the x-y dimensions of a standard NBA court).

This has the following solutions:

x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad \frac{-1. \beta _0-28. \beta _1+4.59512}{\beta _2} \leq y

with the following conditions:


One can also have:

x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad y \leq 47

with the following conditions:


Another solution is:

x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}

with the following conditions:


The fourth possible solution is:

x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}

with the following conditions:


In practice, it should be noted, that it is typically unlikely to have a player that has a 99% probability of making a shot.

To put this example in more practical terms, I generated some random data (1000 points) for a player in terms of (x,y) coordinates and whether he made a shot from that distance or not. The following scatter plot shows the result of this simulation:


In this plot, the red dots indicate a player has made a shot (a response of 1.0) from the (x,y) coordinates given, while a purple dot indicates a player has missed a shot from the (x,y) coordinates given (a response of 0.0).

Performing a logistic regression on this data, we obtain that \beta_0 = 0, \beta_1 = 0.00066876, \beta_2 = -0.00210949.

Using the equations above, we see that this player has a maximum probability of 58.7149 \% of making a shot from a location of (x,y) = (0,23), and a minimum probability of 38.45 \% of making a shot from a location of (x,y) = (28,0).