## So, What’s Wrong with the Knicks?

As I write this post, the Knicks are currently 12th in the Eastern conference with a record of 22-32. A plethora of people are offering the opinions on what is wrong with the Knicks, and of course, most of it being from ESPN and the New York media, most of it is incorrect/useless, here are some examples:

A while ago, I wrote this paper based on statistical learning that shows the common characteristics for NBA playoff teams. Basically, I obtained the following important result:

This classification tree shows along with arguments in the paper, that while the most important factor in teams making the playoffs tends to be the opponent number of assists per game, there are paths to the playoffs where teams are not necessarily strong in this area. Specifically, for the Knicks, as of today, we see that:

opp. Assists / game : 22.4 > 20. 75, STL / game: 7. 2 < 8.0061, TOV / game : 14.1 < 14.1585, DRB / game: 33.8 > 29.9024, opp. TOV / game: 13.0 < 13.1585.

So, one sees that what is keeping the Knicks out of the playoffs is specifically pressure defense, in that, they are not forcing enough turnovers per game. Ironically, they are very close to the threshold, but, it is not enough.

A probability density approximation of the Knicks’ Opp. TOV/G is as follows:

This PDF has the approximate functional form:

P(oTOV) =

Therefore, by computing:

$\int_{A}^{\infty} P(oTOV) d(oTOV)$,

=

,

where Erfc is the complementary error function, and is given by:

$erfc(z) = \frac{2}{\sqrt{\pi}} \int_{z}^{\infty} e^{-t^2} dt$

Given that the threshold for playoff-bound teams is more than 13.1585 opp. TOV/game, setting A = 13 above, we obtain: 0.435. This means that the Knicks have roughly a 43.5% chance of forcing more than 13 TOV in any single game. Similarly, setting A = 14, one obtains: 0.3177. This means that the Knicks have roughly a 31.77% chance of forcing more than 14 TOV in any single game, and so forth.

Therefore, one concludes that while the Knicks problems are defensive-oriented, it is specifically related to pressure defense and forcing turnovers.

By: Dr. Ikjyot Singh Kohli, About the Author

## Optimal Positions for NBA Players

I was thinking about how one can use the NBA’s new SportVU system to figure out optimal positions for players on the court. One of the interesting things about the SportVU system is that it tracks player $(x,y)$ coordinates on the court. Presumably, it also keeps track of whether or not a player located at $(x,y)$ makes a shot or misses it. Let us denote a player making a shot by $1$, and a player missing a shot by $0$. Then, one essentially will have data in the form $(x,y, \text{1/0})$.

One can then use a logistic regression to determine the probability that a player at position $(x,y)$ will make a shot:

$p(x,y) = \frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}$

The main idea is that the parameters $\beta_0, \beta_1, \beta_2$ uniquely characterize a given player’s probability of making a shot.

As a coaching staff from an offensive perspective, let us say we wish to position players as to say they have a very high probability of making a shot, let us say, for demonstration purposes 99%. This means we must solve the optimization problem:

$\frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)} = 0.99$

$\text{s.t. } 0 \leq x \leq 28, \quad 0 \leq y \leq 47$

(The constraints are determined here by the x-y dimensions of a standard NBA court).

This has the following solutions:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad \frac{-1. \beta _0-28. \beta _1+4.59512}{\beta _2} \leq y$

with the following conditions:

One can also have:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad y \leq 47$

with the following conditions:

Another solution is:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}$

with the following conditions:

The fourth possible solution is:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}$

with the following conditions:

In practice, it should be noted, that it is typically unlikely to have a player that has a 99% probability of making a shot.

To put this example in more practical terms, I generated some random data (1000 points) for a player in terms of $(x,y)$ coordinates and whether he made a shot from that distance or not. The following scatter plot shows the result of this simulation:

In this plot, the red dots indicate a player has made a shot (a response of 1.0) from the $(x,y)$ coordinates given, while a purple dot indicates a player has missed a shot from the $(x,y)$ coordinates given (a response of 0.0).

Performing a logistic regression on this data, we obtain that $\beta_0 = 0, \beta_1 = 0.00066876, \beta_2 = -0.00210949$.

Using the equations above, we see that this player has a maximum probability of $58.7149 \%$ of making a shot from a location of $(x,y) = (0,23)$, and a minimum probability of $38.45 \%$ of making a shot from a location of $(x,y) = (28,0)$.

## The Mathematics of “Filling the Triangle”

I’ve been fascinated by the triangle offense for a long time. I think it is a beautiful way to play basketball, and the right way to play basketball, in the half-court, a “system-based” way to play. For those of you that are interested, I highly recommend Tex Winter’s classic book on the topic.

There is this brief video as well where Tex Winter explains how the triangle offense and a basketball are grounded in geometric principles:

I don’t think people recognize though how deep of a geometry problem this is actually. Looking at when the triangle is filled, as in the video above, we have the following situation:

The problem I wanted to study was given 5 players’ random positions on the court, could a series of equations be solved yielding (x,y) coordinates that would yield where players should “go” to fill the triangle?

Using simple geometry, from the diagram above, we see that each player’s position in the triangle offense is governed by the following system of nonlinear equations:

$\left(x_4-x_2\right) \left(x_4-x_5\right)+\left(y_4-y_2\right) \left(y_4-y_5\right)=\cos (a) \sqrt{\left(x_2-x_4\right){}^2+\left(y_2-y_4\right){}^2} \sqrt{\left(x_4-x_5\right){}^2+\left(y_4-y_5\right){}^2}$

$\left(x_4-x_2\right) \left(x_2-x_5\right)+\left(y_4-y_2\right) \left(y_2-y_5\right)=\cos (b) \sqrt{\left(x_2-x_4\right){}^2+\left(y_2-y_4\right){}^2} \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2}$

$\left(x_2-x_5\right) \left(x_4-x_5\right)+\left(y_2-y_5\right) \left(y_4-y_5\right)=\cos (c) \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2} \sqrt{\left(x_4-x_5\right){}^2+\left(y_4-y_5\right){}^2}$

$\left(x_2-x_1\right) \left(x_2-x_5\right)+\left(y_2-y_1\right) \left(y_2-y_5\right)=\cos (d) \sqrt{\left(x_1-x_2\right){}^2+\left(y_1-y_2\right){}^2} \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2}$

$\left(x_2-x_1\right) \left(x_1-x_5\right)+\left(y_2-y_1\right) \left(y_1-y_5\right)=\cos (e) \sqrt{\left(x_1-x_2\right){}^2+\left(y_1-y_2\right){}^2} \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2}$

$\left(x_1-x_5\right) \left(x_2-x_5\right)+\left(y_1-y_5\right) \left(y_2-y_5\right)=\cos (f) \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2} \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2}$

$\left(x_1-x_3\right) \left(x_1-x_5\right)+\left(y_1-y_3\right) \left(y_1-y_5\right)=\cos (h) \sqrt{\left(x_1-x_3\right){}^2+\left(y_1-y_3\right){}^2} \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2}$

$\left(x_1-x_3\right) \left(x_3-x_5\right)+\left(y_1-y_3\right) \left(y_3-y_5\right)=\cos (i) \sqrt{\left(x_1-x_3\right){}^2+\left(y_1-y_3\right){}^2} \sqrt{\left(x_3-x_5\right){}^2+\left(y_3-y_5\right){}^2}$

$\left(x_1-x_5\right) \left(x_3-x_5\right)+\left(y_1-y_5\right) \left(y_3-y_5\right)=\cos (g) \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2} \sqrt{\left(x_3-x_5\right){}^2+\left(y_3-y_5\right){}^2}$

Further, the angles obviously must satisfy the following constraints:

$a + b + c = \pi, \quad d + e + f = \pi, \quad g + h + i = \pi$

Finally, we require that each player be about 15-20 feet apart in the triangle offense (because the offense is predicated on spacing), and thus have some additional constraints:

$15\leq \sqrt{\left(x_2-x_4\right){}^2+\left(y_2-y_4\right){}^2}\leq 20$

$15\leq \sqrt{\left(x_4-x_5\right){}^2+\left(y_4-y_5\right){}^2}\leq 20$

$15\leq \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2}\leq 20$

$15\leq \sqrt{\left(x_1-x_2\right){}^2+\left(y_1-y_2\right){}^2}\leq 20$

$15\leq \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2}\leq 20$

$15\leq \sqrt{\left(x_1-x_3\right){}^2+\left(y_1-y_3\right){}^2}\leq 20$

$15\leq \sqrt{\left(x_3-x_5\right){}^2+\left(y_3-y_5\right){}^2}\leq 20$

Solving this highly nonlinear system of equations with constraints is not a trivial problem! It fact, because of the high degree of nonlinearity and dimension of the problem, it is safe to assume that no closed-form solution exists, and therefore, must be solved numerically.

For this task, we used MATLAB, and experimented with the lsqnonlin() and fsolve() commands. The only issue is that (as with all such numerical algorithms) convergence depends very highly on the choice of initial conditions. It is very difficult to choose a priori this many initial conditions, so I wrote a script that randomized initial conditions. I then ran several numerical experiments and obtained the following results:

In the plot above, I have labeled the plots that converged to the triangle formation with the title “this one”. In addition, the five black circles denote the initial positions of the players on the court before they fill the triangles in the offense. One sees just by the diagram above, how difficult such a problem is to solve mathematically, even through a numerical approach. Running more trials would perhaps yield better results, but, it works! I am truly fascinated by this. In the coming days, I will work on optimizing the numerical algorithm, and post my updates as they come.

Here is an animation of one of the scenarios above when the algorithm converges correctly:

In this animation above, the black dots represent the positions of the players on the court. They begin at initial (random) positions and attempt to fill the triangles as described above.

## Metrics for GSW vs. OKC Game 6 Second Half

Continuing with the live metrics employed yesterday, here is an analysis of the second half of the Warriors-Thunder Game 6.

Here is a plot of the various time series of relevant statistical variables:

One can see from this plot for example, the exact point in time when OKC loses control of the game.

Further, here are the correlation coefficients of the variables above:

One sees there is a tremendously strong anti-correlation between OKC’s lead and GSW 3PT%, while there is a somewhat strong correlation between OKC’s lead and their 2PT%. This perhaps means that for Game 7, OKC’s 3PT defense needs to greatly improve along with maintaining their 2PT%, which, as can be seen from the plot above, dropped off towards the end of the game.

## Live Metrics for NBA Games

Yesterday for the first time, I took the playoff game between Cleveland and Toronto as an opportunity to test out a script I wrote in R that keeps track of key statistics during a game in real time (well, every 30 seconds). Based on previous work, it is evident that championship-calibre teams are the ones that have excellent 2PT-FG% and the ability to draw fouls, so I tracked these during the game, and I came up with the following plot of several time series:

One sees for example that while Toronto started off the game with a much higher 2PT FG%, towards the end Cleveland ended up winning that battle.

A video of this animation is as follows (set the YouTube player to 1080p + FullScreen for Max Quality!)

An interesting question to ask is how are these series correlated? Well, let’s see:

One sees immediately from the correlation plot above that there is a very strong correlation between Cleveland’s point difference  and Toronto’s personal fouls, with some strong correlations attributed to Cleveland’s 2-Point FG% as well.  The equal and opposite is true for Toronto’s point difference. It seems that during a game of this intensity in the playoffs, drawing fouls is a very important factor in determining which team leads and eventually wins in the game combined with 2-Point field goal percentage.