The “Interference” of Phil Jackson

By: Dr. Ikjyot Singh Kohli

So, I came across this article today by Matt Moore on CBSSports, who basically once again has taken to the web to bash the Triangle Offense. Of course, much of what he claims (like much of the Knicks media) is flat-out wrong based on very primitive and simplistic analysis, and I will point it out below. Further, much of this article seems to motivated by several comments Carmelo Anthony made recently expressing his dismay at Jeff Hornacek moving away from the “high-paced” offense that the Knicks were running before the All-Star break:

“I think everybody was trying to figure everything out, what was going to work, what wasn’t going to work,’’ Anthony said in the locker room at the former Delta Center. “Early in the season, we were winning games, went on a little winning streak we had. We were playing a certain way. We went away from that, started playing another way. Everybody was trying to figure out: Should we go back to the way we were playing, or try to do something different?’’

Anthony suggested he liked the Hornacek way.

“I thought earlier we were playing faster and more free-flow throughout the course of the game,’’ Anthony said. “We kind of slowed down, started settling it down. Not as fast. The pace slowed down for us — something we had to make an adjustment on the fly with limited practice time, in the course of a game. Once you get into the season, it’s hard to readjust a whole system.’’

First, it is well-known that the Knicks have been implementing more of the triangle offense since All-Star break. All-Star Weekend was Feb 17-19, 2017. The Knicks record before All-Star weekend was amusingly 23-34, which is 11 games below .500 and is nowhere mentioned in any of these articles, and is also not mentioned (realized?) by Carmelo. 

Anyhow, the question is as follows. If Hornacek was allowed to continue is non-triangle ways of pushing the ball/higher pace (What Carmelo claims he liked), would the Knicks have made the playoffs? Probably not. I claim this to be the case based on a detailed machine-learning-based analysis of playoff-eligible teams that has been available for sometime now. In fact, what is perhaps of most importance from this paper is the following classification tree that determines whether a team is playoff-eligible or not:

img_4304

So, these are the relevant factors in determining whether or not a team in a given season makes the playoffs. (Please see the paper linked above for details on the justification of these results.)

Looking at these predictor variables for the Knicks up to the All-Star break.

  1. Opponent Assists/Game: 22.44
  2. Steals/Game: 7.26
  3. TOV/Game: 13.53
  4. DRB/Game: 33.65
  5. Opp.TOV/Game: 12.46

Since Opp.TOV/Game = 12.46 < 13.16, the Knicks would actually be predicted to miss the NBA Playoffs. In fact, if current trends were allowed to continue, the so-called “Hornacek trends”, one can compute the probability of the Knicks making the playoffs:

knickspdfoTOV1

From this probability density function, we can calculate that the probability of the Knicks making the playoffs was 36.84%. The classification tree also predicted that the Knicks would miss the playoffs. So, what is being missed by Carmelo, Matt Moore, and the like is the complete lack of pressure defense, hence, the insufficient amount of opponent TOV/G. So, it is completely incorrect to claim that the Knicks were somehow “Destined for glory” under Hornacek’s way of doing this. This is exacerbated by the fact that the Knicks’ opponent AST/G pre-All-Star break was already pretty high at 22.44.

The question now is how have the Knicks been doing since Phil Jackson’s supposed interference and since supposedly implementing the triangle in a more complete sense? (On a side note, I still don’t think you can partially implement the triangle, I think it needs a proper off-season implementation as it is a complete system).

Interestingly enough, the Knicks opponent assists per game (which, according to the machine learning analysis is the most relevant factor in determining whether a team makes the playoffs) from All-Star weekend to the present-day is an impressive 20.642/Game. By the classification tree above, this actually puts the Knicks safely in playoff territory, in the sense of being classified as a playoff team, but it is too little, too late.

The defense has actually improved significantly with respect to the key relevant statistic of opponent AST/G. (Note that, as will be shown in a future article, DRTG and ORTG are largely useless statistics in determining a team’s playoff eligibility, another point completely missed in Moore’s article) since the Knicks have started to implement the triangle more completely.

The problem is that it is obviously too little, too late at this point. I would argue based on this analysis, that Phil Jackson should have actually interfered earlier in the season. In fact, if the Knicks keep their opponent Assists/game below 20.75/game next season (which is now very likely, if current trends continue), the Knicks would be predicted to make the playoffs by the above machine learning analysis. 

Finally, I will just make this point. It is interesting to look at Phil Jackson teams that were not filled/packed with dominant players. As the saying goes, unfortunately, “Phil Jackson’s success had nothing to do with the triangle, but, because he had Shaq/Kobe, Jordan/Pippen, etc… ”

Well, let’s first look at the 1994-1995 Chicago Bulls, a team that did not have Michael Jordan, but ran the triangle offense completely. Per the relevant statistics above:

  1. Opp. AST/G = 20.9
  2. STL/G = 9.7
  3. AST/G = 24.0
  4. Opp. TOV/G = 18.1

These are remarkable defensive numbers, which supports Phil’s idea, that the triangle offense leads to good defense.

 

 

So, What’s Wrong with the Knicks?

By: Dr. Ikjyot Singh Kohli

As I write this post, the Knicks are currently 12th in the Eastern conference with a record of 22-32. A plethora of people are offering the opinions on what is wrong with the Knicks, and of course, most of it being from ESPN and the New York media, most of it is incorrect/useless, here are some examples:

  1. The Bulls are following the Knicks’ blueprint for failure and …
  2. Spike Lee ‘still believes’ in Melo, says time for Phil Jackson to go
  3. 25 reasons being a New York Knicks fan is the most depressing …
  4. Carmelo Anthony needs to escape the Knicks
  5. Another Awful Week for Knicks

A while ago, I wrote this paper based on statistical learning that shows the common characteristics for NBA playoff teams. Basically, I obtained the following important result:

img_4304

This classification tree shows along with arguments in the paper, that while the most important factor in teams making the playoffs tends to be the opponent number of assists per game, there are paths to the playoffs where teams are not necessarily strong in this area. Specifically, for the Knicks, as of today, we see that:

opp. Assists / game : 22.4 > 20. 75, STL / game: 7. 2 < 8.0061, TOV / game : 14.1 < 14.1585, DRB / game: 33.8 > 29.9024, opp. TOV / game: 13.0 < 13.1585.

So, one sees that what is keeping the Knicks out of the playoffs is specifically pressure defense, in that, they are not forcing enough turnovers per game. Ironically, they are very close to the threshold, but, it is not enough.

A probability density approximation of the Knicks’ Opp. TOV/G is as follows:

tovpgameplot1

 

This PDF has the approximate functional form:

P(oTOV) =

knicksotovg

Therefore, by computing:

\int_{A}^{\infty} P(oTOV) d(oTOV),

=

knicksotoverfc,

where Erfc is the complementary error function, and is given by:

erfc(z) = \frac{2}{\sqrt{\pi}} \int_{z}^{\infty} e^{-t^2} dt

 

Given that the threshold for playoff-bound teams is more than 13.1585 opp. TOV/game, setting A = 13 above, we obtain: 0.435. This means that the Knicks have roughly a 43.5% chance of forcing more than 13 TOV in any single game. Similarly, setting A = 14, one obtains: 0.3177. This means that the Knicks have roughly a 31.77% chance of forcing more than 14 TOV in any single game, and so forth.

Therefore, one concludes that while the Knicks problems are defensive-oriented, it is specifically related to pressure defense and forcing turnovers.

 

 By: Dr. Ikjyot Singh Kohli, About the Author

The Most Optimal Strategy for the Knicks

In a previous article, I showed how one could use data in combination with advanced probability techniques to determine the optimal shot / court positions for LeBron James. I decided to use this algorithm on the Knicks’ starting 5, and obtained the following joint probability density contour plots:

One sees that the Knicks offensive strategy is optimal if and only if players gets shots as close to the basket as possible. If this is the case, the players have a high probability of making shots even if defenders are playing them tightly. This means that the Knicks would be served best by driving in the paint, posting up, and Porzingis NOT attempting a multitude of three point shots.

By the way, a lot of people are convinced nowadays that someone like Porzingis attempting 3’s is a sign of a good offense, as it is an optimal way to space the floor. I am not convinced of this. Spacing the floor geometrically translates to a multi-objective nonlinear optimization problem. In particular, let (x_i, y_i) represent the (x-y)-coordinates of a player on the floor. Spreading the floor means one must maximize (simultaneously) each element of the following distance metric:

distancematrix

subject to -14 \leq x_i \leq 14, 0 \leq y_i \leq 23.75. While a player attempting 3-point shots may be one way to solve this problem, I am not convinced that it is a unique solution to this optimization problem. In fact, I am convinced that there are a multiple of solutions to this optimization problem.

This solution is slightly simpler if one realizes that the metric above is symmetric, so that there are only 11 independent components.

The Mathematics of “Filling the Triangle”

I’ve been fascinated by the triangle offense for a long time. I think it is a beautiful way to play basketball, and the right way to play basketball, in the half-court, a “system-based” way to play. For those of you that are interested, I highly recommend Tex Winter’s classic book on the topic.

There is this brief video as well where Tex Winter explains how the triangle offense and a basketball are grounded in geometric principles:

 

I don’t think people recognize though how deep of a geometry problem this is actually. Looking at when the triangle is filled, as in the video above, we have the following situation:

trianglesetup
The 3 triangles that form when one triangle is filled involving all 5 players. The letters a,b,c,d,e,f,g,h,i denote the angles within the triangles. We are assuming NBA court dimensions where the 1/2 court is 47′ long and the team bench area which roughly corresponds to the top of the three-point line is 28′ from the baseline.

The problem I wanted to study was given 5 players’ random positions on the court, could a series of equations be solved yielding (x,y) coordinates that would yield where players should “go” to fill the triangle? 

Using simple geometry, from the diagram above, we see that each player’s position in the triangle offense is governed by the following system of nonlinear equations:

\left(x_4-x_2\right) \left(x_4-x_5\right)+\left(y_4-y_2\right) \left(y_4-y_5\right)=\cos (a) \sqrt{\left(x_2-x_4\right){}^2+\left(y_2-y_4\right){}^2} \sqrt{\left(x_4-x_5\right){}^2+\left(y_4-y_5\right){}^2}

\left(x_4-x_2\right) \left(x_2-x_5\right)+\left(y_4-y_2\right) \left(y_2-y_5\right)=\cos (b) \sqrt{\left(x_2-x_4\right){}^2+\left(y_2-y_4\right){}^2} \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2}

\left(x_2-x_5\right) \left(x_4-x_5\right)+\left(y_2-y_5\right) \left(y_4-y_5\right)=\cos (c) \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2} \sqrt{\left(x_4-x_5\right){}^2+\left(y_4-y_5\right){}^2}

\left(x_2-x_1\right) \left(x_2-x_5\right)+\left(y_2-y_1\right) \left(y_2-y_5\right)=\cos (d) \sqrt{\left(x_1-x_2\right){}^2+\left(y_1-y_2\right){}^2} \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2}

\left(x_2-x_1\right) \left(x_1-x_5\right)+\left(y_2-y_1\right) \left(y_1-y_5\right)=\cos (e) \sqrt{\left(x_1-x_2\right){}^2+\left(y_1-y_2\right){}^2} \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2}

\left(x_1-x_5\right) \left(x_2-x_5\right)+\left(y_1-y_5\right) \left(y_2-y_5\right)=\cos (f) \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2} \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2}

\left(x_1-x_3\right) \left(x_1-x_5\right)+\left(y_1-y_3\right) \left(y_1-y_5\right)=\cos (h) \sqrt{\left(x_1-x_3\right){}^2+\left(y_1-y_3\right){}^2} \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2}

\left(x_1-x_3\right) \left(x_3-x_5\right)+\left(y_1-y_3\right) \left(y_3-y_5\right)=\cos (i) \sqrt{\left(x_1-x_3\right){}^2+\left(y_1-y_3\right){}^2} \sqrt{\left(x_3-x_5\right){}^2+\left(y_3-y_5\right){}^2}

\left(x_1-x_5\right) \left(x_3-x_5\right)+\left(y_1-y_5\right) \left(y_3-y_5\right)=\cos (g) \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2} \sqrt{\left(x_3-x_5\right){}^2+\left(y_3-y_5\right){}^2}

Further, the angles obviously must satisfy the following constraints:

a + b + c = \pi, \quad d + e + f = \pi, \quad g + h + i = \pi

Finally, we require that each player be about 15-20 feet apart in the triangle offense (because the offense is predicated on spacing), and thus have some additional constraints:

15\leq \sqrt{\left(x_2-x_4\right){}^2+\left(y_2-y_4\right){}^2}\leq 20

15\leq \sqrt{\left(x_4-x_5\right){}^2+\left(y_4-y_5\right){}^2}\leq 20

15\leq \sqrt{\left(x_2-x_5\right){}^2+\left(y_2-y_5\right){}^2}\leq 20

15\leq \sqrt{\left(x_1-x_2\right){}^2+\left(y_1-y_2\right){}^2}\leq 20

15\leq \sqrt{\left(x_1-x_5\right){}^2+\left(y_1-y_5\right){}^2}\leq 20

15\leq \sqrt{\left(x_1-x_3\right){}^2+\left(y_1-y_3\right){}^2}\leq 20

15\leq \sqrt{\left(x_3-x_5\right){}^2+\left(y_3-y_5\right){}^2}\leq 20

Solving this highly nonlinear system of equations with constraints is not a trivial problem! It fact, because of the high degree of nonlinearity and dimension of the problem, it is safe to assume that no closed-form solution exists, and therefore, must be solved numerically.

For this task, we used MATLAB, and experimented with the lsqnonlin() and fsolve() commands. The only issue is that (as with all such numerical algorithms) convergence depends very highly on the choice of initial conditions. It is very difficult to choose a priori this many initial conditions, so I wrote a script that randomized initial conditions. I then ran several numerical experiments and obtained the following results:

In the plot above, I have labeled the plots that converged to the triangle formation with the title “this one”. In addition, the five black circles denote the initial positions of the players on the court before they fill the triangles in the offense. One sees just by the diagram above, how difficult such a problem is to solve mathematically, even through a numerical approach. Running more trials would perhaps yield better results, but, it works! I am truly fascinated by this. In the coming days, I will work on optimizing the numerical algorithm, and post my updates as they come.

Here is an animation of one of the scenarios above when the algorithm converges correctly:

In this animation above, the black dots represent the positions of the players on the court. They begin at initial (random) positions and attempt to fill the triangles as described above.

Thanks for reading!

How close were The Knicks to making the Playoffs?

It is another New York Knicks season where fans have to wait until next year to see if the Knicks will make the playoffs or not.

Yesterday, there was a lot buzz around the idea that Phil Jackson may want to keep Kurt Rambis on as head coach, and as usual, there were numerous people that were very vocal in their criticism.

However, in actuality, the Knicks were much closer to the playoffs than people realize. A previous post of mine described in detail using data science methodologies the criteria a team must meet to have a high probability of making the playoffs. 

Using the decision tree generated in that post, I evaluated the Knicks playoffs chances this season based on possible playoff criteria scenarios, and found the following:

knicksplayoffs

One sees that a big problem was the Knicks margin of victory, which was too negative. However, even in this case, there are possibilities that existed that would have allowed the Knicks to make the playoffs. For example, a slight increase in the Knicks’ opponent’s field goal attempts or a very slight decrease in the Knicks’ field goal attempts per game would have greatly impacted their playoff chances.

These metrics can easily be adjusted for the upcoming season which will likely require a more organized execution of the triangle offense and discipline on both ends of the floor. They really are almost there!

Stephen Curry and Mahmoud Abdul-Rauf?

As usual, Phil Jackson made another interesting tweet today:

And, as usual received many criticisms from “Experts”, who just looked at the raw numbers from each players, and saw that there is just no way such a statement is justified, but it is not that simple!

When you compare two players (or two objects) who have very different data feature values, it is not that they can’t be compared, you must effectively normalize the data somehow to make the sets comparable.

In this case, I used the data from Basketball-Reference.com to compare Chris Jackson’s 6 seasons in Denver to Stephen Curry’s last 6 seasons (including this one) and took into account 45 different statistical measures, and came up with the following correlation matrix/similarity matrix plot:

  

 
Dark blue circles indicate a strong correlation, while dark red circles indicate a weak correlation between two sets of features. 

What would be of interest in an analysis like this is to examine the diagonal of this matrix, which offers a direct comparison between the two players: 

  
One can see that there are many features that have strong correlation coefficients. 

Therefore, it is true that Stephen Curry and Chris Jackson do in fact share many strong similarities! 

Ranking NBA Players

The 2015-2016 NBA season is dawning upon us, and as usual, ESPN has been doing their usual #NBArank, where they are ranking players based on the following non-rigorous methodology:

We asked, “Which player will be better in 2015-16?” To decide, voters had to consider both the quality and quantity of each player’s contributions to his team’s ability to win games. More than 100 voters weighed in on nearly 30,000 pairs of players.

Of course, while I suspect this type of thing has to be just for fun , it has generated a great deal of controversy with many arguments ensuing between fans. For example, Kobe Bryant being ranked 93rd overall in the NBA this year gained a fair deal of criticism from Stephen A. Smith on ESPN First Take.

In general, at least to me, it does not make any sense to rank players from different positions that bring different strengths to a team sport such as basketball. That is, what does it really mean for Tim Duncan to be better than Russell Westbrook (or vice-versa), or Kevin Love to be better than Mike Conley (or vice-versa), etc…

From a mathematical/data science perspective, the only sensible thing to do is to take all the players in the league, and apply a clustering algorithm such as K-means clustering to group players of similar talents and contributions into groups. This is not a trivial thing to do, but it is the sort of thing that data scientists do all the time! For this analysis, I went to Basketball-Reference.com, and pulled out last season’s (2014-2015) per game averages of every player in the league, looking at 25 statistical factors from FGA, FG% to STL, BLK, and TOV. One can see that this is a 25-dimensional problem. 

Our goal then is to consider the problem where denoting C_{1}, ... C_{K} as sets containing the observations in each cluster, we want to solve the optimization problem:

\mbox{minimize}_{C_{1},...C_{k}} \left\{\sum_{k=1}^{K} W(C_{k})\right\},

where W is our distance measure. We use the squared Euclidean distance to define the within-cluster variation, and then solve:

latex-image-28

The first thing to do is to decide how many clusters we want to use in our solution. This is done by looking at the within sum of squares (WSS) plot:

wssplotball

First, we will use 3 clusters in our K-means solution. In this case, the between sum of squares versus total sum of squares ratio was 77.0%, indicating a good “fit”). We use three clusters to begin with, because based on visual inspection, the data clusters very nicely into 3 clusters. The plots obtained were as follows:

3cluster3 3cluster2 3cluster1

The three clusters of players can be found in the following PDF File. Note that the blue circles represent Cluster 1, the red circles represent Cluster 2, and the green circles represent Cluster 3.

Next, we dramatically increase the number of clusters to 20 in our K-means solution.

Performing the K-means clustering, we obtain the following sets of scatter plots. (Note that, it is a bit difficult to display a 25×25 plot on here, so I have split them into a series of plots. Note also, that the between sum of squares versus total sum of squares ratio was 94.8 %, indicating a good “fit”):

clusterplot1

clusterplot4 clusterplot3 clusterplot2

The cluster behaviour can be seen more clearly in three dimensions. We now display some examples:

cluster3d1cluster3d2

 The 20 groups of players we obtained can be seen in the PDF file linked below:

nbastatsnewclusters

The legend for the clusters obtained was:

cluster_legend

Two sample group clusters from our analysis are displayed below in the table. It is interesting that the analysis/algorithm provided that Carmelo Anthony and Kobe Bryant  belong in one group/cluster while LaMarcus Aldridge, Lebron James, and Dwyane Wade belong in another cluster.

Group 16 Group 19
Arron.Afflalo.1 Steven.Adams
Carmelo.Anthony LaMarcus.Aldridge
Patrick.Beverley Bradley.Beal
Chris.Bosh Andrew.Bogut
Kobe.Bryant Jimmy.Butler
Jose.Calderon DeMarre.Carroll
Michael.Carter.Williams.1 Michael.Carter.Williams
Darren.Collison Mike.Conley
Goran.Dragic.1 DeMarcus.Cousins
Langston.Galloway Anthony.Davis
Kevin.Garnett DeMar.DeRozan
Kevin.Garnett.1 Mike.Dunleavy
Jeff.Green.2 Rudy.Gay
George.Hill Eric.Gordon
Jrue.Holiday Blake.Griffin
Dwight.Howard Tobias.Harris
Brandon.Jennings Nene.Hilario
Enes.Kanter.1 Jordan.Hill
Michael.Kidd.Gilchrist Serge.Ibaka
Brandon.Knight.1 LeBron.James
Kevin.Martin Al.Jefferson
Timofey.Mozgov.2 Wesley.Johnson
Rajon.Rondo.2 Brandon.Knight
Derrick.Rose Kawhi.Leonard
J.R..Smith.2 Robin.Lopez
Jared.Sullinger Kyle.Lowry
Thaddeus.Young.1 Wesley.Matthews
Luc.Mbah.a.Moute
Khris.Middleton
Greg.Monroe
Donatas.Motiejunas
Joakim.Noah
Victor.Oladipo
Tony.Parker
Chandler.Parsons
Zach.Randolph
Andre.Roberson
Rajon.Rondo
P.J..Tucker
Dwyane.Wade
Kemba.Walker
David.West
Russell.Westbrook
Deron.Williams

If we use more clusters, players will obviously be placed into smaller groups. The following clustering results can be seen in the linked PDF files.

  1. 50 Clusters – (between_SS / total_SS =  97.4 %) – PDF File
  2. 70 Clusters – (between_SS / total_SS =  97.8 %) – PDF File
  3. 100 Clusters – (between_SS / total_SS =  98.3 %) – PDF File
  4. 200 Clusters (extreme case) – (between_SS / total_SS =  99.1 %) – PDF File

I did not include the visualizations for these computations because they are quite difficult to visualize.

Looking at the 100 Clusters file, we see two interesting results:

  • In Cluster 16, we have: Carmelo Anthony, Chris Bosh, Kobe Bryant and Kevin Martin
  • In Cluster 74, we have: LaMarcus Aldridge, Anthony Davis, Rudy Gay, Blake Griffin, LeBron James and Russell Westbrook

CONCLUSIONS:

We therefore see that is does not make much mathematical/statistical sense to compare and two pairs of players. In my opinion, the only logical thing to do when ranking players is to decide on rankings within clusters. So, based on the above analysis, it makes sense to ask for example whether Carmelo is a better player than Kobe or whether Lebron is a better player than Westbrook, etc… But, based on last season’s statistics, it doesn’t make much sense to ask whether Kobe is a better player than Westbrook, because they have been clustered differently. I think ESPN could benefit tremendously by using a rigorous approach to these sorts of things which spark many conversations because many people take them seriously.