Mathematics Behind The Triangle Offense

It was pointed out to me recently that a few of the articles I have written describing the detailed geometric structure behind the triangle offense is scattered in various places around my blog, so here is a list of the articles in one convenient place: 

  • The Mathematics of Filling the Triangle (First article) 
  • Group Theory and Dynamical Systems Theory Behind The Triangle Offense 
  • A Demonstration That The Triangle Offense is the most efficient/optimal way for 5 players to space the floor.
  • By: Dr. Ikjyot Singh Kohli (About the Author)

    1. The Mathematics of Filling the Triangle (First article) 
    2. Group Theory and Dynamical Systems Theory Behind The Triangle Offense 
    3. A Demonstration That The Triangle Offense is the most efficient/optimal way for 5 players to space the floor.

    By: Dr. Ikjyot Singh Kohli (About the Author)

    Basketball Machine Learning Paper Updated 

    I have now made a significant update to my applied machine learning paper on predicting patterns among NBA playoff and championship teams, which can be accessed here: arXiv Link . 

    Basketball – A Game of Geometry

    In a previous post, I described the most optimal offensive strategy for the Knicks based on developing relevant joint probability density functions.

    In this post, I attempt a solution to the following problem:

    Given 5 players on the court, how can one determine (x,y) coordinates for each player such that the spacing / distance between each player is maximized. Thus, mathematically providing a solution in which the arrangement of these 5 players is optimal from an offensive strategy standpoint. The idea is that such an arrangement of these 5 players will always stretch the defense to the maximum.

    The problem is then stated as follows. Let (x_i, y_i) be the x and y coordinates of player i on the court. We wish to solve:

    optimproblem

    Problems of this type are known as multi-objective optimization problems, and in general are quite difficult to solve. Note that in setting up the coordinate system for this problem, we have for convenience placed the basket at (x,y) = (0,0), i.e., at the origin.

    Now, for solving this problem we used the Non-dominated Sorting Genetic Algorithm-II (NSGA-II) in the MCO package in R.

    In general, what I found were that there are many possible solutions to this problem, all of which are Pareto optimal. Here are some of these results.

     

    xyplot1

    xyplot2

    xyplot3

    Here are some more plots of of player coordinates clearly showing the origin point (which as mentioned earlier, is the location of the basket):

    Each plot above shows the x-y coordinates of players on the floor such that the distance between them is a maximum. Thus, these are some possible configurations of 5 players on the floor where the defense of the opposing team would be stretched to a maximum. What is even more interesting is that in each solution displayed above, and indeed, each numerical solution we found that is not displayed here, there is at least one triangle formation. It can therefore be said that the triangle offense is amongst the most optimal offensive strategies that produces maximum spacing of offensive players while simultaneously stretching the defense to a maximum as well. Here is more on the unpredictability of the triangle offense and its structure. 

    Based on these coordinates, we obtained the following distance matrices showing the maximum / optimal possible distance between player i and player j:

    distances

    Above, we show 5 possible distance matrices out of the several generated for brevity. So, one can see that looking at the fifth matrix for example, players are at a maximum and optimal distance from each other if for example the distance between player 1 and 2 is 9.96 feet, while the distance between player 3 and 4 is 18.703 feet, while the distance between player 4 and 5 is 4.96 feet, and so on.

    The Most Optimal Strategy for the Knicks

    In a previous article, I showed how one could use data in combination with advanced probability techniques to determine the optimal shot / court positions for LeBron James. I decided to use this algorithm on the Knicks’ starting 5, and obtained the following joint probability density contour plots:

    One sees that the Knicks offensive strategy is optimal if and only if players gets shots as close to the basket as possible. If this is the case, the players have a high probability of making shots even if defenders are playing them tightly. This means that the Knicks would be served best by driving in the paint, posting up, and Porzingis NOT attempting a multitude of three point shots.

    By the way, a lot of people are convinced nowadays that someone like Porzingis attempting 3’s is a sign of a good offense, as it is an optimal way to space the floor. I am not convinced of this. Spacing the floor geometrically translates to a multi-objective nonlinear optimization problem. In particular, let (x_i, y_i) represent the (x-y)-coordinates of a player on the floor. Spreading the floor means one must maximize (simultaneously) each element of the following distance metric:

    distancematrix

    subject to -14 \leq x_i \leq 14, 0 \leq y_i \leq 23.75. While a player attempting 3-point shots may be one way to solve this problem, I am not convinced that it is a unique solution to this optimization problem. In fact, I am convinced that there are a multiple of solutions to this optimization problem.

    This solution is slightly simpler if one realizes that the metric above is symmetric, so that there are only 11 independent components.

    Analyzing Lebron James’ Offensive Play

    Where is Lebron James most effective on the court?

    Based on 2015-2016 data, we obtained from NBA.com the following data which tracks Lebron’s FG% based on defender distance:

    lebrondef

    From Basketball-Reference.com, we then obtained data of Lebron’s FG% based on his shot distance from the basket:

    lebronshotdist

    Based on this data, we generated tens of thousands of sample data points to perform a Monte Carlo simulation to obtain relevant probability density functions. We found that the joint PDF was a very lengthy expression(!):

    lebrondistro

    Graphically, this is:

    lebronjointplot

    A contour plot of the joint PDF was computed to be:

    lebroncontour

    From this information, we can compute where/when LeBron has the highest probability of making a shot. Numerically, we found that the maximum probability occurs when Lebron’s defender is 0.829988 feet away, while Lebron is 1.59378 feet away from the basket. What is interesting is that this analysis shows that defending Lebron tightly doesn’t seem to be an effective strategy if his shot distance is within 5 feet of the basket. It is only an effective strategy further than 5 feet away from the basket. Therefore, opposing teams have the best chance at stopping Lebron from scoring by playing him tightly and forcing him as far away from the basket as possible.

     

    Breaking Down the 2015-2016 NBA Season

    In this article, I will use Data Science / Machine Learning methodologies to break down the real factors separating the playoff from non-playoff teams. In particular, I used the data from Basketball-Reference.com to associate 44 predictor variables which each team: “FG” “FGA” “FG.” “X3P” “X3PA” “X3P.” “X2P” “X2PA” “X2P.” “FT” “FTA” “FT.” “ORB” “DRB” “TRB” “AST”   “STL” “BLK” “TOV” “PF” “PTS” “PS.G” “oFG” “oFGA” “oFG.” “o3P” “o3PA” “o3P.” “o2P” “o2PA” “o2P.” “oFT”   “oFTA” “oFT.” “oORB” “oDRB” “oTRB” “oAST” “oSTL” “oBLK” “oTOV” “oPF” “oPTS” “oPS.G”

    , where a letter ‘o’ before the last 22 predictor variables indicates a defensive variable. (‘o’ stands for opponent. )

    Using principal components analysis (PCA), I was able to project this 44-dimensional data set to a 5-D dimensional data set. That is, the first 5 principal components were found to explain 85% of the variance. 

    Here are the various biplots: 


    In these plots, the teams are grouped according to whether they made the playoffs or not. 

    One sees from this biplot of the first two principal components that the dominant component along the first PC is 3 point attempts, while the dominant component along the second PC is opponent points. CLE and TOR have a high negative score along the second PC indicating a strong defensive performance. Indeed, one suspects that the final separating factor that led CLE to the championship was their defensive play as opposed to 3-point shooting which all-in-all didn’t do GSW any favours. This is in line with some of my previous analyses

    Optimal Positions for NBA Players

    I was thinking about how one can use the NBA’s new SportVU system to figure out optimal positions for players on the court. One of the interesting things about the SportVU system is that it tracks player (x,y) coordinates on the court. Presumably, it also keeps track of whether or not a player located at (x,y) makes a shot or misses it. Let us denote a player making a shot by 1, and a player missing a shot by 0. Then, one essentially will have data in the form (x,y, \text{1/0}).

    One can then use a logistic regression to determine the probability that a player at position (x,y) will make a shot:

    p(x,y) = \frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}

    The main idea is that the parameters \beta_0, \beta_1, \beta_2 uniquely characterize a given player’s probability of making a shot.

    As a coaching staff from an offensive perspective, let us say we wish to position players as to say they have a very high probability of making a shot, let us say, for demonstration purposes 99%. This means we must solve the optimization problem:

    \frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)} = 0.99

    \text{s.t. } 0 \leq x \leq 28, \quad 0 \leq y \leq 47

    (The constraints are determined here by the x-y dimensions of a standard NBA court).

    This has the following solutions:

    x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad \frac{-1. \beta _0-28. \beta _1+4.59512}{\beta _2} \leq y

    with the following conditions:

    constraints1

    One can also have:

    x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad y \leq 47

    with the following conditions:

    constraints2

    Another solution is:

    x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}

    with the following conditions:

    constraints3

    The fourth possible solution is:

    x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}

    with the following conditions:

    constraints4

    In practice, it should be noted, that it is typically unlikely to have a player that has a 99% probability of making a shot.

    To put this example in more practical terms, I generated some random data (1000 points) for a player in terms of (x,y) coordinates and whether he made a shot from that distance or not. The following scatter plot shows the result of this simulation:

    bballoptim5

    In this plot, the red dots indicate a player has made a shot (a response of 1.0) from the (x,y) coordinates given, while a purple dot indicates a player has missed a shot from the (x,y) coordinates given (a response of 0.0).

    Performing a logistic regression on this data, we obtain that \beta_0 = 0, \beta_1 = 0.00066876, \beta_2 = -0.00210949.

    Using the equations above, we see that this player has a maximum probability of 58.7149 \% of making a shot from a location of (x,y) = (0,23), and a minimum probability of 38.45 \% of making a shot from a location of (x,y) = (28,0).