## Basketball Machine Learning Paper Updated

I have now made a significant update to my applied machine learning paper on predicting patterns among NBA playoff and championship teams, which can be accessed here: arXiv Link .

## New Cosmology Paper

New #cosmology paper: https://arxiv.org/pdf/1609.01310.pdf&nbsp;

Using a dynamical systems approach to provide a unifying framework for the AdS, Minkowski, and de Sitter universes. #physics #mathematics #science

## Breaking Down the 2015-2016 NBA Season

In this article, I will use Data Science / Machine Learning methodologies to break down the real factors separating the playoff from non-playoff teams. In particular, I used the data from Basketball-Reference.com to associate 44 predictor variables which each team: “FG” “FGA” “FG.” “X3P” “X3PA” “X3P.” “X2P” “X2PA” “X2P.” “FT” “FTA” “FT.” “ORB” “DRB” “TRB” “AST”   “STL” “BLK” “TOV” “PF” “PTS” “PS.G” “oFG” “oFGA” “oFG.” “o3P” “o3PA” “o3P.” “o2P” “o2PA” “o2P.” “oFT”   “oFTA” “oFT.” “oORB” “oDRB” “oTRB” “oAST” “oSTL” “oBLK” “oTOV” “oPF” “oPTS” “oPS.G”

, where a letter ‘o’ before the last 22 predictor variables indicates a defensive variable. (‘o’ stands for opponent. )

Using principal components analysis (PCA), I was able to project this 44-dimensional data set to a 5-D dimensional data set. That is, the first 5 principal components were found to explain 85% of the variance.

Here are the various biplots:

In these plots, the teams are grouped according to whether they made the playoffs or not.

One sees from this biplot of the first two principal components that the dominant component along the first PC is 3 point attempts, while the dominant component along the second PC is opponent points. CLE and TOR have a high negative score along the second PC indicating a strong defensive performance. Indeed, one suspects that the final separating factor that led CLE to the championship was their defensive play as opposed to 3-point shooting which all-in-all didn’t do GSW any favours. This is in line with some of my previous analyses

## Optimal Positions for NBA Players

I was thinking about how one can use the NBA’s new SportVU system to figure out optimal positions for players on the court. One of the interesting things about the SportVU system is that it tracks player $(x,y)$ coordinates on the court. Presumably, it also keeps track of whether or not a player located at $(x,y)$ makes a shot or misses it. Let us denote a player making a shot by $1$, and a player missing a shot by $0$. Then, one essentially will have data in the form $(x,y, \text{1/0})$.

One can then use a logistic regression to determine the probability that a player at position $(x,y)$ will make a shot:

$p(x,y) = \frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}$

The main idea is that the parameters $\beta_0, \beta_1, \beta_2$ uniquely characterize a given player’s probability of making a shot.

As a coaching staff from an offensive perspective, let us say we wish to position players as to say they have a very high probability of making a shot, let us say, for demonstration purposes 99%. This means we must solve the optimization problem:

$\frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)} = 0.99$

$\text{s.t. } 0 \leq x \leq 28, \quad 0 \leq y \leq 47$

(The constraints are determined here by the x-y dimensions of a standard NBA court).

This has the following solutions:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad \frac{-1. \beta _0-28. \beta _1+4.59512}{\beta _2} \leq y$

with the following conditions:

One can also have:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad y \leq 47$

with the following conditions:

Another solution is:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}$

with the following conditions:

The fourth possible solution is:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}$

with the following conditions:

In practice, it should be noted, that it is typically unlikely to have a player that has a 99% probability of making a shot.

To put this example in more practical terms, I generated some random data (1000 points) for a player in terms of $(x,y)$ coordinates and whether he made a shot from that distance or not. The following scatter plot shows the result of this simulation:

In this plot, the red dots indicate a player has made a shot (a response of 1.0) from the $(x,y)$ coordinates given, while a purple dot indicates a player has missed a shot from the $(x,y)$ coordinates given (a response of 0.0).

Performing a logistic regression on this data, we obtain that $\beta_0 = 0, \beta_1 = 0.00066876, \beta_2 = -0.00210949$.

Using the equations above, we see that this player has a maximum probability of $58.7149 \%$ of making a shot from a location of $(x,y) = (0,23)$, and a minimum probability of $38.45 \%$ of making a shot from a location of $(x,y) = (28,0)$.

## The Mathematics of The Triangle Offense, Continued…

In a previous post, I showed how given random positions of 5 players on the court that they could “fill” the triangle. The main geometric constraint is that 5 players can form 3 triangles on the court, and that due to spacing requirements, these triangles are “optimal” if they are equilateral triangles.

Given that we now know how to fill the triangle, the question that this post tries to address is that how can players actually move within the triangle. The key is symmetry. Players must all move in a way such that the equilateral triangles remain invariant. Equilateral triangles have associated with them the $D_{3}$ dihedral symmetry group. They are therefore invariant with respect to 120 degree rotations, 240 degree rotations, 0 degree rotations, and three reflections.

There are therefore six generators of this group:
$\left( \begin{array}{cc} 1 & 0 \\ 0 & 1 \\ \end{array} \right), \left( \begin{array}{cc} -\frac{1}{2} & -\frac{\sqrt{3}}{2} \\ \frac{\sqrt{3}}{2} & -\frac{1}{2} \\ \end{array} \right),\left( \begin{array}{cc} -\frac{1}{2} & \frac{\sqrt{3}}{2} \\ -\frac{\sqrt{3}}{2} & -\frac{1}{2} \\ \end{array} \right), \left( \begin{array}{cc} \frac{1}{2} & \frac{\sqrt{3}}{2} \\ \frac{\sqrt{3}}{2} & -\frac{1}{2} \\ \end{array} \right),\left( \begin{array}{cc} -1 & 0 \\ 0 & 1 \\ \end{array} \right),\left( \begin{array}{cc} \frac{1}{2} & -\frac{\sqrt{3}}{2} \\ -\frac{\sqrt{3}}{2} & -\frac{1}{2} \\ \end{array} \right).$

In fact, the Cayley graph for this group is as follows:

For now, I will discuss how players can move within the action of 120 degree rotations. As in the previous posting, let the $(x,y)$-coordinates of player $i$ be represented by $(x^{i}, y^{i})$, where $i = 1,2,3,4,5$. Then, under a 120 degree rotation, the player’s coordinates get shifted according to:

$\boxed{x^{i}_{t+1} = \frac{1}{2} \left(-x^{i}_{t} - \sqrt{3}y^{i}_{t}\right), \quad y^{i}_{t+1} = \frac{1}{2}\left(\sqrt{3}x^{i}_{t} - y^{i}_{t}\right)}$

This is a discrete dynamical system. In fact, it can be solved explicitly. Let $x^i_{0}, y^{i}_{0}$ represent the initial coordinates of player $i$. Then, one solves the above discrete system to obtain:

$\boxed{x^i_t =\frac{1}{2} e^{\frac{1}{3} (-2) i \pi t} \left[\left(1+e^{\frac{4 i \pi t}{3}}\right) x^i_0+i \left(-1+e^{\frac{4 i \pi t}{3}}\right) y^i_0\right], \quad y^{i}_{t} =\frac{1}{2} e^{\frac{1}{3} (-2) i \pi t} \left[\left(1+e^{\frac{4 i \pi t}{3}}\right) y^i_0-i \left(-1+e^{\frac{4 i \pi t}{3}}\right) x^i_0\right]}$

Now, we can simulate this to see actually how players move within the triangle offense, forming equilateral triangles in every sequence:

This is running in continuous time, that is, endlessly. In future postings, I will update this to include the other symmetries of the dihedral $D_{3}$ group. However, the challenge is that this symmetry group is non-Abelian, so it will be interesting to implement pairs of consecutive symmetry operations in a simulation that would still result in invariant equilateral triangles.

Hopefully, this post also shows why teams cannot really run “parts” of the triangle, as one player’s movement necessarily effects everyone else’s. This is something that Charley Rosen also mentioned in an article of his own.

## The Possible Initial States of The Universe

Most people when talking about cosmology typically talk about the universe in one context, that is, as a particular solution to the Einstein field equations. Part of my research in mathematical cosmology is to try to determine whether the present-day universe which we observe to be very close to spatially flat and homogeneous, and very close to isotropic could have emerged from a more general geometric state.

What is often not discussed adequately is the fact that not only has our universe emerged from special initial conditions, but the fact that these special initial conditions also must include the geometry of the early universe, and the type of matter in the early universe. Below, I have attached a simulation that shows how the early universe can evolve to different possible states depending on the type of physical matter parametrized by an equation of state parameter $\gamma$. In particular, some examples are:

• $\gamma = 0$: Vacuum energy
• $\gamma = 4/3$: Radiation
• $\gamma = 2$: Stiff Fluid

Note: Click the image below to access the simulation!

In these simulations, we present phase plots of solutions to the Einstein field equations for spatially homogeneous and isotropic flat, hyperbolic, and closed universe geometries. The different points are:

1. dS: de Sitter universe – Inflationary epoch
2. M: Milne universe
3. F: spatially flat FLRW universe – our present-day universe
4. E: Einstein static universe

Note how by changing the value of $\gamma$ , the dynamics lead to different possible future states. Dynamical systems people will recognize the problem at hand requires one to determine for which values of $\gamma$ is F a saddle or stable node.