## The Risk of The 3-Point Shot

As more and more teams are increasing the number of threes they attempt based on some misplaced logical fallacy that this somehow leads to an efficient offense, we show below that it is in fact in a team’s opponent’s interest for a team to attempt as many three point shots as possible.

Looking at this season’s data, let us examine two things. The first thing is the number of points a team’s opponent is expected to score for every three-point shot the other team attempts. We discovered that remarkably, the number of points obeys a lognormal distribution:

$\boxed{P(X) = \frac{2.86089 e^{-25.713 (\log (X)-1.3119)^2}}{X}}$

This means that for every three point shot your team attempts, the opposing team is expected to score

$\boxed{\int X P(X) dX = 1.87475\, -1.87475 \text{erf}(6.75099\, -5.0708 \log (X))}$

which comes out to about 3.7495 points. So, for every 3PA by a team, the opponent is expected to score more than 3 points based on the most recent NBA data. Keeping that in mind, we see also by integrating $P(x)$ above that there is a 99.99% probability that the opponent will score more than 2 points for every 3PA by a team, and a 93.693% probability that the opponent will score more than 3 points for every single 3PA by the other team.

This would suggest a significant breakdown of defensive emphasis in the “modern-day” NBA where evidently teams are just interested in playing shot-for-shot basketball, but in a very risky way that is not optimal.

The work so far covered just three-point attempts, but, what are the effects of missing a three-point shot? The number of opponent points per a three-point miss also remarkably obeys a lognormal distribution:

$\boxed{P(X) = \frac{2.81227 e^{-24.8464 (\log (X)-1.7605)^2}}{X}}$

Therefore, for every three-point shot your team misses, the opposing team is expected to score:

$\boxed{\int X P(X) dX = 2.93707\, -2.93707 \text{erf}(8.87571\, -4.98461 \log (X))}$

which comes out to about 5.87345 points. This identifies a remarkable risk to a team missing a three-point shot. This computation shows that one three-point shot miss corresponds to about 6 points for the opposing team! Looking at probabilities by integrating the density function above, one can show that there is a 99.9999% probability that the opposing team would score more than two points for every three-point miss, a 99.998% probability that the opposing team would score more than three points for every three-point miss, a 99.583% probability that the opposing team would score more than four points for every three-point miss, and so on.

What these calculations demonstrate is that gearing a team’s offense to focus on attempting three-point shots is remarkably risky, especially if a team misses a three-point shot. Given that the average number of three-point attempts is increasing over the last number of years, but the average number of makes has relatively stayed the same (See this older article here: https://relativitydigest.com/2016/05/26/the-three-point-shot-myth-continued/), teams are exposing themselves to greater and greater risk of losing games by adopting this style of play.

## How to Beat the Golden State Warriors

The Golden State Warriors have posed quite the conundrum for opposing teams. They are quick, have a spectacular ability to move the ball, and play suffocating defense. Given their play in the playoffs thus far, all of these points have been exemplified even more to the point where it seems that they are unbeatable.

I wanted to take somewhat of a simplified approach and see if opposing teams are missing something. That is, is their some weakness in their play that opposing teams can exploit, a “weakness in Helm’s deep”?

The most obvious place to start from a data science point-of-view seemed to me to look at every single shot the Warriors took as a team this season in each game and compile a grand ensemble shot chart. Using the data from Basketball-reference.com and some data scraping scripts I wrote in R, I obtained the following:

Certainly, on the surface, it seems that there is no discernible pattern between made shots and missed shots. This is where the machine learning comes in!

From here, I now extracted the x and y coordinates of each shot and recorded a response variable of “made” or “missed” in a table, such that the coordinates were now predictor variables and the shot classification (made/missed) was the response variable. Altogether, we had 7104 observations. Splitting this dataset up into a 70% training dataset and a 30% test data set, I tried the following algorithms, recording the % of correctly classified observations:

 Algorithm % of Correctly Predicted Observations Logistic Regression 56.43 Gradient Boosted Decision Trees 62.62 Random Forests 58.54 Neural Networks with Entropy Fitting 62.47 Naive Bayes Classification with Kernel Density Estimation 57.32

One sees that that gradient boosted decision trees had the best performance correctly classifying 62.62% of the test observations. Given how noisy the data is, this is not bad, and much better than expected. I should also mention that these numbers were obtained after tuning these models using cross-validation for optimal parameters.

Using the gradient boosted decision tree model, we made a set of predictions for a vast number of (x,y)-coordinates for basketball court. We obtained the following contour plot:

Overlaying this on top of the basketball court diagram, we got:

The contour plot levels denote the probabilities that the GSW will make a shot from a given (x,y) location on the court. As a sanity check, the lowest probabilities seem to be close to the 1/2-court line and beyond the three-point line. The highest probabilities are surprisingly along very specific areas on the court: very close the basket, the line from the basket to the left corner, extending up slightly, and a very narrow line extending from the basket to the right corner. Interestingly, the probabilities are low on the right side of the basket, specifically:

A map showing the probabilities more explicitly is as follows (although, upon uploading it, I realized it is a bit harder to read, I will re-upload a clearer version soon!)

In conclusion, it seems that, at least according to a first look at the data, the Warriors do indeed have several “weak spots” in their offense that opponents should certainly look to exploit by designing defensive schemes that force them to take shots in the aforementioned low-probability zones. As for future improvements, I think it would be interesting to add as predictor variables things like geographic location, crowd sizes, team opponent strengths, etc… I will look into making these improvements in the near future.

## So, What’s Wrong with the Knicks?

As I write this post, the Knicks are currently 12th in the Eastern conference with a record of 22-32. A plethora of people are offering the opinions on what is wrong with the Knicks, and of course, most of it being from ESPN and the New York media, most of it is incorrect/useless, here are some examples:

A while ago, I wrote this paper based on statistical learning that shows the common characteristics for NBA playoff teams. Basically, I obtained the following important result:

This classification tree shows along with arguments in the paper, that while the most important factor in teams making the playoffs tends to be the opponent number of assists per game, there are paths to the playoffs where teams are not necessarily strong in this area. Specifically, for the Knicks, as of today, we see that:

opp. Assists / game : 22.4 > 20. 75, STL / game: 7. 2 < 8.0061, TOV / game : 14.1 < 14.1585, DRB / game: 33.8 > 29.9024, opp. TOV / game: 13.0 < 13.1585.

So, one sees that what is keeping the Knicks out of the playoffs is specifically pressure defense, in that, they are not forcing enough turnovers per game. Ironically, they are very close to the threshold, but, it is not enough.

A probability density approximation of the Knicks’ Opp. TOV/G is as follows:

This PDF has the approximate functional form:

P(oTOV) =

Therefore, by computing:

$\int_{A}^{\infty} P(oTOV) d(oTOV)$,

=

,

where Erfc is the complementary error function, and is given by:

$erfc(z) = \frac{2}{\sqrt{\pi}} \int_{z}^{\infty} e^{-t^2} dt$

Given that the threshold for playoff-bound teams is more than 13.1585 opp. TOV/game, setting A = 13 above, we obtain: 0.435. This means that the Knicks have roughly a 43.5% chance of forcing more than 13 TOV in any single game. Similarly, setting A = 14, one obtains: 0.3177. This means that the Knicks have roughly a 31.77% chance of forcing more than 14 TOV in any single game, and so forth.

Therefore, one concludes that while the Knicks problems are defensive-oriented, it is specifically related to pressure defense and forcing turnovers.

By: Dr. Ikjyot Singh Kohli, About the Author

## The Relationship Between The Electoral College and Popular Vote

An interesting machine learning problem: Can one figure out the relationship between the popular vote margin, voter turnout, and the percentage of electoral college votes a candidate wins? Going back to the election of John Quincy Adams, the raw data looks like this:

 Electoral College Party Popular vote  Margin (%) Turnout Percentage of EC John Quincy Adams D.-R. -0.1044 0.27 0.3218 Andrew Jackson Dem. 0.1225 0.58 0.68 Andrew Jackson Dem. 0.1781 0.55 0.7657 Martin Van Buren Dem. 0.14 0.58 0.5782 William Henry Harrison Whig 0.0605 0.80 0.7959 James Polk Dem. 0.0145 0.79 0.6182 Zachary Taylor Whig 0.0479 0.73 0.5621 Franklin Pierce Dem. 0.0695 0.70 0.8581 James Buchanan Dem. 0.12 0.79 0.5878 Abraham Lincoln Rep. 0.1013 0.81 0.5941 Abraham Lincoln Rep. 0.1008 0.74 0.9099 Ulysses Grant Rep. 0.0532 0.78 0.7279 Ulysses Grant Rep. 0.12 0.71 0.8195 Rutherford Hayes Rep. -0.03 0.82 0.5014 James Garfield Rep. 0.0009 0.79 0.5799 Grover Cleveland Dem. 0.0057 0.78 0.5461 Benjamin Harrison Rep. -0.0083 0.79 0.58 Grover Cleveland Dem. 0.0301 0.75 0.6239 William McKinley Rep. 0.0431 0.79 0.6063 William McKinley Rep. 0.0612 0.73 0.6532 Theodore Roosevelt Rep. 0.1883 0.65 0.7059 William Taft Rep. 0.0853 0.65 0.6646 Woodrow Wilson Dem. 0.1444 0.59 0.8192 Woodrow Wilson Dem. 0.0312 0.62 0.5217 Warren Harding Rep. 0.2617 0.49 0.7608 Calvin Coolidge Rep. 0.2522 0.49 0.7194 Herbert Hoover Rep. 0.1741 0.57 0.8362 Franklin Roosevelt Dem. 0.1776 0.57 0.8889 Franklin Roosevelt Dem. 0.2426 0.61 0.9849 Franklin Roosevelt Dem. 0.0996 0.63 0.8456 Franklin Roosevelt Dem. 0.08 0.56 0.8136 Harry Truman Dem. 0.0448 0.53 0.5706 Dwight Eisenhower Rep. 0.1085 0.63 0.8324 Dwight Eisenhower Rep. 0.15 0.61 0.8606 John Kennedy Dem. 0.0017 0.6277 0.5642 Lyndon Johnson Dem. 0.2258 0.6192 0.9033 Richard Nixon Rep. 0.01 0.6084 0.5595 Richard Nixon Rep. 0.2315 0.5521 0.9665 Jimmy Carter Dem. 0.0206 0.5355 0.55 Ronald Reagan Rep. 0.0974 0.5256 0.9089 Ronald Reagan Rep. 0.1821 0.5311 0.9758 George H. W. Bush Rep. 0.0772 0.5015 0.7918 Bill Clinton Dem. 0.0556 0.5523 0.6877 Bill Clinton Dem. 0.0851 0.4908 0.7045 George W. Bush Rep. -0.0051 0.51 0.5037 George W. Bush Rep. 0.0246 0.5527 0.5316 Barack Obama Dem. 0.0727 0.5823 0.6784 Barack Obama Dem. 0.0386 0.5487 0.6171

Clearly, the percentage of electoral college votes a candidate depends nonlinearly on the voter turnout percentage and popular vote margin (%) as this non-parametric regression shows:

We therefore chose to perform a nonlinear regression using neural networks, for which our structure was:

As is turns out, this simple neural network structure with one hidden layer gave the lowest test error, which was 0.002496419 in this case.

Now, looking at the most recent national polls for the upcoming election, we see that Hillary Clinton has a 6.1% lead in the popular vote. Our neural network model then predicts the following:

 Simulation Popular Vote Margin Percentage of Voter Turnout Predicted Percentage of Electoral College Votes (+/- 0.04996417) 1 0.061 0.30 0.6607371 2 0.061 0.35 0.6647464 3 0.061 0.40 0.6687115 4 0.061 0.45 0.6726314 5 0.061 0.50 0.6765048 6 0.061 0.55 0.6803307 7 0.061 0.60 0.6841083 8 0.061 0.65 0.6878366 9 0.061 0.70 0.6915149 10 0.061 0.75 0.6951424

One sees that even for an extremely low voter turnout (30%), at this point Hillary Clinton can expect to win the Electoral College by a margin of 61.078% to 71.07013%, or 328 to 382 electoral college votes. Therefore, what seems like a relatively small lead in the popular vote (6.1%) translates according to this neural network model into a large margin of victory in the electoral college.

One can see that the predicted percentage of electoral college votes really depends on popular vote margin and voter turnout. For example, if we reduce the popular vote margin to 1%, the results are less promising for the leading candidate:

 Pop.Vote Margin Voter Turnout % E.C. % Win E.C% Win Best Case E.C.% Win Worst Case 0.01 0.30 0.5182854 0.4675000 0.5690708 0.01 0.35 0.5244157 0.4736303 0.5752011 0.01 0.40 0.5305820 0.4797967 0.5813674 0.01 0.45 0.5367790 0.4859937 0.5875644 0.01 0.50 0.5430013 0.4922160 0.5937867 0.01 0.55 0.5492434 0.4984580 0.6000287 0.01 0.60 0.5554995 0.5047141 0.6062849 0.01 0.65 0.5617642 0.5109788 0.6125496 0.01 0.70 0.5680317 0.5172463 0.6188171 0.01 0.75 0.5742963 0.5235109 0.6250817

One sees that if the popular vote margin is just 1% for the leading candidate, that candidate is not in the clear unless the popular vote exceeds 60%.

## Optimal Positions for NBA Players

I was thinking about how one can use the NBA’s new SportVU system to figure out optimal positions for players on the court. One of the interesting things about the SportVU system is that it tracks player $(x,y)$ coordinates on the court. Presumably, it also keeps track of whether or not a player located at $(x,y)$ makes a shot or misses it. Let us denote a player making a shot by $1$, and a player missing a shot by $0$. Then, one essentially will have data in the form $(x,y, \text{1/0})$.

One can then use a logistic regression to determine the probability that a player at position $(x,y)$ will make a shot:

$p(x,y) = \frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}$

The main idea is that the parameters $\beta_0, \beta_1, \beta_2$ uniquely characterize a given player’s probability of making a shot.

As a coaching staff from an offensive perspective, let us say we wish to position players as to say they have a very high probability of making a shot, let us say, for demonstration purposes 99%. This means we must solve the optimization problem:

$\frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)} = 0.99$

$\text{s.t. } 0 \leq x \leq 28, \quad 0 \leq y \leq 47$

(The constraints are determined here by the x-y dimensions of a standard NBA court).

This has the following solutions:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad \frac{-1. \beta _0-28. \beta _1+4.59512}{\beta _2} \leq y$

with the following conditions:

One can also have:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad y \leq 47$

with the following conditions:

Another solution is:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}$

with the following conditions:

The fourth possible solution is:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}$

with the following conditions:

In practice, it should be noted, that it is typically unlikely to have a player that has a 99% probability of making a shot.

To put this example in more practical terms, I generated some random data (1000 points) for a player in terms of $(x,y)$ coordinates and whether he made a shot from that distance or not. The following scatter plot shows the result of this simulation:

In this plot, the red dots indicate a player has made a shot (a response of 1.0) from the $(x,y)$ coordinates given, while a purple dot indicates a player has missed a shot from the $(x,y)$ coordinates given (a response of 0.0).

Performing a logistic regression on this data, we obtain that $\beta_0 = 0, \beta_1 = 0.00066876, \beta_2 = -0.00210949$.

Using the equations above, we see that this player has a maximum probability of $58.7149 \%$ of making a shot from a location of $(x,y) = (0,23)$, and a minimum probability of $38.45 \%$ of making a shot from a location of $(x,y) = (28,0)$.

## 2016 Real-Time Election Predictions

Further to my original post on using physics to predict the outcome of the 2016 US Presidential elections, I have now written a cloud-based app using the powerful Wolfram Cloud to pull the most recent polling data on the web from The HuffPost Pollster, which “tracks thousands of public polls to give you the latest data on elections, political opinions and more”.  This app works in real-time and applies my PDE-solver / machine learning based algorithm to predict the probability of a candidate winning a state assuming the election is held tomorrow.

The app can be accessed by clicking the image below: (Note: If you obtain some type of server error, it means Wolfram’s server is busy, a refresh usually works. Also, results are only computed for states for which there exists reliable polling data. )

## Some Thoughts on The US GDP

Here are some thoughts on the US GDP based on some data I’ve been looking at recently, mostly motivated by some Donald Trump supporters that have been criticizing President Obama’s record on the GDP and the economy.

First, analyzing the real GDP’s average growth per year, we obtain that (based on a least squares regression analysis)

According to these calculations, President Clinton’s economic policies led to the best average GDP growth rate at $436 Billion / year. President Reagan and President Obama have almost identical average GDP growth rates in the neighbourhood of$320 Billion / year. However, an obvious caveat is that President Obama’s GDP record is still missing two years of data, so I will need to revisit these calculations in two years! Also, it should be noted that, historically, the US GDP has grown at an average of about \$184 Billion / year.

The second point I wanted to address is several Trump supporters who keep comparing the average real GDP annual percentage change between President Reagan and President Obama. Although they are citing the averages, they are not mentioning the standard deviations! Computing these we find that:

Looking at these calculations, we find that Presidents Clinton and Obama had the most stable growth in year-to-year real GDP %. Presidents Bush and Reagan had highly unstable GDP growth, with President Bush’s being far worse than President Reagan’s. Further, Trump supporters and most Republicans seem quick to point out the mean of 3.637% figure associated with President Reagan, but the point is this is +/- 2.55%, which indicates high volatility in the GDP under President Reagan, which has not been the case under President Obama.

Another observation I would like to point out is that very few people have been mentioning the fact that the annual real US GDP % is in fact correlated to that of other countries. Based on data from the World Bank, one can compute the following correlations:

One sees that the correlation between the annual growth % of the US real GDP and Canada is 0.826, while for Estonia and The UK is roughly close to 0.7. Therefore, evidently, any President that claims that his policies will increase the GDP, is not being truthful, since, it is quite likely that these numbers also depend on those for other countries, which, I am not entirely  convinced a US President has complete control over!

My final observation is with respect to the quarterly GDP numbers. There are some articles that I have seen in recent days in addition to several television segments in which Trump supporters are continuously citing how better Reagan’s quarterly GDP numbers were compared to Obama’s. We now show that in actuality this is not the case.

The problem is that most of the “analysts” are just looking at the raw data, which on its face value actually doesn’t tell you much, since, as expected, fluctuates. Below, we analyze the quarterly GDP% data during the tenure of both Presidents Reagan and Obama, from 1982-1988 and 2010-2016 respectively, comparing data from the same length of time.

For Reagan, we obtain:

For Obama, we obtain:

The only way to reasonably compare these two data sets is to analyze the rate at which the GDP % has increased in time. Since the data is nonlinear in time, this means we must calculate the derivatives at instants of time / each quarter. We first performed cubic spline interpolation to fit curves to these data sets, which gave extremely good results:

We then numerically computed the derivative of these curves at each quarter and obtained:

The dashed curves in the above plot are plots of the derivatives of each curve at each quarter. In terms of numbers, these were found to be:

Summarizing the table above in graphical format, we obtain:

As can be calculated easily, Obama has higher GDP quarterly growth numbers for 15/26 (57.69%) quarters. Therefore, even looking at the quarterly real GDP numbers, overall, President Obama outperforms President Reagan.

Thanks to Hargun Singh Kohli, B.A. Honours, LL.B. for the data collection and processing part of this analysis.