## The Relationship Between The Electoral College and Popular Vote

An interesting machine learning problem: Can one figure out the relationship between the popular vote margin, voter turnout, and the percentage of electoral college votes a candidate wins? Going back to the election of John Quincy Adams, the raw data looks like this:

 Electoral College Party Popular vote  Margin (%) Turnout Percentage of EC John Quincy Adams D.-R. -0.1044 0.27 0.3218 Andrew Jackson Dem. 0.1225 0.58 0.68 Andrew Jackson Dem. 0.1781 0.55 0.7657 Martin Van Buren Dem. 0.14 0.58 0.5782 William Henry Harrison Whig 0.0605 0.80 0.7959 James Polk Dem. 0.0145 0.79 0.6182 Zachary Taylor Whig 0.0479 0.73 0.5621 Franklin Pierce Dem. 0.0695 0.70 0.8581 James Buchanan Dem. 0.12 0.79 0.5878 Abraham Lincoln Rep. 0.1013 0.81 0.5941 Abraham Lincoln Rep. 0.1008 0.74 0.9099 Ulysses Grant Rep. 0.0532 0.78 0.7279 Ulysses Grant Rep. 0.12 0.71 0.8195 Rutherford Hayes Rep. -0.03 0.82 0.5014 James Garfield Rep. 0.0009 0.79 0.5799 Grover Cleveland Dem. 0.0057 0.78 0.5461 Benjamin Harrison Rep. -0.0083 0.79 0.58 Grover Cleveland Dem. 0.0301 0.75 0.6239 William McKinley Rep. 0.0431 0.79 0.6063 William McKinley Rep. 0.0612 0.73 0.6532 Theodore Roosevelt Rep. 0.1883 0.65 0.7059 William Taft Rep. 0.0853 0.65 0.6646 Woodrow Wilson Dem. 0.1444 0.59 0.8192 Woodrow Wilson Dem. 0.0312 0.62 0.5217 Warren Harding Rep. 0.2617 0.49 0.7608 Calvin Coolidge Rep. 0.2522 0.49 0.7194 Herbert Hoover Rep. 0.1741 0.57 0.8362 Franklin Roosevelt Dem. 0.1776 0.57 0.8889 Franklin Roosevelt Dem. 0.2426 0.61 0.9849 Franklin Roosevelt Dem. 0.0996 0.63 0.8456 Franklin Roosevelt Dem. 0.08 0.56 0.8136 Harry Truman Dem. 0.0448 0.53 0.5706 Dwight Eisenhower Rep. 0.1085 0.63 0.8324 Dwight Eisenhower Rep. 0.15 0.61 0.8606 John Kennedy Dem. 0.0017 0.6277 0.5642 Lyndon Johnson Dem. 0.2258 0.6192 0.9033 Richard Nixon Rep. 0.01 0.6084 0.5595 Richard Nixon Rep. 0.2315 0.5521 0.9665 Jimmy Carter Dem. 0.0206 0.5355 0.55 Ronald Reagan Rep. 0.0974 0.5256 0.9089 Ronald Reagan Rep. 0.1821 0.5311 0.9758 George H. W. Bush Rep. 0.0772 0.5015 0.7918 Bill Clinton Dem. 0.0556 0.5523 0.6877 Bill Clinton Dem. 0.0851 0.4908 0.7045 George W. Bush Rep. -0.0051 0.51 0.5037 George W. Bush Rep. 0.0246 0.5527 0.5316 Barack Obama Dem. 0.0727 0.5823 0.6784 Barack Obama Dem. 0.0386 0.5487 0.6171

Clearly, the percentage of electoral college votes a candidate depends nonlinearly on the voter turnout percentage and popular vote margin (%) as this non-parametric regression shows:

We therefore chose to perform a nonlinear regression using neural networks, for which our structure was:

As is turns out, this simple neural network structure with one hidden layer gave the lowest test error, which was 0.002496419 in this case.

Now, looking at the most recent national polls for the upcoming election, we see that Hillary Clinton has a 6.1% lead in the popular vote. Our neural network model then predicts the following:

 Simulation Popular Vote Margin Percentage of Voter Turnout Predicted Percentage of Electoral College Votes (+/- 0.04996417) 1 0.061 0.30 0.6607371 2 0.061 0.35 0.6647464 3 0.061 0.40 0.6687115 4 0.061 0.45 0.6726314 5 0.061 0.50 0.6765048 6 0.061 0.55 0.6803307 7 0.061 0.60 0.6841083 8 0.061 0.65 0.6878366 9 0.061 0.70 0.6915149 10 0.061 0.75 0.6951424

One sees that even for an extremely low voter turnout (30%), at this point Hillary Clinton can expect to win the Electoral College by a margin of 61.078% to 71.07013%, or 328 to 382 electoral college votes. Therefore, what seems like a relatively small lead in the popular vote (6.1%) translates according to this neural network model into a large margin of victory in the electoral college.

One can see that the predicted percentage of electoral college votes really depends on popular vote margin and voter turnout. For example, if we reduce the popular vote margin to 1%, the results are less promising for the leading candidate:

 Pop.Vote Margin Voter Turnout % E.C. % Win E.C% Win Best Case E.C.% Win Worst Case 0.01 0.30 0.5182854 0.4675000 0.5690708 0.01 0.35 0.5244157 0.4736303 0.5752011 0.01 0.40 0.5305820 0.4797967 0.5813674 0.01 0.45 0.5367790 0.4859937 0.5875644 0.01 0.50 0.5430013 0.4922160 0.5937867 0.01 0.55 0.5492434 0.4984580 0.6000287 0.01 0.60 0.5554995 0.5047141 0.6062849 0.01 0.65 0.5617642 0.5109788 0.6125496 0.01 0.70 0.5680317 0.5172463 0.6188171 0.01 0.75 0.5742963 0.5235109 0.6250817

One sees that if the popular vote margin is just 1% for the leading candidate, that candidate is not in the clear unless the popular vote exceeds 60%.

## Breaking Down the 2015-2016 NBA Season

In this article, I will use Data Science / Machine Learning methodologies to break down the real factors separating the playoff from non-playoff teams. In particular, I used the data from Basketball-Reference.com to associate 44 predictor variables which each team: “FG” “FGA” “FG.” “X3P” “X3PA” “X3P.” “X2P” “X2PA” “X2P.” “FT” “FTA” “FT.” “ORB” “DRB” “TRB” “AST”   “STL” “BLK” “TOV” “PF” “PTS” “PS.G” “oFG” “oFGA” “oFG.” “o3P” “o3PA” “o3P.” “o2P” “o2PA” “o2P.” “oFT”   “oFTA” “oFT.” “oORB” “oDRB” “oTRB” “oAST” “oSTL” “oBLK” “oTOV” “oPF” “oPTS” “oPS.G”

, where a letter ‘o’ before the last 22 predictor variables indicates a defensive variable. (‘o’ stands for opponent. )

Using principal components analysis (PCA), I was able to project this 44-dimensional data set to a 5-D dimensional data set. That is, the first 5 principal components were found to explain 85% of the variance.

Here are the various biplots:

In these plots, the teams are grouped according to whether they made the playoffs or not.

One sees from this biplot of the first two principal components that the dominant component along the first PC is 3 point attempts, while the dominant component along the second PC is opponent points. CLE and TOR have a high negative score along the second PC indicating a strong defensive performance. Indeed, one suspects that the final separating factor that led CLE to the championship was their defensive play as opposed to 3-point shooting which all-in-all didn’t do GSW any favours. This is in line with some of my previous analyses

## Optimal Positions for NBA Players

I was thinking about how one can use the NBA’s new SportVU system to figure out optimal positions for players on the court. One of the interesting things about the SportVU system is that it tracks player $(x,y)$ coordinates on the court. Presumably, it also keeps track of whether or not a player located at $(x,y)$ makes a shot or misses it. Let us denote a player making a shot by $1$, and a player missing a shot by $0$. Then, one essentially will have data in the form $(x,y, \text{1/0})$.

One can then use a logistic regression to determine the probability that a player at position $(x,y)$ will make a shot:

$p(x,y) = \frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}$

The main idea is that the parameters $\beta_0, \beta_1, \beta_2$ uniquely characterize a given player’s probability of making a shot.

As a coaching staff from an offensive perspective, let us say we wish to position players as to say they have a very high probability of making a shot, let us say, for demonstration purposes 99%. This means we must solve the optimization problem:

$\frac{\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)}{1 +\exp\left(\beta_0 + \beta_1 x + \beta_2 y\right)} = 0.99$

$\text{s.t. } 0 \leq x \leq 28, \quad 0 \leq y \leq 47$

(The constraints are determined here by the x-y dimensions of a standard NBA court).

This has the following solutions:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad \frac{-1. \beta _0-28. \beta _1+4.59512}{\beta _2} \leq y$

with the following conditions:

One can also have:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}, \quad y \leq 47$

with the following conditions:

Another solution is:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}$

with the following conditions:

The fourth possible solution is:

$x = \frac{-1. \beta _0-1. \beta _2 y+4.59512}{\beta _1}$

with the following conditions:

In practice, it should be noted, that it is typically unlikely to have a player that has a 99% probability of making a shot.

To put this example in more practical terms, I generated some random data (1000 points) for a player in terms of $(x,y)$ coordinates and whether he made a shot from that distance or not. The following scatter plot shows the result of this simulation:

In this plot, the red dots indicate a player has made a shot (a response of 1.0) from the $(x,y)$ coordinates given, while a purple dot indicates a player has missed a shot from the $(x,y)$ coordinates given (a response of 0.0).

Performing a logistic regression on this data, we obtain that $\beta_0 = 0, \beta_1 = 0.00066876, \beta_2 = -0.00210949$.

Using the equations above, we see that this player has a maximum probability of $58.7149 \%$ of making a shot from a location of $(x,y) = (0,23)$, and a minimum probability of $38.45 \%$ of making a shot from a location of $(x,y) = (28,0)$.

## Optimal Strategies for the Clinton/Trump Debate

Consider modelling the Clinton/Trump debate via a static game in which each candidate can choose between two strategies: $\{A,P\}$, where $A$ denotes predominantly “attacking” the other candidate, while $P$ denotes predominantly discussing policy positions.

Further, let us consider the mixed strategies $\sigma_1 = (p,1-p)$ for Clinton, and $\sigma_2 = (q,1-q)$ for Trump. That is, Clinton predominantly attacks Trump with probability $p$, and Trump predominantly attacks Clinton with probability $q$.

Let us first deal with the general case of arbitrary payoffs, thus, generating the following payoff matrix:

$\left( \begin{array}{cc} \{a,b\} & \{c,d\} \\ \{e,f\} & \{g,h\} \\ \end{array} \right)$

That is, if Clinton attacks Trump and Trump attacks Clinton, the payoff to Clinton is $a$, while the payoff to Trump is $b$. If Clinton attacks Trump, and Trump ignores and discusses policy positions instead, the payoff to Clinton is $c$, while the payoff to trump is $d$. If Clinton discusses policy positions while Trump attacks, the payoff to Clinton is $e$, while the payoff to Trump is $f$, and if both candidates discuss policy positions instead of attacking each other, the payoff to them both will be $g$ and $h$ respectively.

With this information in hand, we can calculate the payoff to Clinton as:

$\pi_c(\sigma_1, \sigma_2) = a p q+c p (1-q)+e (1-p) q+g (1-p) (1-q)$

while the payoff to Trump is:

$\pi_t(\sigma_1,\sigma_2) = b p q+d p (1-q)+f (1-p) q+h (1-p) (1-q)$

With these payoff functions, we can compute each candidate’s best response to the other candidate by solving the following equations:

$\hat{\sigma}_1 \in \text{argmax}_{\sigma_1} \pi_1(\sigma_1,\sigma_2)$

$\hat{\sigma}_{2} \in \text{argmax}_{\sigma_2} \pi_2(\sigma_1,\sigma_2)$

where $\hat{\sigma}_{1,2}$ indicates the best response strategy to a fixed strategy for the other player.

Solving these equations, we obtain the following:

If

then,

Clinton’s best response is to choose $p = 1/2$.

If

then,

Clinton’s best response is to choose  $p = 1$.

Otherwise, her best response is to choose $p = 0$.

While for Trump, the best responses are computed as follows:

If

Trump’s best response is to choose $q = 1/2$.

If

Trump’s best response is to choose $q = 1$.

Otherwise, Trump’s best response is to choose $q = 0$.

To demonstrate this, let us work out an example. Assume (for this example) that the payoffs for each candidate are to sway independent voters / voters that have not made up their minds. Further, let us assume that these voters are more interested in policy positions, and will take attacks negatively. Obviously, this is not necessarily true, and we have solved the general case above. We are just using the following payoff matrix for demonstration purposes:

$\left( \begin{array}{cc} \{-1,-1\} & \{-1,1\} \\ \{1,-1\} & \{1,1\} \\ \end{array} \right)$

Using the above equations, we see that if $0 \leq q \leq 1$, Clinton’s best response is to choose $p=0$. While, if $0 \leq p \leq 1$, Trump’s best response is to choose $q =0$. That is, no matter what Trump’s strategy is, it is always Clinton’s best response to discuss policy positions. No matter what Clinton’s strategy is, it is always Trump’s best response to discuss policy positions as well. The two candidates’ payoff functions take the following form:

What this shows for example is that there is a Nash equilibrium of:

$(\sigma_1^{*}, \sigma_{2}^{*}) = (0,0)$.

The expected payoffs for each candidate are evidently

$\pi_c = \pi_t = 1$.

Let us work out an another example. This time, assume that if Clinton attacks Trump, she receives a payoff of $+1$, while if Trump attacks Clinton, he receives a payoff of $-1$. While, if Clinton discusses policy, while being attacked by Trump, she receives a payoff of $+1$, while Trump receives a payoff of $-1$. On the other hand, if Trump discusses policy while being attacked by Clinton, he receives a payoff $+1$, while Clinton receives a payoff of $-1$. If Clinton discusses policy, while Trump discusses policy, she receives a payoff of $+1$, while Trump receives a payoff of $-1$. The payoff matrix is evidently:

$\left( \begin{array}{cc} \{1,-1\} & \{1,-1\} \\ \{1,-1\} & \{1,-1\} \\ \end{array} \right)$

In this case, if $0 \leq q \leq 1$, then Clinton’s best response is to choose $p = 1/2$. While, if $0 \leq p \leq 1$, then Trump’s best response is to choose $q = 1/2$. The Nash equilibrium is evidently

$(\sigma_1^{*}, \sigma_{2}^{*}) = (1/2,1/2)$.

The expected payoffs for each candidate are evidently

$\pi_c = 1, \pi_t = -1$.

In this example,  even though it is the optimal strategy for each candidate to play a mixed strategy of 50% attack, 50% discuss policy, Clinton is expected to benefit, while Trump is expected to lose.

Let us also consider an example of where the audience is biased towards Trump. So, every time Trump attacks Clinton, he gains an additional point. Every time Trump discusses policy, while Clinton does the same he gains an additional point. While, if Clinton attacks while Trump discusses policy positions, she will lose a point, and he gains a point. Such a payoff matrix can be given by:

$\left( \begin{array}{cc} \{1,2\} & \{-1,1\} \\ \{0,1\} & \{0,1\} \\ \end{array} \right)$

Solving the equations above, we find that if $q = 1/2$, Clinton’s best response is to choose $p =1/2$. If $1/2 < q \leq 1$, Clinton’s best response is to choose $p = 1$. Otherwise, her best response is to choose $p = 0$. On the other hand, if $p = 0$, Trump’s best response is to choose $q = 1/2$. While, if $0 < p \leq 1$, Trump’s best response is to choose $q = 1$. Evidently, there is a single Nash equilibrium (as long as $1/2 < p \leq 1$):

$(\sigma_1^{*}, \sigma_{2}^{*}) = (1,1)$.

Therefore, in this situation, it is each candidate’s best strategy to attack one another. It is interesting that even in an audience that is heavily biased towards Trump, Clinton’s best strategy is still to attack 100% of the time.

The interested reader is invited to experiment with different scenarios using the general results derived above.

Using data science / machine learning methodologies, it basically showed that the most important factors in characterizing a team’s playoff eligibility are the opponent field goal percentage and the opponent points per game. This seems to suggest that defensive factors as opposed to offensive factors are the most important characteristics shared among NBA playoff teams. It was also shown that championship teams must be able to have very strong defensive characteristics, in particular, strong perimeter defense characteristics in combination with an effective half-court offense that generates high-percentage two-point shots. A key part of this offensive strategy must also be the ability to draw fouls.

Some people have commented that despite this, teams who frequently attempt three point shots still can be considered to have an efficient offense as doing so leads to better rebounding, floor spacing, and higher percentage shots. We show below that this is not true. Looking at the last 16 years of all NBA teams (using the same data we used in the paper), we performed a correlation analysis of an individual NBA team’s 3-point attempts per game and other relevant variables, and discovered:

One sees that there is very little correlation between a team’s 3-point attempts per game and 2-point percentage, free throws, free throw attempts, and offensive rebounds. In fact, at best, there is a somewhat “medium” anti-correlation between 3-point attempts per game and a team’s 2-point attempts per game.

## 2016 Real-Time Election Predictions

Further to my original post on using physics to predict the outcome of the 2016 US Presidential elections, I have now written a cloud-based app using the powerful Wolfram Cloud to pull the most recent polling data on the web from The HuffPost Pollster, which “tracks thousands of public polls to give you the latest data on elections, political opinions and more”.  This app works in real-time and applies my PDE-solver / machine learning based algorithm to predict the probability of a candidate winning a state assuming the election is held tomorrow.

The app can be accessed by clicking the image below: (Note: If you obtain some type of server error, it means Wolfram’s server is busy, a refresh usually works. Also, results are only computed for states for which there exists reliable polling data. )

## Will Donald Trump’s Proposed Immigration Policies Curb Terrorism in The US?

In recent days, Donald Trump proposed yet another iteration of his immigration policy which is focused on “Keeping America Safe” as part of his plan to “Make America Great Again!”. In this latest iteration, in addition to suspending visas from countries with terrorist ties, he is also proposing introducing an ideological test for those entering the US. As you can see in the BBC article, he is also fond of holding up bar graphs of showing the number of refugees entering the US over a period of time, and somehow relates that to terrorist activities in the US, or at least, insinuates it.

Let’s look at the facts behind these proposals using the available data from 2005-2014. Specifically, we analyzed:

1. The number of terrorist incidents per year from 2005-2014 from here (The Global Terrorism Database maintained by The University of Maryland)
2. The Department of Homeland Security Yearbook of Immigration Statistics, available here . Specifically, we looked at Persons Obtaining Lawful Permanent Resident Status by Region and Country of Birth (2005-2014) and Refugee Arrivals by Region and Country of Nationality (2005-2014).

Given these datasets, we focused on countries/regions labeled as terrorist safe havens and state sponsors of terror based on the criteria outlined here .

We found the following.

First, looking at naturalized citizens, these computations yielded:

 Country Correlations Percent of Variance Explained Afghanistan 0.61169 0.37416 Egypt 0.26597 0.07074 Indonesia -0.66011 0.43574 Iran -0.31944 0.10204 Iraq 0.26692 0.07125 Lebanon -0.35645 0.12706 Libya 0.59748 0.35698 Malaysia 0.39481 0.15587 Mali 0.20195 0.04079 Pakistan 0.00513 0.00003 Phillipines -0.79093 0.62557 Somalia -0.40675 0.16544 Syria 0.62556 0.39132 Yemen -0.11707 0.01371

In graphical form:

The highest correlations are 0.62556 and 0.61669 from Syria and Afghanistan respectively. The highest anti-correlations were from Indonesia and The Phillipines at -0.66011 and -0.79093 respectively. Certainly, none of the correlations exceed 0.65, which indicates that there could be some relationship between the number of naturalized citizens from these particular countries and the number of terrorist incidents, but, it is nowhere near conclusive. Further, looking at Syria, we see that the percentage of variance explained / coefficient of determination is 0.39132, which means that only about 39% of the variation in the number of terrorist incidents can be predicted from the relationship between where a naturalized citizen is born and the number of terrorist incidents in The United States.

Second, looking at refugees, these computations yielded:

 Country Correlations Percent of Variance Explained Afghanistan 0.59836 0.35803 Egypt 0.66657 0.44432 Iran -0.29401 0.08644 Iraq 0.49295 0.24300 Pakistan 0.60343 0.36413 Somalia 0.14914 0.02224 Syria 0.56384 0.31792 Yemen -0.35438 0.12558 Other 0.54109 0.29278

In graphical form:

We see that the highest correlations are from Egypt (0.6657), Pakistan (0.60343), and Afghanistan (0.59836). This indicates there is some mild correlation between refugees from these countries and the number of terrorist incidents in The United States, but it is nowhere near conclusive. Further, the coefficients of determination from Egypt and Syria are 0.44432 and 0.31792 respectively. This means that in the case of Syrian refugees for example, only 31.792% of the variation in terrorist incidents in the United States can be predicted from the relationship between a refugee’s country of origin and the number of terrorist incidents in The United States.

In conclusion, it is therefore unlikely that Donald Trump’s proposals would do anything to significantly curb the number of terrorist incidents in The United States. Further, repeatedly showing pictures like this:

at his rallies is doing nothing to address the issue at hand and is perhaps only serving as yet another fear tactic as has become all too common in his campaign thus far.

(Thanks to Hargun Singh Kohli, Honours B.A., LL.B. for the initial data mining and processing of the various datasets listed above.)

Note, further to the results of this article, I was recently made aware of this excellent article from The WSJ, which I have summarized below: