I have now made a significant update to my applied machine learning paper on predicting patterns among NBA playoff and championship teams, which can be accessed here: arXiv Link .
Category: Data Analytics
The Trump Rally, Really?
Today, The Dow Jones Industrial Average (DJIA) surpassed the 20,000 mark for the first time in history. At the time of the writing of this posting (12:31 PM on January 25), it is actually 20,058.29, so, I am not sure if it will close above 20,000 points, but, nevertheless, a lot of people are crediting this to Trump’s presidency, but I’m not so sure you can do that. First, the point must be made, that it is really the Obama economic policies that set the stage for this. On January 20, 2009, when Obama was sworn in, the Dow closed at 7949.089844 points. On November 8, 2016, when Trump won the election, the Dow closed at 18332.74023. So, during the Obama administration, the Dow increased by approximately 130.63%. I just wanted to make that point.
Now, the question that I wanted to investigate was would the Dow have closed past 20,000 points had Trump not been elected president. That is, assuming that the Obama administration policies and subsequent effects on the Dow were allowed to continue, would the Dow have surpassed 20,000 points.
For this, I looked at the DJIA data from January 20, 2009 (Obama’s first inauguration) to November 08, 2016 (Trump’s election). I specifically calculated the daily returns and discovered that they are approximately normally distributed using a kernel density method:
Importantly, one can calculate that the mean daily returns, , while the volatility in daily returns, . Indeed, the volatility in daily returns for the DJIA was found to be relatively high during this period. Finally, the DJIA closed at 18332.74023 points on election night, November 08, 2016, which was 53 business days ago.
The daily dynamics of the DJIA can be modelled by the following stochastic differential equation:
,
where denotes a Wiener/Brownian motion process. Simulating this on computer, I ran 2,000,000 Monte Carlo simulations to simulate the DJIA closing price 53 business days from November 08, 2016, that is, January 25, 2017. The results of some of these simulations are shown below:
We concluded the following from our simulation. At the end of January 25, 2017, the DJIA was predicted to close at:
That is, the DJIA would be expected to close anywhere between 17398.0923062336 and 20158.94121. This range, albeit wide, is due to the high volatility of the daily returns in the DJIA, but, as you can see, it is perfectly feasible that the DJIA would have surpassed 20,000 points if Trump would not have been elected president.
Further, perhaps what is of more importance is the probability that the DJIA would surpass 20,000 points at any time during this 54day period. We found the following:
One sees that there is an almost 20% (more precisely, 18.53%) probability that the DJIA would close above 20,000 points on January 25, 2017 had Trump not been elected president. Since, by all accounts, the DJIA exceeding 20,000 points is considered to be an extremely rare/historic event, the fact that the probability is found to be almost 20% is actually quite significant, and shows, that it is quite likely that a Trump administration actually has little to do with the DJIA exceeding 20,000 points.
Although, this simulation was just for 53 working days from Nov 08, 2016, one can see that the probability of the DJIA exceeding 20,000 at closing day is monotonically increasing with every passing day. It is therefore quite feasible to conclude that Trump being president actually has little to do with the DJIA exceeding 20,000 points, rather, one can really attribute it to the daytoday volatility of the DJIA!
The Most Optimal Strategy for the Knicks
In a previous article, I showed how one could use data in combination with advanced probability techniques to determine the optimal shot / court positions for LeBron James. I decided to use this algorithm on the Knicks’ starting 5, and obtained the following joint probability density contour plots:
One sees that the Knicks offensive strategy is optimal if and only if players gets shots as close to the basket as possible. If this is the case, the players have a high probability of making shots even if defenders are playing them tightly. This means that the Knicks would be served best by driving in the paint, posting up, and Porzingis NOT attempting a multitude of three point shots.
By the way, a lot of people are convinced nowadays that someone like Porzingis attempting 3’s is a sign of a good offense, as it is an optimal way to space the floor. I am not convinced of this. Spacing the floor geometrically translates to a multiobjective nonlinear optimization problem. In particular, let represent the (xy)coordinates of a player on the floor. Spreading the floor means one must maximize (simultaneously) each element of the following distance metric:
subject to . While a player attempting 3point shots may be one way to solve this problem, I am not convinced that it is a unique solution to this optimization problem. In fact, I am convinced that there are a multiple of solutions to this optimization problem.
This solution is slightly simpler if one realizes that the metric above is symmetric, so that there are only 11 independent components.
Analyzing Lebron James’ Offensive Play
Where is Lebron James most effective on the court?
Based on 20152016 data, we obtained from NBA.com the following data which tracks Lebron’s FG% based on defender distance:
From BasketballReference.com, we then obtained data of Lebron’s FG% based on his shot distance from the basket:
Based on this data, we generated tens of thousands of sample data points to perform a Monte Carlo simulation to obtain relevant probability density functions. We found that the joint PDF was a very lengthy expression(!):
Graphically, this was:
A contour plot of the joint PDF was computed to be:
From this information, we can compute where/when LeBron has the highest probability of making a shot. Numerically, we found that the maximum probability occurs when Lebron’s defender is 0.829988 feet away, while Lebron is 1.59378 feet away from the basket. What is interesting is that this analysis shows that defending Lebron tightly doesn’t seem to be an effective strategy if his shot distance is within 5 feet of the basket. It is only an effective strategy further than 5 feet away from the basket. Therefore, opposing teams have the best chance at stopping Lebron from scoring by playing him tightly and forcing him as far away from the basket as possible.
The Relationship Between The Electoral College and Popular Vote
An interesting machine learning problem: Can one figure out the relationship between the popular vote margin, voter turnout, and the percentage of electoral college votes a candidate wins? Going back to the election of John Quincy Adams, the raw data looks like this:
Electoral College  Party  Popular vote Margin (%) 
Percentage of EC 

John Quincy Adams  D.R.  0.1044  0.27  0.3218 
Andrew Jackson  Dem.  0.1225  0.58  0.68 
Andrew Jackson  Dem.  0.1781  0.55  0.7657 
Martin Van Buren  Dem.  0.14  0.58  0.5782 
William Henry Harrison  Whig  0.0605  0.80  0.7959 
James Polk  Dem.  0.0145  0.79  0.6182 
Zachary Taylor  Whig  0.0479  0.73  0.5621 
Franklin Pierce  Dem.  0.0695  0.70  0.8581 
James Buchanan  Dem.  0.12  0.79  0.5878 
Abraham Lincoln  Rep.  0.1013  0.81  0.5941 
Abraham Lincoln  Rep.  0.1008  0.74  0.9099 
Ulysses Grant  Rep.  0.0532  0.78  0.7279 
Ulysses Grant  Rep.  0.12  0.71  0.8195 
Rutherford Hayes  Rep.  0.03  0.82  0.5014 
James Garfield  Rep.  0.0009  0.79  0.5799 
Grover Cleveland  Dem.  0.0057  0.78  0.5461 
Benjamin Harrison  Rep.  0.0083  0.79  0.58 
Grover Cleveland  Dem.  0.0301  0.75  0.6239 
William McKinley  Rep.  0.0431  0.79  0.6063 
William McKinley  Rep.  0.0612  0.73  0.6532 
Theodore Roosevelt  Rep.  0.1883  0.65  0.7059 
William Taft  Rep.  0.0853  0.65  0.6646 
Woodrow Wilson  Dem.  0.1444  0.59  0.8192 
Woodrow Wilson  Dem.  0.0312  0.62  0.5217 
Warren Harding  Rep.  0.2617  0.49  0.7608 
Calvin Coolidge  Rep.  0.2522  0.49  0.7194 
Herbert Hoover  Rep.  0.1741  0.57  0.8362 
Franklin Roosevelt  Dem.  0.1776  0.57  0.8889 
Franklin Roosevelt  Dem.  0.2426  0.61  0.9849 
Franklin Roosevelt  Dem.  0.0996  0.63  0.8456 
Franklin Roosevelt  Dem.  0.08  0.56  0.8136 
Harry Truman  Dem.  0.0448  0.53  0.5706 
Dwight Eisenhower  Rep.  0.1085  0.63  0.8324 
Dwight Eisenhower  Rep.  0.15  0.61  0.8606 
John Kennedy  Dem.  0.0017  0.6277  0.5642 
Lyndon Johnson  Dem.  0.2258  0.6192  0.9033 
Richard Nixon  Rep.  0.01  0.6084  0.5595 
Richard Nixon  Rep.  0.2315  0.5521  0.9665 
Jimmy Carter  Dem.  0.0206  0.5355  0.55 
Ronald Reagan  Rep.  0.0974  0.5256  0.9089 
Ronald Reagan  Rep.  0.1821  0.5311  0.9758 
George H. W. Bush  Rep.  0.0772  0.5015  0.7918 
Bill Clinton  Dem.  0.0556  0.5523  0.6877 
Bill Clinton  Dem.  0.0851  0.4908  0.7045 
George W. Bush  Rep.  0.0051  0.51  0.5037 
George W. Bush  Rep.  0.0246  0.5527  0.5316 
Barack Obama  Dem.  0.0727  0.5823  0.6784 
Barack Obama  Dem.  0.0386  0.5487  0.6171 
Clearly, the percentage of electoral college votes a candidate depends nonlinearly on the voter turnout percentage and popular vote margin (%) as this nonparametric regression shows:
We therefore chose to perform a nonlinear regression using neural networks, for which our structure was:
As is turns out, this simple neural network structure with one hidden layer gave the lowest test error, which was 0.002496419 in this case.
Now, looking at the most recent national polls for the upcoming election, we see that Hillary Clinton has a 6.1% lead in the popular vote. Our neural network model then predicts the following:
Simulation  Popular Vote Margin  Percentage of Voter Turnout  Predicted Percentage of Electoral College Votes (+/ 0.04996417) 
1  0.061  0.30  0.6607371 
2  0.061  0.35  0.6647464 
3  0.061  0.40  0.6687115 
4  0.061  0.45  0.6726314 
5  0.061  0.50  0.6765048 
6  0.061  0.55  0.6803307 
7  0.061  0.60  0.6841083 
8  0.061  0.65  0.6878366 
9  0.061  0.70  0.6915149 
10  0.061  0.75  0.6951424 
One sees that even for an extremely low voter turnout (30%), at this point Hillary Clinton can expect to win the Electoral College by a margin of 61.078% to 71.07013%, or 328 to 382 electoral college votes. Therefore, what seems like a relatively small lead in the popular vote (6.1%) translates according to this neural network model into a large margin of victory in the electoral college.
One can see that the predicted percentage of electoral college votes really depends on popular vote margin and voter turnout. For example, if we reduce the popular vote margin to 1%, the results are less promising for the leading candidate:
Pop.Vote Margin  Voter Turnout %  E.C. % Win  E.C% Win Best Case  E.C.% Win Worst Case 
0.01  0.30  0.5182854  0.4675000  0.5690708 
0.01  0.35  0.5244157  0.4736303  0.5752011 
0.01  0.40  0.5305820  0.4797967  0.5813674 
0.01  0.45  0.5367790  0.4859937  0.5875644 
0.01  0.50  0.5430013  0.4922160  0.5937867 
0.01  0.55  0.5492434  0.4984580  0.6000287 
0.01  0.60  0.5554995  0.5047141  0.6062849 
0.01  0.65  0.5617642  0.5109788  0.6125496 
0.01  0.70  0.5680317  0.5172463  0.6188171 
0.01  0.75  0.5742963  0.5235109  0.6250817 
One sees that if the popular vote margin is just 1% for the leading candidate, that candidate is not in the clear unless the popular vote exceeds 60%.
Breaking Down the 20152016 NBA Season
In this article, I will use Data Science / Machine Learning methodologies to break down the real factors separating the playoff from nonplayoff teams. In particular, I used the data from BasketballReference.com to associate 44 predictor variables which each team: “FG” “FGA” “FG.” “X3P” “X3PA” “X3P.” “X2P” “X2PA” “X2P.” “FT” “FTA” “FT.” “ORB” “DRB” “TRB” “AST” “STL” “BLK” “TOV” “PF” “PTS” “PS.G” “oFG” “oFGA” “oFG.” “o3P” “o3PA” “o3P.” “o2P” “o2PA” “o2P.” “oFT” “oFTA” “oFT.” “oORB” “oDRB” “oTRB” “oAST” “oSTL” “oBLK” “oTOV” “oPF” “oPTS” “oPS.G”
, where a letter ‘o’ before the last 22 predictor variables indicates a defensive variable. (‘o’ stands for opponent. )
Using principal components analysis (PCA), I was able to project this 44dimensional data set to a 5D dimensional data set. That is, the first 5 principal components were found to explain 85% of the variance.
Here are the various biplots:
In these plots, the teams are grouped according to whether they made the playoffs or not.
One sees from this biplot of the first two principal components that the dominant component along the first PC is 3 point attempts, while the dominant component along the second PC is opponent points. CLE and TOR have a high negative score along the second PC indicating a strong defensive performance. Indeed, one suspects that the final separating factor that led CLE to the championship was their defensive play as opposed to 3point shooting which allinall didn’t do GSW any favours. This is in line with some of my previous analyses.
Optimal Positions for NBA Players
I was thinking about how one can use the NBA’s new SportVU system to figure out optimal positions for players on the court. One of the interesting things about the SportVU system is that it tracks player coordinates on the court. Presumably, it also keeps track of whether or not a player located at makes a shot or misses it. Let us denote a player making a shot by , and a player missing a shot by . Then, one essentially will have data in the form .
One can then use a logistic regression to determine the probability that a player at position will make a shot:
The main idea is that the parameters uniquely characterize a given player’s probability of making a shot.
As a coaching staff from an offensive perspective, let us say we wish to position players as to say they have a very high probability of making a shot, let us say, for demonstration purposes 99%. This means we must solve the optimization problem:
(The constraints are determined here by the xy dimensions of a standard NBA court).
This has the following solutions:
with the following conditions:
One can also have:
with the following conditions:
Another solution is:
with the following conditions:
The fourth possible solution is:
with the following conditions:
In practice, it should be noted, that it is typically unlikely to have a player that has a 99% probability of making a shot.
To put this example in more practical terms, I generated some random data (1000 points) for a player in terms of coordinates and whether he made a shot from that distance or not. The following scatter plot shows the result of this simulation:
In this plot, the red dots indicate a player has made a shot (a response of 1.0) from the coordinates given, while a purple dot indicates a player has missed a shot from the coordinates given (a response of 0.0).
Performing a logistic regression on this data, we obtain that .
Using the equations above, we see that this player has a maximum probability of of making a shot from a location of , and a minimum probability of of making a shot from a location of .