Optimal Strategies for Winning The Democratic Primaries

By: Dr. Ikjyot Singh Kohli

Election season is upon us again, and a number of people from political analysts to campaign advisors are making a huge deal about winning the Iowa caucuses. This seems to be the standard “wisdom”. I decided to run some analysis on the data to see if it was true.

I looked at every Democratic primary since 1976 and tried to find which states are absolutely “must-win” for a candidate to be the Democratic presidential nominee. Because the data from a data science perspective is scarce, I had to run Monte Carlo bootstrap sampling on the dataset to come up with the results.

Interestingly, irrespective of the number of bootstrap samples, three classification tree results kept coming up, which I now present:

Winning a certain state was encoded as a binary variable. “0” indicates a candidate losing the state, while “1” indicates a candidate won the state.

Very interestingly, from the classification tree above, one sees that actually the most important state for a candidate to win to ensure the highest probability of being the Democratic nominee is Illinois. However, if that candidate loses Illinois, another possible path to the nomination is: Lose Illinois -> Win Hawaii -> Lose Alaska.

The other result from bootstrap sampling was as follows:

Winning a certain state was encoded as a binary variable. “0” indicates a candidate losing the state, while “1” indicates a candidate won the state.

Here we see that winning Texas is of paramount importance. In fact, all subsequent paths to the nomination stem from winning Texas.

There is also a third result that came from the bootstrap simulation:

Winning a certain state was encoded as a binary variable. “0” indicates a candidate losing the state, while “1” indicates a candidate won the state.

We see that in this simulation, once again Illinois is of prime importance. However, even if a candidate does lose Illinois, evidently a path to the nomination is still possible if that candidate wins Maryland and Arizona.

Conclusion: We see that from analyzing the data that Iowa and New Hampshire are actually not very important in becoming the Democratic party nomination. Rather, Illinois and Texas are much more important to ensure a candidate of a high probability of being the Democratic nominee.

NBA Analytics Dashboard

Here is an embedded dashboard that shows a number of statistical insights for NBA teams, their opponents, and individual players as well. You can compare multiple teams and players. Navigate through  the different pages by clicking through the scrolling arrow below. (The data is based on the most recent season “per-game” numbers.)

(If you cannot see the dashboard embedded below for whatever reason, click here to be taken directly to the dashboard in a separate page.)

So, What’s Wrong with the Knicks?

By: Dr. Ikjyot Singh Kohli

As I write this post, the Knicks are currently 12th in the Eastern conference with a record of 22-32. A plethora of people are offering the opinions on what is wrong with the Knicks, and of course, most of it being from ESPN and the New York media, most of it is incorrect/useless, here are some examples:

  1. The Bulls are following the Knicks’ blueprint for failure and …
  2. Spike Lee ‘still believes’ in Melo, says time for Phil Jackson to go
  3. 25 reasons being a New York Knicks fan is the most depressing …
  4. Carmelo Anthony needs to escape the Knicks
  5. Another Awful Week for Knicks

A while ago, I wrote this paper based on statistical learning that shows the common characteristics for NBA playoff teams. Basically, I obtained the following important result:

img_4304

This classification tree shows along with arguments in the paper, that while the most important factor in teams making the playoffs tends to be the opponent number of assists per game, there are paths to the playoffs where teams are not necessarily strong in this area. Specifically, for the Knicks, as of today, we see that:

opp. Assists / game : 22.4 > 20. 75, STL / game: 7. 2 < 8.0061, TOV / game : 14.1 < 14.1585, DRB / game: 33.8 > 29.9024, opp. TOV / game: 13.0 < 13.1585.

So, one sees that what is keeping the Knicks out of the playoffs is specifically pressure defense, in that, they are not forcing enough turnovers per game. Ironically, they are very close to the threshold, but, it is not enough.

A probability density approximation of the Knicks’ Opp. TOV/G is as follows:

tovpgameplot1

 

This PDF has the approximate functional form:

P(oTOV) =

knicksotovg

Therefore, by computing:

\int_{A}^{\infty} P(oTOV) d(oTOV),

=

knicksotoverfc,

where Erfc is the complementary error function, and is given by:

erfc(z) = \frac{2}{\sqrt{\pi}} \int_{z}^{\infty} e^{-t^2} dt

 

Given that the threshold for playoff-bound teams is more than 13.1585 opp. TOV/game, setting A = 13 above, we obtain: 0.435. This means that the Knicks have roughly a 43.5% chance of forcing more than 13 TOV in any single game. Similarly, setting A = 14, one obtains: 0.3177. This means that the Knicks have roughly a 31.77% chance of forcing more than 14 TOV in any single game, and so forth.

Therefore, one concludes that while the Knicks problems are defensive-oriented, it is specifically related to pressure defense and forcing turnovers.

 

 By: Dr. Ikjyot Singh Kohli, About the Author

Some Thoughts on The US GDP

Here are some thoughts on the US GDP based on some data I’ve been looking at recently, mostly motivated by some Donald Trump supporters that have been criticizing President Obama’s record on the GDP and the economy. 

First, analyzing the real GDP’s average growth per year, we obtain that (based on a least squares regression analysis)

According to these calculations, President Clinton’s economic policies led to the best average GDP growth rate at $436 Billion / year. President Reagan and President Obama have almost identical average GDP growth rates in the neighbourhood of $320 Billion / year. However, an obvious caveat is that President Obama’s GDP record is still missing two years of data, so I will need to revisit these calculations in two years! Also, it should be noted that, historically, the US GDP has grown at an average of about $184 Billion / year. 

The second point I wanted to address is several Trump supporters who keep comparing the average real GDP annual percentage change between President Reagan and President Obama. Although they are citing the averages, they are not mentioning the standard deviations! Computing these we find that:


Looking at these calculations, we find that Presidents Clinton and Obama had the most stable growth in year-to-year real GDP %. Presidents Bush and Reagan had highly unstable GDP growth, with President Bush’s being far worse than President Reagan’s. Further, Trump supporters and most Republicans seem quick to point out the mean of 3.637% figure associated with President Reagan, but the point is this is +/- 2.55%, which indicates high volatility in the GDP under President Reagan, which has not been the case under President Obama. 

Another observation I would like to point out is that very few people have been mentioning the fact that the annual real US GDP % is in fact correlated to that of other countries. Based on data from the World Bank, one can compute the following correlations: 


One sees that the correlation between the annual growth % of the US real GDP and Canada is 0.826, while for Estonia and The UK is roughly close to 0.7. Therefore, evidently, any President that claims that his policies will increase the GDP, is not being truthful, since, it is quite likely that these numbers also depend on those for other countries, which, I am not entirely  convinced a US President has complete control over!

My final observation is with respect to the quarterly GDP numbers. There are some articles that I have seen in recent days in addition to several television segments in which Trump supporters are continuously citing how better Reagan’s quarterly GDP numbers were compared to Obama’s. We now show that in actuality this is not the case. 

The problem is that most of the “analysts” are just looking at the raw data, which on its face value actually doesn’t tell you much, since, as expected, fluctuates. Below, we analyze the quarterly GDP% data during the tenure of both Presidents Reagan and Obama, from 1982-1988 and 2010-2016 respectively, comparing data from the same length of time. 

For Reagan, we obtain: 


For Obama, we obtain:


The only way to reasonably compare these two data sets is to analyze the rate at which the GDP % has increased in time. Since the data is nonlinear in time, this means we must calculate the derivatives at instants of time / each quarter. We first performed cubic spline interpolation to fit curves to these data sets, which gave extremely good results: 


We then numerically computed the derivative of these curves at each quarter and obtained: 

The dashed curves in the above plot are plots of the derivatives of each curve at each quarter. In terms of numbers, these were found to be: 


Summarizing the table above in graphical format, we obtain: 


As can be calculated easily, Obama has higher GDP quarterly growth numbers for 15/26 (57.69%) quarters. Therefore, even looking at the quarterly real GDP numbers, overall, President Obama outperforms President Reagan. 

Thanks to Hargun Singh Kohli, B.A. Honours, LL.B. for the data collection and processing part of this analysis. 

Stephen Curry and Mahmoud Abdul-Rauf?

As usual, Phil Jackson made another interesting tweet today:

And, as usual received many criticisms from “Experts”, who just looked at the raw numbers from each players, and saw that there is just no way such a statement is justified, but it is not that simple!

When you compare two players (or two objects) who have very different data feature values, it is not that they can’t be compared, you must effectively normalize the data somehow to make the sets comparable.

In this case, I used the data from Basketball-Reference.com to compare Chris Jackson’s 6 seasons in Denver to Stephen Curry’s last 6 seasons (including this one) and took into account 45 different statistical measures, and came up with the following correlation matrix/similarity matrix plot:

  

 
Dark blue circles indicate a strong correlation, while dark red circles indicate a weak correlation between two sets of features. 

What would be of interest in an analysis like this is to examine the diagonal of this matrix, which offers a direct comparison between the two players: 

  
One can see that there are many features that have strong correlation coefficients. 

Therefore, it is true that Stephen Curry and Chris Jackson do in fact share many strong similarities! 

New Paper on Stochastic Eternal Inflation

Our new paper was accepted for publication in Physical Review D. The goal of the paper was to calculate the probability that a multiverse could emerge from a more general background spacetime, in this case, Bianchi Type I coupled to a chaotic inflaton potential. Basically, we found that a multiverse being generated from such a scenario has a small probability of occurring. Further, the fine-tuning problem that the multiverse / eternal inflation is supposed to solve doesn’t actually occur, because fine-tuning is still required of the geometry of the background spacetime, the initial conditions, and most importantly, the amount of anisotropy.

prdabstract

The preprint can be read on the arXiv here.