Election season is upon us again, and a number of people from political analysts to campaign advisors are making a huge deal about winning the Iowa caucuses. This seems to be the standard “wisdom”. I decided to run some analysis on the data to see if it was true.
I looked at every Democratic primary since 1976 and tried to find which states are absolutely “must-win” for a candidate to be the Democratic presidential nominee. Because the data from a data science perspective is scarce, I had to run Monte Carlo bootstrap sampling on the dataset to come up with the results.
Interestingly, irrespective of the number of bootstrap samples, three classification tree results kept coming up, which I now present:
Very interestingly, from the classification tree above, one sees that actually the most important state for a candidate to win to ensure the highest probability of being the Democratic nominee is Illinois.
The other result from bootstrap sampling was as follows:
Here we see that winning Texas is of paramount importance. In fact, all subsequent paths to the nomination stem from winning Texas.
There is also a third result that came from the bootstrap simulation:
We see that in this simulation, once again Illinois is of prime importance. However, even if a candidate does lose Illinois, evidently a path to the nomination is still possible if that candidate wins Maryland and Arizona.
Conclusion: We see that from analyzing the data that Iowa and New Hampshire are actually not very important in becoming the Democratic party nomination. Rather, Illinois and Texas are much more important to ensure a candidate of a high probability of being the Democratic nominee.
I decided to try to analyze this statement quantitatively. Indeed, one can calculate the probability that an illegal immigrant will commit a crime within The United States as follows. Let us denote crime (or criminal) by C, while denoting illegal immigrant by ii. Then, by Bayes’ theorem, we have:
It is quite easy to find data associated with the various factors in this formula. For example, one finds that
Putting all of this together, we find that:
That is, the probability that an illegal immigrant will commit a crime (of any type) while in The United States is a very low 11.35%.
Therefore, Trump’s claim of “tremendous amounts of crime” being brought to The United States by illegal immigrants is incorrect.
Note that, the numerical factors used above were obtained from:
A great deal of noise has been made in the previous weeks about the surge in the polls of Donald Trump and Bernie Sanders. This has led some people to question whether Hillary Clinton will actually end up being the Democratic party nominee in 2016. This was further evidenced by the fact that Sanders is now leading Clinton in the latest New Hampshire polls.
However, running an analysis on current polling data, I still believe that even though it is very early, Hillary Clinton still has the best chance of being the Democratic party nominee. In fact, running some algorithms against the current data, I found that:
Hillary Clinton: chance of winning Democratic nomination.
Bernie Sanders: chance of winning Democratic nomination.
These numbers were deduced from an algorithm that used non-parametric methods to obtain the following probability density functions.
Thanks to Hargun Singh Kohli for data compilation and research.