The conventional wisdom by the political pundits/analysts who are seeking to explain Joe Biden’s massive win in the 2020 South Carolina primary is that Jim Clyburn’s endorsement was the sole reason why Biden won. (Here is just one article describing this.)
Using extensive polling data from RealClearPolitics, I looked at Biden’s margin of victory according to various polling samples before the Clyburn endorsement. I used Kernel Density Estimation to form the following probability density function of Biden’s predicted margin of victory (as a percentage/popular vote) in the 2020 South Carolina Primary:
Assuming this probability density function has the form , we notice some interesting properties:
The Expected Margin of Victory for Biden is given by: . Using numerical integration, we find that this is . The error in this prediction is given by . This means that the predicted Biden margin of victory is . Clearly, the higher bound of this prediction is 28.89%. That is, according to the data before Clyburn’s endorsement, it was perfectly reasonable to expect that Biden’s victory in South Carolina could have been around 29%. Indeed, Biden’s final margin of victory in South Carolina was 28.5%, which is within the prediction margin. Therefore, it seems it is unlikely Jim Clyburn’s endorsement boosted Biden’s victory in South Carolina.
Given the density function above, we can make some more interesting calculations:
P(Biden win > 5%) = = 90.4%
P(Biden win > 10%) = = 79.9%
P(Biden win > 15%) = = 71.0%
P(Biden win > 20%) = = 56.7%
What these calculations show is that the probability that Biden would have won by more than 5% before Clyburn’s endorsement was 90.4%. The probability that Biden would have won by more than 10% before Clyburn’s endorsement was 79.9%. The probability that Biden would have won by more than 20% before Clyburn’s endorsement was 56.7%, and so on.
Given these calculations, it actually seems unlikely that Clyburn’s endorsement made a huge impact on Biden’s win in South Carolina. This analysis shows that Biden would have likely won by more 15%-20% regardless.
Election season is upon us again, and a number of people from political analysts to campaign advisors are making a huge deal about winning the Iowa caucuses. This seems to be the standard “wisdom”. I decided to run some analysis on the data to see if it was true.
I looked at every Democratic primary since 1976 and tried to find which states are absolutely “must-win” for a candidate to be the Democratic presidential nominee. Because the data from a data science perspective is scarce, I had to run Monte Carlo bootstrap sampling on the dataset to come up with the results.
Interestingly, irrespective of the number of bootstrap samples, three classification tree results kept coming up, which I now present:
Very interestingly, from the classification tree above, one sees that actually the most important state for a candidate to win to ensure the highest probability of being the Democratic nominee is Illinois.
The other result from bootstrap sampling was as follows:
Here we see that winning Texas is of paramount importance. In fact, all subsequent paths to the nomination stem from winning Texas.
There is also a third result that came from the bootstrap simulation:
We see that in this simulation, once again Illinois is of prime importance. However, even if a candidate does lose Illinois, evidently a path to the nomination is still possible if that candidate wins Maryland and Arizona.
Conclusion: We see that from analyzing the data that Iowa and New Hampshire are actually not very important in becoming the Democratic party nomination. Rather, Illinois and Texas are much more important to ensure a candidate of a high probability of being the Democratic nominee.
An interesting machine learning problem: Can one figure out the relationship between the popular vote margin, voter turnout, and the percentage of electoral college votes a candidate wins? Going back to the election of John Quincy Adams, the raw data looks like this:
Predicted Percentage of Electoral College Votes (+/- 0.04996417)
One sees that even for an extremely low voter turnout (30%), at this point Hillary Clinton can expect to win the Electoral College by a margin of 61.078% to 71.07013%, or 328 to 382 electoral college votes. Therefore, what seems like a relatively small lead in the popular vote (6.1%) translates according to this neural network model into a large margin of victory in the electoral college.
One can see that the predicted percentage of electoral college votes really depends on popular vote margin and voter turnout. For example, if we reduce the popular vote margin to 1%, the results are less promising for the leading candidate:
Voter Turnout %
E.C. % Win
E.C% Win Best Case
E.C.% Win Worst Case
One sees that if the popular vote margin is just 1% for the leading candidate, that candidate is not in the clear unless the popular vote exceeds 60%.