## Did Clyburn Help Biden in South Carolina?

The conventional wisdom by the political pundits/analysts who are seeking to explain Joe Biden’s massive win in the 2020 South Carolina primary is that Jim Clyburn’s endorsement was the sole reason why Biden won. (Here is just one article describing this.)

I wanted to analyze the data behind this and actually measure the effect of the Clyburn effect. Clyburn formally endorsed Biden on February 26, 2020.

Using extensive polling data from RealClearPolitics, I looked at Biden’s margin of victory according to various polling samples before the Clyburn endorsement. I used Kernel Density Estimation to form the following probability density function of Biden’s predicted margin of victory (as a percentage/popular vote) in the 2020 South Carolina Primary:

Assuming this probability density function has the form $p(x)$, we notice some interesting properties:

• The Expected Margin of Victory for Biden is given by: $\int x p(x) dx$. Using numerical integration, we find that this is $\int x p(x) dx = 18.513 \%$. The error in this prediction is given by $var(x) = \int x^2 p(x) dx - (\int x p(x) dx)^2 = 107.79$. This means that the predicted Biden margin of victory is $18.51 \pm 10.382$. Clearly, the higher bound of this prediction is 28.89%. That is, according to the data before Clyburn’s endorsement, it was perfectly reasonable to expect that Biden’s victory in South Carolina could have been around 29%. Indeed, Biden’s final margin of victory in South Carolina was 28.5%, which is within the prediction margin. Therefore, it seems it is unlikely Jim Clyburn’s endorsement boosted Biden’s victory in South Carolina.
• Given the density function above, we can make some more interesting calculations:
• P(Biden win > 5%) = $1 - \int_{-\infty}^{5} f(x) dx = 0.904$ = 90.4%
• P(Biden win > 10%) = $1 - \int_{-\infty}^{10} f(x) dx = 0.799$ = 79.9%
• P(Biden win > 15%) = $1 - \int_{-\infty}^{15} f(x) dx = 0.710$ = 71.0%
• P(Biden win > 20%) = $1 - \int_{-\infty}^{20} f(x) dx = 0.567$ = 56.7%

What these calculations show is that the probability that Biden would have won by more than 5% before Clyburn’s endorsement was 90.4%. The probability that Biden would have won by more than 10% before Clyburn’s endorsement was 79.9%. The probability that Biden would have won by more than 20% before Clyburn’s endorsement was 56.7%, and so on.

Given these calculations, it actually seems unlikely that Clyburn’s endorsement made a huge impact on Biden’s win in South Carolina. This analysis shows that Biden would have likely won by more 15%-20% regardless.

## Optimal Strategies for Winning The Democratic Primaries

Election season is upon us again, and a number of people from political analysts to campaign advisors are making a huge deal about winning the Iowa caucuses. This seems to be the standard “wisdom”. I decided to run some analysis on the data to see if it was true.

I looked at every Democratic primary since 1976 and tried to find which states are absolutely “must-win” for a candidate to be the Democratic presidential nominee. Because the data from a data science perspective is scarce, I had to run Monte Carlo bootstrap sampling on the dataset to come up with the results.

Interestingly, irrespective of the number of bootstrap samples, three classification tree results kept coming up, which I now present:

Very interestingly, from the classification tree above, one sees that actually the most important state for a candidate to win to ensure the highest probability of being the Democratic nominee is Illinois.

The other result from bootstrap sampling was as follows:

Here we see that winning Texas is of paramount importance. In fact, all subsequent paths to the nomination stem from winning Texas.

There is also a third result that came from the bootstrap simulation:

We see that in this simulation, once again Illinois is of prime importance. However, even if a candidate does lose Illinois, evidently a path to the nomination is still possible if that candidate wins Maryland and Arizona.

Conclusion: We see that from analyzing the data that Iowa and New Hampshire are actually not very important in becoming the Democratic party nomination. Rather, Illinois and Texas are much more important to ensure a candidate of a high probability of being the Democratic nominee.

## Will Donald Trump’s Proposed Immigration Policies Curb Terrorism in The US?

In recent days, Donald Trump proposed yet another iteration of his immigration policy which is focused on “Keeping America Safe” as part of his plan to “Make America Great Again!”. In this latest iteration, in addition to suspending visas from countries with terrorist ties, he is also proposing introducing an ideological test for those entering the US. As you can see in the BBC article, he is also fond of holding up bar graphs of showing the number of refugees entering the US over a period of time, and somehow relates that to terrorist activities in the US, or at least, insinuates it.

Let’s look at the facts behind these proposals using the available data from 2005-2014. Specifically, we analyzed:

1. The number of terrorist incidents per year from 2005-2014 from here (The Global Terrorism Database maintained by The University of Maryland)
2. The Department of Homeland Security Yearbook of Immigration Statistics, available here . Specifically, we looked at Persons Obtaining Lawful Permanent Resident Status by Region and Country of Birth (2005-2014) and Refugee Arrivals by Region and Country of Nationality (2005-2014).

Given these datasets, we focused on countries/regions labeled as terrorist safe havens and state sponsors of terror based on the criteria outlined here .

We found the following.

First, looking at naturalized citizens, these computations yielded:

 Country Correlations Percent of Variance Explained Afghanistan 0.61169 0.37416 Egypt 0.26597 0.07074 Indonesia -0.66011 0.43574 Iran -0.31944 0.10204 Iraq 0.26692 0.07125 Lebanon -0.35645 0.12706 Libya 0.59748 0.35698 Malaysia 0.39481 0.15587 Mali 0.20195 0.04079 Pakistan 0.00513 0.00003 Phillipines -0.79093 0.62557 Somalia -0.40675 0.16544 Syria 0.62556 0.39132 Yemen -0.11707 0.01371

In graphical form:

The highest correlations are 0.62556 and 0.61669 from Syria and Afghanistan respectively. The highest anti-correlations were from Indonesia and The Phillipines at -0.66011 and -0.79093 respectively. Certainly, none of the correlations exceed 0.65, which indicates that there could be some relationship between the number of naturalized citizens from these particular countries and the number of terrorist incidents, but, it is nowhere near conclusive. Further, looking at Syria, we see that the percentage of variance explained / coefficient of determination is 0.39132, which means that only about 39% of the variation in the number of terrorist incidents can be predicted from the relationship between where a naturalized citizen is born and the number of terrorist incidents in The United States.

Second, looking at refugees, these computations yielded:

 Country Correlations Percent of Variance Explained Afghanistan 0.59836 0.35803 Egypt 0.66657 0.44432 Iran -0.29401 0.08644 Iraq 0.49295 0.24300 Pakistan 0.60343 0.36413 Somalia 0.14914 0.02224 Syria 0.56384 0.31792 Yemen -0.35438 0.12558 Other 0.54109 0.29278

In graphical form:

We see that the highest correlations are from Egypt (0.6657), Pakistan (0.60343), and Afghanistan (0.59836). This indicates there is some mild correlation between refugees from these countries and the number of terrorist incidents in The United States, but it is nowhere near conclusive. Further, the coefficients of determination from Egypt and Syria are 0.44432 and 0.31792 respectively. This means that in the case of Syrian refugees for example, only 31.792% of the variation in terrorist incidents in the United States can be predicted from the relationship between a refugee’s country of origin and the number of terrorist incidents in The United States.

In conclusion, it is therefore unlikely that Donald Trump’s proposals would do anything to significantly curb the number of terrorist incidents in The United States. Further, repeatedly showing pictures like this:

at his rallies is doing nothing to address the issue at hand and is perhaps only serving as yet another fear tactic as has become all too common in his campaign thus far.

(Thanks to Hargun Singh Kohli, Honours B.A., LL.B. for the initial data mining and processing of the various datasets listed above.)

Note, further to the results of this article, I was recently made aware of this excellent article from The WSJ, which I have summarized below:

## Some Thoughts on The US GDP

Here are some thoughts on the US GDP based on some data I’ve been looking at recently, mostly motivated by some Donald Trump supporters that have been criticizing President Obama’s record on the GDP and the economy.

First, analyzing the real GDP’s average growth per year, we obtain that (based on a least squares regression analysis)

According to these calculations, President Clinton’s economic policies led to the best average GDP growth rate at $436 Billion / year. President Reagan and President Obama have almost identical average GDP growth rates in the neighbourhood of$320 Billion / year. However, an obvious caveat is that President Obama’s GDP record is still missing two years of data, so I will need to revisit these calculations in two years! Also, it should be noted that, historically, the US GDP has grown at an average of about \$184 Billion / year.

The second point I wanted to address is several Trump supporters who keep comparing the average real GDP annual percentage change between President Reagan and President Obama. Although they are citing the averages, they are not mentioning the standard deviations! Computing these we find that:

Looking at these calculations, we find that Presidents Clinton and Obama had the most stable growth in year-to-year real GDP %. Presidents Bush and Reagan had highly unstable GDP growth, with President Bush’s being far worse than President Reagan’s. Further, Trump supporters and most Republicans seem quick to point out the mean of 3.637% figure associated with President Reagan, but the point is this is +/- 2.55%, which indicates high volatility in the GDP under President Reagan, which has not been the case under President Obama.

Another observation I would like to point out is that very few people have been mentioning the fact that the annual real US GDP % is in fact correlated to that of other countries. Based on data from the World Bank, one can compute the following correlations:

One sees that the correlation between the annual growth % of the US real GDP and Canada is 0.826, while for Estonia and The UK is roughly close to 0.7. Therefore, evidently, any President that claims that his policies will increase the GDP, is not being truthful, since, it is quite likely that these numbers also depend on those for other countries, which, I am not entirely  convinced a US President has complete control over!

My final observation is with respect to the quarterly GDP numbers. There are some articles that I have seen in recent days in addition to several television segments in which Trump supporters are continuously citing how better Reagan’s quarterly GDP numbers were compared to Obama’s. We now show that in actuality this is not the case.

The problem is that most of the “analysts” are just looking at the raw data, which on its face value actually doesn’t tell you much, since, as expected, fluctuates. Below, we analyze the quarterly GDP% data during the tenure of both Presidents Reagan and Obama, from 1982-1988 and 2010-2016 respectively, comparing data from the same length of time.

For Reagan, we obtain:

For Obama, we obtain:

The only way to reasonably compare these two data sets is to analyze the rate at which the GDP % has increased in time. Since the data is nonlinear in time, this means we must calculate the derivatives at instants of time / each quarter. We first performed cubic spline interpolation to fit curves to these data sets, which gave extremely good results:

We then numerically computed the derivative of these curves at each quarter and obtained:

The dashed curves in the above plot are plots of the derivatives of each curve at each quarter. In terms of numbers, these were found to be:

Summarizing the table above in graphical format, we obtain:

As can be calculated easily, Obama has higher GDP quarterly growth numbers for 15/26 (57.69%) quarters. Therefore, even looking at the quarterly real GDP numbers, overall, President Obama outperforms President Reagan.

Thanks to Hargun Singh Kohli, B.A. Honours, LL.B. for the data collection and processing part of this analysis.

## 2016 Michigan Primary Predictions

Using the Monte Carlo techniques I have described in earlier posts, I ran several simulations today to try to predict who will win the 2016 Michigan primaries. Here is what I found:

For the Republican primaries, I predict:

Trump: 89.64% chance of winning

Cruz: 5.01% chance of winning

Kasich: 3.29% chance of winning

Rubio: 2.06% chance of winning

The following plot is a histogram of the simulations: