Tomorrow is the date of the Canadian Federal Elections. Here are my predictions for the outcome:

*That is, I predict the Liberals will win, with the NDP trailing very far behind either party. *

Skip to content
# Category: Statistics

## Canadian Federal Election Predictions for 10/19/2015

## Do More Gun Laws Prevent Gun Violence?

## Let’s not go overboard with this Trump stuff!

## Hillary Clinton Still Has the Best Chance of Being The Democratic Party Nominee in 2016

## The “Evolution” of the 3-Point Shot in The NBA

### 1. Introduction

### 2. The Dynamical Equations

### 3. Fixed-Points Analysis

### 4. Global Stability and The Existence of Nash Equilibria

## Article on Three-Point Shooting in the Modern-Day NBA

## Mathematical Origins of Life

**Update: March 16, 2018: I have received quite a few comments about my critique of Volokh’s WaPo article, and just as a summary of my reply back to those comments:**

The main point that I made and demonstrated below is that the concept of a correlation is only useful as a measure of linearity between the two variables you are comparing. ALL of Volokh’s correlations that he computes are close to zero: 0.032 for correlation between homicide rate, including gun accidents and the Brady score, 0.065 for correlation between intentional homicide rate and Brady score, 0.0178, correlation between the homicide rate including gun accidents and the National Journal score, and 0.0511, correlation between just the intentional homicide rate and National Journal score. All of these numbers are completely *useless*. You cannot conclude anything from these scores. All you can conclude is that the relationship between homicide rate (including or not including gun accidents) and the Brady score is highly nonlinear. Since they are nonlinear, I have investigated this nonlinear relationship using data science methodologies such as regression trees.

Article begins below:

**Abstract:**

- The number and quality of gun-control laws a state has drastically effects the number of gun-related deaths.
- Other factors like mean household income play a smaller role in the number of gun-related deaths.
- Factors like the amount of money a state spends on mental-health care has a negligible effect on the number of gun-related deaths. This point is quite important as there are a number of policy-makers that consistently argue that the focus needs to be on the mentally ill and that this will curb the number of gun-related deaths.

**Contents:**

- Critique of Recent Gun-Control Opposition Studies
- A more correct way to look at the Gun Deaths data using data science methodologies.

**A Critique of Recent Gun-Control Opposition Studies**

In light of the recent tragedy in Oregon which is part of a disturbing trend in an increase in gun violence in The United States, we are once again in the aftermath where President Obama and most Democrats are advocating for more gun laws that they claim would aid in decreasing gun violence while their Republican counterparts are as usual arguing the precise opposite. Indeed, there have been two very simplified “studies” presented in the media thus far that have been cited frequently by gun advocates:

- Glenn Kessler’s so-called Fact-Checker Article
- Eugene Volokh’s opinion article in The Washington Post

I have singled out these two examples, but most of the studies claiming to “do statistics” follow a similar suit and methodology, so I have listed them here. It should be noted that these studies are extremely simplified, as they compute correlations, while in reality they only look at two factors (the gun death rate and a state’s “Brady grade”). As we show below, the answer to the question of interest and one that allows us to determine causation and correlation must depend on several state-dependent factors and hence, requires deeper statistical learning methodologies, of which NONE of the second amendment advocates seem to be aware of.

The reason why one cannot deduce anything significant from correlations as is done in Volokh’s article is correlation coefficients are good “summary statistics” but they hardly tell you anything deep about the data you are working with. For example, in Volokh’s article, he uses MS Excel to compute the correlations between a pair of variables, but Excel itself uses the Pearson correlation coefficient, which essentially is a measure of the linearity between two variables. If the underlying data exhibits a nonlinear relationship, the correlation coefficient will return a small value, but this in no way means there is no relationship between the data, it just means it is not linear. Similarly, other correlation coefficient computations make other assumptions about the data such as coming from a normal distribution, which is strange to assume from the onset. (There is also the more technical issue that a state’s Brady grade is not exactly a *random* variable. So measuring the correlation between a supposed random variable (the number of homicides) and a non-random variable is not exactly a sound idea.)

A simple example of where the correlation calculation fails is to try to determine the relationship between the following set of data. Consider 2 variables, x and y. Let x have the data

**x y**

-1.0000 0.2420

-0.9000 0.2661

-0.8000 0.2897

-0.7000 0.3123

-0.6000 0.3332

-0.5000 0.3521

-0.4000 0.3683

-0.3000 0.3814

-0.2000 0.3910

-0.1000 0.3970

0 0.3989

0.1000 0.3970

0.2000 0.3910

0.3000 0.3814

0.4000 0.3683

0.5000 0.3521

0.6000 0.3332

0.7000 0.3123

0.8000 0.2897

0.9000 0.2661

1.0000 0.2420

If one tries to compute the correlation between x and y, one will obtain that the correlation coefficient is zero! (Try it!) A simple conclusion would be that therefore there is no linear causation/dependence between x and y. But, if one now makes a scatter plot of x and y, one gets:

Despite having zero correlation, there is apparently a very strong relationship between x and y. In fact, after some analysis, one can show that they obey the following relationship:

,

that is, y is the normal distribution. So, in this example and similar examples where there is a strong nonlinear relationship between the two variables, the correlation, in particular, the Pearson correlation is meaningless. Strangely, despite this, Volokh uses a near-zero correlation of his data to demonstrate that there is no correlation between a state’s gun score and the number of gun-related deaths, but this is not what his results show! He is misinterpreting his calculations.

Indeed, looking at Volokh’s specific example of comparing the Brady score to the number of Homicides, one gets the following scatter plot:

Volokh that computes the Pearson correlation between the two variables and obtains a result of 0.0323, that is, quite close to zero, which leads him to conclude that there is no correlation between the two. But, this is *not *what this result means. What it is saying in this case, is that there is a strong nonlinear relationship between the two. Even a very rough analysis between the two variables, and as I’ve said above, and demonstrate below, looking at two variables for a state is hardly useful, but for argument sake, there is a rough sinusoidal relationship between the two variables:

In fact, the fit of this sum-of-sines curve is an 8-term sine function with a R^2 of 0.5322. So, it’s not great, but there is clearly at least some causal behaviour between the two variables. But, I will say again, that due to the clustering of points around zero on the x-axis above, there will be simply NO function that fits the points, because it will not be one-to-one and onto, that is, there are repeated x-points for the same y-value in the data, and this is problematic. So, looking at two variables is not useful at all, and what this calculation shows is that the relationship if there is one would be strongly *nonlinear*, so measuring the correlation doesn’t make any sense.

**Therefore, one requires a much deeper analysis, which we attempt to provide below.**

**A more correct way to look at the Gun Homicide data using data science methodologies.**

I wanted to analyze using data science methodologies which side is correct. Due to limited time resources, I was only able to look at data from previous years (2010-2014) and looked at state-by-state data comparing:

- # of Firearm deaths per 100,000 people (Data from: http://kff.org/other/state-indicator/firearms-death-rate-per-100000/)
- Total State Population (Obtained from Wikipedia)
- Population Density / Square Mile (Obtained from Wikipedia)
- Median Household Income (Obtained from Wikipedia)
- Gun Law Grade: This data was obtained from http://gunlawscorecard.org/, which is The Law Center to Prevent Gun Violence and grades each state based on the number and quality of their gun laws using letter grades, i.e., A,A+,B+,F, etc… To use this data in the data science algorithms, I converted each letter grade to a numerical grade based on the following scale: A+: 90, A-: 90, A: 85, B:73,B-:70,B+:77,C:63,C-:60,C+:67, D:53,D-:50,D+:57,F:0.
- State Mental Health Agency Per Capita Mental Health Services Expenditures (Obtained from: http://kff.org/other/state-indicator/smha-expenditures-per-capita/#table)
- Some data was available for some years and not for others, so there are very slight percentage changes from year-to-year, but overall, this should have a negligible effect on the results.

This is what I found.

Using a boosted regression tree algorithm, I wanted to find which are the largest contributing factors to the number of firearm deaths per 100,000 people and found:

(The above numbers were calculated from a gradient boosted model with a gaussian loss function. 5000 iterations were performed.)

One sees right away that the quality and number of gun laws a state has is the overwhelming factor in the number of gun-related deaths, with the amount of money a state spends on mental health services having a negligible effect.

Next, I created a regression tree to analyze this problem further. I found the following:

The numbers in the very last level of each tree indicate the number of gun-related deaths. One sees that once again where the individual state’s gun law grade is above 73.5%, that is, higher than a “B”, the number of gun-related deaths is at its lowest at a predicted 5.7 / 100,000 people. (Note that: the sum of squares error for this regression was found to be 3.838). Interestingly, the regression tree also predicts that highest number of gun-related deaths all occur for states that score an “F”!

In fact, using a Principle Components Analysis (PCA), and plotting the first two principle components, we find that:

One sees from this PCA analysis, that states that have a high gun-law grade have a low death rate.

Finally, using K-means clustering, I found the following:

One sees from the above results, the states that have a very low “Gun Law grade” are clustered together in having the highest firearms death rate. (See the fourth column in this matrix). That is, zooming in:

**What about Suicides? **

This question has been raised many times because the gun deaths number above includes the number of self-inflicted gun deaths. The argument has been that if we filter out this data from the gun deaths above, the arguments in this article fall apart. As I now show, this is in fact, not the case. Using the state-by-state firearm suicide rate from (http://archinte.jamanetwork.com/article.aspx?articleid=1661390), I performed this filtering to obtain the following principle components analysis biplot:

One sees that the PCA puts approximately equal weight (loadings) onto population density, gun-law grade, and median household income. It is quite clear that states that have a very high gun-law grade have a low amount of gun murders, and vice-versa.

**One sees that the data shows that there is a very large anti-correlation between a state’s gun law grade and the death rate.** There is also a very small anti-correlation between how much a state spends on mental health care and the death rate.

Therefore, the conclusions one can draw immediately are:

**The number and quality of gun-control laws a state has***drastically*effects the number of gun-related deaths.**Other factors like mean household income play a smaller role in the number of gun-related deaths.****Factors like the amount of money a state spends on mental-health care has a negligible effect on the number of gun-related deaths. This point is quite important as there are a number of policy-makers that consistently argue that the focus needs to be on the mentally ill and that this will curb the number of gun-related deaths.**- It would be interesting to apply these methodologies to data from other years. I will perhaps pursue this at a later time.

It has certainly become the talk of the town with *some* of the latest polls showing that Donald Trump is leading Hillary Clinton in a hypothetical 2016 matchup.

I decided to run my polling algorithm to simulate 100,000 election matchups between Clinton and Trump. I calibrated my model using a variety of data sources.

These were the results:

Based on these simulations, I conclude that:

I think in the era of the 24-hour news cycle, too much is made of one poll.

A great deal of noise has been made in the previous weeks about the surge in the polls of Donald Trump and Bernie Sanders. This has led some people to question whether Hillary Clinton will actually end up being the Democratic party nominee in 2016. This was further evidenced by the fact that Sanders is now leading Clinton in the latest New Hampshire polls.

However, running an analysis on current polling data, I still believe that even though it is very early, Hillary Clinton still has the best chance of being the Democratic party nominee. In fact, running some algorithms against the current data, I found that:

**Hillary Clinton: chance of winning Democratic nomination.**

**Bernie Sanders: chance of winning Democratic nomination.**

These numbers were deduced from an algorithm that used non-parametric methods to obtain the following probability density functions.

Thanks to Hargun Singh Kohli for data compilation and research.

The purpose of this post is to determine whether basketball teams who choose to employ an offensive strategy that involves predominantly shooting three point shots is stable and optimal. We employ a game-theoretical approach using techniques from dynamical systems theory to show that taking more three point shots to a point where an offensive strategy is dependent on predominantly shooting threes is not necessarily optimal, and depends on a combination of payoff constraints, where one can establish conditions via the global stability of equilibrium points in addition to Nash equilibria where a predominant two-point offensive strategy would be optimal as well. We perform a detailed fixed-points analysis to establish the local stability of a given offensive strategy. We finally prove the existence of Nash equilibria via global stability techniques via the monotonicity principle. We believe that this work demonstrates that the concept that teams should attempt more three-point shots because a three-point shot is worth more than a two-point shot is therefore, a highly ambiguous statement.

We are currently living in the age of analytics in professional sports, with a strong trend of their use developing in professional basketball. Indeed, perhaps, one of the most discussed results to come out of the analytics era thus far is the claim that teams should shoot as many three-point shots as possible, largely because, three-point shots are worth more than two-point shots, and this somehow is indicative of a very efficient offense. These ideas were mentioned for example by Alex Rucker who said “When you ask coaches what’s better between a 28 percent three-point shot and a 42 percent midrange shot, they’ll say the 42 percent shot. And that’s objectively false. It’s wrong. If LeBron James just jacked a three on every single possession, that’d be an exceptionally good offense. That’s a conversation we’ve had with our coaching staff, and let’s just say they don’t support that approach.” It was also claimed in the same article that “The analytics team is unanimous, and rather emphatic, that every team should shoot more 3s including the Raptors and even the Rockets, who are on pace to break the NBA record for most 3-point attempts in a season.” These assertions were repeated here. In an article by John Schuhmann, it was claimed that “It’s simple math. A made three is worth 1.5 times a made two. So you don’t have to be a great 3-point shooter to make those shots worth a lot more than a jumper from inside the arc. In fact, if you’re not shooting a layup, you might as well be beyond the 3-point line. Last season, the league made 39.4 percent of shots between the restricted area and the arc, for a value of 0.79 points per shot. It made 36.0 percent of threes, for a value of 1.08 points per shot.” The purpose of this paper is to determine whether basketball teams who choose to employ an offensive strategy that involves predominantly shooting three point shots is stable and optimal. We will employ a game-theoretical approach using techniques from dynamical systems theory to show that taking more three point shots to a point where an offensive strategy is dependent on predominantly shooting threes is not necessarily optimal, and depends on a combination of payoff constraints, where one can establish conditions via the global stability of equilibrium points in addition to Nash equilibria where a predominant two-point offensive strategy would be optimal as well. *(Article research and other statistics provided by: Hargun Singh Kohli)*

For our model, we consider two types of NBA teams. The first type are teams that employ two point shots as the predominant part of their offensive strategy, while the other type consists of teams that employ three-point shots as the predominant part of their offensive strategy. There are therefore two predominant strategies, which we will denote as , such that we define

We then let represent the number of teams using , such that the total number of teams in the league is given by

which implies that the proportion of teams using strategy is given by

The state of the population of teams is then represented by . It can be shown that the proportions of individuals using a certain strategy change in time according to the following dynamical system

subject to

where we have defined the average payoff function as

Now, let represent the proportion of teams that predominantly shoot two-point shots, and let represent the proportion of teams that predominantly shoot three-point shots. Further, denoting the game action set to be , where represents a predominant two-point shot strategy, and represents a predominant three-point shot strategy. As such, we assign the following payoffs:

We therefore have that

From (6), we further have that

From Eq. (4) the dynamical system is then given by

,

,

subject to the constraint

Indeed, because of the constraint (10), the dynamical system is actually one-dimensional, which we write in terms of as

From Eq. (11), we immediately notice some things of importance. First, we are able to deduce just from the form of the equation what the invariant sets are. We note that for a dynamical system with flow , if we define a function such that , where , then, the subsets of defined by , and are invariant sets of the flow . Applying this notion to Eq. (11), one immediately sees that , , and are invariant sets of the corresponding flow. Further, there also exists a symmetry such that , which implies that without loss of generality, we can restrict our attention to .

With the dynamical system in hand, we are now in a position to perform a fixed-points analysis. There are precisely three fixed points, which are invariant manifolds and are given by:

Note that, actually contains and as special cases. Namely, when , , and when , . We will therefore just analyze, the stability of . represents a state of the population where all teams predominantly shoot three-point shots. Similarly, represents a state of the population where all teams predominantly shoot two-point shots, We additionally restrict

which implies the following conditions on the payoffs:

With respect to a stability analysis of , we note the following. The point is a: • Local sink if: , • Source if: , • Saddle: if: , or .

What this last calculation shows is that the condition which always corresponds to the point , which corresponds to a dominant 3-point strategy always exists as a saddle point! That is, there will NEVER be a league that dominantly adopts a three-point strategy, at best, some teams will go towards a 3-point strategy, and others will not irrespective of what the analytics people say. This also shows that a team's basketball strategy really should depend on its respective payoffs, and not current "trends". This behaviour is displayed in the following plot.

Further, the system exhibits some bifurcations as well. In the neigbourhood of , the linearized system takes the form

Therefore, destabilizes the system at . Similarly, destabilizes the system at . Therefore, bifurcations of the system occur on the lines and in the four-dimensional parameter space.

With the preceding fixed-points analysis completed, we are now interested in determining global stability conditions. The main motivation is to determine the existence of any Nash equilibria that occur for this game via the following theorem: If is an asymptotically stable fixed point, then the symmetric strategy pair , with is a Nash equilibrium. We will primarily make use of the monotonicity principle, which says let be a flow on with an invariant set. Let be a function whose range is the interval , where , and . If is decreasing on orbits in , then for all ,

,

.

Consider the function

Then, we have that

For the invariant set , we have that . One can then immediately see that in ,

Therefore, by the monotonicity principle,

Note that the conditions and correspond to above. In particular, for , , which implies that is globally stable. Therefore, under these conditions, the symmetric strategy is a Nash equilibrium. Now, consider the function

We can therefore see that

Clearly, in if for example and . Then, by the monotonicity principle, we obtain that

Note that the conditions and correspond to above. In particular, for , , which implies that is globally stable. Therefore, under these conditions, the symmetric strategy is a Nash equilibrium. In summary, we have just shown that for the specific case where and , the strategy is a Nash equilibrium. On the other hand, for the specific case where and , the strategy is a Nash equilibrium. 5. Discussion In the previous section which describes global results, we first concluded that for the case where and , the strategy is a Nash equilibrium. The relevance of this is as follows. The condition on the payoffs thus requires that

That is, given the strategy adopted by the other team, neither team could increase their payoff by adopting another strategy if and only if the condition in (23) is satisfied. Given these conditions, if one team has a predominant two-point strategy, it would be the other team’s best response to also use a predominant two-point strategy. We also concluded that for the case where and , the strategy is a Nash equilibrium. The relevance of this is as follows. The condition on the payoffs thus requires that

That is, given the strategy adopted by the other team, neither team could increase their payoff by adopting another strategy if and only if the condition in (24) is satisfied. Given these conditions, if one team has a predominant three-point strategy, it would be the other team’s best response to also use a predominant three-point strategy. Further, we also showed that is globally stable under the conditions in (23). That is, if these conditions hold, every team in the NBA will eventually adopt an offensive strategy predominantly consisting of two-point shots. The conditions in (24) were shown to imply that the point is globally stable. This means that if these conditions now hold, every team in the NBA will eventually adopt an offensive strategy predominantly consisting of three-point shots. We also provided through a careful stability analysis of the fixed points criteria for the local stability of strategies. For example, we showed that a predominant three-point strategy is locally stable if , while it is unstable if . In addition, a predominant two-point strategy was found to be locally stable when , and unstable when . There is also they key point of which one of these strategies has the highest probability of being executed. We know that

That is, the payoff to a team using strategy in a league with profile is proportional to the probability of this team using strategy . We therefore see that a team’s optimal strategy would be that for which they could maximize their payoff, that is, for which is a maximum, while keeping in mind the strategy of the other team, hence, the existence of Nash equilibria. **Hopefully, this work also shows that the concept that teams should attempt more three-point shots because a three-point shot is worth more than a two-point shot is a highly ambiguous statement. In actuality, one needs to analyze what offensive strategy is optimal which is constrained by a particular set of payoffs.**

Continuing the debate of the value of three-point shooting in today’s NBA, my article analyzing this issue from a mathematical perspective has now been published on the arXiv, check it out!

The purpose of this post is to demonstrate some very beautiful (I think!) mathematics that arises form Darwinian evolutionary theory. It is a real shame that most courses and discussions dealing with evolution **never** introduce any type of mathematical formalism which is very strange, since at the most fundamental levels, evolution must also be governed by quantum mechanics and electromagnetism, from which chemistry and biochemistry arise via top-down and bottom-up causation. See this article by George Ellis for more on the role of top-down causation in the universe and the hierarchy of physical matter. Indeed, my personal belief is that if some biologists and evolutionary biologists like Dawkins, Coyne, and others took the time to explain evolution with some modicum of mathematical formalism to properly describe the underlying mechanics instead of using it as an opportunity to attack religious people, the world would be a much better place, and the dialogue between science and religion would be much more smooth and intelligible.

In this post today, I will describe some formalism behind the phenomena of *prebiotic evolution. *It turns out that there has been a very good book by Claudius Gros and understanding evolution as a complex dynamical system (dynamical systems theory is my main area of research), and the interested reader should check out his book for more details on what follows below.

We can for simplicity consider a quasispecies as a system of macromolecules that have the ability to carry information, and consider the dynamics of the concentrations of the constituent molecules as the following dynamical system:

,

where are the concentrations of molecules, is the autocatalytic self-replication rate, and are mutation rates.

From this, we can consider the following catalytic reaction equations:

,

,

are the concentrations, are the autocatalytic growth rates, and are the transmolecular catalytic rates. We choose such that

.

Clearly:

,

that is, this quick calculation shows that the total concentration remains constant.

Let us consider now the case of homogeneous interactions such that

, , ,

which leads to

,

which becomes

.

This is a one-dimensional ODE with the following invariant submanifolds:

,

.

With homogeneous interactions, the concentrations with the largest growth rates will dominate, so there exists a such that where

,

.

The quantities and are determined via normalization conditions that give us a system of equations:

,

.

**For large , we obtain the approximation**

**,**

**which is the number of surviving species.**

Clearly, this is non-zero for a finite catalytic rate . This shows the formation of a hypercycle of molecules/quasispecies.

These computations clearly should be taken with a grain of salt. As pointed out in several sources, hypercycles describe closed systems, but, life exists in an open system driven by an energy flux. But, the interesting thing is, despite this, the very last calculation shows that there is clear division between molecules which can be considered as a type of primordial life-form separated by these molecules belonging to the environment.