Coronavirus Predictions

By: Dr. Ikjyot Singh Kohli

I wrote an extensive script in R that takes the most recent data available for the number of new/confirmed COVID-19 cases per day by location and computes the probability using statistical learning that a selected location will observe a new COVID-19 case. You can access the dashboard by clicking the image below: (Beneath the screenshot are further examples of possible selections.)

Here, we see a map of all current COVID-19 locations, and an ability to select a specific location. Further, there are two calculations at the bottom of the screen: the first is the selected location(s) probability of observing a new case, the second is the current long-term trend of the daily growth rate of new cases for the selected location(s).
In this example, we have asked to return locations within the US that have more than an 87% probability of observing a new case. We can also see that for these locations, the long-term growth rate is trending towards 0.80.

Did Clyburn Help Biden in South Carolina?

By: Dr. Ikjyot Singh Kohli

The conventional wisdom by the political pundits/analysts who are seeking to explain Joe Biden’s massive win in the 2020 South Carolina primary is that Jim Clyburn’s endorsement was the sole reason why Biden won. (Here is just one article describing this.)

I wanted to analyze the data behind this and actually measure the effect of the Clyburn effect. Clyburn formally endorsed Biden on February 26, 2020.

Using extensive polling data from RealClearPolitics, I looked at Biden’s margin of victory according to various polling samples before the Clyburn endorsement. I used Kernel Density Estimation to form the following probability density function of Biden’s predicted margin of victory (as a percentage/popular vote) in the 2020 South Carolina Primary:

Assuming this probability density function has the form p(x), we notice some interesting properties:

  • The Expected Margin of Victory for Biden is given by: \int x p(x) dx. Using numerical integration, we find that this is \int x p(x) dx = 18.513 \%. The error in this prediction is given by var(x) = \int x^2 p(x) dx - (\int x p(x) dx)^2 = 107.79. This means that the predicted Biden margin of victory is 18.51 \pm 10.382. Clearly, the higher bound of this prediction is 28.89%. That is, according to the data before Clyburn’s endorsement, it was perfectly reasonable to expect that Biden’s victory in South Carolina could have been around 29%. Indeed, Biden’s final margin of victory in South Carolina was 28.5%, which is within the prediction margin. Therefore, it seems it is unlikely Jim Clyburn’s endorsement boosted Biden’s victory in South Carolina.
  • Given the density function above, we can make some more interesting calculations:
  • P(Biden win > 5%) = 1 - \int_{-\infty}^{5} f(x) dx = 0.904 = 90.4%
  • P(Biden win > 10%) = 1 - \int_{-\infty}^{10} f(x) dx = 0.799 = 79.9%
  • P(Biden win > 15%) = 1 - \int_{-\infty}^{15} f(x) dx = 0.710 = 71.0%
  • P(Biden win > 20%) = 1 - \int_{-\infty}^{20} f(x) dx = 0.567 = 56.7%

What these calculations show is that the probability that Biden would have won by more than 5% before Clyburn’s endorsement was 90.4%. The probability that Biden would have won by more than 10% before Clyburn’s endorsement was 79.9%. The probability that Biden would have won by more than 20% before Clyburn’s endorsement was 56.7%, and so on.

Given these calculations, it actually seems unlikely that Clyburn’s endorsement made a huge impact on Biden’s win in South Carolina. This analysis shows that Biden would have likely won by more 15%-20% regardless.

A Problem With Offensive Rating

Abstract: It is shown that the standard/common definition of team offensive rating/offensive efficiency implies that a team’s offensive rating increases as its opponent’s offensive rebounds increase, which, in principle, should not be the case.

Over the past number of years, the advanced metric known as Offensive Rating has become the standard way of measuring a basketball team’s offensive efficiency. Broadly speaking, it is defined as points scored per 100 possessions. Specifically, for teams, it is defined as (See: https://www.basketball-reference.com/about/ratings.html and https://www.nbastuffer.com/analytics101/possession/ AND https://fansided.com/2015/12/21/nylon-calculus-101-possessions/):

ortg_eqn copy

There is a significant issue with this definition as I now demonstrate. Let us compute the partial derivative of this expression with respect to OppORB, we easily obtain:

partial2

As the denominator is always positive, we would like to examine the numerator. The numerator is always negative due to physical constraints (i.e., can’t have negative points or rebounds!) and if OppFG < OppFGA, which makes intuitive sense. It is only positive if OppFG > OppFGA, which logically cannot happen. Therefore, this numerator is always negative (except for the rare case when OppFG = OppFGA of course), which means that the entire partial derivative is positive.

This means that a team’s offensive rating / offensive efficiency increases as it’s opponent’s offensive rebounds increase. Intuitively, this shouldn’t be the case. If your opponent has a high number of offensive rebounds, this should give you less possessions, and put pressure on you to score, thus resulting in less points overall. The problem is that the more general definition of offensive efficiency is 100*(Points Scored)/(Possessions), which is obviously maximized when possessions is minimized. The problem of course, is that the more detailed definition of possessions implies that this minimization of possessions occurs at the cost of maximizing opponent offensive rebounds, which intuitively should not be the case.

NBA Analytics Dashboard

Here is an embedded dashboard that shows a number of statistical insights for NBA teams, their opponents, and individual players as well. You can compare multiple teams and players. Navigate through  the different pages by clicking through the scrolling arrow below. (The data is based on the most recent season “per-game” numbers.)

(If you cannot see the dashboard embedded below for whatever reason, click here to be taken directly to the dashboard in a separate page.)

The Probability of An Illegal Immigrant Committing a Crime In The United States

Trump has once again put The U.S. on the world stage this time at the expense of innocent children whose families are seeking asylum. The Trump administration’s justification is that:

 

“They want to have illegal immigrants pouring into our country, bringing with them crime, tremendous amounts of crime.”

 

I decided to try to analyze this statement quantitatively. Indeed, one can calculate the probability that an illegal immigrant will commit a crime within The United States as follows. Let us denote crime (or criminal) by C, while denoting illegal immigrant by ii. Then, by Bayes’ theorem, we have:

\boxed{P(C | ii) = \frac{P(ii | C) P(c)}{P(ii)}}

It is quite easy to find data associated with the various factors in this formula. For example, one finds that

  1. P(ii |c) = 0.21
  2. P(c) = 0.02
  3. P(ii) = 0.037

Putting all of this together, we find that:

P(C|ii) = 0.1135 = 11.35 \%

That is, the probability that an illegal immigrant will commit a crime (of any type) while in The United States is a very low 11.35%.

 

Therefore, Trump’s claim of “tremendous amounts of crime” being brought to The United States by illegal immigrants is incorrect.

 

Note that, the numerical factors used above were obtained from:

  1. https://www.justice.gov/opa/pr/departments-justice-and-homeland-security-release-data-incarcerated-aliens-94-percent-all
  2. https://www.washingtontimes.com/news/2017/aug/1/immigrants-22-percent-federal-prison-population/
  3. https://en.wikipedia.org/wiki/Incarceration_in_the_United_States

 

 

 

The Risk of The 3-Point Shot

As more and more teams are increasing the number of threes they attempt based on some misplaced logical fallacy that this somehow leads to an efficient offense, we show below that it is in fact in a team’s opponent’s interest for a team to attempt as many three point shots as possible.

Looking at this season’s data, let us examine two things. The first thing is the number of points a team’s opponent is expected to score for every three-point shot the other team attempts. We discovered that remarkably, the number of points obeys a lognormal distribution:

\boxed{P(X) = \frac{2.86089 e^{-25.713 (\log (X)-1.3119)^2}}{X}}

This means that for every three point shot your team attempts, the opposing team is expected to score

\boxed{\int X P(X) dX = 1.87475\, -1.87475 \text{erf}(6.75099\, -5.0708 \log (X))}

which comes out to about 3.7495 points. So, for every 3PA by a team, the opponent is expected to score more than 3 points based on the most recent NBA data. Keeping that in mind, we see also by integrating P(x) above that there is a 99.99% probability that the opponent will score more than 2 points for every 3PA by a team, and a 93.693% probability that the opponent will score more than 3 points for every single 3PA by the other team.

This would suggest a significant breakdown of defensive emphasis in the “modern-day” NBA where evidently teams are just interested in playing shot-for-shot basketball, but in a very risky way that is not optimal.

The work so far covered just three-point attempts, but, what are the effects of missing a three-point shot? The number of opponent points per a three-point miss also remarkably obeys a lognormal distribution:

\boxed{P(X) = \frac{2.81227 e^{-24.8464 (\log (X)-1.7605)^2}}{X}}

Therefore, for every three-point shot your team misses, the opposing team is expected to score:

\boxed{\int X P(X) dX = 2.93707\, -2.93707 \text{erf}(8.87571\, -4.98461 \log (X))}

which comes out to about 5.87345 points. This identifies a remarkable risk to a team missing a three-point shot. This computation shows that one three-point shot miss corresponds to about 6 points for the opposing team! Looking at probabilities by integrating the density function above, one can show that there is a 99.9999% probability that the opposing team would score more than two points for every three-point miss, a 99.998% probability that the opposing team would score more than three points for every three-point miss, a 99.583% probability that the opposing team would score more than four points for every three-point miss, and so on.

What these calculations demonstrate is that gearing a team’s offense to focus on attempting three-point shots is remarkably risky, especially if a team misses a three-point shot. Given that the average number of three-point attempts is increasing over the last number of years, but the average number of makes has relatively stayed the same (See this older article here: https://relativitydigest.com/2016/05/26/the-three-point-shot-myth-continued/), teams are exposing themselves to greater and greater risk of losing games by adopting this style of play.

 

 

 

New Article Published in Journal of Geometry and Physics

Our new article was recently published in The Journal of Geometry and Physics. It is shown that under certain conditions, The Einstein Field Equations have the same form as a fold bifurcation seen in Dynamical Systems theory, showing even a deeper connection between General Relativity and Dynamical Systems theory! (You can click the image below to be taken to the article):