## NCAA March Madness 2017 Predictions

Update: March 18, 2017: In a stunning upset, Wisconsin just beat Villanova. It is easy to see why this happened based on the factor relevance diagram below. To win games, Villanova has relied heavily on moving the ball, while Wisconsin has relied heavily on opposing assists! Wisconsin had a minor 5 assists in the whole game today, great defense by them.

Original Article: March 16, 2017

So, I’m a bit late this year with these, but, it’s only the first day of the tournament as I write this (teaching 2 courses in 1 semester tends to take up A LOT of one’s time!). Anyways, I tried to use Machine Learning methodologies such as neural networks to make predictions on who is going to win the NCAA tournament this year.

To do this, I trained a neural network model on the last 17 seasons of NCAA regular-season team data.

The first thing that I found was what are the most relevant predictor variables in a team’s NCAA championship success:

1. Free Throws Made : 99.99% relevance
2. Opponent Assists : 55.86% relevance
3. Opponent Field Goal Attempts : 31.44% relevance
4. Free Throws Attempted : -83.13% relevance
5. Opponent Field Goals Made: -69.2% relevance

It is interesting that the most important factor in deciding whether or not a team wins the NCAA tournament is actually free throw percentage. In other words, schools that have a knack for shooting a high free throw percentage seem to have the highest probability of winning the NCAA tournament. (Point 1 and Point 4 in the list above translates to having a high free throw percentage.) Obviously, with a neural network the relationship between these predictors and the output is not necessarily linear, so other factors could play a strong role as well.

The neural network structure used looked like this:

Now, for the results:

 School Name Probability of Winning Tournament Villanova 0.9294916774 Gonzaga 0.8076801 Baylor 0.716319 Arizona 0.5516670309 Duke 0.005617711 Saint Mary’s 0.0048923492 Wichita St. 0.001208123 Purdue 0.001180955 SMU 0.0008327729 North Carolina 0.0006080225 UCLA 0.0003794108 S. Dakota St. 0.0003186754 Oregon 0.0002288606 Princeton 0.0002107522 Wisconsin 0.000206285 Northwestern 0.0001878604 Cincinnati 0.0001875887 Marquette 0.0001828106 Virgnia 0.0001532999 Kent St. 0.0001353252 Miami 0.0001338989 Fla. Gulf Coast 0.0001308963 Vermont 0.0001288239 Notre Dame 0.0001278009 Minnesota 0.0001277032 New Mexico State 0.0001276369 USC 0.0001274456 Middle Tenn. 0.0001268802 Florida 0.0001265646 Texas Southern 0.0001265547 Xavier 0.0001264269 Vanderbilt 0.0001262982 Michigan 0.0001261976 East Tenn. St. 0.0001261878 Nevada 0.0001261331 Butler 0.0001260504 Louisville 0.0001260042 Troy 0.0001259668 Dayton 0.0001259567 Arkansas 0.0001259387 Michigan St. 0.0001259298 Oklahoma St. 0.0001259287 Winthrop 0.0001259213 Iona 0.0001259197 Jacksonville St. 0.0001259174 Creighton 0.0001259092 West Virginia 0.0001259032 North Carolin-Wilmington 0.0001259012 Northern Ky. 0.0001259000 Kansas 0.0001258950 Iowa St 0.0001258950 Bucknell 0.0001258945 Florida St 0.0001258939 Kentucky 0.0001258939 Virginia Tech 0.0001258938 Seton Hall 0.0001258937 Maryland 0.0001258936 North Dakota 0.0001258936 South Carolina 0.0001258935 Rhode Island 0.0001258934 Kansas St. 0.0001258933 Mount St. Mary’s 0.0001258932 VCU 0.0001258931 UC Davis 0.0001258929

This neural network model predicts that the team with the highest probability of winning the NCAA tournament this year is Villanova with a 92.94% chance of winning, followed by Gonzaga with a 80.77% chance of winning, Baylor with a 71.63% chance of winning, and Arizona with a 55.16% chance of winning.

## So, What’s Wrong with the Knicks?

As I write this post, the Knicks are currently 12th in the Eastern conference with a record of 22-32. A plethora of people are offering the opinions on what is wrong with the Knicks, and of course, most of it being from ESPN and the New York media, most of it is incorrect/useless, here are some examples:

A while ago, I wrote this paper based on statistical learning that shows the common characteristics for NBA playoff teams. Basically, I obtained the following important result:

This classification tree shows along with arguments in the paper, that while the most important factor in teams making the playoffs tends to be the opponent number of assists per game, there are paths to the playoffs where teams are not necessarily strong in this area. Specifically, for the Knicks, as of today, we see that:

opp. Assists / game : 22.4 > 20. 75, STL / game: 7. 2 < 8.0061, TOV / game : 14.1 < 14.1585, DRB / game: 33.8 > 29.9024, opp. TOV / game: 13.0 < 13.1585.

So, one sees that what is keeping the Knicks out of the playoffs is specifically pressure defense, in that, they are not forcing enough turnovers per game. Ironically, they are very close to the threshold, but, it is not enough.

A probability density approximation of the Knicks’ Opp. TOV/G is as follows:

This PDF has the approximate functional form:

P(oTOV) =

Therefore, by computing:

$\int_{A}^{\infty} P(oTOV) d(oTOV)$,

=

,

where Erfc is the complementary error function, and is given by:

$erfc(z) = \frac{2}{\sqrt{\pi}} \int_{z}^{\infty} e^{-t^2} dt$

Given that the threshold for playoff-bound teams is more than 13.1585 opp. TOV/game, setting A = 13 above, we obtain: 0.435. This means that the Knicks have roughly a 43.5% chance of forcing more than 13 TOV in any single game. Similarly, setting A = 14, one obtains: 0.3177. This means that the Knicks have roughly a 31.77% chance of forcing more than 14 TOV in any single game, and so forth.

Therefore, one concludes that while the Knicks problems are defensive-oriented, it is specifically related to pressure defense and forcing turnovers.

By: Dr. Ikjyot Singh Kohli, About the Author

## Some Thoughts on The US GDP

Here are some thoughts on the US GDP based on some data I’ve been looking at recently, mostly motivated by some Donald Trump supporters that have been criticizing President Obama’s record on the GDP and the economy.

First, analyzing the real GDP’s average growth per year, we obtain that (based on a least squares regression analysis)

According to these calculations, President Clinton’s economic policies led to the best average GDP growth rate at $436 Billion / year. President Reagan and President Obama have almost identical average GDP growth rates in the neighbourhood of$320 Billion / year. However, an obvious caveat is that President Obama’s GDP record is still missing two years of data, so I will need to revisit these calculations in two years! Also, it should be noted that, historically, the US GDP has grown at an average of about \$184 Billion / year.

The second point I wanted to address is several Trump supporters who keep comparing the average real GDP annual percentage change between President Reagan and President Obama. Although they are citing the averages, they are not mentioning the standard deviations! Computing these we find that:

Looking at these calculations, we find that Presidents Clinton and Obama had the most stable growth in year-to-year real GDP %. Presidents Bush and Reagan had highly unstable GDP growth, with President Bush’s being far worse than President Reagan’s. Further, Trump supporters and most Republicans seem quick to point out the mean of 3.637% figure associated with President Reagan, but the point is this is +/- 2.55%, which indicates high volatility in the GDP under President Reagan, which has not been the case under President Obama.

Another observation I would like to point out is that very few people have been mentioning the fact that the annual real US GDP % is in fact correlated to that of other countries. Based on data from the World Bank, one can compute the following correlations:

One sees that the correlation between the annual growth % of the US real GDP and Canada is 0.826, while for Estonia and The UK is roughly close to 0.7. Therefore, evidently, any President that claims that his policies will increase the GDP, is not being truthful, since, it is quite likely that these numbers also depend on those for other countries, which, I am not entirely  convinced a US President has complete control over!

My final observation is with respect to the quarterly GDP numbers. There are some articles that I have seen in recent days in addition to several television segments in which Trump supporters are continuously citing how better Reagan’s quarterly GDP numbers were compared to Obama’s. We now show that in actuality this is not the case.

The problem is that most of the “analysts” are just looking at the raw data, which on its face value actually doesn’t tell you much, since, as expected, fluctuates. Below, we analyze the quarterly GDP% data during the tenure of both Presidents Reagan and Obama, from 1982-1988 and 2010-2016 respectively, comparing data from the same length of time.

For Reagan, we obtain:

For Obama, we obtain:

The only way to reasonably compare these two data sets is to analyze the rate at which the GDP % has increased in time. Since the data is nonlinear in time, this means we must calculate the derivatives at instants of time / each quarter. We first performed cubic spline interpolation to fit curves to these data sets, which gave extremely good results:

We then numerically computed the derivative of these curves at each quarter and obtained:

The dashed curves in the above plot are plots of the derivatives of each curve at each quarter. In terms of numbers, these were found to be:

Summarizing the table above in graphical format, we obtain:

As can be calculated easily, Obama has higher GDP quarterly growth numbers for 15/26 (57.69%) quarters. Therefore, even looking at the quarterly real GDP numbers, overall, President Obama outperforms President Reagan.

Thanks to Hargun Singh Kohli, B.A. Honours, LL.B. for the data collection and processing part of this analysis.

## Stephen Curry and Mahmoud Abdul-Rauf?

As usual, Phil Jackson made another interesting tweet today:

And, as usual received many criticisms from “Experts”, who just looked at the raw numbers from each players, and saw that there is just no way such a statement is justified, but it is not that simple!

When you compare two players (or two objects) who have very different data feature values, it is not that they can’t be compared, you must effectively normalize the data somehow to make the sets comparable.

In this case, I used the data from Basketball-Reference.com to compare Chris Jackson’s 6 seasons in Denver to Stephen Curry’s last 6 seasons (including this one) and took into account 45 different statistical measures, and came up with the following correlation matrix/similarity matrix plot:

Dark blue circles indicate a strong correlation, while dark red circles indicate a weak correlation between two sets of features.

What would be of interest in an analysis like this is to examine the diagonal of this matrix, which offers a direct comparison between the two players:

One can see that there are many features that have strong correlation coefficients.

Therefore, it is true that Stephen Curry and Chris Jackson do in fact share many strong similarities!

## New Paper on Stochastic Eternal Inflation

Our new paper was accepted for publication in Physical Review D. The goal of the paper was to calculate the probability that a multiverse could emerge from a more general background spacetime, in this case, Bianchi Type I coupled to a chaotic inflaton potential. Basically, we found that a multiverse being generated from such a scenario has a small probability of occurring. Further, the fine-tuning problem that the multiverse / eternal inflation is supposed to solve doesn’t actually occur, because fine-tuning is still required of the geometry of the background spacetime, the initial conditions, and most importantly, the amount of anisotropy.

The preprint can be read on the arXiv here.

## Breaking Down the Knicks’ Season

Like many of my fellow Knicks fans, I am in an absolute state of shock and disappointment as the Knicks are currently 5-29 to start the new year! Many analysts from the standard outlets, ESPN, Yahoo! sports, etc… have given their share of reasons why the Knicks are playing the way they are. Being a mathematical physicist and data scientist, I decided to see if one could deduce any useful information from how the Knicks have been playing to see what is the true reason why they are losing all of these games. Here is what I found. Based on the data available at Basketball-Reference.com,  I designed an algorithm in R to go through each game, and fit regression trees (Here is a link to more on regression trees if you are unfamiliar with the concept) and found the following:

1. The number of points the Knicks score per game:

From this regression tree, we see that if the Knicks for example make less than 33.5 FG’s in a game, and have a 3-Point shooting percentage of less than 0.309, they will be expected to score no more than 79 points in a game. On the other hand, if they make more than 38.5 FG’s in a game, and also attempt more than 19 free throws in a game, they can be expected to score more than 111 points in a game.

2. The number of points the Knicks’ opponents score per game:

From this regression tree, note that first “Tm” denotes how many points the Knicks score in a game. We see that for example, if the Knicks have less than 28 defensive rebounds in a game, also score less than 98 points in a game, and have fewer than 4-5 blocks in a game, their opponents will slightly outscore them, and win the game. In fact, if the Knicks manage to get less than 28-29 defensive rebounds per game, and score less than 98 points in a game, they will be expected to lose every game they play! Now, let’s say, the Knicks do manage to get more than 28 defensive rebounds in a game, if they still only manage to score less than 89 points in a game, they are still almost guaranteed to lose as well.

Although, many analysts have probably pointed these things out, the conclusion one draws from these regression tree analyses, is that the Knicks have a significant problem with defensive rebounding, as that seems to be the number one factor in them not winning games. Further, they also have a significant problem with how many points they score per game, which is a direct result of this Knicks team still not running their offense correctly.

Would Tyson Chandler have made a difference? As the above analyses show, no single factor determines whether the Knicks win games or not. It is reasonable to assume that if Tyson Chandler was on the team, then, the Knicks would get more than 28-29 defensive rebounds in a game. But, according to the above analyses, and the right of the previous regression team, if they still as a team would attempt more than 78-79 field goals, they would still be expected to lose every game. The question then remains would Tyson Chandler’s presence increase the Knicks’ offensive efficiency? In principle, according to his career FG% stats, I would say yes. According to Basketball-Reference.com, Tyson Chandler had a FG% of 0.638 while in New York, and for his career has a FG% of 0.588, which is quite high for NBA standards. It is quite reasonable to assume therefore, that the Knicks would have considerably less FGA’s (certainly less than 78-79) in a game, and their opponents would be held to around 91.0 points per game. One would conclude that from a statistical perspective, trading away Tyson Chandler was perhaps a mistake and had an overall negative impact on the team’s performance both defensively and offensively.