Is Vegas Beatable?

Harrison Gu
3 min readOct 22, 2021

Overview

As the sports betting industry is gaining steam, I am interested in selling NBA spread picks to sports bettors via subscription to my service. I will use regression models to predict outcomes of NBA games. My goal is to make a prediction on the spreads (point differential) of each game, and use that information to bet against the Vegas spread. Because Vegas typically takes a 10% rake for each bet, I have to be able to beat Vegas 52.5% of the time in order to be profitable.

The Data

My data was collected via scraping, using Beautiful Soup, basketball-reference.com and sportsbookreviewonline.com, using data from all regular season games from 2011–2020, which includes 11,656 games. The way the NBA game is played has changed dramatically since the early 2000s, in that 3 point shooting has become a much more critical part of the game. In order to train an accurate model for today’s game, I made the decision to not include games from the 2000s.

Modeling

For my first round of modeling, I casted a wide net, running 8 different machine learning regression baseline models. From there, I chose the 3 best performing models, determined by lowest RMSE while keeping training and testing RMSEs as close as possible. These models were linear regression, random forest, and gradient boost. From there, I tuned each model using Gridsearch to determine optimal parameters. All 3 of my Gridsearch models topped out at 61.5%-62.5% accuracy against the Vegas spread, leading me to believe that this could be due to the true randomness of sports in general.

Additionally, I also ran a neural networks regression model, and tuned parameters using Talos. Again, the model topped out around 62.3%.

In an effort to increase prediction accuracy, I tried to create ensemble models. My method was to only make predictions when all 4 of the regression models agreed on the same side of the bet. This increased my accuracy to 69.1%, and included 62.28% of games. While this is a dramatic increase in accuracy, I would like to be able to make predictions on a higher percentage of games. Diving further, I found that when only 3 models agreed with each other, the accuracy was between 51.8%-54.1% depending on which model disagreed. Because of the 52.5% threshold to be profitable, I cannot use predictions when the random forest disagrees with the others, as those predictions are only 51.8% accurate. However, when linear regression, gradient boost, or neural networks disagree, the predictions are still useable. This increases the percentage of useable predictions to 81.5% of all games!

Analysis

The next trend that I explored was the absolute difference between the predicted spreads and Vegas’ spread. I found that the bigger the discrepancy, the more accurate the prediction was. This pattern will help me rank predictions based on how likely they are to hit, which is important to bettors, as it allows them to size their bets accordingly. Interestingly, when my model is more than 10 points away from the Vegas spread (0.5% of games), it predicts games at >90% accuracy.

I believe that this could be due to Vegas’ business model. Their goal is to set a line such that exactly 50% of bettors take each side, and just collect the 10% juice with 0 risk. If the public opinion differs from the “true” spread, Vegas will move their line away from their original prediction, closer to the public average. When the public opinion is vastly different from the “true” spread, my model will be able to pick the correct side more frequently.

Final Outcome

My final model will be able to make useful predictions on 81.5% of games, at a weighted average accuracy of 65.2% of games. For the remaining 18.5% of games, I will refrain from making any predictions. This yields an expected value of 0.239, meaning for every $1 bet using my model, the bettor is expected to make ~$0.24. Based on my research, the average sports bettor bets $216 per month. This means that my model gives the average sports bettor $51.84 of value per month. Because my target audience is serious bettors, I will price the monthly subscription at $50/month.

--

--