Predicting Teams’ Points against Opponents

Let’s think of a scenario. Team A is a great offensive team and has averaged 110 points against its opponents. But Team B is a great defensive team and has allowed only 90 points on average to its opponents. Given this information, can we predict how Team A will score against Team B?

This is a question that I pondered many, many times. For this analysis, I used NBA regular season games data since the 1995-1996 season.

First, I wanted to see which variables correlated how well teams scored. Instead of focusing on the total number of points scored, I decided to focus on points scored within 4 quarters that excluded points scored in overtime.

I asked myself, “What are some variables that I can use to predict the next game’s points?”

I figured how well a team scored in the past would be a good indicator of how the team would score in the future. The following came to mind:

  • 5-games rolling average of previous regular-quarter points scored (rqP_rollmean5_gen)
  • 10-games rolling average of previous regular-quarter points scored (rqP_rollmean10_gen)
  • Cumulative average of previous regular-quarter points scored (rqP_cummean_gen)

It also occurred to me that how well a team scores would also depend on the strength of its opponent’s defense. Hence, I looked at the following variables as well:

  • opponent’s 5-games rolling average of previous regular-quarter points allowed (o_rqPA_rollmean5_gen)
  • opponent’s 10-games rolling average of previous regular-quarter points allowed (o_rqPA_rollmean10_gen)
  • opponent’s cumulative average of previous regular-quarter points allowed (o_rqPA_cummean_gen)

Then, it seemed logical to think that how well a team might score would depend on both the team’s offensive strength and the opponent’s defensive strength. And hence the following variables:

  • (rqP_rollmean5_gen + o_rqPA_rollmean5_gen) / 2
  • (rqP_rollmean10_gen + o_rqPA_rollmean10_gen) / 2
  • (rqP_cummean_gen + o_rqPA_cummean_gen) / 2

I measured how regular-quarter points correlated with the variables above.

Metric Type Variable Correlation Coefficient Mean Absolute Error (MAE)
Previous Offensive Performance rqP_rollmean5_gen 0.328 9.374
rqP_rollmean10_gen 0.358 9.047
rqP_cummean_gen* 0.374 8.895
Previous Defensive Performance by Opponent o_rqPA_rollmean5_gen 0.328 9.377
o_rqPA_rollmean10_gen 0.365 9.015
o_rqPA_cummean_gen* 0.392 8.808
Averaging Method \begin{aligned} \frac{\text{rqP\_rollmean5\_gen} + \text{o\_rqPA\_rollmean5\_gen}}{2} \end{aligned} 0.424 8.65
\begin{aligned} \frac{\text{rqP\_rollmean10\_gen} + \text{o\_rqPA\_rollmean10\_gen}}{2} \end{aligned} 0.461 8.487
\begin{aligned} \frac{\text{rqP\_cummean\_gen} + \text{o\_rqPA\_cummean\_gen}}{2} \end{aligned} 0.474 8.496

* Performance metrics for cumulative means were considered where there were at least 10 observations.

You needn’t worry if you don’t have a statistical background. A correlation coefficient can take on a value anywhere from -1 to 1, and is used to indicate how strong a relationship is between two variables. A correlation coefficient close to 1 or to -1 suggests a stronger relationship between two variables.

On the other hand, mean absolute error (MAE) is simply an “average error.” A low mean average error means the predicted value is close to the actual value observed. For example, if our regular-quarter points predictions have a mean average error of 10, that means our predictions are off from the actual regular-quarter points, on average, by 10 points. We want to lower our mean average error

We notice a few patterns here.

  • Predicting based on rolling averages of the past 10 games performed better than predictions based on rolling averages of the past 5 games.
  • Predicting based on cumulative averages of all past games in the season also performed better than predictions based on rolling averages of the past 5 games.
  • Of the three metric types, the averaging method (in which we averaged team’s projected points and opponent’s projected points allowed) performed the best, as can be seen by the highest correlation coefficients and lowest mean absolute errors.

In the next blog post, I will attempt to improve our prediction by:

  • adjusting our projections for home/away differences
  • using site-specific rolling averages and cumulative averages

You May Also Like

About the Author: Howard Song

I’m a data practitioner by day, a web developer by night, a semi-competent swimmer, an active basketball player, a collector of cool ideas, an aspiring entrepreneur, a college dropout but a lifelong learner, and a self-professed nice guy. I love all things basketball, data, programming, and entrepreneurship.

Leave a Reply

Your email address will not be published. Required fields are marked *