This is a riddle/challenge that I’m posting for the readers. I cannot figure this out on my own and I’m requesting your help.
I plotted these three charts when I was analyzing some basketball trends over the years:
- Average number of league-wide team points by season
- Average number of league-wide (estimated) team possessions by season
- Average league-wide team field goal percentage by season
The three charts are shown below.
When I plotted the charts, I was surprised at how similar the contours of the plots were. I had to check again to make sure that I wasn’t plotting the same chart three times.
It can be seen that the three variables (league-wide average of team points scored, team possessions, and team field goal percentage) are highly correlated with each other throughout the years. A simple correlation analysis confirmed the strength of relationships amongst the variables.
|League-wide Average Team Points||League-wide Average Team Possessions||League-wide Average Team FGP|
|League-wide Average Team Points||1|
|League-wide Average Team Possessions||0.959||1|
|League-wide Average Team FGP||0.94||0.866||1|
To a degree, this made sense.
If teams record higher field goal percentages, then they’re likely to score more. So the correlation between the league-wide average of team field goal percentage and that of team points scored made sense.
Also, if teams record higher numbers of possessions, then they’re likely to score more. So the correlation between the league-wide average of team possessions and that of team points scored also made sense.
What struck me odd was the strong relationship between the league-wide average of team field goal percentage and that of team possessions, because I was unable to find the same strong relationship in the non-aggregate, raw games data.
This is the correlation matrix when I used raw games data (instead of using league-wide averages) to examine the strength of relationships among the three variables:
|Team Points||Team Possessions||Team FGP|
We still see relatively strong correlations between:
- Team Points vs. Team Possessions (r = 0.548)
- Team Points vs. Team Field Goal Percentage (r = 0.696)
However, notice the weak relationship between:
- Team Possessions vs. Team Field Goal Percentage (0.076)
In league-wide averages, team field goal percentage and team possessions had a much stronger relationship (r = 0.866).
So my question is this: Why is there a strong relationship between the league-wide average of team possessions and team FGP but there isn’t a strong relationship between the team possessions and team FGP of actual games? Does averaging strengthen correlations between variables by removing variance? Does averaging amplify the correlation strength?
I’d love to know. Please enlighten me.