Overreacting to your NFL Quarterback? Some (non-)correlation visuals for QBs
Overreaction Mondays for the NFL is more intense than any overreactions I've witnessed. Overreactions to Game of Thrones episodes? Doesn't even compare. Overreactions to celebrity gossip? Not even in the same universe. NFL overreactions -- mostly based around quarterback play -- are beyond wild. I'm here to hopefully quell overreactions via data visualization.
No matter what statistic you look at, there is little-to-no correlation when you compare Week 1 performances to the rest of the season
I'll be focusing on yards and TDs in the main post, but at the bottom you can see all of the quarterback graphs I made.
A little about the data: this is week 1 from last season compared to the full season averages for each week 1 QB. You can find these data at ProFootballReference.
As I discussed in a previous fantasy football post, two of the most important statistics for a QB are yards per game and touchdowns per game. In the above graph, you can see a bar chart version of the data and a scatter plot version of the data. The bar chart is what many casual football statisticians look at when comparing a player's data to another player's data, or maybe looking within the same player and comparing different weeks, or whatever you want to compare. This presents a problem though -- when you are isolating -- or "cherry picking" -- your data, are you really looking at representative data (aka data that generalizes to your hypothesis) or are you looking at at non-representative data? The scatter plot and lines-of-best-fit can help generalize what these yards represent on a larger view. I've plotted (for visual aesthetic) week 1 performers from worst to best (in blue). This line of fit unsurprisingly has a positive slope, indicating these data trend from low to high. But when I plot the full season averages for yards alongside week 1 performances, the slope is not as positive. Rather, the slope looks relatively neutral. If week 1 data were to correlate with full season data, both of these lines should be trending with a positive slope at a similar steepness. Since that is not the case here, this means week 1 data is not necessarily correlative, or even predictive, of what occurs across a full season.
Simply put, week 1 data is not enough to show much of anything. In other words, overreacting to week 1 is really wasted energy.
Here's another graph, this time of touchdowns from week 1 (last season). Seven quarterbacks threw zero touchdowns -- including both Manning brothers and the "elite" Joe Flacco. Their full season averages clearly indicate they 1-2 TDs per game throughout the season. On the other side, Alex Smith threw 3 TDs in the first week last season, and Marcus Mariota threw 4 TDs. Did they sustain that average? Absolutely not! Both QBs averaged 1 - 1.5 TDs per game the rest of the season.
Using the lines of best fit, this is not very different from how the graph on yards looked. Even when organized from worst-to-best for week 1, the full season data does not correlate.
If week 1 doesn't matter, then how many weeks does matter?
The amount of data someone should use as an indicator for predictive value is unfortunately unavailable in the current NFL format. There are only 16 games per season. In essence, that's only 16 data points (assuming the QB you are looking at plays all 16 games). So for any player, there is not a lot of data (or in statistics land, "power") to generate a reliable predictive model. Even if you took all quarterbacks in the league -- realistically, 32 starting quarterbacks (with maybe another 3 QBs who end up replacing a starting QB) -- that's still only 516 data points per game (32 QBs times 16 games). Let's just pretend that's enough to generate a model of data -- that would still only be describing the average quarterback, not any specific quarterback. You wouldn't be able to take a model of the average QB and say "Oh, this is what [insert any quarterback here] will perform like on Sunday".
This is actually a logical fallacy known as hasty generalizations -- also known as the law of small numbers. For example, say you come into a town and the first three cars you see are Ferrari's. You may come to the assumption that the people in this town are very rich (or perhaps all drive Ferrari's or other expensive cars). But three random cars aren't enough data to make a generalization about an entire town of thousands or tens of thousands or hundreds of thousands. This is the same with sports data. One week is not enough information to make a generalization about an entire season.
Other QB graphs from 2015
Note: Adjusted yards per attempt is a ProFootballFocus statistic. It is calculated using this equation