#DarwinsBalls, Bracketology and some half-cocked statistics.
Before you read, head over to Gerty-Z's blog post about #DarwinsBalls and see how you can help contribute to science in classrooms. I am a poor graduate student and I've donated $20 total thus far. I know you can too.
I love sports. Maybe more so, I love sports stats. They astound me. How people can be so involved with numbers and the beauty of statistics applied... it's awesome. I thank one calculus professor I had, Tom Falbo at Santa Rosa Junior College, who always jabbered about baseball and how baseball and stats are married. It wasn't until years after his class I found that out.
The place I like tinkering with stats the most is definitely the NCAA Division 1 Men's Basketball Championship Tournament, or as it's known in most places: March Madness. It's a handful of days (mostly weekends) in March where the best college basketball teams in the nation compete. It's also a testament to inter-conference play and how being the best in one conference (e.g. SEC, Big 10, PAC-12, etc.) might not mean you're the best in the nation.
The tournament itself consists of 64 teams in a single-elimination tournament. If you win, you play on, if you lose you are eliminated from the tournament. Many people try to choose the best teams out of these 64, usually called "bracketing" or putting together your bracket (the tournament "bracket" refers to how the teams are matched up to each other and how teams advance to play subsequent winners). Most brackets are guess-work or subjective choosing, but I have taken to the statistics of each team and put together two analyses to make my analysis this year. Every year I tweak my formula a little. These are my 2014 formulas:
2014 Offensive Analysis
First, we have to set a limit to who we will include into the stats. In a game, only ~9 players per team rotate in to the game. On a typical college basketball roster, there are ~14 players on a team, ranging from 11 players to as many as 18 players. In most cases, the 9 players who score the most points account for >90% of the team statistics. So, all of my team stats are based off of the top 9 players based upon contribution in points over the season.
Before I continue, my stats all come from Sports Reference: College Basketball. It's the go-to place for free stats.
This year, my offensive analysis relies on minutes played (MP), shots (aka field goals) attempted (FGA) and total points made over the season (PTS). My theory was if a team is shooting the ball a lot and taking up a lot of time, and converting those shots to points, then the team with the best points per attempt over time should be the best offensive team.
Please don't laugh at me, real statisticians.
To create the team average:
Average FGA across the top 9 point-scorers on the team.
Average PTS across the same.
First, we create a denominator that includes FGA and MP. I suppose my logic here is I wanted a ratio of attempts per minute (APM). And that's exactly what we get:
Now we divide PTS by APM. I assumed this would give us a number representing points gained depending on your attempts per minute. Thus, the larger number a team had means they had more quality possessions over time (either more points or more time and less attempts). Here's the 2014 offensive analysis:
2014 Defensive Analysis
Last year, I only used an offensive analysis with bits and pieces of defensive stats to create a single number to base my bracket on. I didn't crack the top 10 of my bracket pool. I figured it was because I wasn't factoring in defense heavily enough.
My defensive statistic is much more complete than my offensive statistic. It includes (these are all season totals):
Minutes played (MP)
Total rebounds (TRB)
Personal fouls (PF)
The object on defense is to stop the other team from scoring while also limiting your own fouls or time on defense.
Again, don't laugh at me, real mathematicians. I'm just a noob playing with numbers.
The numerator I create is based on total defense (TOTDEF). I divide TRB, STL and BLK by MP and add this number together.
The concept here is that the more you take the ball from the other team (TRB and STL) or the more you limit the team from scoring (BLK) then the less opportunities the other team has to score. I make this dependent on time because players may play minimal amount of time but contribute a lot defensively. We see that's the case for "situational" players, who come in when defensive is needed over offense but play less minutes because they are not a large offensive threat.
The denominator I create involves TOV and PF, each divided by MP and summed together. This is to create a sort of weakness statistic (WEAK).
The concept in this stat is defense is dependent on your offense maintaining possession. Turnovers and fouls generally hurt you in the long run -- especially in close games in the second half. You hurt your defense overall when giving the ball back to the other team. So I make TOTDEF dependent on WEAK, because the better your team is at defense and also at maintaining possession then the better opportunities you make for team offensively.
Deciding what stat is the better indicator
I can't really say for sure. There were clear cut choices for me, when statistically one team had greater numbers offensively and defensively. When a team had better numbers offensively but worse defensively, it was a coin flip.
Let's take a look at my bracket for some key selections I made:
ND State (12) over Oklahoma (5)
This was an easy choice. NDST had an offensive stat of 968.59 and Oklahoma had an offense of 949.37. Defensively, NDST had a stat of 1.52 versus Oklahoma's stat of 1.47. Two bigger numbers, easy choice. I didn't get to watch the game, but the score was 80-75, NDST wins.
Pittsburgh (9) over Colorado (8)
Another easy choice. Pitt offense: 1001.41. Pitt defense: 1.57. CU offense: 935.29. CU defense: 1.54. Defensively, this was much closer than the NDST vs Oklahoma game, but offensively this was a blow out. The score indicated as much as well: 77-48, Pitt.
Baylor (6) over Creighton (3)
Baylor offense: 1054.1
Baylor defense: 1.60
Creighton offense: 1018.07
Creighton defense: 1.33
Again, both numbers larger for Baylor. Not exactly sure how to extrapolate score guessing, but having an offensive stat of +36 over your opponent I'm assuming is a factor. As is a defensive stat of +0.27. This is similar to the NDST vs Oklahoma stat (NDST offense: +19.22, NDST defense: +0.05) except the result was much more in favor of Baylor: 85-55.
Of course, the bracket is not without its faults:
SF Austin (12) upset VCU (5)
I had VCU winning.
VCU offense: 851.92
VCU defense: 1.62
SFA offense: 956.03
SFA defense: 1.35
This is a situation of a coin flip. VCU had a better defense, SFA with a better offense. An objective measure I took into account was that my defensive statistic was stronger than my offensive statistic. A subjective measure I took into account was seeding -- I would assume the people who seed these teams have better stats than I do. Defensively, VCU vs SFA was similar (statistically speaking) to Baylor vs Creighton. However, the offensive discrepancy was +104.11 in favor of SFA.
Kentucky (8) upsets Wichita State (1)
I had Wichita St over UK.
Wich St offense: 1030.72
Wich St defense: 1.78
UK offense: 970.22
UK defense: 1.62
Offensive discrepancy of +60.5 in favor of Wich St. This is similar to Pitt vs Colorado.
Defensive discrepancy of +0.16. I didn't run a standard deviation statistic on all of the defensive and offensive stats, so I'm not exactly sure if increments of +0.1 are large. By eyeball inspection, they seem to be large.
However, UK beat Wich St. It was a very very close game (78-76, UK). One theory is that the stats aren't representative of inter-conference play. No offense to the Missouri Valley Conference, where Wichita State plays the majority of its regular season games, but the rankings of the opponents they faced (termed "strength of schedule") was ranked 122nd out of the 351 NCAA Division 1 teams. Kentucky strength of schedule was 11th. So the numbers for Wich St may be inflated due to a weaker schedule than UK.
I am very proud to be a part of the #DarwinsBalls bracket this year, put together by my blue birdy friend MyTChondria. This bracket was initially created to raise money for a class who wanted to read The Immortal Life of Henrietta Lacks, a biographical account about the ethics and science surrounding HeLa cells, which have been used in medical research for decades. With the help of many scientists on Twitter, including author of Henrietta Lacks, Rebecca Skloot, we were able to raise money for not just this class but for 5 other class projects via DonorsChoose.org. That's a lot of science for children who may not have been able to get exposure to these materials.
I highly recommend donating at least $5 (I've been donating $10) to a DonorsChoose project. I remember having teachers who sacrificed their money to benefit their students for science. I remember being able to dissect fetal pigs and make Rube Goldberg machines. I remember being able to learn about wave energy when someone donated a ton of Plexiglas to our school and we got to make giant solar ovens out of cardboard, Plexiglas and duct tape. I remember a lot of opportunities I got to have that many students don't get to have. All of that aided me to get on the path of science.
Check out Gerty-Z's blog about #DarwinsBalls. Even though bracketing is over, teaching and science is 24/7. Eat a peanut butter and jelly sandwich and drink water for lunch one day to save $10. Then use that to contribute to science in a classroom.