Dispatches from #SSAC16: Future points of interest
These posts are all about the 2016 Sloan Sports Analytics Conference (or SSAC16 for short). You can check out all of my posts related to SSAC16 by clicking this link. If you're attending, don't be shy about saying hi! Email | Twitter | LinkedIn | Comment below
Many of these were talked about in multiple panels but there was no explicit call to action for any of these points
De-identification and handling of athlete data
In science research, there are many checks-and-balances to help protect participants who contribute to our research. The sentiment for what sports analytics are currently contributing to is the health and benefit to the athlete -- which is like the mission of most research ethics committees (maximize the benefits for participants while doing no harm). "Doing no harm" is difficult to achieve when sensitive data on biological and physical attributes could possibly play a role in future salaries or endorsements. The best practice would be immediate de-identification of an athlete's data, but then this raises questions of who controls the data and how much control does an athlete have over their own data? Unlike publicly funded science research, private industries can have different agreements for what and how data will be used. To this point, most "data" have been in the forms of drug tests and physical fitness tests. Introducing predictive models of performance is unlike drug tests and physicals in the sense that the performance hasn't happened yet -- so how is someone to be penalized for something that hasn't occurred?
Another possible route is league intervention (or Player's Association intervention), to which a data ethics board is put together to manage how data is being managed and used in order to stop possibly harming data identifications to occur. We've seen what happens when sensitive data is revealed in the media -- this will only occur more frequently if no safeguards are put in place soon.
Data collection and preprocessing: how to collect cleaner data/how to filter messy data
The state of data collection related to in-game or in-practice performance is emerging, but the discussion of tool verification has not been discussed. Companies have stated their product does XYZ -- and a good amount of them do -- but for more sensitive data (like neural data, or muscular data), how do buyers know what they are measuring is actually being accurately measured? In the world of neuroscience, collecting data in ways that are "clean," or with minimal data unrelated to what you are trying to record, is of the utmost importance. Movement in some techniques means an entire dataset is ruined and unusable. It is hard to imagine players doing in-game performing while having neural data collected and coming out with usable data. Better validation of tools are necessary for the advancement of biometrics in sports.
However, clean data is only one part of the data analysis process. Filtering data to eliminate artifactual data is also needed. This was not even mentioned in any talk I attended. Many of the people who are coming into sports analytics come from very different fields where filtering techniques vary and many are not standard across fields. As one person I talked to mentioned, analytics within teams probably have their own standardizations, but for potential career analysts breaking into this field it would be advantageous to have everyone know a set standard of filtering techniques to further ensure the data we are trying to interpret are meaningful. On the opposite take, if people are doing analyses on unfiltered and possible unusable data, what are these people actually selling you?
Greater analytics transparency would help the academic sports scientist -- but no real benefit to the team sports scientist... right?
Many representatives of analytics companies were coy to speak about what sort of analytics they used and rather talked about how their analytics were being used. That's fine (application is important to know), but that doesn't help those basic research scientists help make larger advancements in the field. As many said, there has never been more advanced statistics and analyses available to sports at large than now. But looking from the outside at this field and at neuroscience, there is a long way to go before academic sports analytics gets up to speed. In particular, the access to data is really important and fragile -- access to teams would be hard on the professional level, if not impossible. Even on the collegiate level, teams would be difficult to work with. But if there were openly available, de-identified data on standardized tasks from a team (or multiple teams) then data analytics in sports could begin to take off. But who does that benefit? And ultimately, how does that benefit teams and athletes?