RGC Grant Report - 2019-2021
INTRODUCTION
This document reports on research activities supported by a University of Alabama Small Grant Program (formerly RGC) grant awarded to Dr. Steven MacCall and Dr. Greg Bott to establish a new multi-phase research project in the area of sports information science. The specific focus of this research study is the development, implementation, and evaluation of a new data-driven method for indexing current and historical multimedia sports asset collections (i.e., still images and moving image clips), which is based on a UA patent written by Dr. MacCall (MacCall, 2020).
The data-driven method under study incorporates statistical play-by-play and player participation data into a semantic indexing process using linked data technologies and Wikibase software. The initial sport that we are investigating in this study is the Alabama Crimson Tide football team, and the statistical game data we are incorporating comes from various sources depending on the historical period/material origin for each game's play-by-play dataset:
- Category 1 games: JSON-formatted play-by-play dataset files for all plays occurring in every game from the 2017 Alabama Crimson Tide football season were acquired from a publicly available data source.
- Category 2 games: Typed paper-based play-by-play datasets for all plays occurring in 2 games from the 1992 Alabama Crimson Tide football season were transcribed:
- Alabama v
- Alabama v
- Category 3 games: For those games without complete play-by-play datasets due to gaps in the historical record, there are other documentary sources containing such data, including journalistic newspaper accounts of individual games, that can mined for purposes of reconstructing play-by-play datasets for individual games. We report on such efforts for 2 games played by the 1961 Alabama Crimson Tide football team:
- Alab
- Ala
- We also had access to the complete play-by-play dataset from an additional game in the 1961 season:
- Alal
BACKGROUND
The recovery of data from the historical record for purposes of reconstructing the past in digital form is an active area of research across many areas, such as the recovery of climate data (Wheeler & García‐Herrera, 2008). For example, the RECLAIM Project (RECovery of Logbooks and International Marine data) is a concerted international effort to facilitate the recovery of archived marine weather observations that have been recorded in ships' logbooks for hundreds of years (Wilkinson et al, 2011). It is in the spirit of these and other studies on data recovery from the historical record that we apply a similar such method to attempt to recover statistical play-by-play and player participation data from various efforts to “log” the activity occurring in football games from a sample of Alabama football games. This data can then be incorporated into the process of indexing historical multimedia assets in a future research study (see below).
RESEARCH QUESTION AND METHODOLOGY OVERVIEW
The method for recovering statistical play-by-play and player participation data contained in a documentary collection depends on the format of the data source:
- Play-by-play statistical game data from our Category 1 games was recovered from a JSON-encoded digital file using an automated methods combined with a data processing pipeline.
- Play-by-play statistical game data from our Category 2 games was transcribed from structured data by volunteers
- Play-by-play statistical game data from our Category 3 games was transcribed from unstructured data sources seasons (1961 and 1992) is not in a digital file format, so we deploy an approach in which volunteers read journalistic accounts of games from the Tuscaloosa News in order to extract statistical play-by-play and player participation data.
RESEARCH QUESTIONS
Category 1 games from the 2017 season were used as a baseline against which to compare the results of transcribing activities of our volunteers working with Category 2 games and Category 3 games as follows:
- Research Question #1 – How much play-by-play data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season?
- Research Question #2 – How much player participation data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season?
RESULTS
kkk
FOLLOW UP RESEARCH
The results of this phase 1 research study will be used as justification for further grant applications. Our next objective is to apply for a federally funded research grant from the Institute of Museum and Library Services. The focus of this research will be to evaluate the scaling of our indexing method to include a much larger number of football seasons than is in the sample for this phase 1 research study.
CITED REFERENCES
- MacCall, S.L. (2020). Systems and methods for digital asset organization. U.S. Patent 10,534,812.
- Wheeler, D. and García‐Herrera, R. (2008), Ships' Logbooks in Climatological Research. Annals of the New York Academy of Sciences, 1146: 1-15. doi:10.1196/annals.1446.006
- Wilkinson, C. , Woodruff, S. D., Brohan, P. , Claesson, S. , Freeman, E. , Koek, F. , Lubker, S. J., Marzin, C. and Wheeler, D. (2011), Recovery of logbooks and international marine data: the RECLAIM project. Int. J. Climatol., 31: 968-979. doi:10.1002/joc.2102