RGC Grant Report - 2019-2021
INTRODUCTION
This document reports on research activities supported by a University of Alabama Small Grant Program (formerly RGC) grant awarded to Dr. Steven MacCall and Dr. Greg Bott to establish a new multi-phase research project in the area of sports information science. The specific focus of this research study is the development, implementation, and evaluation of a new data-driven method for indexing current and historical multimedia sports asset collections (i.e., still images and moving image clips), which is based on a UA patent written by Dr. MacCall.
The data-driven method under study incorporates statistical play-by-play and player participation data into a semantic indexing process using linked data technologies and Wikibase software. The initial sport that we are investigating in this study is the Alabama Crimson Tide football team, and the statistical game data we are incorporating comes from various sources depending on the historical period/material origin for each game's play-by-play dataset:
- Category 1 games: JSON-formatted play-by-play dataset files for all plays occurring in every game from the 2017 Alabama Crimson Tide football season were acquired from a publicly available data source.
- Category 2 games: Typed paper-based play-by-play datasets for all plays occurring in 2 games from the 1992 Alabama Crimson Tide football season were transcribed:
- Alabama v
- Alabama v
- Category 3 games: For those games without complete play-by-play datasets due to gaps in the historical record, there are other documentary sources containing such data, including journalistic newspaper accounts of individual games, that can mined for purposes of reconstructing game. We report on efforts to reconstruct the play-by-play datasets from 2 games played by the 1961 Alabama Crimson Tide football team:
- Alab
- Ala
- We also had access to the complete play-by-play dataset from an additional game in the 1961 season:
- Alal
How much statistical game data can be recovered from the historical documentary records of the Bryant Museum?</blockquote?>
The recovery of data from the historical record for purposes of reconstructing the past in digital form is an active area of research across many areas, such as the recovery of climate data (Wheeler & García‐Herrera, 2008). For example, the RECLAIM Project (RECovery of Logbooks and International Marine data) is a concerted international effort to facilitate the recovery of archived marine weather observations that have been recorded in ships' logbooks for hundreds of years (Wilkinson et al, 2011). It is in the spirit of these and other studies on data recovery from the historical record that we apply a similar such method to attempt to recover statistical play-by-play and player participation data from various efforts to “log” the activity occurring in football games from a sample of Alabama football games. This data can then be incorporated into the process of indexing historical multimedia assets in a future research study (see below). In this initial phase research study, we investigate the availability of statistical play-by-play and player participation data available in the large multimedia documentary collection at the Paul W. Bryant Museum at the University of Alabama. RESEARCH QUESTION AND METHODOLOGY OVERVIEW RQ: To what extent does the historical record as reflected in the documentary collection of the Bryant Museum contain recoverable statistical play-by-play and player participation data? We will use trained volunteers to “mine” for the data contained in historical documentary sources from two seasons of Alabama football history (1961 and 1992). The data recovered from the sample seasons will then be compared to a more recent “baseline” season (2017). The method for recovering statistical play-by-play and player participation data contained in a documentary collection depends on the format of the data source. For example, play-by-play statistical game data from our baseline 2017 season will be recovered from a JSON-encoded digital file using an automated method. On the other hand, game data from our sample seasons (1961 and 1992) is not in a digital file format, so we deploy an approach in which volunteers read documentary materials or view documentary video or film in order to extract statistical play-by-play and player participation data. SAMPLE As noted in the previous section, the availability of recoverable game data from the Bryant Museum’s multimedia documentary collection is complicated by its varying digital and non-digital formats. We identified three categories of such data sources to serve as a basis for our sample of historical Alabama football games as follows: Category 1 games from current JSON era (2017): Games from the 2017 football season were logged digitally capturing play-by-play statistical data resulting in a JSON-encoded file for each game from the season. Category 2 games from previous handwritten/typewritten era (1992): Games from the 1992 football season were logged on paper capturing play-by-play statistical data the must be transcribed by trained volunteer study participants. Category 3 games having no existing detailed game records (1961): Games from the 1961 football season were not formally logged (or no records exist in the Bryant Museum Collection). However, there are other documentary sources that contain such data, including coaches’ film and journalistic newspaper accounts of games that, while more possibly be more “spotty”, when recovered by trained volunteer participants will allow reconstruction of game logs. DATA RECOVERY/REFORMATTING PROCEDURES Overview The following table summarizes the recovery and reformatting procedures based on each category of season’s games: Games Play-by-Play Data Player Participation Data Category 1 Data reformatting from JSON-encoded files TBD* Category 2 Transcribed from paper-based game stats Transcribed from game recording Category 3 Transcribed from newspaper game accounts Transcribed from coaches’ film
- Data reformatting from digital files if available; otherwise, transcribed from game video
Reformatting Data from Category 1 Games Data Sources: 1. Play-by-Play Data: JSON-encoded files for each game in 2017 season obtained from https://www.reddit.com/r/CFBAnalysis/comments/6htfc6/play_by_play_data_dump_20012016 2. Player Participation Data: We are seeking digital sources for this information. If none exist, we will use volunteers to recover player participation data using the methods of Category 2 and Category 3 games described below. Data Recovery Procedures for Volunteer Participants– Category 2 and 3 Games: As noted above, games from Category 2 and Category 3 are different from Category 1 games as there is not JSON-encoded digital data sources available for these football seasons. Therefore, we will recruit and train volunteers to recover play-by-play and player participation data from various documentary sources that are held by the Bryant Museum as follows: 1. Handwritten/typed play-by-play data sheets (1992 season). This will involve having volunteers transcribe existing written/typed play-by-play data into digital form. 2. Newspaper accounts of games (1961 season). We have noted a tendency for journalists who wrote about games from that period tended to be much more descriptive about many of the plays that occurred in any given game. This will allow us to evaluate the amount of recoverable play-level data from the journalistic accounts of games and compare that amount to the baseline 2017 season. 3. Coaches’ film of historical games from the Museum’s documentary collection (1992 and 1961 seasons). UA Athletics Department captured valuable game footage recorded over a period of decades. The earliest footage is from the 1920s, although they were not regularly shot until the 1940s. These early coaches’ films were shot on 16mm film until the 1980s. These film/video recordings provide access to recoverable player participation data. EVALUATION We will use the games from the 2017 season (category 1) as our baseline for comparison because the JSON files for each game, when combined with player participation data, will provide a complete play-by-play record for that season. At the conclusion of volunteer participation in recovering data from 1992 (category 2) and 1961 (category 3) seasons, we will be able to compare the results for those seasons with the baseline 2017 season as follows: 1. Evaluation Question #1 – How much play-by-play data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season? 2. Evaluation Question #2 – How much player participation data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season? FOLLOW UP RESEARCH The results of this phase 1 research study will be used as justification for further grant applications. Our next objective is to apply for a federally funded research grant from the Institute of Museum and Library Services. The focus of this research will be to evaluate the scaling of our indexing method to include a much larger number of football seasons than is in the sample for this phase 1 research study. CITED REFERENCES MacCall, S.L. (Filed December 15, 2015). Systems and methods for digital asset organization. U.S. Utility Patent Application number 14/971,463. Wheeler, D. and García‐Herrera, R. (2008), Ships' Logbooks in Climatological Research. Annals of the New York Academy of Sciences, 1146: 1-15. doi:10.1196/annals.1446.006 Wilkinson, C. , Woodruff, S. D., Brohan, P. , Claesson, S. , Freeman, E. , Koek, F. , Lubker, S. J., Marzin, C. and Wheeler, D. (2011), Recovery of logbooks and international marine data: the RECLAIM project. Int. J. Climatol., 31: 968-979. doi:10.1002/joc.2102