Difference between revisions of "RGC Grant Report - 2019-2021"

From Wikibase.slis.ua.edu
Jump to navigation Jump to search
(Created page with "INTRODUCTION This is report on a research activities supported by a University of Alabama [http://ovpred.ua.edu/research-development/rgc/ Small Grant Program] (formerly RGC) g...")
 
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
INTRODUCTION
+
'''INTRODUCTION''' <br><br>
This is report on a research activities supported by a University of Alabama [http://ovpred.ua.edu/research-development/rgc/ Small Grant Program] (formerly RGC) grant awarded to [http://smaccall.people.ua.edu/ Dr. Steven MacCall] and Dr. Greg Bott to establish a new multi-phase research project in the area of sports information science. The specific focus of this research study is the development, implementation, and evaluation of a new data-driven method for indexing current and historical multimedia sports asset collections (i.e., still images and moving image clips), which is based on a [http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=10534812.PN.&OS=PN/10534812&RS=PN/10534812 UA patent written by the PI (MacCall, 2015)]. The data-driven method under study incorporates statistical play-by-play and player participation data into the indexing process. The particular sport we are using for this study will be football, and the statistical game data we are incorporating is represented by the play-by-play and player participation data sources from Alabama football history that are available to us by way of the documentary collections of the Paul W. Bryant Museum. Our study question is: How much statistical game data can be recovered from the historical documentary records of the Bryant Museum?
+
This document reports on research activities supported by a University of Alabama [http://ovpred.ua.edu/research-development/rgc/ Small Grant Program] (formerly RGC) grant awarded to [http://smaccall.people.ua.edu/ Dr. Steven MacCall] and Dr. Greg Bott to establish a new multi-phase research project in the area of sports information science. The specific focus of this research study is the development, implementation, and evaluation of a new data-driven method for indexing current and historical multimedia sports asset collections (i.e., still images and moving image clips), which is based on a [http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=10534812.PN.&OS=PN/10534812&RS=PN/10534812 UA patent written by Dr. MacCall] (MacCall, 2020).  
 +
 
 +
The data-driven method under study incorporates statistical play-by-play and player participation data into a semantic indexing process using linked data technologies and [https://wikiba.se/ Wikibase software]. The initial sport that we are investigating in this study is the Alabama Crimson Tide football team, and the statistical game data we are incorporating comes from various sources depending on the historical period/material origin for each game's play-by-play dataset:
 +
 
 +
# '''Category 1 games (2017 season)''': JSON-formatted play-by-play dataset files for all plays occurring in every game from the 2017 Alabama Crimson Tide football season were acquired from a publicly available data source.
 +
# '''Category 2 games (1992 season)''': Typed paper-based play-by-play datasets for all plays occurring in 2 games from the 1992 Alabama Crimson Tide football season were transcribed:
 +
## Alabama v
 +
## Alabama v
 +
# '''Category 3 games (1961 season)''': For those games without complete play-by-play datasets due to gaps in the historical record, there are other documentary sources containing such data, including journalistic newspaper accounts of individual games, that can mined for purposes of reconstructing play-by-play datasets for individual games. We report on such efforts for 2 games played by the 1961 Alabama Crimson Tide football team:
 +
## Alab
 +
## Ala
 +
# We also had access to the complete play-by-play dataset from an additional game in the 1961 season:
 +
## Alal
 +
 
 +
<br>
 +
'''BACKGROUND'''
 +
<br><br>
 
The recovery of data from the historical record for purposes of reconstructing the past in digital form is an active area of research across many areas, such as the recovery of climate data (Wheeler & García‐Herrera, 2008). For example, the RECLAIM Project (RECovery of Logbooks and International Marine data) is a concerted international effort to facilitate the recovery of archived marine weather observations that have been recorded in ships' logbooks for hundreds of years (Wilkinson et al, 2011). It is in the spirit of these and other studies on data recovery from the historical record that we apply a similar such method to attempt to recover statistical play-by-play and player participation data from various efforts to “log” the activity occurring in football games from a sample of Alabama football games. This data can then be incorporated into the process of indexing historical multimedia assets in a future research study (see below).  
 
The recovery of data from the historical record for purposes of reconstructing the past in digital form is an active area of research across many areas, such as the recovery of climate data (Wheeler & García‐Herrera, 2008). For example, the RECLAIM Project (RECovery of Logbooks and International Marine data) is a concerted international effort to facilitate the recovery of archived marine weather observations that have been recorded in ships' logbooks for hundreds of years (Wilkinson et al, 2011). It is in the spirit of these and other studies on data recovery from the historical record that we apply a similar such method to attempt to recover statistical play-by-play and player participation data from various efforts to “log” the activity occurring in football games from a sample of Alabama football games. This data can then be incorporated into the process of indexing historical multimedia assets in a future research study (see below).  
In this initial phase research study, we investigate the availability of statistical play-by-play and player participation data available in the large multimedia documentary collection at the Paul W. Bryant Museum at the University of Alabama.
+
<br><br>
RESEARCH QUESTION AND METHODOLOGY OVERVIEW
+
'''RESEARCH QUESTION AND METHODOLOGY OVERVIEW'''
RQ: To what extent does the historical record as reflected in the documentary collection of the Bryant Museum contain recoverable statistical play-by-play and player participation data?
+
<br><br>
We will use trained volunteers to “mine” for the data contained in historical documentary sources from two seasons of Alabama football history (1961 and 1992). The data recovered from the sample seasons will then be compared to a more recent “baseline” season (2017).
+
 
The method for recovering statistical play-by-play and player participation data contained in a documentary collection depends on the format of the data source. For example, play-by-play statistical game data from our baseline 2017 season will be recovered from a JSON-encoded digital file using an automated method. On the other hand, game data from our sample seasons (1961 and 1992) is not in a digital file format, so we deploy an approach in which volunteers read documentary materials or view documentary video or film in order to extract statistical play-by-play and player participation data.
+
The method for recovering statistical play-by-play and player participation data contained in a documentary collection depends on the format of the data source:
SAMPLE
+
 
As noted in the previous section, the availability of recoverable game data from the Bryant Museum’s multimedia documentary collection is complicated by its varying digital and non-digital formats. We identified three categories of such data sources to serve as a basis for our sample of historical Alabama football games as follows:
+
# Play-by-play statistical game data from our '''Category 1 games''' was recovered from a JSON-encoded digital file using an automated methods combined with a data processing pipeline.
Category 1 games from current JSON era (2017): Games from the 2017 football season were logged digitally capturing play-by-play statistical data resulting in a JSON-encoded file for each game from the season.
+
# Play-by-play statistical game data from our '''Category 2 games''' was transcribed from structured data by volunteers
Category 2 games from previous handwritten/typewritten era (1992): Games from the 1992 football season were logged on paper capturing play-by-play statistical data the must be transcribed by trained volunteer study participants.
+
# Play-by-play statistical game data from our '''Category 3 games''' was transcribed from unstructured data sources seasons (1961 and 1992) is not in a digital file format, so we deploy an approach in which volunteers read journalistic accounts of games from the Tuscaloosa News in order to extract statistical play-by-play and player participation data.
Category 3 games having no existing detailed game records (1961): Games from the 1961 football season were not formally logged (or no records exist in the Bryant Museum Collection). However, there are other documentary sources that contain such data, including coaches’ film and journalistic newspaper accounts of games that, while more possibly be more “spotty”, when recovered by trained volunteer participants will allow reconstruction of game logs.
+
<br>
DATA RECOVERY/REFORMATTING PROCEDURES
+
'''RESEARCH QUESTIONS'''
Overview
+
<br><br>
The following table summarizes the recovery and reformatting procedures based on each category of season’s games:
+
'''Category 1 games''' from the 2017 season were used as a baseline against which to compare the results of transcribing activities of our volunteers working with '''Category 2 games''' and '''Category 3 games''' as follows:
Games Play-by-Play Data Player Participation Data
+
# Research Question #1 – How much play-by-play data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season?  
Category 1 Data reformatting from JSON-encoded files TBD*
+
# Research Question #2 – How much player participation data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season?  
Category 2 Transcribed from paper-based game stats Transcribed from game recording
+
 
Category 3 Transcribed from newspaper game accounts
+
<br><br>
Transcribed from coaches’ film
+
'''RESULTS'''
* Data reformatting from digital files if available; otherwise, transcribed from game video
+
<br><br>
Reformatting Data from Category 1 Games
+
 
Data Sources:
+
kkk
1. Play-by-Play Data: JSON-encoded files for each game in 2017 season obtained from https://www.reddit.com/r/CFBAnalysis/comments/6htfc6/play_by_play_data_dump_20012016
+
 
2. Player Participation Data: We are seeking digital sources for this information. If none exist, we will use volunteers to recover player participation data using the methods of Category 2 and Category 3 games described below.
+
<br><br>
Data Recovery Procedures for Volunteer Participants– Category 2 and 3 Games:
+
'''FOLLOW UP RESEARCH'''
As noted above, games from Category 2 and Category 3 are different from Category 1 games as there is not JSON-encoded digital data sources available for these football seasons. Therefore, we will recruit and train volunteers to recover play-by-play and player participation data from various documentary sources that are held by the Bryant Museum as follows:  
+
<br><br>
1. Handwritten/typed play-by-play data sheets (1992 season). This will involve having volunteers transcribe existing written/typed play-by-play data into digital form.
 
2. Newspaper accounts of games (1961 season). We have noted a tendency for journalists who wrote about games from that period tended to be much more descriptive about many of the plays that occurred in any given game. This will allow us to evaluate the amount of recoverable play-level data from the journalistic accounts of games and compare that amount to the baseline 2017 season.
 
3. Coaches’ film of historical games from the Museum’s documentary collection (1992 and 1961 seasons). UA Athletics Department captured valuable game footage recorded over a period of decades.  The earliest footage is from the 1920s, although they were not regularly shot until the 1940s. These early coaches’ films were shot on 16mm film until the 1980s. These film/video recordings provide access to recoverable player participation data.
 
EVALUATION
 
We will use the games from the 2017 season (category 1) as our baseline for comparison because the JSON files for each game, when combined with player participation data, will provide a complete play-by-play record for that season. At the conclusion of volunteer participation in recovering data from 1992 (category 2) and 1961 (category 3) seasons, we will be able to compare the results for those seasons with the baseline 2017 season as follows:
 
1. Evaluation Question #1 – How much play-by-play data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season?  
 
2. Evaluation Question #2 – How much player participation data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season?  
 
FOLLOW UP RESEARCH
 
 
The results of this phase 1 research study will be used as justification for further grant applications. Our next objective is to apply for a federally funded research grant from the Institute of Museum and Library Services. The focus of this research will be to evaluate the scaling of our indexing method to include a much larger number of football seasons than is in the sample for this phase 1 research study. 
 
The results of this phase 1 research study will be used as justification for further grant applications. Our next objective is to apply for a federally funded research grant from the Institute of Museum and Library Services. The focus of this research will be to evaluate the scaling of our indexing method to include a much larger number of football seasons than is in the sample for this phase 1 research study. 
CITED REFERENCES
+
 
MacCall, S.L. (Filed December 15, 2015). Systems and methods for digital asset organization. U.S. Utility Patent Application number 14/971,463.
+
'''CITED REFERENCES'''<br><br>
Wheeler, D. and García‐Herrera, R. (2008), Ships' Logbooks in Climatological Research. Annals of the New York Academy of Sciences, 1146: 1-15. doi:10.1196/annals.1446.006
+
# MacCall, S.L. (2020). [http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=10534812.PN.&OS=PN/10534812&RS=PN/10534812 Systems and methods for digital asset organization.] U.S. Patent 10,534,812.
Wilkinson, C. , Woodruff, S. D., Brohan, P. , Claesson, S. , Freeman, E. , Koek, F. , Lubker, S. J., Marzin, C. and Wheeler, D. (2011), Recovery of logbooks and international marine data: the RECLAIM project. Int. J. Climatol., 31: 968-979. doi:10.1002/joc.2102
+
# Wheeler, D. and García‐Herrera, R. (2008), Ships' Logbooks in Climatological Research. Annals of the New York Academy of Sciences, 1146: 1-15. doi:10.1196/annals.1446.006
 +
# Wilkinson, C. , Woodruff, S. D., Brohan, P. , Claesson, S. , Freeman, E. , Koek, F. , Lubker, S. J., Marzin, C. and Wheeler, D. (2011), Recovery of logbooks and international marine data: the RECLAIM project. Int. J. Climatol., 31: 968-979. doi:10.1002/joc.2102

Latest revision as of 02:50, 18 February 2020

INTRODUCTION

This document reports on research activities supported by a University of Alabama Small Grant Program (formerly RGC) grant awarded to Dr. Steven MacCall and Dr. Greg Bott to establish a new multi-phase research project in the area of sports information science. The specific focus of this research study is the development, implementation, and evaluation of a new data-driven method for indexing current and historical multimedia sports asset collections (i.e., still images and moving image clips), which is based on a UA patent written by Dr. MacCall (MacCall, 2020).

The data-driven method under study incorporates statistical play-by-play and player participation data into a semantic indexing process using linked data technologies and Wikibase software. The initial sport that we are investigating in this study is the Alabama Crimson Tide football team, and the statistical game data we are incorporating comes from various sources depending on the historical period/material origin for each game's play-by-play dataset:

  1. Category 1 games (2017 season): JSON-formatted play-by-play dataset files for all plays occurring in every game from the 2017 Alabama Crimson Tide football season were acquired from a publicly available data source.
  2. Category 2 games (1992 season): Typed paper-based play-by-play datasets for all plays occurring in 2 games from the 1992 Alabama Crimson Tide football season were transcribed:
    1. Alabama v
    2. Alabama v
  3. Category 3 games (1961 season): For those games without complete play-by-play datasets due to gaps in the historical record, there are other documentary sources containing such data, including journalistic newspaper accounts of individual games, that can mined for purposes of reconstructing play-by-play datasets for individual games. We report on such efforts for 2 games played by the 1961 Alabama Crimson Tide football team:
    1. Alab
    2. Ala
  4. We also had access to the complete play-by-play dataset from an additional game in the 1961 season:
    1. Alal


BACKGROUND

The recovery of data from the historical record for purposes of reconstructing the past in digital form is an active area of research across many areas, such as the recovery of climate data (Wheeler & García‐Herrera, 2008). For example, the RECLAIM Project (RECovery of Logbooks and International Marine data) is a concerted international effort to facilitate the recovery of archived marine weather observations that have been recorded in ships' logbooks for hundreds of years (Wilkinson et al, 2011). It is in the spirit of these and other studies on data recovery from the historical record that we apply a similar such method to attempt to recover statistical play-by-play and player participation data from various efforts to “log” the activity occurring in football games from a sample of Alabama football games. This data can then be incorporated into the process of indexing historical multimedia assets in a future research study (see below).

RESEARCH QUESTION AND METHODOLOGY OVERVIEW

The method for recovering statistical play-by-play and player participation data contained in a documentary collection depends on the format of the data source:

  1. Play-by-play statistical game data from our Category 1 games was recovered from a JSON-encoded digital file using an automated methods combined with a data processing pipeline.
  2. Play-by-play statistical game data from our Category 2 games was transcribed from structured data by volunteers
  3. Play-by-play statistical game data from our Category 3 games was transcribed from unstructured data sources seasons (1961 and 1992) is not in a digital file format, so we deploy an approach in which volunteers read journalistic accounts of games from the Tuscaloosa News in order to extract statistical play-by-play and player participation data.


RESEARCH QUESTIONS

Category 1 games from the 2017 season were used as a baseline against which to compare the results of transcribing activities of our volunteers working with Category 2 games and Category 3 games as follows:

  1. Research Question #1 – How much play-by-play data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season?
  2. Research Question #2 – How much player participation data was recovered from games played in category 2 and category 3 seasons compared to the category 1 season?



RESULTS

kkk



FOLLOW UP RESEARCH

The results of this phase 1 research study will be used as justification for further grant applications. Our next objective is to apply for a federally funded research grant from the Institute of Museum and Library Services. The focus of this research will be to evaluate the scaling of our indexing method to include a much larger number of football seasons than is in the sample for this phase 1 research study. 

CITED REFERENCES

  1. MacCall, S.L. (2020). Systems and methods for digital asset organization. U.S. Patent 10,534,812.
  2. Wheeler, D. and García‐Herrera, R. (2008), Ships' Logbooks in Climatological Research. Annals of the New York Academy of Sciences, 1146: 1-15. doi:10.1196/annals.1446.006
  3. Wilkinson, C. , Woodruff, S. D., Brohan, P. , Claesson, S. , Freeman, E. , Koek, F. , Lubker, S. J., Marzin, C. and Wheeler, D. (2011), Recovery of logbooks and international marine data: the RECLAIM project. Int. J. Climatol., 31: 968-979. doi:10.1002/joc.2102