Chronology: Data-driven Sports Image Indexing Research
Contents
Introduction and Background
This chronology reports on research (theory, empirical, and grant/patent related), teaching, service, and entrepreneurial activities investigating data-driven sports image indexing that were led by Steven L. MacCall, PhD, associate professor in the School of Library and Information Studies (SLIS) at the University of Alabama.
This research program investigates the efficiency and effectiveness of a semantic indexing method designed to facilitate the search for individual game images using queries that include game statistical situations (e.g., retrieve images associated with plays occurring within a given season that resulted in touchdowns and occurred on 3rd down). The method, based on a recently issued UA patent, is not based on the traditional asset-by-asset approach that results in the creation of individual metadata records for each asset; rather, our method is innovative because it proposes to invert the traditional process by facilitating the organizing of each game on a play-by-play basis using structured time segments, and then we “attach” each asset to the appropriate play via an actual or simulated time-based parameter. This reduction in time spent organizing (fewer games versus indexing many more individual images) is the source of our claim for efficiency and effectiveness.
Indexing research is a long standing area for information science investigators. Applications of such research include the improvement of systems design (efficiency and effectiveness) as well as informing the teaching of information professionals to better prepare them for the technological intensities of the modern metadata work environment. In a professional school such as SLIS, it is vitally important that full time faculty members have teaching and service activities informed by their research.
Current Research Questions
One would expect that research questions would evolve over time, and in that spirit, these are the current primary research questions under investigation in this research program:
- Effectiveness: Can data-driven indexing methods serve a maximal indexing objective in which all identifiable named entities are indexed based on all of their observable features and attributes?
- Efficiency: Can data-driven indexing methods scale to meet the maximal indexing objective and also match the rate in which current sports images (photo and video) that are captured by digital cameras and historical sports images (photo and video) that are digitized from the historical record?
- Integration: Can current born digital sports images and digitized historical images be indexed in such a way that results in a single database application rather than in multiple silos?
- Identification: Can a maximal semantic indexing method incorporating statistical play-by-play datasets provide sufficient context for identifying images based on visible entities and attributes?
Research Result Highlights and Working Software Demonstrations
Semantic Indexing Era: Wikibase Software (2018 to present)
Since summer of 2018 (see entry for 2018 in chronology below), I have been able to take advantage of a state-of-the-art research and development software platform (Wikibase) that allowed me to investigate my research questions using semantic indexing methods rather than the using conventional image indexing software (see subsection just below).
Since summer of 2018, we have achieved the following research milestones (follow links for more information about each):
- Research questions 1 and 2: Image indexing efficiency and effectiveness
- Games from the 2017 Alabama Crimson Tide football season: We have completed the semantic indexing of every single play that occurred in all games from the 2017 Alabama Crimson Tide football season using a semantic indexing method that incorporates JSON-formatted statistical play-by-play datasets into a semantic indexing process via the creation of a semi-automated data processing pipeline. This effort was partially funded by two 2018 grants that we received; see chronology below. (See 2018-19 Academic Year Research Report for detailed documentation leading to this milestone.)
- Games from the 1992 Alabama Crimson Tide football season: We have completed RGC-grant funded work (see 2018 entry in chronology below) that allowed us to investigate the recovery of play-by-play data from the historical record. For games in 1992, the historical record consisted of typed play-by-play datasets requiring manual transcription by volunteers, which allowed us to investigate the inclusion into our data processing pipeline of a crowdsourcing method for historical data transcription. A summary of our results (additional information in a conference paper presentation reported in chronology below):
- Alabama versus Tulane on October 10, 1992 at the Louisiana Superdome in New Orleans, LA; RECOVERED PLAY-BY-PLAY DATA: 176 plays from 26 drives
- Alabama versus LSU on November 7, 1992 at Tiger Stadium in Baton Rouge, LA; RECOVERED PLAY-BY-PLAY DATA: 185 plays from 26 drives
- Games from the 1961 Alabama Crimson Tide football season: As part of the above mentioned RGC grant, we also investigated games from the 1961 season. For games from 1961, there were no historical records available containing play-by-data. However, there are other documentary sources of such data, including journalistic newspaper accounts of those games. In our study, we investigated the inclusion into our data processing pipeline of a crowdsourcing method for historical data-extraction from newspaper articles. We chose to use articles published in the Tuscaloosa News (available online). A summary of our results (additional information in a conference paper presentation reported in chronology below):
- Alabama versus Tulane on September 30, 1961 at Ladd Stadium in Mobile, AL; RECOVERED PLAY-BY-PLAY DATA: 104 plays and 22 drives
- Alabama versus Vanderbilt on October 7, 1961 at Dudley Field in Nashville, TN; RECOVERED PLAY-BY-PLAY DATA: 129 plays and 25 drives
- Special game from the 1961 Alabama Crimson Tide football season: When working with the 1961 season as described in the previous section, we came across a fully reconstructed game, the "Iron Bowl" game against Auburn on December 2, that had been converted into a television production using digitized coaches film and historical accounts maintained at the Paul W. Bryant Museum. Based on the availability of this digitized full-game content, we were able to index every play of that game as if it were a game from the 2017 season (described above).
- Research question 3: Integration: After completing the semantic indexing of individual plays from the three seasons of Alabama Crimson Tide football as described above, we were able to successfully evaluate the querying of plays and the photos and video clips associated with those plays as if they were all in the same semantic database.
- Research question 4: Identification: Crimson Tide Photos, a unit in UA Athletics, provided us with a random sample of born digital photos that were taken during games from the 2017 Alabama Crimson Tide football season, and we were able to identify each image in terms of the play context and the game statistical situation at the time that each photo was taken. (See Plays with Example UA Images May 2019 for documentation of this milestone.)
Conventional Indexing Era: Omeka and ContentDM Software (2008 to 2018)
During this period, SLIS students in my annual spring semester Metadata course (LS 566) benefited from the application of my research as I was able to use it to inform my teaching. Specifically, my research served as the basis for their course project work, which involved the application of indexing theories by way of applying the standard/conventional approach to the indexing of a set of images. These images were provided by Ken Gaddy, Director of the Paul W. Bryant Museum, and they were both digitized black and white photos of Alabama Crimson Tide football games from the 1975 season and also born digital color images from the 2010 National Championship game.
The indexing of these images was accomplished using two different software applications.
- Omeka software (2011-2018): Here is a representative example of the end result of the last of such indexing projects from the Spring 2018 semester of this course.
- ContentDM (2008-2010)
Chronology of Research, etc Outputs
The entries below are inverse chronologically and are coded as follows:
- RESEARCH
- RESEARCH - Theory
- RESEARCH - Empirical
- RESEARCH - Patent activity/actions
- RESEARCH - Grant activity
- TEACHING
- SERVICE
- Entrepreneurial activity
epiphany
2020
MacCall, S.L. (2020). Systems and methods for digital asset organization. U.S . Patent number 10,534,812.
Partnership established with Dr. Yu Gan, assistant professor of Electrical and Computer Engineering at the University of Alabama. Dr. Gan is an expert in digital image processing, and he is working with a masters students to extract player participation data from born digital and digitized historical video from a 2017 game and a game from 1961.
MacCall, S.L. (In review). Data-driven semantic DAM indexing incorporating statistical play-by-play game logs: A linked data application using Wikibase from the 2017 football season of the Alabama Crimson Tide. Conference paper submitted to 2020 LD4 Conference on Linked Data in Libraries, College Station, TX.
MacCall, S.L., Liu, H., & Anderson, C.M. (In review). Statistical data recovery from historical documentation of Alabama football games using Wikibase as a repository. Conference paper submitted to Connecting Collections as Data: Transforming Communities, Sharing Knowledge, and Building Networks with International GLAM Labs, Washington, DC.
Adopted Wikibase as image repository software for the semantic indexing project in my newly developed LS 590 Linked Data course at SLIS.
2019
Anderson, C.M., Liu, H., & MacCall, S.L. (2019). Crowdsourcing in a semantic indexing workflow for efficiently organizing historical multimedia sports collections. Poster accepts for the 2020 Annual Meeting of the Alabama Library Association, Birmingham, AL.
MacCall, S.L., Liu, H., & Anderson, C.M. (2019). How much statistical data can be recovered from Alabama football history? Piloting a crowdsourced approach using Wikibase as data repository. Conference paper presented at 2019 Digitorium Digital Humanities Conference, Tuscaloosa, AL. [UA Institutional Repository deposit: https://ir.ua.edu/handle/123456789/6574]
2018
MacCall, S.L. (2018). Investigation of a data-driven indexing method for multimedia asset collections in sports: Phase 2: Developing SLIS research capacity for key linked open data technologies. University of Alabama School of Library and Information Studies Research Fund -$1,000. Funded.
MacCall, S.L., & Bott, G. (2018). Investigation of a data-driven indexing method for multimedia asset collections in sports: Phase 1: How much data can be recovered from Alabama football history? University of Alabama Office of Research and Development Research Grants Committee Level 1 Program - $6,000. Funded
2016
MacCall, S.L., McMillan, D.J., Vargo, C.J., Bradley, S.B., & Aversa, E.A. (2016). Efficiency, integration, interoperability: A 21st century approach to organizing sports digital assets for all libraries. Knight Foundation’s News Challenge for Libraries: How Might Libraries Serve 21st Century Information Needs? - pre-budget grant submission. Not funded.
2015
MacCall, S.L. (Filed December 15, 2015). Systems and methods for digital asset organization. U.S. Utility Patent Application number 14/971,463.
MacCall, S.L., Vargo, C.J., Bradley, S.B., & Aversa, E.A. (2015). Development of a novel digital asset organizing method in sports. National Science Foundation - Small Business Innovation Research (SBIR) Phase I Grant - $225,000 ($74,925 sub-award to University of Alabama). Not funded.
2014
MacCall, S.L. [Chief Scientist for MaxOrg, LLC], Aversa, E.A. [CEO for MaxOrg, LLC], & McMillan, D.J. [Technology Officer for MaxOrg, LLC]. (2014, 2015, 2016). Crimson Canvas - MaxOrg, LLC. Program participation: Commercial Development of Faculty Developed UA Intellectual Property sponsored by Alabama Innovation and Mentoring of Entrepreneurs (AIME).
MaxOrg, LLC formed as a faculty-led startup to contribute to the commercial development of UA intellectual property (see 2020 and 2015 entries above for patent issued and patent filed data respectively)
2013
MacCall, S.L. & Gaddy, K. (2013). Optimal organizing of digital images in sports: A project of the Paul W. Bryant Museum and UA SLIS. Presented at the 2013 University of Alabama Program in Sports Communication Sports Symposium, Tuscaloosa, AL. [Slideshare: http://tinyurl.com/k875g5j]
2011
Adopted Omeka image repository software for the conventional indexing project in my LS 566 Metadata course at SLIS.
2008
Partnership begun with Ken Gaddy, director of the Paul W. Bryant Museum. Ken provided my students and me with a set of digitized black and white photos from the 1975 Alabama Crimson Tide football season and a set of born digital color images from the 2010 National Championship game. These images served as the training set for students in my LS 566 course on Metadata every year since 2008. Equally importantly, my own study of these images was instrumental in the development of my research program and subsequently led to the research output reported here.
Adopted use of ContentDM image repository software for the conventional indexing project in my LS 566 Metadata course at SLIS.
Team Members and Major Supporters
- Ken Gaddy, Director, Paul W. Bryant Museum
- Dr. Greg Bott, Assistant Professor, UA Culverhouse College of Business (co-author of RGC grant)
- David McMillan, Executive Director, Enterprise Development Application Support and MaxOrg, LLC (led install of Wikibase software and its software customization to this project)
- Dr. Yu Gan, Assistant Professor, UA Department of Electrical and Computer Engineering (digital image processing)
- Huapu Liu, MLIS, my former Graduate Research Assistant and frequent co-author (provided indispensable support from the very beginning of the Wikibase phase of this research project when we were faced with an empty database!)
- C. Melissa Anderson, my current Graduate Research Assistant and co-author (project manager for RGC grant)
- Christina Schultz-Richert, MLIS graduate (worked to extend the football related model to Public Television episode transcripts)
- Dr. Elizabeth Aversa,retired SLIS Director and MaxOrg, LLC