Chronology: Data-driven Sports Image Indexing Research
Contents
Introduction
This chronology reports on research (theory, empirical, and grant/patent related), teaching, service, and entrepreneurial activities investigating data-driven sports image indexing that were led by Steven L. MacCall, PhD, associate professor in the School of Library and Information Studies (SLIS) at the University of Alabama since 2013.
Indexing research is a long standing area for information science investigators. Applications of such research include the improvement of systems design (efficiency and effectiveness) as well as informing the teaching of information professionals to better prepare them for the technological intensities of the modern metadata work environment. In a professional school such as SLIS, it is vitally important that full time faculty members have teaching and service activities informed by their research.
Current Research Questions
One would expect that research questions would evolve over time, and in that spirit, these are the present day global research questions presently under investigation in this research program:
- Effectiveness: Can data-driven indexing methods serve a maximal indexing objective in which all identifiable named entities are indexed based on all of their observable features and attributes?
- Efficiency: Can data-driven indexing methods scale to meet the maximal indexing objective and also match the rate in which current sports images (photo and video) that are captured by digital cameras and historical sports images (photo and video) that are digitized from the historical record?
- Integration: Can current born digital sports images and digitized historical images be indexed in such a way that results in a single database application rather than in multiple silos?
Research Result Highlights and Working Software Demonstrations
Semantic Indexing Era: Wikibase Software (2018 to present)
Since summer of 2018 (see entry for 2018 in chronology below), I have been able to switch my research and development platform from conventional image indexing software (see subsection just below) to Wikibase, which is more current state-of-the-art linked data research and development software that supports experiments in semantic indexing.
Since 2018, we have achieved the following research milestones (follow links for more information about each):
- Games from the 2017 Alabama Crimson Tide football season: We have completed the semantic indexing of every single play that occurred in all games from the 2017 Alabama Crimson Tide football season using a semantic indexing method that incorporates JSON-formatted statistical play-by-play datasets into a semantic indexing process via the creation of a semi-automated data processing pipeline. (This effort was partially funded by two 2018 grants that we received; see chronology below.)
- Games from the 1992 Alabama Crimson Tide football season: We have completed RGC-grant funded work (see 2018 entry in chronology below) that allowed us to investigate the recovery of play-by-play data from the historical record. For games in 1992, the historical record consisted of typed play-by-play datasets requiring manual transcription by volunteers, which allowed us to evaluate crowdsourcing as a historical data transcription method. A summary of our results (additional information in a conference paper presentation reported in chronology below):
- Alabama versus Tulane on October 10, 1992 at the Louisiana Superdome in New Orleans, LA; RECOVERED PLAY-BY-PLAY DATA: '176 plays from 26 drives'
- Alabama versus LSU on November 7, 1992 at Tiger Stadium in Baton Rouge, LA; RECOVERED PLAY-BY-PLAY DATA: 185 plays from 26 drives
- Games from the 1961 Alabama Crimson Tide football season: As part of the above mentioned RGC grant, we also investigated games from the 1961 season. For games from 1961, there was no historical record available containing play-by-data. However, there are other documentary sources, including journalistic newspaper accounts of those games. In our grant funded study, we evaluated a data-extraction method for reconstructing play-by-play data based solely on the content of journalistic accounts published in the Tuscaloosa News (available online), which allowed us to pilot an evaluation of crowdsourcing as a historical data extraction method. A summary of our results (additional information in a conference paper presentation reported in chronology below):
- Alabama versus Tulane on September 30, 1961 at Ladd Stadium in Mobile, AL; RECOVERED PLAY-BY-PLAY DATA: 104 plays and 22 drives
- Alabama versus Vanderbilt on October 7, 1961 at Dudley Field in Nashville, TN; RECOVERED PLAY-BY-PLAY DATA: 129 plays and 25 drives
rovered approx XX% of plays recovered approx XX of plays 1961 Alabama recovered app recover app Dealth with a special situation (Iron Bowl) demonstration of integration
Conventional Indexing Era: Omeka and ContentDM Software (2008 to 2018)
During this period, SLIS students in my annual spring semester Metadata course (LS 566) benefited from the application of my research as I was able to use it to inform my teaching. Specifically, my research served as the basis for their course project work, which involved the application of indexing theories by way of applying the standard/conventional approach to the indexing of a set of images. These images were provided by Ken Gaddy, Director of the Paul W. Bryant Museum, and they were both digitized black and white photos of Alabama Crimson Tide football games from the 1975 season and also born digital color images from the 2010 National Championship game.
The indexing of these images was accomplished using two different software applications.
- Omeka software (2011-2018): Here is a representative example of the end result of the last of such indexing projects from the Spring 2018 semester of this course.
- ContentDM (2008-2010)
Chronology of Research, etc Outputs
The entries below are inverse chronologically and are coded as follows:
- RESEARCH
- RESEARCH - Theory
- RESEARCH - Empirical
- RESEARCH - Patent activity/actions
- RESEARCH - Grant activity
- TEACHING
- SERVICE
- Entrepreneurial activity
epiphany
2020
MacCall, S.L. (2020). Systems and methods for digital asset organization. U.S . Patent number 10,534,812.
MacCall, S.L. (In review). Data-driven semantic DAM indexing incorporating statistical play-by-play game logs: A linked data application using Wikibase from the 2017 football season of the Alabama Crimson Tide. Conference paper submitted to 2020 LD4 Conference on Linked Data in Libraries, College Station, TX.
MacCall, S.L., Liu, H., & Anderson, C.M. (In review). Statistical data recovery from historical documentation of Alabama football games using Wikibase as a repository. Conference paper submitted to Connecting Collections as Data: Transforming Communities, Sharing Knowledge, and Building Networks with International GLAM Labs, Washington, DC.
2019
Anderson, C.M., Liu, H., & MacCall, S.L. (2019). Crowdsourcing in a semantic indexing workflow for efficiently organizing historical multimedia sports collections. Poster accepts for the 2020 Annual Meeting of the Alabama Library Association, Birmingham, AL.
MacCall, S.L., Liu, H., & Anderson, C.M. (2019). How much statistical data can be recovered from Alabama football history? Piloting a crowdsourced approach using Wikibase as data repository. Conference paper presented at 2019 Digitorium Digital Humanities Conference, Tuscaloosa, AL. [UA Institutional Repository deposit: https://ir.ua.edu/handle/123456789/6574]
2018
MacCall, S.L. (2018). Investigation of a data-driven indexing method for multimedia asset collections in sports: Phase 2: Developing SLIS research capacity for key linked open data technologies. University of Alabama School of Library and Information Studies Research Fund -$1,000. Funded.
MacCall, S.L., & Bott, G. (2018). Investigation of a data-driven indexing method for multimedia asset collections in sports: Phase 1: How much data can be recovered from Alabama football history? University of Alabama Office of Research and Development Research Grants Committee Level 1 Program - $6,000. Funded
2016
MacCall, S.L., McMillan, D.J., Vargo, C.J., Bradley, S.B., & Aversa, E.A. (2016). Efficiency, integration, interoperability: A 21st century approach to organizing sports digital assets for all libraries. Knight Foundation’s News Challenge for Libraries: How Might Libraries Serve 21st Century Information Needs? - pre-budget grant submission. Not funded.
2015
MacCall, S.L. (Filed December 15, 2015). Systems and methods for digital asset organization. U.S. Utility Patent Application number 14/971,463.
MacCall, S.L., Vargo, C.J., Bradley, S.B., & Aversa, E.A. (2015). Development of a novel digital asset organizing method in sports. National Science Foundation - Small Business Innovation Research (SBIR) Phase I Grant - $225,000 ($74,925 sub-award to University of Alabama). Not funded.
2014
MacCall, S.L. [Chief Scientist for MaxOrg, LLC], Aversa, E.A. [CEO for MaxOrg, LLC], & McMillan, D.J. [Technology Officer for MaxOrg, LLC]. (2014, 2015, 2016). Crimson Canvas - MaxOrg, LLC. Program participation: Commercial Development of Faculty Developed UA Intellectual Property sponsored by Alabama Innovation and Mentoring of Entrepreneurs (AIME).
MaxOrg, LLC formed as a faculty-led startup to contribute to the commercial development of UA intellectual property (see 2020 and 2015 entries above for patent issued and patent filed data respectively)
2013
MacCall, S.L. & Gaddy, K. (2013). Optimal organizing of digital images in sports: A project of the Paul W. Bryant Museum and UA SLIS. Presented at the 2013 University of Alabama Program in Sports Communication Sports Symposium, Tuscaloosa, AL. [Slideshare: http://tinyurl.com/k875g5j]