Difference between revisions of "Chronology: Data-driven Sports Image Indexing Research"

From Wikibase.slis.ua.edu
Jump to navigation Jump to search
Line 57: Line 57:
 
MacCall, S.L. (2020). [http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=10534812.PN.&OS=PN/10534812&RS=PN/10534812 Systems and methods for digital asset organization]. U.S . Patent number 10,534,812.  
 
MacCall, S.L. (2020). [http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=10534812.PN.&OS=PN/10534812&RS=PN/10534812 Systems and methods for digital asset organization]. U.S . Patent number 10,534,812.  
  
Partnership established with [https://eng.ua.edu/people/yu-gan/ Dr. Yu Gan], assistant professor of Electrical and Computer Engineering at the University of Alabama. Dr. Gan is an expert in digital image processing, and he is working with a masters students to extract player participation data from born digital and digitized historical video from a 2017 game and a game from 1961.
+
Partnership established with [https://eng.ua.edu/people/yu-gan/ Dr. Yu Gan], assistant professor of Electrical and Computer Engineering at the University of Alabama. Dr. Gan is an expert in digital image processing, and he is working with a masters students to extract player participation data from born digital and digitized historical video from the [https://wikibase.slis.ua.edu/wiki/2018_Alabama_versus_Georgia_CFP_National_Championship_Football_Game_Presented_by_AT%26T National Championship game] from the 2017 season and a [https://wikibase.slis.ua.edu/wiki/1961_Alabama_versus_Auburn_Football_Game game 1961 Iron Bowl].
  
 
MacCall, S.L., et al. (In preparation). Shatford-Layne model and indexing efficiency: An experiment in linked data methods for the efficient indexing of sports images. Journal of Documentation.
 
MacCall, S.L., et al. (In preparation). Shatford-Layne model and indexing efficiency: An experiment in linked data methods for the efficient indexing of sports images. Journal of Documentation.

Revision as of 15:06, 21 February 2020

REPORT DATE: February 2020

Steven L. MacCall, PhD
Associate Professor
School of Library and Information Studies
University of Alabama


Introduction and Background

This chronology reports on research, teaching, service, and entrepreneurial activities investigating data-driven sports image indexing led by Steven L. MacCall, PhD, Associate Professor in the School of Library and Information Studies (SLIS) at the University of Alabama.

This research program investigates the effectiveness and efficiency of a semantic indexing method designed to facilitate image search queries that are pinpointed on game statistical situations (e.g., retrieve all images associated with plays that resulted in touchdowns and occurred on 3rd down). The method, based on a recently issued UA patent, is not based on the traditional asset-by-asset approach that results in the creation of individual metadata records for each asset; rather, our method is innovative because it proposes to invert the traditional process by facilitating the organizing of each game on a play-by-play basis using structured time segments, and then we “attach” each asset to the appropriate play via an actual or simulated time-based parameter. This reduction in time spent organizing (fewer games versus indexing many more individual images) is the source of our claim for efficiency and effectiveness.

Indexing research is a long standing area for information science investigators. Applications of such research include the improvement of systems design (efficiency and effectiveness) as well as informing the teaching of information professionals to better prepare them for the technological intensities of the modern metadata work environment. In a professional school such as SLIS, it is vitally important that full time faculty members have teaching and service activities informed by their research.

Current Research Questions

One would expect that research questions would evolve over time, and in that spirit, these are the current primary research questions under investigation in this research program:

  1. Effectiveness: Can data-driven indexing methods serve a maximal indexing objective in which all identifiable named entities are indexed based on all of their observable features and attributes?
  2. Efficiency: Can data-driven indexing methods scale to meet the maximal indexing objective and also match the rate in which current sports images (photo and video) that are captured by digital cameras and historical sports images (photo and video) that are digitized from the historical record?
  3. Integration: Can current born digital sports images and digitized historical images be indexed in such a way that results in a single database application rather than in multiple silos?
  4. Identification: Can a maximal semantic indexing method incorporating statistical play-by-play datasets provide sufficient context for identifying images based on visible entities and attributes?

Research Result Highlights and Links to Working Software Demonstrations

Semantic Indexing Era: Wikibase Software (2018 to present)

Since summer of 2018 (see entry for 2018 in chronology below), I have been able to take advantage of a state-of-the-art research and development software platform (Wikibase) that allowed me to investigate my research questions using semantic indexing methods rather than the using conventional image indexing software (see subsection just below).

Since summer of 2018, we have achieved the following research milestones (follow links for more information about each):

  1. Research questions 1 and 2: Image indexing efficiency and effectiveness
    1. Games from the 2017 Alabama Crimson Tide football season: We have completed the semantic indexing of every single play that occurred in all games from the 2017 Alabama Crimson Tide football season using a semantic indexing method that incorporates JSON-formatted statistical play-by-play datasets into a semantic indexing process via the creation of a semi-automated data processing pipeline. This effort was partially funded by two 2018 grants that we received; see chronology below. See 2018-19 Academic Year Research Report for detailed documentation and software demonstration pertaining to the accomplishment this milestone.
    2. Games from the 1992 Alabama Crimson Tide football season: We have completed RGC-grant funded work (see 2018 entry in chronology below) that allowed us to investigate the recovery of play-by-play data from the historical record. For games in 1992, the historical record consisted of typed play-by-play datasets requiring manual transcription by volunteers, which allowed us to investigate the inclusion into our data processing pipeline of a crowdsourcing method for historical data transcription. See 2019-20 Academic Year Research Progress Report for detailed documentation and software demonstration pertaining to the accomplishment this milestone.
    3. Games from the 1961 Alabama Crimson Tide football season: As part of the above mentioned RGC grant, we also investigated games from the 1961 season. For games from 1961, there were no historical records available containing play-by-data. However, there are other documentary sources of such data, including journalistic newspaper accounts of those games. In our study, we investigated the inclusion into our data processing pipeline of a crowdsourcing method for historical data-extraction from newspaper articles. We chose to use articles published in the Tuscaloosa News available online. See 2019-20 Academic Year Research Progress Report for detailed documentation and software demonstration pertaining to the accomplishment this milestone.
    4. PLEASE NOTE: Additional game from the 1961 Alabama Crimson Tide football season: When working with the 1961 season, we came across a fully reconstructed game, the "Iron Bowl" game against Auburn on December 2, that had been converted into a television production using digitized coaches film and historical accounts maintained at the Paul W. Bryant Museum. Based on the availability of this digitized full-game content, we were able to index every play of that game as if it were a game from the 2017 season (described above). See 2019-20 Academic Year Research Progress Report for detailed documentation and software demonstration pertaining to this game.
  2. Research question 3: Integration: After completing the semantic indexing of individual plays from the three seasons of Alabama Crimson Tide football as described above, we were able to successfully evaluate the querying of plays as if they were all in the same semantic database. Here are example SPARQL queries demonstrating this capability. (PLEASE NOTE: To run each query, click on the Blue Arrow icon in lower left portion of screen after clicking on links below):
    1. All rushing touchdowns that went for over 50 yards during the 3 seasons of Alabama Crimson Tide football in our database (just those for which there are video clips)
    2. All interceptions returned for 15 or more yards during the 3 seasons of Alabama Crimson Tide football in our database (just those for which there are video clips)
    3. All successful field goals from 30 yards or more during the 3 seasons of Alabama Crimson Tide football in our database (just those successful field goals for which there are video clips)
  3. Research question 4: Identification: Crimson Tide Photos, a unit in UA Athletics, provided us with a random sample of born digital photos that were taken during games from the 2017 Alabama Crimson Tide football season, and we were able to identify each image in terms of the play context and the game statistical situation at the time that each photo was taken. (See Plays with Example UA Images May 2019 for documentation of this milestone.)

Conventional Indexing Era: Omeka and ContentDM Software (2008 to 2018)

During this period, SLIS students in my annual spring semester Metadata course (LS 566) benefited from the application of my research as I was able to use it to inform my teaching. Specifically, my research served as the basis for their course project work, which involved the application of indexing theories by way of applying the standard/conventional approach to the indexing of a set of images. These images were provided by Ken Gaddy, Director of the Paul W. Bryant Museum, and they were both digitized black and white photos of Alabama Crimson Tide football games from the 1975 season and also born digital color images from the 2010 National Championship game.

The indexing of these images was accomplished using two different software applications.

  1. Omeka software (2011-2018): Here is a representative example of the end result of the last of such indexing projects from the Spring 2018 semester of this course.
  2. ContentDM (2008-2010)

Chronology of Research (and Other) Outputs

The entries below are arranged reverse chronologically:

2020

MacCall, S.L. (2020). Systems and methods for digital asset organization. U.S . Patent number 10,534,812.

Partnership established with Dr. Yu Gan, assistant professor of Electrical and Computer Engineering at the University of Alabama. Dr. Gan is an expert in digital image processing, and he is working with a masters students to extract player participation data from born digital and digitized historical video from the National Championship game from the 2017 season and a game 1961 Iron Bowl.

MacCall, S.L., et al. (In preparation). Shatford-Layne model and indexing efficiency: An experiment in linked data methods for the efficient indexing of sports images. Journal of Documentation.

MacCall, S.L. (In review). Data-driven semantic DAM indexing incorporating statistical play-by-play game logs: A linked data application using Wikibase from the 2017 football season of the Alabama Crimson Tide. Conference paper submitted to 2020 LD4 Conference on Linked Data in Libraries, College Station, TX.

MacCall, S.L., Liu, H., & Anderson, C.M. (In review). Statistical data recovery from historical documentation of Alabama football games using Wikibase as a repository. Conference paper submitted to Connecting Collections as Data: Transforming Communities, Sharing Knowledge, and Building Networks with International GLAM Labs, Washington, DC.

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with Wikibase software, as the basis for the major indexing project in my brand new spring 2020 Linked Data course (LS 590)

2019

Anderson, C.M., Liu, H., & MacCall, S.L. (2019). Crowdsourcing in a semantic indexing workflow for efficiently organizing historical multimedia sports collections. Poster accepts for the 2020 Annual Meeting of the Alabama Library Association, Birmingham, AL.

MacCall, S.L., Liu, H., & Anderson, C.M. (2019). How much statistical data can be recovered from Alabama football history? Piloting a crowdsourced approach using Wikibase as data repository. Conference paper presented at 2019 Digitorium Digital Humanities Conference, Tuscaloosa, AL. [UA Institutional Repository deposit: https://ir.ua.edu/handle/123456789/6574]

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with OmekaS software, as the basis for the major metadata indexing project in my spring 2019 Metadata course (LS 566)

2018

MacCall, S.L. (2018). Investigation of a data-driven indexing method for multimedia asset collections in sports: Phase 2: Developing SLIS research capacity for key linked open data technologies. University of Alabama School of Library and Information Studies Research Fund -$1,000. Funded.

MacCall, S.L., & Bott, G. (2018). Investigation of a data-driven indexing method for multimedia asset collections in sports: Phase 1: How much data can be recovered from Alabama football history? University of Alabama Office of Research and Development Research Grants Committee Level 1 Program - $6,000. Funded

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with OmekaS software, as the basis for the major metadata indexing project in my spring 2018 Metadata course (LS 566)

2017

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with OmekaS software, as the basis for the major metadata indexing project in my spring 2017 Metadata course (LS 566)

2016

MacCall, S.L., McMillan, D.J., Vargo, C.J., Bradley, S.B., & Aversa, E.A. (2016). Efficiency, integration, interoperability: A 21st century approach to organizing sports digital assets for all libraries. Knight Foundation’s News Challenge for Libraries: How Might Libraries Serve 21st Century Information Needs? - pre-budget grant submission. Not funded.

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with Omeka Classic software, as the basis for the major metadata indexing project in my spring 2016 Metadata course (LS 566)

2015

MacCall, S.L. (Filed December 15, 2015). Systems and methods for digital asset organization. U.S. Utility Patent Application number 14/971,463.

MacCall, S.L., Vargo, C.J., Bradley, S.B., & Aversa, E.A. (2015). Development of a novel digital asset organizing method in sports. National Science Foundation - Small Business Innovation Research (SBIR) Phase I Grant - $225,000 ($74,925 sub-award to University of Alabama). Not funded.

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with Omeka Classic software, as the basis for the major metadata indexing project in my spring 2015 Metadata course (LS 566)

2014

MacCall, S.L. [Chief Scientist for MaxOrg, LLC], Aversa, E.A. [CEO for MaxOrg, LLC], & McMillan, D.J. [Technology Officer for MaxOrg, LLC]. (2014, 2015, 2016). Crimson Canvas - MaxOrg, LLC. Program participation: Commercial Development of Faculty Developed UA Intellectual Property sponsored by Alabama Innovation and Mentoring of Entrepreneurs (AIME).

MaxOrg, LLC formed as a faculty-led startup to contribute to the commercial development of UA intellectual property (see 2020 and 2015 entries above for patent issued and patent filed data respectively)

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with Omeka Classic software, as the basis for the major metadata indexing project in my spring 2014 Metadata course (LS 566)

2013

MacCall, S.L. & Gaddy, K. (2013). Optimal organizing of digital images in sports: A project of the Paul W. Bryant Museum and UA SLIS. Presented at the 2013 University of Alabama Program in Sports Communication Sports Symposium, Tuscaloosa, AL. [Slideshare: http://tinyurl.com/k875g5j]

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with Omeka Classic software, as the basis for the major metadata indexing project in my spring 2013 Metadata course (LS 566)

2011

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with Omeka Classic software, as the basis for the major metadata indexing project in my spring 2011 Metadata course (LS 566)

2010

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with ContentDM software, as the basis for the major metadata indexing project in my spring 2010 Metadata course (LS 566)

2009

Used the Paul W. Bryant Museum image sets provided in 2008 by Ken Gaddy, along with ContentDM software, as the basis for the major metadata indexing project in my spring 2009 Metadata course (LS 566)

2008

Partnership begun with Ken Gaddy, director of the Paul W. Bryant Museum. Ken provided my students and me with a set of digitized black and white photos from the 1975 Alabama Crimson Tide football season and a set of born digital color images from the 2010 National Championship game. These images served as the training set for students in my LS 566 course on Metadata every year since 2008. Equally importantly, my own study of these images was instrumental in the development of my research program and subsequently led to the research output reported here.

Adopted use of ContentDM image repository software for the metadata indexing project in my LS 566 Metadata course at SLIS.

Team Members and Major Supporters

  1. Ken Gaddy, Director, Paul W. Bryant Museum
  2. Dr. Greg Bott, Assistant Professor, UA Culverhouse College of Business (co-author of RGC grant)
  3. David McMillan, Executive Director, Enterprise Development Application Support and MaxOrg, LLC (led install of Wikibase software and its software customization to this project)
  4. Dr. Yu Gan, Assistant Professor, UA Department of Electrical and Computer Engineering (digital image processing)
  5. Huapu Liu, MLIS, my former Graduate Research Assistant and frequent co-author (provided indispensable support from the very beginning of the Wikibase phase of this research project when we were faced with an empty database!)
  6. C. Melissa Anderson, my current Graduate Research Assistant and co-author (project manager for RGC grant)
  7. Christina Schultz-Richert, MLIS graduate (worked to extend the football related model to Public Television episode transcripts)
  8. Dr. Elizabeth Aversa,retired SLIS Director and MaxOrg, LLC