Data-driven Semantic Indexing of Time-based Broadcast Media Series

From Wikibase.slis.ua.edu
Jump to navigation Jump to search

Introduction

In this project, the Linked Data Research Group at UA SLIS seeks to generalize our semantic indexing method for sports image and video clip collections to collections ("series") of documentaries and other broadcast genres.

Our goal is to investigate models for the semantic indexing of broadcast series for granular access and semantic integration across the linked data cloud using data generated by AI/ML methods applied to the documentaries of the series.

At this time, we can report very preliminary results and example SPARQL queries over 5 AABP-hosted documentaries in The Alabama Experience series with data generated by the Microsoft Azure Video Analyzer service.
PLEASE NOTE: This is a proof of concept demo, and we acknowledge some issues with our semantic data model and with SPARQL query construction.

Method

We are investigating the incorporation of data generated with computational methods into a semantic indexing method for a series of documentary episodes post-digitization using AI/ML processes and using “deep hyperlinks” as locators that directly point within digital video. Our research question was "Can named entities and features detected by AI/ML processes serve as a data source for the data-driven semantic indexing of documentary series?"

A preliminary evaluation of this question led us to use Microsoft Azure Video Analyzer service to generate JSON files with various detected features. For this proof of concept, we concentrated on just a few of the detected features (what MS refers to as “insights”): Transcripts lines and transcript blocks as well as detected topics and recognized named entities (persons and locations). We extracted these features from the JSON output, transformed them into RDF triples, and loaded them into our local Wikibase instance for SPARQL querying.

We indexed a sub-collection of 5 documentaries from The Alabama Experience series all from the same year (1997):

  1. Roses of Crimson (60 minutes)
  2. Natural Assets (30 minutes)
  3. Miller’s Pottery (30 minutes)
  4. High Calling (30 minutes)
  5. A Season with the Forgotten Farmers (30 minutes)

SPARQL Queries

To run each query, click the large blue/white arrowhead in the lower corner after following link (PLEASE NOTE: The deep links to video ontent at the AAPB are currently not functioning properly).

  1. This query retrieves deep hyperlinks pointing to all identified topics and named entities from the 5 episodes indexed from the series (The Alabama Experience) that have a computed confidence level equal to or exceeding .9: https://tinyurl.com/2h4cn4mg (this link runs the same query, but with a lower threshold (.5): https://tinyurl.com/ygv7sg3n)
  2. This query retrieves all “Alabama (topic)” references identified across 5 documentaries indexed from the series (The Alabama Experience): https://tinyurl.com/yg8723xo
  3. This query retrieves deep hyperlinks pointing to all identified named entities in one of the documentaries (Roses of Crimson): https://tinyurl.com/yzb9tdmm
  4. This query retrieves deep hyperlinks pointing to all identified topics in one of the documentaries (A Season with the Forgotten Farmers): https://tinyurl.com/yjatv89z

As noted above, this is a proof of concept demo, and we acknowledge some issues with our semantic data model and with SPARQL query construction. Also, note that this search experience in not optimized as a quality user interface experience (e.g., there is a lot of redundant content in the retrieved data), but rather to demonstrate the data query capabilities of this approach.

Conclusion

The small sample size prevents significant conclusions. We did show that precision searching is possible using data generated by an AI/ML feature extraction and named entity recognition service. The Microsoft Azure Video Analyzer service, though, has a wide aim in terms of corporate and other customers that are outside of GLAM. That having been said, we will expand our study to include more features generated by the Azure Video Analyzer service. We will also experiment with other AI/ML services, including those aimed at the cultural heritage sector, such as CLAMS.ai.