Main Page

From Wikibase.slis.ua.edu
Revision as of 16:12, 15 June 2021 by Smaccall (talk | contribs)
Jump to navigation Jump to search

Welcome to the Linked Data Research Group!

Leadership

Steven L. MacCall, PhD
Associate Professor
School of Library and Information Studies
University of Alabama

Bibliography

Selected Reading Lists

Mission

Our digital libraries research mission is the application of linked data methods to investigate the data-driven semantic indexing of text and time-based media collections to facilitate the precision search via SPARQL queries and URL-based locators of attribute-bearing items, item parts, and other granular components contained in those collections. We are investigating various methods for deriving properties as well as deploying various methods for extracting named entities and features from collection items that have been either manually created (e.g., book indexes or video logs) or computationally detected (e.g., among others, Microsoft Azure Video Analyzer for Media). The result of our semantic indexing method are SPARQL queriable RDF knowledge graphs that are hosted on a local Wikibase instance managed by the Office of Information Technology (OIT) at the University of Alabama.

General Research Questions

There are two overarching empirical research questions that we are pursuing:

  1. Scalability of semantic indexing method: Data-driven aspects of our semantic indexing methods are key to scalability issues. Following semantic uplift methods, which involve ETL pipeline development in which RDF triples are generated from the transform phase, we have had demonstrable success using R and Python scripting methods coupled with semantic data models and direct API uploads to our Wikibase instance to improve batch management efficiencies versus QuickStatements. A scalable semantic index method would result in an information-professional-in-the-loop method where data-driven techniques enable precision collection indexing at scale.
  2. Queriability of semantic indexing method: The semantic indexing of both text and time-based media collections facilitates precision queries (compare a book index (more precise) to a table of contents (less precise)). When RDF triples are loaded to a knowledge graph, the resulting triplestore can be queried with precision using SPARQL. But are these precise queries useful?

Projects

1st project: Sports DAM Knowledge Graph:

  1. We are applying linked data technologies to semantically index still images and video clips that document game action in sports by incorporating play-by-play datasets into the indexing process by way of a semantic data model and ETL pipeline process. The resulting knowledge graph can be queried using SPARQL, which allows for precision searching based on queries that incorporate game situation variables. Aadditional information:
    1. Video of presentation by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference
    2. Chronology: Data-driven Sports Image Indexing Research, which documents our work up until now.
  2. Basic research on philology graphs: We are investigating an ontology that would serve to integrate texts in library collections extending the work of the Collections as Data research community.

Current researchers (Spring 2021):

  1. Dr. Steven L. MacCall
  2. Huapu Liu, CIS doctoral student
  3. Nicole Lewis, SLIS MLIS student: For her Fall 2020 directed research study, Nicole is applying what she learned in the Linked Data course from this past summer to extend her philology graph work from scientific article publishing to book publishing using the HathiTrust Research Center's Extracted Features Datasets. She is developing the "transform" portion of an ETL pipeline using back-of-the-book indexes as a source for named entities for chapter-level subject access and will evaluate her work with a set of SPARQL queries against a set of transformed books on the topic of cataloging.


Affiliate researchers:

  1. Dr. Greg Bott in UA Department of Information Systems, Statistics and Management Science for database design and Python programming guidance
    1. Austin Herriott, Dr. Bott's undergraduate student, provided Python programming for the first iteration of our sports ETL pipeline funded with RGC grant monies in fall 2019
  2. Dr. Yu Gan in UA Department of Electrical and Computer Engineering for digital image processing
    1. Alexander Ramey, Dr. Gan's masters student, is assisting in developing an algorithm for extracting players numbers visible in YouTube game video, which will added to our knowledge graph as named entity data.

Previous SLIS student:

  1. C. Melissa Anderson, SLIS MLIS graduate
  2. Christine Schultz-Richert, MLIS graduate: For her Summer 2019 directed research study, Christine investigated the semantic enhancement of transcripts from a 1957 National Educational Television (NET) program (A Look at the Indian's Future) using linked data methods.
  3. Jessica Camano, SLIS MLIS graduate: For her Fall 2020 directed research study, Jessica is investigating a "red letter" problem that emerged from our work in the summer Linked Data course in which our subject terms remain without MediaWiki sitepages. Resolution of this problem involved researching the National Library of Medicine's linked data service (Medical Subject Headings RDF) to download metadata about MeSH entries as RDF triples.
  4. David Roby, SLIS MLIS graduate: For his Fall 2020 directed research study, David is helping us investigate a research problem related to scaling the data management methods and ETL pipeline for our ongoing research into the semantic indexing of digital images and video clips documenting game action in sports.

Special thanks to David J. McMillan, Executive Director for Enterprise Development & Application Support in the UA Office of Information Technology (OIT)