Difference between revisions of "Main Page"

From Wikibase.slis.ua.edu
Jump to navigation Jump to search
Line 11: Line 11:
  
 
<b>Huapu Liu, MLIS<br></b>
 
<b>Huapu Liu, MLIS<br></b>
Doctor Student<br>
+
Doctoral Student<br>
 
College of Communication and Information Sciences<br>
 
College of Communication and Information Sciences<br>
 
University of Alabama
 
University of Alabama

Revision as of 01:48, 16 June 2021

Welcome to the Linked Data Research Group!
School of Library and Information Studies
University of Alabama

Leadership

Steven L. MacCall, PhD
Associate Professor
School of Library and Information Studies
University of Alabama

Huapu Liu, MLIS
Doctoral Student
College of Communication and Information Sciences
University of Alabama

Mission

Our digital libraries research mission is to apply linked data methods to investigate the data-driven semantic indexing of text and time-based media collections (born digital and digitized historical) to facilitate the precision search for attribute-bearing collection items, item parts, and other granular components via SPARQL queries and URL-based locators. We are using a Collections as (Meta)data and closed-system indexing approach as we investigate various methods for indexing collections based on semantically mapping properties and extracted named entities and features that have been either manually created (e.g., book indexes or video logs) or computationally detected (e.g., among others, Microsoft Azure Video Analyzer for Media). The result of our semantic indexing method are SPARQL queriable RDF knowledge graphs that are hosted on a local Wikibase instance managed by the Office of Information Technology (OIT) at the University of Alabama.

Publications, Presentations, Background Readings

General Research Questions

There are two overarching empirical research questions that we are pursuing:

  1. Scalability of semantic indexing method: Data-driven aspects of our semantic indexing method are key to addressing scalability issues. Following semantic uplift methods, which involve ETL pipeline development in which RDF triples are generated during the transform phase, we have had demonstrable success using R and Python scripting methods coupled with semantic data models and direct API access to our Wikibase instance facilitate efficient transformed data upload at scale.
  2. Queriability of semantic indexing method: The semantic indexing of both text and time-based media collections facilitates precision queries (compare a book index (more precise) to a table of contents (less precise)). When RDF triples are loaded to a knowledge graph, the resulting triplestore can be queried with precision using SPARQL. But are these precise queries useful?

Projects

  1. Sports DAM Knowledge Graphs resulting from the semantic indexing of time-based media collections: Our longest standing project, we are investigating precision queries of collections of still images and video clips that document game action in sports. Primary data source for semantic indexing method are play-by-play datasets. Precision queries incorporate game situation variables. Additional information:
    1. Chronology: Data-driven Sports Image Indexing Research, which documents our work up until now.
    2. Video of presentation by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference
  2. Philology Graphs resulting from the semantic indexing of text collections: The rapid growth in the number of digital texts has posed challenges for information professionals to make them more navigable and usable. These challenges have prompted researchers to develop different approaches to create semantic descriptions for these resources aiming at more efficient and effective granular access to specific textual content. Ongoing work includes the challenge of computationally recovering the typographical logic of book indexes that is lost when standard page-level OCR scanning techniques are deployed on index pages in historical books hosted at the HathiTrust Digital Library. Recovering typographical logic would aid in the provision of granular access to digital texts through improving textual navigation.
  3. Documentary Series Knowledge Graphs resulting from the semantic indexing of time-based documentary media collections (series): This our most recently established project as we seek to generalize our semantic indexing method from sports image and video clip collections to collections ("series") of documentaries and other broadcast genres. Objectives:
    1. Our goal is to investigate models for the semantic indexing of broadcast series for granular access and semantic integration across the linked data cloud using data generated by ML/AI computational methods applied to the documentaries of the series.
    2. Our purpose is to extend and enhance the existing relationships of the AAPB and WGBH with the UA School of Library and Information Studies to benefit both EBSCO Scholars and SLIS students generally.
    3. Additional information: Data-driven Semantic Indexing of Time-based Broadcast Media Series

Current Researchers (Fall 2021)

  1. Dr. Steven L. MacCall
  2. Huapu Liu, CIS doctoral student

Affiliate Researchers

  1. Dr. Greg Bott from the UA Department of Information Systems, Statistics and Management Science for database design and Python programming guidance
    1. Austin Herriott, Dr. Bott's undergraduate student, provided Python programming for the first iteration of our sports ETL pipeline funded with RGC grant monies in fall 2019
  2. Dr. Yu Gan from the UA Department of Electrical and Computer Engineering for digital image processing
    1. Alexander Ramey, Dr. Gan's masters student, is assisting in developing an algorithm for extracting players numbers visible in YouTube game video, which will added to our knowledge graph as named entity data.

Previous SLIS Students

  1. Nicole Lewis, SLIS MLIS student: For her Fall 2020 and Spring 2021 directed research studies, Nicole applied what she learned in her Linked Data course in Summer 2020. She worked with the HathiTrust Digital Library and their Research Center's Extracted Features Datasets. Nicole was instrumental in the initial design of the "transform" portion of an ETL pipeline using back-of-the-book indexes as a source for named entities for page-level subject access.
  2. C. Melissa Anderson, SLIS MLIS graduate: Melissa served as Dr. MacCall's graduate assistant during the 2019-20 academic year assisting immeasurably in project development, management, and student teaching role in various technology-intensive SLIS courses taught by Dr. MacCall. Melissa also served as research project manager for an RGC grant for which she was instrumental in preparing materials for a Zooniverse-based crowdsourcing effort and in recruiting and supervising volunteer transcribers. This led to her inclusion as a co-author on a DH conference presentation.
  3. Jessica Camano, SLIS MLIS graduate: For her Fall 2020 directed research study, Jessica is investigating a "red letter" problem that emerged from our work in the summer Linked Data course in which our subject terms remain without MediaWiki sitepages. Resolution of this problem involved researching the National Library of Medicine's linked data service (Medical Subject Headings RDF) to download metadata about MeSH entries as RDF triples.
  4. David Roby, SLIS MLIS graduate: For his Fall 2020 directed research study, David is helping us investigate a research problem related to scaling the data management methods and ETL pipeline for our ongoing research into the semantic indexing of digital images and video clips documenting game action in sports.
  5. Christine Schultz-Richert, MLIS graduate: For her Summer 2019 directed research study, Christine investigated the semantic enhancement of transcripts from a 1957 National Educational Television (NET) program (A Look at the Indian's Future) using linked data methods.

Special Thanks

Special thanks to David J. McMillan, Executive Director for Enterprise Development & Application Support in the UA Office of Information Technology (OIT)