Difference between revisions of "Main Page"
Line 41: | Line 41: | ||
There are two overarching empirical research questions that we are pursuing: | There are two overarching empirical research questions that we are pursuing: | ||
# <b>Scalability of semantic indexing method</b>: Data-driven aspects of our semantic indexing method are key to addressing scalability issues. Following semantic uplift methods, which involve ETL pipeline development in which RDF triples are generated during the transform phase, we have had demonstrable success using R and Python scripting methods coupled with semantic data models and direct API access to our Wikibase instance facilitate efficient transformed data upload at scale. | # <b>Scalability of semantic indexing method</b>: Data-driven aspects of our semantic indexing method are key to addressing scalability issues. Following semantic uplift methods, which involve ETL pipeline development in which RDF triples are generated during the transform phase, we have had demonstrable success using R and Python scripting methods coupled with semantic data models and direct API access to our Wikibase instance facilitate efficient transformed data upload at scale. | ||
− | # <b>Queriability of semantic indexing | + | # <b>Queriability of semantic indexing results</b>: The semantic indexing of both text and time-based media collections facilitates precision queries (compare a book index (more precise) to a table of contents (less precise)). When RDF triples are loaded to a knowledge graph, the resulting triplestore can be queried with precision using SPARQL. But are these precise queries useful? |
====Current Researchers (2022-23 Academic Year)==== | ====Current Researchers (2022-23 Academic Year)==== |
Revision as of 19:17, 9 January 2023
Welcome to the Linked Data Research Group!
School of Library and Information Studies
College of Communication and Information Sciences
University of Alabama
For information, email Dr. Steven L. MacCall at smaccall@ua.edu
Contents
Publications, Presentations, Background Readings
- Publications and Presentations
- Selected background Reading Lists
Current Projects
- Sports DAM Knowledge Graphs resulting from the semantic indexing of time-based media collections: Our longest standing project, we are investigating precision queries of collections of still images and video clips that document game action in sports. Primary data source for semantic indexing method are play-by-play datasets. Precision queries incorporate game situation variables. Additional information:
- Overview of Project
- Research-related Accomplishments by Year
- Research Result Highlights and Links to Working Software Demonstrations
- Video of presentation by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference
- Philology Graphs resulting from the semantic indexing of text collections: The rapid growth in the number of digital texts has posed challenges for information professionals to make them more navigable and usable. These challenges have prompted researchers to develop different approaches to create semantic descriptions for these resources aiming at more efficient and effective granular access to specific textual content. Ongoing work includes the challenge of computationally recovering the typographical logic of book indexes that is lost when standard page-level OCR scanning techniques are deployed on index pages in historical books hosted at the HathiTrust Digital Library. Recovering typographical logic would aid in the provision of granular access to digital texts through improving textual navigation.
- Documentary Series Knowledge Graphs resulting from the semantic indexing of time-based documentary media collections (series): This our most recently established project as we seek to generalize our semantic indexing method from sports image and video clip collections to collections ("series") of documentaries and other broadcast genres. Objectives:
- Our goal is to investigate models for the semantic indexing of broadcast series for granular access and semantic integration across the linked data cloud using data generated by ML/AI computational methods applied to the documentaries of the series.
- Our purpose is to extend and enhance the existing relationships of the AAPB and WGBH with the UA School of Library and Information Studies to benefit both EBSCO Scholars and SLIS students generally.
- Additional information: Data-driven Semantic Indexing of Time-based Broadcast Media Series
Leadership
Steven L. MacCall, PhD
Associate Professor
School of Library and Information Studies
University of Alabama
Huapu Liu, MLIS
Doctoral Student
College of Communication and Information Sciences
University of Alabama
Mission
Our digital libraries research mission is to apply linked data methods to investigate the data-driven semantic indexing of text and time-based media collections (born digital and digitized historical) to facilitate the precision search for attribute-bearing collection items, item parts, and other granular components via SPARQL queries and URL-based locators. We are using a Collections as (Meta)data and closed-system indexing approach as we investigate various methods for indexing collections based on semantically mapping properties and extracted named entities and features that have been either manually created (e.g., book indexes or video logs) or computationally detected (e.g., among others, Microsoft Azure Video Analyzer for Media). The result of our semantic indexing method are SPARQL queriable RDF knowledge graphs that are hosted on a local Wikibase instance managed by the Office of Information Technology (OIT) at the University of Alabama.
General Research Questions
There are two overarching empirical research questions that we are pursuing:
- Scalability of semantic indexing method: Data-driven aspects of our semantic indexing method are key to addressing scalability issues. Following semantic uplift methods, which involve ETL pipeline development in which RDF triples are generated during the transform phase, we have had demonstrable success using R and Python scripting methods coupled with semantic data models and direct API access to our Wikibase instance facilitate efficient transformed data upload at scale.
- Queriability of semantic indexing results: The semantic indexing of both text and time-based media collections facilitates precision queries (compare a book index (more precise) to a table of contents (less precise)). When RDF triples are loaded to a knowledge graph, the resulting triplestore can be queried with precision using SPARQL. But are these precise queries useful?
Current Researchers (2022-23 Academic Year)
- Dr. Steven L. MacCall
- Huapu Liu, CIS doctoral candidate
- Duncan McCaskill, doctoral student
- William Blackerby, master's student
Affiliate Researchers
- Dr. Yu Gan from Stevens Institute of Technology for digital image processing
- Dr. Greg Bott from the UA Department of Information Systems, Statistics and Management Science for database design and Python programming guidance
Previous SLIS Students
- Nicole Lewis, SLIS MLIS student: For her Fall 2020 and Spring 2021 directed research studies, Nicole applied what she learned in her Linked Data course in Summer 2020. She worked with the HathiTrust Digital Library and their Research Center's Extracted Features Datasets. Nicole was instrumental in the initial design of the "transform" portion of an ETL pipeline using back-of-the-book indexes as a source for named entities for page-level subject access.
- C. Melissa Anderson, SLIS MLIS graduate: Melissa served as Dr. MacCall's graduate assistant during the 2019-20 academic year assisting immeasurably in project development, management, and student teaching role in various technology-intensive SLIS courses taught by Dr. MacCall. Melissa also served as research project manager for an RGC grant for which she was instrumental in preparing materials for a Zooniverse-based crowdsourcing effort and in recruiting and supervising volunteer transcribers. This led to her inclusion as a co-author on a DH conference presentation.
- Jessica Camano, SLIS MLIS graduate: For her Fall 2020 directed research study, Jessica is investigating a "red letter" problem that emerged from our work in the summer Linked Data course in which our subject terms remain without MediaWiki sitepages. Resolution of this problem involved researching the National Library of Medicine's linked data service (Medical Subject Headings RDF) to download metadata about MeSH entries as RDF triples.
- David Roby, SLIS MLIS graduate: For his Fall 2020 directed research study, David is helping us investigate a research problem related to scaling the data management methods and ETL pipeline for our ongoing research into the semantic indexing of digital images and video clips documenting game action in sports.
- Christine Schultz-Richert, MLIS graduate: For her Summer 2019 directed research study, Christine investigated the semantic enhancement of transcripts from a 1957 National Educational Television (NET) program (A Look at the Indian's Future) using linked data methods.
Special Thanks
Special thanks to David J. McMillan, former Executive Director for Enterprise Development & Application Support in the UA Office of Information Technology (OIT)