Difference between revisions of "Main Page"
Line 23: | Line 23: | ||
====Projects==== | ====Projects==== | ||
− | # <b>Sports DAM Knowledge Graph (Time-based Media Collections)</b> | + | # <b>Sports DAM Knowledge Graph (Time-based Media Collections)</b>: We are investigating precision queries of collections of still images and video clips that document game action in sports. Primary data source for semantic indexing method are play-by-play datasets. Precision queries incorporate game situation variables. Additional information: |
− | We are investigating precision queries of collections of still images and video clips that document game action in sports. Primary data source for semantic indexing method are play-by-play datasets. Precision queries incorporate game situation variables. Additional information: | + | ## [https://tinyurl.com/y5wcxqen Video of presentation] by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference |
− | + | ## [[Chronology: Data-driven Sports Image Indexing Research]], which documents our work up until now. | |
− | |||
# Basic research on philology graphs: We are investigating an ontology that would serve to integrate texts in library collections extending the work of the [https://osf.io/mx6uk/wiki/home/ Collections as Data] research community. | # Basic research on philology graphs: We are investigating an ontology that would serve to integrate texts in library collections extending the work of the [https://osf.io/mx6uk/wiki/home/ Collections as Data] research community. | ||
Revision as of 16:53, 15 June 2021
Welcome to the Linked Data Research Group!
School of Library and Information Studies
University of Alabama
Contents
Leadership
Steven L. MacCall, PhD
Associate Professor
School of Library and Information Studies
University of Alabama
Mission
Our digital libraries research mission is the application of linked data methods to investigate the data-driven semantic indexing of text and time-based media collections to facilitate the precision search via SPARQL queries and URL-based locators of attribute-bearing items, item parts, and other granular components contained in those collections. We are investigating various methods for deriving properties as well as deploying various methods for extracting named entities and features from collection items that have been either manually created (e.g., book indexes or video logs) or computationally detected (e.g., among others, Microsoft Azure Video Analyzer for Media). The result of our semantic indexing method are SPARQL queriable RDF knowledge graphs that are hosted on a local Wikibase instance managed by the Office of Information Technology (OIT) at the University of Alabama.
Publications, Presentations, Background Readings
- Publications and Presentations
- Selected Reading Lists
General Research Questions
There are two overarching empirical research questions that we are pursuing:
- Scalability of semantic indexing method: Data-driven aspects of our semantic indexing methods are key to scalability issues. Following semantic uplift methods, which involve ETL pipeline development in which RDF triples are generated from the transform phase, we have had demonstrable success using R and Python scripting methods coupled with semantic data models and direct API uploads to our Wikibase instance to improve batch management efficiencies versus QuickStatements. A scalable semantic index method would result in an information-professional-in-the-loop method where data-driven techniques enable precision collection indexing at scale.
- Queriability of semantic indexing method: The semantic indexing of both text and time-based media collections facilitates precision queries (compare a book index (more precise) to a table of contents (less precise)). When RDF triples are loaded to a knowledge graph, the resulting triplestore can be queried with precision using SPARQL. But are these precise queries useful?
Projects
- Sports DAM Knowledge Graph (Time-based Media Collections): We are investigating precision queries of collections of still images and video clips that document game action in sports. Primary data source for semantic indexing method are play-by-play datasets. Precision queries incorporate game situation variables. Additional information:
- Video of presentation by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference
- Chronology: Data-driven Sports Image Indexing Research, which documents our work up until now.
- Basic research on philology graphs: We are investigating an ontology that would serve to integrate texts in library collections extending the work of the Collections as Data research community.
Current Researchers (Fall 2021)
- Dr. Steven L. MacCall
- Huapu Liu, CIS doctoral student
Affiliate Researchers
- Dr. Greg Bott from the UA Department of Information Systems, Statistics and Management Science for database design and Python programming guidance
- Austin Herriott, Dr. Bott's undergraduate student, provided Python programming for the first iteration of our sports ETL pipeline funded with RGC grant monies in fall 2019
- Dr. Yu Gan from the UA Department of Electrical and Computer Engineering for digital image processing
- Alexander Ramey, Dr. Gan's masters student, is assisting in developing an algorithm for extracting players numbers visible in YouTube game video, which will added to our knowledge graph as named entity data.
Previous SLIS Students
- Nicole Lewis, SLIS MLIS student: For her Fall 2020 and Spring 2021 directed research studies, Nicole applied what she learned in her Linked Data course in Summer 2020. She worked with the HathiTrust Digital Library and their Research Center's Extracted Features Datasets. Nicole was instrumental in the initial design of the "transform" portion of an ETL pipeline using back-of-the-book indexes as a source for named entities for page-level subject access.
- C. Melissa Anderson, SLIS MLIS graduate: Melissa served as Dr. MacCall's graduate assistant during the 2019-20 academic year assisting immeasurably in project development, management, and student teaching role in various technology-intensive SLIS courses taught by Dr. MacCall. Melissa also served as research project manager for an RGC grant for which she was instrumental in preparing materials for a Zooniverse-based crowdsourcing effort and in recruiting and supervising volunteer transcribers. This led to her inclusion as a co-author on a DH conference presentation.
- Jessica Camano, SLIS MLIS graduate: For her Fall 2020 directed research study, Jessica is investigating a "red letter" problem that emerged from our work in the summer Linked Data course in which our subject terms remain without MediaWiki sitepages. Resolution of this problem involved researching the National Library of Medicine's linked data service (Medical Subject Headings RDF) to download metadata about MeSH entries as RDF triples.
- David Roby, SLIS MLIS graduate: For his Fall 2020 directed research study, David is helping us investigate a research problem related to scaling the data management methods and ETL pipeline for our ongoing research into the semantic indexing of digital images and video clips documenting game action in sports.
- Christine Schultz-Richert, MLIS graduate: For her Summer 2019 directed research study, Christine investigated the semantic enhancement of transcripts from a 1957 National Educational Television (NET) program (A Look at the Indian's Future) using linked data methods.
Special Thanks
Special thanks to David J. McMillan, Executive Director for Enterprise Development & Application Support in the UA Office of Information Technology (OIT)