Difference between revisions of "Main Page"

From Wikibase.slis.ua.edu
Jump to navigation Jump to search
Line 26: Line 26:
 
## [[Chronology: Data-driven Sports Image Indexing Research]], which documents our work up until now.
 
## [[Chronology: Data-driven Sports Image Indexing Research]], which documents our work up until now.
 
## [https://tinyurl.com/y5wcxqen Video of presentation] by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference
 
## [https://tinyurl.com/y5wcxqen Video of presentation] by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference
# <b>Philology Graphs resulting from the semantic indexing of text collections</b>: We are investigating an ontology that would serve to integrate texts in library collections extending the work of the [https://osf.io/mx6uk/wiki/home/ Collections as Data] research community.
+
# <b>Philology Graphs resulting from the semantic indexing of text collections</b>: The rapid growth in the number of digital texts has posed challenges for information professionals to make them more navigable and usable. These challenges have prompted researchers to develop different approaches to create semantic descriptions for these resources aiming at more efficient and effective granular access to specific textual content. In this poster, we report on ongoing work to semantically describe book indexes and their logical structures. We address the challenge of computationally recovering the typographical logic of book indexes that is lost when standard page-level OCR scanning techniques are deployed on index pages in historical books hosted at the HathiTrust Digital Library. Recovering typographical logic would aid in the provision of granular access to digital texts through improving textual navigation.
  
 
====Current Researchers (Fall 2021)====
 
====Current Researchers (Fall 2021)====

Revision as of 19:46, 15 June 2021

Welcome to the Linked Data Research Group!
School of Library and Information Studies
University of Alabama

Leadership

Steven L. MacCall, PhD
Associate Professor
School of Library and Information Studies
University of Alabama

Mission

Our digital libraries research mission is the application of linked data methods to investigate the data-driven semantic indexing of text and time-based media collections (born digital and digitized historical) to facilitate the precision search via SPARQL queries and URL-based locators of attribute-bearing items, item parts, and other granular components contained in those collections. We are investigating various methods for deriving properties as well as deploying various methods for extracting named entities and features from collection items that have been either manually created (e.g., book indexes or video logs) or computationally detected (e.g., among others, Microsoft Azure Video Analyzer for Media). The result of our semantic indexing method are SPARQL queriable RDF knowledge graphs that are hosted on a local Wikibase instance managed by the Office of Information Technology (OIT) at the University of Alabama.

Publications, Presentations, Background Readings

General Research Questions

There are two overarching empirical research questions that we are pursuing:

  1. Scalability of semantic indexing method: Data-driven aspects of our semantic indexing methods are key to scalability issues. Following semantic uplift methods, which involve ETL pipeline development in which RDF triples are generated from the transform phase, we have had demonstrable success using R and Python scripting methods coupled with semantic data models and direct API uploads to our Wikibase instance to improve batch management efficiencies versus QuickStatements. A scalable semantic index method would result in an information-professional-in-the-loop method where data-driven techniques enable precision collection indexing at scale.
  2. Queriability of semantic indexing method: The semantic indexing of both text and time-based media collections facilitates precision queries (compare a book index (more precise) to a table of contents (less precise)). When RDF triples are loaded to a knowledge graph, the resulting triplestore can be queried with precision using SPARQL. But are these precise queries useful?

Projects

  1. Sports DAM Knowledge Graphs resulting from the semantic indexing of time-based media collections: Our longest standing project, we are investigating precision queries of collections of still images and video clips that document game action in sports. Primary data source for semantic indexing method are play-by-play datasets. Precision queries incorporate game situation variables. Additional information:
    1. Chronology: Data-driven Sports Image Indexing Research, which documents our work up until now.
    2. Video of presentation by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference
  2. Philology Graphs resulting from the semantic indexing of text collections: The rapid growth in the number of digital texts has posed challenges for information professionals to make them more navigable and usable. These challenges have prompted researchers to develop different approaches to create semantic descriptions for these resources aiming at more efficient and effective granular access to specific textual content. In this poster, we report on ongoing work to semantically describe book indexes and their logical structures. We address the challenge of computationally recovering the typographical logic of book indexes that is lost when standard page-level OCR scanning techniques are deployed on index pages in historical books hosted at the HathiTrust Digital Library. Recovering typographical logic would aid in the provision of granular access to digital texts through improving textual navigation.

Current Researchers (Fall 2021)

  1. Dr. Steven L. MacCall
  2. Huapu Liu, CIS doctoral student

Affiliate Researchers

  1. Dr. Greg Bott from the UA Department of Information Systems, Statistics and Management Science for database design and Python programming guidance
    1. Austin Herriott, Dr. Bott's undergraduate student, provided Python programming for the first iteration of our sports ETL pipeline funded with RGC grant monies in fall 2019
  2. Dr. Yu Gan from the UA Department of Electrical and Computer Engineering for digital image processing
    1. Alexander Ramey, Dr. Gan's masters student, is assisting in developing an algorithm for extracting players numbers visible in YouTube game video, which will added to our knowledge graph as named entity data.

Previous SLIS Students

  1. Nicole Lewis, SLIS MLIS student: For her Fall 2020 and Spring 2021 directed research studies, Nicole applied what she learned in her Linked Data course in Summer 2020. She worked with the HathiTrust Digital Library and their Research Center's Extracted Features Datasets. Nicole was instrumental in the initial design of the "transform" portion of an ETL pipeline using back-of-the-book indexes as a source for named entities for page-level subject access.
  2. C. Melissa Anderson, SLIS MLIS graduate: Melissa served as Dr. MacCall's graduate assistant during the 2019-20 academic year assisting immeasurably in project development, management, and student teaching role in various technology-intensive SLIS courses taught by Dr. MacCall. Melissa also served as research project manager for an RGC grant for which she was instrumental in preparing materials for a Zooniverse-based crowdsourcing effort and in recruiting and supervising volunteer transcribers. This led to her inclusion as a co-author on a DH conference presentation.
  3. Jessica Camano, SLIS MLIS graduate: For her Fall 2020 directed research study, Jessica is investigating a "red letter" problem that emerged from our work in the summer Linked Data course in which our subject terms remain without MediaWiki sitepages. Resolution of this problem involved researching the National Library of Medicine's linked data service (Medical Subject Headings RDF) to download metadata about MeSH entries as RDF triples.
  4. David Roby, SLIS MLIS graduate: For his Fall 2020 directed research study, David is helping us investigate a research problem related to scaling the data management methods and ETL pipeline for our ongoing research into the semantic indexing of digital images and video clips documenting game action in sports.
  5. Christine Schultz-Richert, MLIS graduate: For her Summer 2019 directed research study, Christine investigated the semantic enhancement of transcripts from a 1957 National Educational Television (NET) program (A Look at the Indian's Future) using linked data methods.

Special Thanks

Special thanks to David J. McMillan, Executive Director for Enterprise Development & Application Support in the UA Office of Information Technology (OIT)