Difference between revisions of "Main Page"
(97 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ||
− | < | + | <font size="+2"><b>Welcome to the Linked Data Research Group!</b></font><br> |
− | <b>[https://smaccall.people.ua.edu Steven L. MacCall, PhD]<br> | + | Established in August 2018<br><br> |
+ | [https://slis.ua.edu/ School of Library and Information Studies]<br> | ||
+ | [https://cis.ua.edu/ College of Communication and Information Sciences]<br> | ||
+ | University of Alabama | ||
+ | |||
+ | For information, email [https://smaccall.people.ua.edu Dr. Steven L. MacCall] at [mailto:smaccall@ua.edu smaccall@ua.edu] <br><br> | ||
+ | |||
+ | ====Publications, Presentations, Background Readings==== | ||
+ | * [[Publications and Presentations]] | ||
+ | * Selected background [[Reading Lists]] | ||
+ | |||
+ | ====Current Projects==== | ||
+ | |||
+ | # <b>Sports DAM Knowledge Graphs resulting from the semantic indexing of time-based media collections</b>: Our longest standing project, we are investigating precision queries of collections of still images and video clips that document game action in sports. Primary data source for semantic indexing method are play-by-play datasets. Precision queries incorporate game situation variables. Additional information: | ||
+ | ## [https://wikibase.slis.ua.edu/wiki/Chronology:_Data-driven_Sports_Image_Indexing_Research Overview of Project] | ||
+ | ## [https://wikibase.slis.ua.edu/wiki/Chronology:_Data-driven_Sports_Image_Indexing_Research#Research-related_Accomplishments_by_Year Research-related Accomplishments by Year] | ||
+ | ## [https://wikibase.slis.ua.edu/wiki/Chronology:_Data-driven_Sports_Image_Indexing_Research#Research_Result_Highlights_and_Links_to_Working_Software_Demonstrations Research Result Highlights and Links to Working Software Demonstrations] | ||
+ | ## [https://tinyurl.com/y5wcxqen Video of presentation] by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference | ||
+ | # <b>Philology Graphs resulting from the semantic indexing of text collections</b>: The rapid growth in the number of digital texts has posed challenges for librarians to make them more navigable and usable. These challenges have prompted researchers to develop different approaches to create semantic descriptions for these resources aiming at more efficient and effective granular access to specific textual content. Ongoing work includes the challenge of computationally recovering the typographical logic of book indexes that is lost when standard page-level OCR scanning techniques are deployed on index pages in historical books hosted at the HathiTrust Digital Library. Recovering typographical logic would aid in the provision of granular access to digital texts through improving textual navigation. Additional information: | ||
+ | ## Liu, H., & MacCall, S.L. (2022). Historical Text Datafication and Loss: Computational Recovery of Typographical Layout Logic on an RDF Graph Featuring ML Methods. Conference paper presented at the 2022 ASIS&T Annual Meeting Pre-conference AI Workshop: [https://asistaiworkshop.web.illinois.edu/program-2022/ AI in the Real World: Strengthening Connections Between LIS Research and Practice] (<b>[http://smaccall.people.ua.edu/uploads/1/0/1/9/101978326/2022_asist_ai_workshop_presentation-liu_and_maccall-final.pdf presentation slides and speaker notes]</b>). | ||
+ | ## Liu, H., MacCall, S.L., & Lewis, N.A. (2021). Knowledge Graph as Navigational Paratext: Returning Structural Semantics to a Closed System Index through the Computational Recovery of the Typographical Logic of Digitized Historical Book Indexes. Conference paper presented at Wissensorganisation 2021, biennial conferences of the German chapter of the International Society of Knowledge Organization (<b>[http://smaccall.people.ua.edu/uploads/1/0/1/9/101978326/ppt-final_liu-maccall-lewis-wissenpresoconf-presentation-2021.pdf presentation slides]</b>). | ||
+ | ## Liu, H., MacCall, S.L., & Lewis, N.A. (2021). Exploring the computational recovery of the typographical logic of book indexes as paratext for improving navigation within digitized historical texts using semantic methods. Poster presented at the 2021 annual meeting of the American Society for Information Science and Technology. (<b>[http://smaccall.people.ua.edu/uploads/1/0/1/9/101978326/finalposter_2021asistannualmeeting.pdf Poster PDF]</b>). | ||
+ | # <b>Documentary Series Knowledge Graphs resulting from the semantic indexing of time-based documentary media collections (series)</b>: This our most recently established project as we seek to generalize our semantic indexing method from sports image and video clip collections to collections ("series") of documentaries and other broadcast genres. Objectives: | ||
+ | ## Our goal is to investigate models for the semantic indexing of broadcast series for granular access and semantic integration across the linked data cloud using data generated by ML/AI computational methods applied to the documentaries of the series. | ||
+ | ## Our purpose is to extend and enhance the existing relationships of the AAPB and WGBH with the UA [https://slis.ua.edu/ School of Library and Information Studies] to benefit both EBSCO Scholars and SLIS students generally. | ||
+ | ## Additional information: [[Data-driven Semantic Indexing of Time-based Broadcast Media Series]] | ||
+ | |||
+ | ====Leadership==== | ||
+ | <b>[https://smaccall.people.ua.edu Steven L. MacCall, PhD]<br></b> | ||
Associate Professor<br> | Associate Professor<br> | ||
School of Library and Information Studies<br> | School of Library and Information Studies<br> | ||
− | University of Alabama | + | University of Alabama |
− | + | ||
− | <b>< | + | <b>Huapu Liu, MLIS<br></b> |
− | <br> | + | Doctoral Student<br> |
+ | College of Communication and Information Sciences<br> | ||
+ | University of Alabama | ||
+ | |||
====Mission==== | ====Mission==== | ||
− | Our digital libraries research mission is | + | Our digital libraries research mission is to apply linked data methods to investigate the <b>data-driven semantic indexing of text and time-based media collections</b> (born digital and digitized historical) to facilitate the precision search for attribute-bearing collection items, item parts, and other granular components via SPARQL queries and URL-based locators. We are using a [https://osf.io/mx6uk/wiki/home/ Collections as (Meta)data] and [https://www.theindexer.org/files/23-1/23-1_023.pdf closed-system indexing approach] as we investigate various methods for indexing collections based on semantically mapping properties and extracted named entities and features that have been either manually created (e.g., book indexes or video logs) or computationally detected (e.g., among others, [https://docs.microsoft.com/en-us/azure/azure-video-analyzer/video-analyzer-docs/ Microsoft Azure Video Analyzer for Media]). The result of our semantic indexing method are SPARQL queriable RDF knowledge graphs that are hosted on a local Wikibase instance managed by the Office of Information Technology (OIT) at the University of Alabama. |
====General Research Questions==== | ====General Research Questions==== | ||
There are two overarching empirical research questions that we are pursuing: | There are two overarching empirical research questions that we are pursuing: | ||
− | # Scalability of semantic indexing method: Data-driven | + | # <b>Scalability of semantic indexing method</b>: Data-driven aspects of our semantic indexing method are key to addressing scalability issues. Following semantic uplift methods, which involve ETL pipeline development in which RDF triples are generated during the transform phase, we have had demonstrable success using R and Python scripting methods coupled with semantic data models and direct API access to our Wikibase instance facilitate efficient transformed data upload at scale. |
− | # Queriability of semantic indexing | + | # <b>Queriability of semantic indexing results</b>: The semantic indexing of both text and time-based media collections facilitates precision queries (compare a book index (more precise) to a table of contents (less precise)). When RDF triples are loaded to a knowledge graph, the resulting triplestore can be queried with precision using SPARQL. But are these precise queries useful? |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | ====Current Researchers (2022-23 Academic Year)==== | |
# [http://smaccall.people.ua.edu/ Dr. Steven L. MacCall] | # [http://smaccall.people.ua.edu/ Dr. Steven L. MacCall] | ||
− | # Huapu Liu, CIS doctoral student | + | # Huapu Liu, CIS doctoral candidate |
− | # | + | # Duncan McCaskill, CIS doctoral student |
− | + | # William Blackerby, SLIS master's student | |
− | < | + | ====Affiliate Researchers==== |
− | # [https://www.ua.edu/news/ua-theme-expert/gregory-bott/ Dr. Greg Bott] | + | # <b>[https://www.stevens.edu/profile/ygan5 Dr. Yu Gan]</b> from Stevens Institute of Technology for digital image processing |
− | + | # <b>[https://www.ua.edu/news/ua-theme-expert/gregory-bott/ Dr. Greg Bott]</b> from the UA Department of Information Systems, Statistics and Management Science for database design and Python programming guidance | |
− | |||
− | |||
− | < | + | ====Previous SLIS Students==== |
− | # C. Melissa Anderson, SLIS MLIS graduate | + | # <b>Nicole Lewis, SLIS MLIS graduate</b>: For her Fall 2020 and Spring 2021 directed research studies, Nicole applied what she learned in her Linked Data course in Summer 2020. She worked with the HathiTrust Digital Library Research Center's [https://analytics.hathitrust.org/datasets Extracted Features Datasets]. Nicole was instrumental in the initial design of the "transform" portion of an ETL pipeline using back-of-the-book indexes as a source for named entities for page-level subject access. |
− | + | # <b>C. Melissa Anderson, SLIS MLIS graduate</b>: Melissa served as Dr. MacCall's graduate assistant during the 2019-20 academic year assisting immeasurably in project development, management, and student teaching role in various technology-intensive SLIS courses taught by Dr. MacCall. Melissa also served as research project manager for an RGC grant for which she was instrumental in preparing materials for a Zooniverse-based crowdsourcing effort and in recruiting and supervising volunteer transcribers. This led to her inclusion as a co-author on a [https://ir.ua.edu/handle/123456789/6574 DH conference presentation]. | |
− | # Jessica Camano, SLIS MLIS graduate: For her Fall 2020 directed research study, Jessica is investigating a "red letter" problem that emerged from our work in the summer Linked Data course in which our subject terms remain without MediaWiki sitepages. Resolution of this problem involved researching the National Library of Medicine's linked data service ([https://id.nlm.nih.gov/mesh/ Medical Subject Headings RDF]) to download metadata about MeSH entries as RDF triples. | + | # <b>Jessica Camano, SLIS MLIS graduate</b>: For her Fall 2020 directed research study, Jessica is investigating a "red letter" problem that emerged from our work in the summer Linked Data course in which our subject terms remain without MediaWiki sitepages. Resolution of this problem involved researching the National Library of Medicine's linked data service ([https://id.nlm.nih.gov/mesh/ Medical Subject Headings RDF]) to download metadata about MeSH entries as RDF triples. |
− | # David Roby, SLIS MLIS graduate: For his Fall 2020 directed research study, David is helping us investigate a research problem related to scaling the data management methods and ETL pipeline for our ongoing research into the semantic indexing of digital images and video clips documenting game action in sports. | + | # <b>David Roby, SLIS MLIS graduate</b>: For his Fall 2020 directed research study, David is helping us investigate a research problem related to scaling the data management methods and ETL pipeline for our ongoing research into the semantic indexing of digital images and video clips documenting game action in sports. |
+ | # <b>Christine Schultz-Richert, SLIS MLIS graduate</b>: For her Summer 2019 directed research study, Christine investigated the semantic enhancement of transcripts from a 1957 [https://americanarchive.org/special_collections/net-catalog National Educational Television (NET)] program (A Look at the Indian's Future) using linked data methods. | ||
− | + | ====Special Thanks==== | |
+ | Special thanks to David J. McMillan, former Executive Director for Enterprise Development & Application Support in the UA Office of Information Technology (OIT) |
Latest revision as of 16:45, 14 July 2023
Welcome to the Linked Data Research Group!
Established in August 2018
School of Library and Information Studies
College of Communication and Information Sciences
University of Alabama
For information, email Dr. Steven L. MacCall at smaccall@ua.edu
Contents
Publications, Presentations, Background Readings
- Publications and Presentations
- Selected background Reading Lists
Current Projects
- Sports DAM Knowledge Graphs resulting from the semantic indexing of time-based media collections: Our longest standing project, we are investigating precision queries of collections of still images and video clips that document game action in sports. Primary data source for semantic indexing method are play-by-play datasets. Precision queries incorporate game situation variables. Additional information:
- Overview of Project
- Research-related Accomplishments by Year
- Research Result Highlights and Links to Working Software Demonstrations
- Video of presentation by Dr. MacCall to the 2020 Linked Data for Libraries (LD4L) conference
- Philology Graphs resulting from the semantic indexing of text collections: The rapid growth in the number of digital texts has posed challenges for librarians to make them more navigable and usable. These challenges have prompted researchers to develop different approaches to create semantic descriptions for these resources aiming at more efficient and effective granular access to specific textual content. Ongoing work includes the challenge of computationally recovering the typographical logic of book indexes that is lost when standard page-level OCR scanning techniques are deployed on index pages in historical books hosted at the HathiTrust Digital Library. Recovering typographical logic would aid in the provision of granular access to digital texts through improving textual navigation. Additional information:
- Liu, H., & MacCall, S.L. (2022). Historical Text Datafication and Loss: Computational Recovery of Typographical Layout Logic on an RDF Graph Featuring ML Methods. Conference paper presented at the 2022 ASIS&T Annual Meeting Pre-conference AI Workshop: AI in the Real World: Strengthening Connections Between LIS Research and Practice (presentation slides and speaker notes).
- Liu, H., MacCall, S.L., & Lewis, N.A. (2021). Knowledge Graph as Navigational Paratext: Returning Structural Semantics to a Closed System Index through the Computational Recovery of the Typographical Logic of Digitized Historical Book Indexes. Conference paper presented at Wissensorganisation 2021, biennial conferences of the German chapter of the International Society of Knowledge Organization (presentation slides).
- Liu, H., MacCall, S.L., & Lewis, N.A. (2021). Exploring the computational recovery of the typographical logic of book indexes as paratext for improving navigation within digitized historical texts using semantic methods. Poster presented at the 2021 annual meeting of the American Society for Information Science and Technology. (Poster PDF).
- Documentary Series Knowledge Graphs resulting from the semantic indexing of time-based documentary media collections (series): This our most recently established project as we seek to generalize our semantic indexing method from sports image and video clip collections to collections ("series") of documentaries and other broadcast genres. Objectives:
- Our goal is to investigate models for the semantic indexing of broadcast series for granular access and semantic integration across the linked data cloud using data generated by ML/AI computational methods applied to the documentaries of the series.
- Our purpose is to extend and enhance the existing relationships of the AAPB and WGBH with the UA School of Library and Information Studies to benefit both EBSCO Scholars and SLIS students generally.
- Additional information: Data-driven Semantic Indexing of Time-based Broadcast Media Series
Leadership
Steven L. MacCall, PhD
Associate Professor
School of Library and Information Studies
University of Alabama
Huapu Liu, MLIS
Doctoral Student
College of Communication and Information Sciences
University of Alabama
Mission
Our digital libraries research mission is to apply linked data methods to investigate the data-driven semantic indexing of text and time-based media collections (born digital and digitized historical) to facilitate the precision search for attribute-bearing collection items, item parts, and other granular components via SPARQL queries and URL-based locators. We are using a Collections as (Meta)data and closed-system indexing approach as we investigate various methods for indexing collections based on semantically mapping properties and extracted named entities and features that have been either manually created (e.g., book indexes or video logs) or computationally detected (e.g., among others, Microsoft Azure Video Analyzer for Media). The result of our semantic indexing method are SPARQL queriable RDF knowledge graphs that are hosted on a local Wikibase instance managed by the Office of Information Technology (OIT) at the University of Alabama.
General Research Questions
There are two overarching empirical research questions that we are pursuing:
- Scalability of semantic indexing method: Data-driven aspects of our semantic indexing method are key to addressing scalability issues. Following semantic uplift methods, which involve ETL pipeline development in which RDF triples are generated during the transform phase, we have had demonstrable success using R and Python scripting methods coupled with semantic data models and direct API access to our Wikibase instance facilitate efficient transformed data upload at scale.
- Queriability of semantic indexing results: The semantic indexing of both text and time-based media collections facilitates precision queries (compare a book index (more precise) to a table of contents (less precise)). When RDF triples are loaded to a knowledge graph, the resulting triplestore can be queried with precision using SPARQL. But are these precise queries useful?
Current Researchers (2022-23 Academic Year)
- Dr. Steven L. MacCall
- Huapu Liu, CIS doctoral candidate
- Duncan McCaskill, CIS doctoral student
- William Blackerby, SLIS master's student
Affiliate Researchers
- Dr. Yu Gan from Stevens Institute of Technology for digital image processing
- Dr. Greg Bott from the UA Department of Information Systems, Statistics and Management Science for database design and Python programming guidance
Previous SLIS Students
- Nicole Lewis, SLIS MLIS graduate: For her Fall 2020 and Spring 2021 directed research studies, Nicole applied what she learned in her Linked Data course in Summer 2020. She worked with the HathiTrust Digital Library Research Center's Extracted Features Datasets. Nicole was instrumental in the initial design of the "transform" portion of an ETL pipeline using back-of-the-book indexes as a source for named entities for page-level subject access.
- C. Melissa Anderson, SLIS MLIS graduate: Melissa served as Dr. MacCall's graduate assistant during the 2019-20 academic year assisting immeasurably in project development, management, and student teaching role in various technology-intensive SLIS courses taught by Dr. MacCall. Melissa also served as research project manager for an RGC grant for which she was instrumental in preparing materials for a Zooniverse-based crowdsourcing effort and in recruiting and supervising volunteer transcribers. This led to her inclusion as a co-author on a DH conference presentation.
- Jessica Camano, SLIS MLIS graduate: For her Fall 2020 directed research study, Jessica is investigating a "red letter" problem that emerged from our work in the summer Linked Data course in which our subject terms remain without MediaWiki sitepages. Resolution of this problem involved researching the National Library of Medicine's linked data service (Medical Subject Headings RDF) to download metadata about MeSH entries as RDF triples.
- David Roby, SLIS MLIS graduate: For his Fall 2020 directed research study, David is helping us investigate a research problem related to scaling the data management methods and ETL pipeline for our ongoing research into the semantic indexing of digital images and video clips documenting game action in sports.
- Christine Schultz-Richert, SLIS MLIS graduate: For her Summer 2019 directed research study, Christine investigated the semantic enhancement of transcripts from a 1957 National Educational Television (NET) program (A Look at the Indian's Future) using linked data methods.
Special Thanks
Special thanks to David J. McMillan, former Executive Director for Enterprise Development & Application Support in the UA Office of Information Technology (OIT)