Automatically Cataloging Scholarly Articles using Library of Congress Subject Headings

Nazmul Kazi, Nathaniel Lane, Indika Kahanda

Student Research Workshop Long paper Paper

Gather-2F: Apr 22, Gather-2F: Apr 22 (13:00-15:00 UTC) [Join Gather Meeting]

Abstract: Institutes are required to catalog their articles with proper subject headings so that the users can easily retrieve relevant articles from the institutional repositories. However, due to the rate of proliferation of the number of articles in these repositories, it is becoming a challenge to manually catalog the newly added articles at the same pace. To address this challenge, we explore the feasibility of automatically annotating articles with Library of Congress Subject Headings (LCSH). We first use web scraping to extract keywords for a collection of articles from the Repository Analytics and Metrics Portal (RAMP). Then, we map these keywords to LCSH names for developing a gold-standard dataset. As a case study, using the subset of Biology-related LCSH concepts, we develop predictive models by formulating this task as a multi-label classification problem. Our experimental results demonstrate the viability of this approach for predicting LCSH for scholarly articles.

Connected Papers in EACL2021

Similar Papers

CD^2CR: Co-reference resolution across documents and domains
James Ravenscroft, Amanda Clare, Arie Cattan, Ido Dagan, Maria Liakata,
LOME: Large Ontology Multilingual Extraction
Patrick Xia, Guanghui Qin, Siddharth Vashishtha, Yunmo Chen, Tongfei Chen, Chandler May, Craig Harman, Kyle Rawlins, Aaron Steven White, Benjamin Van Durme,
Forum 4.0: An Open-Source User Comment Analysis Framework
Marlo Haering, Jakob Smedegaard Andersen, Chris Biemann, Wiebke Loosen, Benjamin Milde, Tim Pietz, Christian Stöcker, Gregor Wiedemann, Olaf Zukunft, Walid Maalej,
Identifying Named Entities as they are Typed
Ravneet Arora, Chen-Tse Tsai, Daniel Preotiuc-Pietro,