Award Date
5-1-2025
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Science
First Committee Member
Kazem Taghva
Second Committee Member
Mingon Kang
Third Committee Member
Laxmi Gewali
Fourth Committee Member
Emma Regentova
Fifth Committee Member
Fatma Nasoz
Sixth Committee Member
Ashok Singh
Number of Pages
74
Abstract
Machine reading comprehension is a critical step in development of applications that require the semantic understanding of human speech-to-text driven work. Many devices such as smart home appliances like the Amazon Echo Dot, Google Home, or smart assistants like Apple Siri or Microsoft Cortana are examples of these applications. The comprehension task involves a deeper understanding and recognition of named entities such as person names, locations, medicals codes, quantities, abbreviations, and acronyms in speech or text data. In this dissertation, we explore and extend the different approaches and techniques in modern research that tackles the problem of recognition and definition of acronyms and abbreviations. Also, we offer different techniques for disambiguation of abbreviations that are caused by the abundance and frequent introduction of new abbreviations. We provide the following contributions: 1) A historical background on the rule-based and statistical methods for finding acronyms and their definitions. 2) A method based on the bidirectional encoder representations from transformers question answering model to find acronym definitions in each document. Our experiments show that this model can correctly predict 94% of acronym expansions assuming a Jaro–Winkler threshold distance of greater than 0.8. 3) An exploration of the different approaches and techniques to solve the problem of ambiguous abbreviations and their definitions. We reverse engineered the process of creating ad hoc abbreviations and found some preliminary statistics on what makes them easier or harder to define. In addition to recognition and definition of acronyms and abbreviations, this dissertation contributes to a systematic generative method to create datasets and use them to build a corpus for acronym expansion. Our approach for data generation can be used in many applications where there are no standard datasets.
Keywords
Abbreviations; Acronyms; Machine Learning; Machine Reading Comprehension; Natural Langauge Processing
Disciplines
Artificial Intelligence and Robotics | Computer Engineering | Computer Sciences
File Format
File Size
851 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Choi, Sing, "Developments on Abbreviations Towards Machine Reading Comprehension" (2025). UNLV Theses, Dissertations, Professional Papers, and Capstones. 5255.
https://oasis.library.unlv.edu/thesesdissertations/5255
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/