Award Date

5-1-2025

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Committee Member

Kazem Taghva

Second Committee Member

Mingon Kang

Third Committee Member

Laxmi Gewali

Fourth Committee Member

Emma Regentova

Fifth Committee Member

Fatma Nasoz

Sixth Committee Member

Ashok Singh

Number of Pages

74

Abstract

Machine reading comprehension is a critical step in development of applications that require the semantic understanding of human speech-to-text driven work. Many devices such as smart home appliances like the Amazon Echo Dot, Google Home, or smart assistants like Apple Siri or Microsoft Cortana are examples of these applications. The comprehension task involves a deeper understanding and recognition of named entities such as person names, locations, medicals codes, quantities, abbreviations, and acronyms in speech or text data. In this dissertation, we explore and extend the different approaches and techniques in modern research that tackles the problem of recognition and definition of acronyms and abbreviations. Also, we offer different techniques for disambiguation of abbreviations that are caused by the abundance and frequent introduction of new abbreviations. We provide the following contributions: 1) A historical background on the rule-based and statistical methods for finding acronyms and their definitions. 2) A method based on the bidirectional encoder representations from transformers question answering model to find acronym definitions in each document. Our experiments show that this model can correctly predict 94% of acronym expansions assuming a Jaro–Winkler threshold distance of greater than 0.8. 3) An exploration of the different approaches and techniques to solve the problem of ambiguous abbreviations and their definitions. We reverse engineered the process of creating ad hoc abbreviations and found some preliminary statistics on what makes them easier or harder to define. In addition to recognition and definition of acronyms and abbreviations, this dissertation contributes to a systematic generative method to create datasets and use them to build a corpus for acronym expansion. Our approach for data generation can be used in many applications where there are no standard datasets.

Keywords

Abbreviations; Acronyms; Machine Learning; Machine Reading Comprehension; Natural Langauge Processing

Disciplines

Artificial Intelligence and Robotics | Computer Engineering | Computer Sciences

File Format

pdf

File Size

851 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/


Share

COinS