Award Date

August 2025

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Committee Member

Kazem Taghva

Second Committee Member

Laxmi Gewali

Third Committee Member

Wolfgang Bein

Fourth Committee Member

Mingon Kang

Fifth Committee Member

Emma Regentova

Number of Pages

110

Abstract

This dissertation demonstrates that carefully adapted language-model pipelines can transform unstructured clinical-trial and pharmacological prose into reliable, low-latency structured data. Four interconnected studies support this claim.Tri-AL platform. An open-source dashboard ingests all 440 k+ ClinicalTrials.gov records—including every historical revision—into a normalized schema and parses the 20 GB XML archive over 10x faster than a BeautifulSoup baseline, while exposing hooks for demographic analytics and supporting integration of user-defined modules. Clinical trial summarization. An encoder–decoder model is trained on 57k description–summary pairs to condense clinical trials into a few sentences. ROUGE evaluation shows a 20% improvement over the baseline, while graph-based evaluation indicates the model preserves 71% of critical biomedical entities, yielding concise yet informative summaries suitable for evidence scans. MoA classification. A collection of models—including traditional classifiers (decision trees, random forests, XGBoost) and contrastively fine-tuned masked-language-model variants—achieves a macro F1 of 97%, effectively handling class imbalance and drug-class sparsity while also providing interpretable insights. Scalable medical NER pipeline. A dynamic and scalable pipeline is introduced for training lightweight Named Entity Recognition (NER) models adaptable to different entity types. Knowledge distillation compresses the large teacher model into a 110M-parameter student that retains 70% of gold-label accuracy (F1=0.61) while running 1000x faster and consuming just 6% of the memory. Collectively, these contributions provide scalable tools and empirical evidence that domain-specific NLP methods can be integrated to accelerate trial discovery, enhance drug-development analytics, and support data-driven clinical decision-making.

Keywords

clinical trials; information extraction; tri-al

Disciplines

Artificial Intelligence and Robotics | Computer Engineering

File Format

pdf

Degree Grantor

University of Nevada, Las Vegas

Language

English

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/


Share

COinS