Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Traffic crashes remain a persistent global public health challenge. Traditional statistical and econometric approaches to crash analysis are fundamentally limited by their rigid functional forms and assumptions that fail to capture complex, nonlinear, and high-order interactions in crash data. Machine learning methods, particularly tree ensembles, often achieve better inference accuracy, but remain constrained by their reliance on structured data, which lacks contextual information, leading to an incomplete understanding of underlying mechanisms in safety-critical contexts. This research presents a systematic, three-stage progression of Large Language Models (LLMs)-oriented frameworks for traffic crash modeling and analysis. First, the Tab-Text framework is introduced, demonstrating the viability of a multi-modal approach by integrating structured tabular data with algorithmically generated textual descriptions. Using ELECTRA, a transformer-based model, this approach improves inference performance over traditional baselines while maintaining consistent identification of influential factors. Second, the study systematically benchmarks state-of-the-art foundation LLMs, including GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, leveraging advanced prompting techniques, such as zero-shot, few-shot, and Chain-of-Thought reasoning. This benchmarking establishes baseline capabilities and identifies best practices for eliciting domain-specific insights from general-purpose LLMs. Third, the research introduces CrashSage, a domain-specialized, end-to-end LLM framework for traffic crash modeling and analysis. CrashSage features a robust tabular-to-text generation pipeline that includes relational data integration, context-aware data augmentation, and supervised fine-tuning (SFT) of an LLM (LLaMA3-8B) with specialized domain knowledge. Empirical evaluation on real-world crash datasets shows that CrashSage achieves state-of-the-art inference accuracy, outperforming both conventional models and larger, general-purpose LLMs. Importantly, this work advances beyond prediction by incorporating a novel, gradient-based explainable AI (XAI) method. It delivers transparent, word-level attributions of model predictions and enables a co-occurrence analysis of high-risk factors. By elucidating complex interdependencies between human, vehicle, environmental, and infrastructure elements, this framework turns the LLM from a ``black box'' into an interpretable engine for safety discovery. Ultimately, this dissertation provides a methodological blueprint for applying LLMs in safety-critical domains. It paves the way for transportation agencies to shift from retrospective analysis toward proactive, data-driven interventions. Future directions involve bridging the gap between transportation and public health data systems, integrating multi-modal visual and sensor data, advancing from correlational analysis to robust causal inference, and developing continual learning capabilities to adapt to evolving environments.

Details

PDF

Statistics

from
to
Export
Download Full History