Files
Abstract
Despite substantial declines in cardiovascular disease (CVD) mortality across counties in the United States from 2009 to 2018, notable racial/ethnic, socioeconomic, and regional disparities persist. Health disparities in CVD mortality are closely linked to social determinants of health (SDOH), highlighting the need to address SDOH domains. Addressing these domains through targeted strategies is vital for reducing disparities and improving CVD outcomes. Challenges related to longitudinal data on SDOH include correlations of observations from the same subject and potential time-varying response patterns. Therefore, it is crucial to utilize statistical models that consider the within-subject correlation and the time-dependent effects of covariates. Models providing population-averaged effects or individual-specific estimates have been developed to address these challenges. Missing data often arise in longitudinal studies and are generally assumed to be missing at random when conditioned on relevant observed information. Modern longitudinal studies operate within a high-dimensional framework. Variable selection and regularization methods effectively address related challenges in SDOH, as they shrink coefficients to prevent overfitting and select variables within groups. The Exclusive Lasso manages grouped variables, ensuring at least one predictor from each predefined group is selected. Given the high-dimensional and longitudinal nature of county-level SDOH data, advanced clustering methods are necessary to reveal variations in the longitudinal relationship between SDOH domains and CVD mortality. Different subpopulations can demonstrate distinct behaviors over time, highlighting the necessity for clustering techniques to identify more homogeneous groups. In this dissertation, I developed a novel approach to integrate Exclusive Lasso into penalized weighted generalized estimating equations to facilitate domain-specific variable selection under missing at random. Furthermore, I propose a model-based clustering extension for high-dimensional longitudinal data, utilizing Exclusive Lasso to identify subpopulations of counties influenced by distinct covariates within each domain. Finally, to enhance this approach, I will employ the model-based clustering method using Exclusive Lasso to refine our understanding of county-level variations within each state. By integrating an additional algorithm that considers these variations, we can categorize counties based on their unique characteristics.