Files
Abstract
Symbolic data analysis was first introduced by Diday (1987) and presents an alternative approach to classical data when the data have a more complex formulation; assuming different forms, such as lists, intervals, histograms, etc. There is an increasing need to develop and improve techniques to deal with and to make inferences about symbolic data, while offering efficiency and interpretability to the results. Focusing solely on interval-valued data, where observations are represented by a lower and upper bound, an approach for partial least squares path modeling is proposed. Partial least squares path modeling (PLS-PM) is also known as a variance based method for structural equation modeling (SEM), and intends to quantify and estimate the directional relationship between latent and manifest variables. The PLS-PM uses a two-step iterative process to estimate the latent variables, followed by successive linear regressions to obtain the estimation of the parameters in structural and measurement models.
A partial least squares path modeling approach for interval-valued variables method is proposed, by using two of the currently available regression methods for interval-valued variables, the center method and the symbolic covariance method. The PLS-PM for interval-valued data is illustrated with an example with data from the past ten years from the National Football League (NFL) games. Later, some simulation studies are conducted in order to understand the behavior of the estimation of the parameters in both the structural and measurement models when different aspects vary. Among other findings, the simulations show that, in general, the symbolic covariance method produces more variable estimates than does the center method, and is also more affected by any collinearity in the structural model. Furthermore, for both regression methods, wider interval-valued variables tend to produce estimates with larger absolute relative biases. The PLS-PM process and the algorithm to randomly generate interval-valued data with the SEM structure are thoroughly described in the dissertation.
A partial least squares path modeling approach for interval-valued variables method is proposed, by using two of the currently available regression methods for interval-valued variables, the center method and the symbolic covariance method. The PLS-PM for interval-valued data is illustrated with an example with data from the past ten years from the National Football League (NFL) games. Later, some simulation studies are conducted in order to understand the behavior of the estimation of the parameters in both the structural and measurement models when different aspects vary. Among other findings, the simulations show that, in general, the symbolic covariance method produces more variable estimates than does the center method, and is also more affected by any collinearity in the structural model. Furthermore, for both regression methods, wider interval-valued variables tend to produce estimates with larger absolute relative biases. The PLS-PM process and the algorithm to randomly generate interval-valued data with the SEM structure are thoroughly described in the dissertation.