Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In the real world, we may encounter various kinds of data types to which conventional statistical theories and methods may not be directly applied, and symbolic data is one kind. The observations of a symbolic data set are described by categorical variables, intervals, histograms, distributions and so forth, instead of single values. Therefore, symbolic data need novel methods for analysis. In this dissertation, we develop divisive hierarchical clustering methodologies for interval-valued data which are the most commonly-used symbolic data. We first propose three monothetic divisive clustering algorithms for interval-valued data and a weighted symbolic principal component analysis (PCA) method for interval-valued data. The first algorithm is based on the symbolic covariance PCA method proposed by Le-Rademacher (2008) and Le-Rademacher and Billard (2012). The second algorithm is based on our proposed weighted symbolic PCA method. The third one is based on the endpoints of intervals. Then, two mixed-strategy algorithms combining these three algorithms are also proposed. A series of simulations is conducted to compare these algorithms with an existing monothetic divisive algorithm proposed by Chavent (1998, 2000). The two mixed-strategy algorithms outperform all the other monothetic algorithms in these simulations and they are also applied to real-world data to further validate their effectiveness. Furthermore, we propose a polythetic divisive clustering algorithm for interval-valued data based on minimum spanning trees. The effectiveness of this algorithm is also verified through several simulations and real data applications.

Details

PDF

Statistics

from
to
Export
Download Full History