Files
Abstract
Resource Description Framework (RDF) has been extensively used to represent the data for Semantic Web in recent times. Due to a large amount of RDF data, it is difficult to store it in a single system and query it using SPARQL. Instead, it is possible to partition the data into subsets and then query it using federated SPARQL queries. There are many challenges related to distributed querying: for instance, the processing time for a query increases in proportion to the number of distributed joins. We present a study on the impact of query- adaptive partitioning of the RDF data. We present a system called RePart that shuffles the data among the nodes of the cluster according to the incoming query workload to reduce the number of distributed joins while querying. Our evaluation based on several benchmarks demonstrates that the performance of federated queries is improved after performing the repartitioning of the triples according to the query-workload.