Files
Abstract
RDF data is a labeled directed graph. SPARQL is an RDF Query language that is used toextract information from the RDF Graph. There are different RDF Engines like Sesame, RDF-3X, OWLIM & Jena. Jena is the most popular framework and is widely used. Jena In-Memorymodel cannot scale for large RDF datasets while Jena SDB and Jena TDB have high latencies. Inthis thesis we propose a new system RGIS (RDF Graph Split and Index) for processingSPARQL queries on RDF data. RGIS is not only scalable but also faster than Jena and OWLIMSE(BigOWLIM). RGIS uses a custom data format and novel indexing technique to store theRDF data. Our custom format stores the RDF data into different files based on Classes andObject Properties present in the RDF data. These files are then given an index and each instancein these files is given a unique index value. We have also developed an RDF structure-awareQuery Planner that uses the topology of RDF graph to intelligently schedule various queryoperations. When compared with Jena TDB, OWLIM and Mulgara on LUBM datasets, RGISwas not only had faster response times but also has less memory overhead.