Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Large-scale knowledge graphs with billions of nodes and edges are increasingly common in many application areas. Their large sizes often exceed the limits of systems storing the graphs in a centralized data store, especially if placed in main memory. To overcome some of the problems related to their size, knowledge graphs can be partitioned into multiple sub-graphs and distributed as shards among many computing nodes for further processing. Querying on these shards, due to distributed joins, increases the communication cost, but by reducing the edge cuts and continually re-partitioning, based on the given or changing query workload, maintains a good average processing time. Graph embedding on knowledge graphs converts a graph into a vector space where the structure and the inherent properties of the graph may be preserved. Many embedding algorithms are based on random walks and executing such algorithms against partitioned graphs poses new challenges, such as preserving network neighborhood of nodes.This thesis introduces a novel methods of knowledge graph partitioning that considers a set of queries (workload), as well as an adaptive partitioning method for large knowledge graphs, which adapts the partitioning in response to changes in the query workload. The resulting partitioning aims to reduce the number of distributed joins necessary to answer the queries in the workload and thus improves the workload performance. Critical features identified in the query workload and the knowledge graph are used to cluster the queries and then partition the graph based on the clusters. Workload queries are rewritten to account for the graph partitioning. The thesis also introduces a novel method for embedding partitioned knowledge graphs. After partitioning, the method performs (partial) random walks in each partition, in parallel. Complete walks are then assembled before an embedding is created. Our evaluation results demonstrate the performance improvements in the query workload processing time and improvements after dynamically adapting the partitioning of the knowledge graph. Another evaluation demonstrates that the performance of knowledge graph embedding is improved after partitioning the graph, and that the quality of the generated embedding is similar to the one produced on the complete, unpartitioned graph.

Details

PDF

Statistics

from
to
Export
Download Full History