QUERYING AND DATA MINING ON PARTITIONED KNOWLEDGE GRAPHS

Priyadarshi, Amitabh

QUERYING AND DATA MINING ON PARTITIONED KNOWLEDGE GRAPHS

Priyadarshi, Amitabh

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Large-scale knowledge graphs with billions of nodes and edges are increasingly common in many application areas. Their large sizes often exceed the limits of systems storing the graphs in a centralized data store, especially if placed in main memory. To overcome some of the problems related to their size, knowledge graphs can be partitioned into multiple sub-graphs and distributed as shards among many computing nodes for further processing. Querying on these shards, due to distributed joins, increases the communication cost, but by reducing the edge cuts and continually re-partitioning, based on the given or changing query workload, maintains a good average processing time. Graph embedding on knowledge graphs converts a graph into a vector space where the structure and the inherent properties of the graph may be preserved. Many embedding algorithms are based on random walks and executing such algorithms against partitioned graphs poses new challenges, such as preserving network neighborhood of nodes.This thesis introduces a novel methods of knowledge graph partitioning that considers a set of queries (workload), as well as an adaptive partitioning method for large knowledge graphs, which adapts the partitioning in response to changes in the query workload. The resulting partitioning aims to reduce the number of distributed joins necessary to answer the queries in the workload and thus improves the workload performance. Critical features identified in the query workload and the knowledge graph are used to cluster the queries and then partition the graph based on the clusters. Workload queries are rewritten to account for the graph partitioning. The thesis also introduces a novel method for embedding partitioned knowledge graphs. After partitioning, the method performs (partial) random walks in each partition, in parallel. Complete walks are then assembled before an embedding is created. Our evaluation results demonstrate the performance improvements in the query workload processing time and improvements after dynamically adapting the partitioning of the knowledge graph. Another evaluation demonstrates that the performance of knowledge graph embedding is improved after partitioning the graph, and that the quality of the generated embedding is similar to the one produced on the complete, unpartitioned graph.

Details

Record ID

3002

Record Created

2024-12-05

Title

QUERYING AND DATA MINING ON PARTITIONED KNOWLEDGE GRAPHS

Author

Priyadarshi, Amitabh

Contributor

Kochut, Krzysztof J Advisor
Miller, John A. Committee Member
Arpinar, Budak I. Committee Member

College or School

College of Engineering

Department

School of Computing

Content Type

Dissertation

Pagination

145

File Format

pdf

Language

English

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia

Year Degree Granted

2023-05

Keywords

Adaptive Partitioning; Feature Learning; Graph Partitioning; Graph Representation Learning; Knowledge Graph; Node Embedding

Record Appears in

College, School, or Unit > College of Engineering > School of Computing
Electronic Theses and Dissertations > Doctoral Dissertation
All Resources
Doctoral

System Control Number

9949559127202959

PDF

Statistics

Download Full History