Files
Abstract
Unifying information across the organizational data silos that lack documentation, structure and automated semantic discovery has been of an intense interest in the recent years. Although there are many data analysis tools to capture various statistical, quality, and provenance information within an enterprise, these vendor tools work in isolation. Yielding meaningful and consistent information from a landscape of different tools requires a holistic view over all extracted metadata. Knowledge graph is a common tool of data integration and knowledge discovery and it has become a backbone to APIs that demand access to structured knowledge. Constructing knowledge graphs has many steps from extracting databases metadata to storing the relationships in a unified ontology. In this research, we present our Universal Metadata Repository (UMR) applied to three in-flight use cases which combine the power of a technical and business view using knowledge graphs for: searching, traceability, enforcing accessibility standards, and providing consistent organizational architecture.Knowledge graphs provide a harmonized data management platform. One of their applications is providing more automation in querying RDBMSes and SQL-based data virtualization search engines. We propose a query rewriting method to find nontrivial inter-databaseassociations of relational data. Our approach enriches an input SPARQL query which lacks statements about nontrivial database associations and translates it to a completed SQL query using a knowledge graph capturing the relational databases metadata.There are heterogeneous databases in large organizations, such as relational databases and NoSQL databases. Concept extraction from the growing volume of this isolated and semi-structured data has applications in augmenting knowledge graphs and boosting knowledge discovery. However, extracting topics from semi-structured data suffers from lack of corpus or description as its major challenge. In this research, we investigate the impact of self-supplementation of words and documents on probabilistic topic modeling upon semi- structured data. Another contribution of this research is finding the best tuning of probabilistic topic modeling that fits semi-structured data. The extracted topics are potential summaries and concepts about the dataset. Moreover, in order to improve the precision of the topic extraction, we propose a selection heuristic for effective identification of topics hidden in various data sources.