Files
Abstract
An immense number of articles containing important information are being published every day. We have developed a generalized text mining system which automatically extracts relationships between concepts from free text and presents them in user desired format. The system requires example sentences with entities of interest annotated by the user as an input to train the system. The system uses the SPARQL query language as an interface to identify grammatical patterns existing in the sentence, which helps in extracting relationships. A curatorial system can be used to verify extracted relationships. To improve the performance, an additional module was developed that generates SPARQL query patterns using expert feedback from the curatorial system; this module adds patterns to the extraction patterns set. Similar patterns are combined to reduce the overall numbers of distinct patterns to speed up extraction process. Additionally, the module improves system accuracy over time.