Files
Abstract
A crucial concept in linguistics is dependency; linguistic units are connected to and depend upon one another. This dissertation is concerned with incremental language processing — in minds and machines — viewed through the lens of dependency. It develops three ideas, spanning natural language processing, computational linguistics, and psycholinguistics. I contribute a novel dependency parsing algorithm for cognitive modeling. It assigns labeled dependencies to natural language in a manner which incorporates cognitively plausible biases: it operates incrementally, with only marginal lookahead, it is generative, modeling both explicit syntactic structure as well as word probability, and it is left-corner, making predictions about grammatical relations for unseen words, and operating with working memory demands that mimic those of the human sentence processor. Predicting processing difficulty effects for syntactically complex garden path constructions is an unsolved challenge in computational psycholinguistics. Thus far, surprisal derived from large language models comes up short. Hypothesizing that what is needed is a more precise estimate of syntactic update work performed in order to accommodate the next word, I contribute the Kullback-Leibler divergence calculated between distributions of partial syntactic analyses at a word before and after the generation of that word as a novel information-theoretical complexity metric. I demonstrate that this metric, calculated using the dependency parser developed here, outperforms surprisal metrics in predicting self-paced reading times for direct object/sentential complement and transitive/intransitive garden path sentences. While many accounts of sentence comprehension assume word-by-word processing, and while traditional experimental methods for investigating written language processing implement serial presentation, recent research has investigated sentence processing using the rapid parallel visual presentation (RPVP) paradigm, in which short sentences are displayed all at once, but only briefly. I test the hypothesis of whether, despite parallel presentation, sentences in this paradigm, are processed in a serial, left-to-right manner. Surprisal derived from the dependency parser developed here is correlated against observed electroencephalogram data as participants engage in an RPVP task. Evidence is found for syntactically-informed serial, left-to-right processing, a result which dramatically diverges from — but is not necessarily inconsistent with — previous results which have drawn a conclusion of parallel processing.