Inductive Dependency Parsing of Natural Language Text
Sammanfattning: This thesis investigates new methods for syntactic parsing of unrestricted natural language text under requirements of robustness and disambiguation. A parsing system is required to assign to every sentence in a text at least one analysis (robustness) and at most one analysis (disambiguation). The single analysis should be correct as often as possible (accuracy), and the computation should consume as little time and memory as possible (efficiency).The parsing methods proposed are formalized in a general framework of inductive dependency parsing, where dependency parsing is defined as the derivation of labeled dependency graphs, satisfying such constraints as single-headedness, acyclicity, connectedness and projectivity, and where inductive machine learning is used to guide the parser at nondeterministic choice points.The main contribution is a new algorithm for deterministic parsing that derives labeled projective dependency graphs in a single left-to-right pass over the input. The algorithm is proven optimal with respect to robustness, disambiguation and efficiency, meaning that it derives a single dependency graph for every input sentence in time that is linear in the length of the sentence.The parsing algorithm is combined with inductive machine learning using a history-based model where the next parser action is predicted from features of the current parser configuration, features that include the part-of-speech, word form and dependency type of relevant input tokens. It is shown how memory-based learning and classification can be used to solve the learning problem by inducing classifiers from treebank data that predict the next parser action.The memory-based deterministic dependency parser is evaluated using treebank data from Swedish and English. The results show parsing accuracy close to the state of the art while maintaining robustness, disambiguation and good efficiency.
Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.