  1. 1. Inductive Dependency Parsing of Natural Language Text

    Författare :Joakim Nivre; Walter Daelemans; Växjö universitet; []
    Nyckelord :NATURVETENSKAP; NATURAL SCIENCES; natural language parsing; dependency parsing; memory-based learning; treebank parsing; Systems engineering; Systemteknik; Computer and Information Sciences Computer Science; Data- och informationsvetenskap;

    Sammanfattning : This thesis investigates new methods for syntactic parsing of unrestricted natural language text under requirements of robustness and disambiguation. A parsing system is required to assign to every sentence in a text at least one analysis (robustness) and at most one analysis (disambiguation). LÄS MER

  2. 2. MaltParser -- An Architecture for Inductive Labeled Dependency Parsing

    Författare :Johan Hall; Joakim Nivre; Welf Löwe; Martin Volk; Växjö universitet; []
    Nyckelord :NATURVETENSKAP; NATURAL SCIENCES; Dependency Parsing; Support Vector Machines; Machine Learning; Language technology; Språkteknologi; Computer and Information Sciences Computer Science; Data- och informationsvetenskap;

    Sammanfattning : This licentiate thesis presents a software architecture for inductive labeled dependency parsing of unrestricted natural language text, which achieves a strict modularization of parsing algorithm, feature model and learning method such that these parameters can be varied independently. The architecture is based on the theoretical framework of inductive dependency parsing by Nivre \citeyear{nivre06c} and has been realized in MaltParser, a system that supports several parsing algorithms and learning methods, for which complex feature models can be defined in a special description language. LÄS MER

  3. 3. The Multilingual Forest : Investigating High-quality Parallel Corpus Development

    Författare :Yvonne Adesam; Martin Volk; Joakim Nivre; Koenraad de Smedt; Stockholms universitet; []
    Nyckelord :NATURVETENSKAP; NATURAL SCIENCES; treebank; syntax; alignment; corpus; annotation projection; multilingual; tagging; parsing; datorlingvistik; Computational Linguistics;

    Sammanfattning : This thesis explores the development of parallel treebanks, collections of language data consisting of texts and their translations, with syntactic annotation and alignment, linking words, phrases, and sentences to show translation equivalence. We describe the semi-manual annotation of the SMULTRON parallel treebank, consisting of 1,000 sentences in English, German and Swedish. LÄS MER

  4. 4. Principal Word Vectors

    Författare :Ali Basirat; Joakim Nivre; Hinrich Schütze; Uppsala universitet; []
    Nyckelord :HUMANIORA; HUMANITIES; NATURVETENSKAP; NATURAL SCIENCES; TEKNIK OCH TEKNOLOGIER; ENGINEERING AND TECHNOLOGY; word; context; word embedding; principal component analysis; PCA; sparse matrix; singular value decomposition; SVD; entropy;

    Sammanfattning : Word embedding is a technique for associating the words of a language with real-valued vectors, enabling us to use algebraic methods to reason about their semantic and grammatical properties. This thesis introduces a word embedding method called principal word embedding, which makes use of principal component analysis (PCA) to train a set of word embeddings for words of a language. LÄS MER

  5. 5. Automatic and unsupervised methods in natural language processing

    Författare :Johnny Bigert; Viggo Kann; joakim Nivre; KTH; []
    Nyckelord :NATURVETENSKAP; NATURAL SCIENCES; Datalogi; Datalogi; Computer science; Datalogi;

    Sammanfattning : Natural language processing (NLP) means the computer-aided processing of language produced by a human. But human language is inherently irregular and the most reliable results are obtained when a human is involved in at least some part of the processing. However, manual workis time-consuming and expensive. LÄS MER