Sökning: "corpora"

Visar resultat 1 - 5 av 107 avhandlingar innehållade ordet corpora.

  1. 1. Studies in Corpora and Idioms : Getting the cat out of the bag

    Författare :David Minugh; Nils-Lennart Johannesson; Maria Kuteeva; Karin Aijmer; Stockholms universitet; []
    Nyckelord :HUMANITIES; HUMANIORA; HUMANIORA; HUMANITIES; Coll corpus; corpora; corpus creation; idioms; idiom variation; idiom-breaking; online newspapers; student newspapers; college newspapers; English language; Engelska språket; English; engelska;

    Sammanfattning : “Idiomatic” expressions, usually called “idioms”, such as a dime a dozen, a busman’s holiday, or to have bats in your belfry are a curious part of any language: they usually have a fixed lexical (why a busman?) and structural composition (only dime and dozen in direct conjunction mean ‘common, ordinary’), can be semantically obscure (why bats?), yet are widely recognized in the speech community, in spite of being so rare that only large corpora can provide us with access to sufficient empirical data on their use.In this compilation thesis, four published studies focusing on idioms in corpora are presented. LÄS MER

  2. 2. Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing

    Författare :Jörg Tiedemann; Anna Sågvall Hein; Joakim Nivre; Uppsala universitet; []
    Nyckelord :NATURAL SCIENCES; NATURVETENSKAP; NATURVETENSKAP; NATURAL SCIENCES; Computational linguistics; word alignment; parallel corpora; translation corpora; computational lexicography; machine translation; Datorlingvistik; Computational linguistics; Datorlingvistik;

    Sammanfattning : The focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing. LÄS MER

  3. 3. Morphosyntactic Corpora and Tools for Persian

    Författare :Mojgan Seraji; Joakim Nivre; Carina Jahani; Jan Hajic; Uppsala universitet; []
    Nyckelord :NATURAL SCIENCES; NATURVETENSKAP; NATURVETENSKAP; NATURAL SCIENCES; Persian; language technology; corpus; treebank; preprocessing; segmentation; part-of-speech tagging; dependency parsing; Computational Linguistics; Datorlingvistik;

    Sammanfattning : This thesis presents open source resources in the form of annotated corpora and modules for automatic morphosyntactic processing and analysis of Persian texts. More specifically, the resources consist of an improved part-of-speech tagged corpus and a dependency treebank, as well as tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and dependency parsing for Persian. LÄS MER

  4. 4. Extracting Clinical Findings from Swedish Health Record Text

    Författare :Maria Skeppstedt; Hercules Dalianis; Gunnar Nilsson; Maria Kvist; Tapio Salakoski; Stockholms universitet; []
    Nyckelord :SOCIAL SCIENCES; SAMHÄLLSVETENSKAP; SAMHÄLLSVETENSKAP; SOCIAL SCIENCES; Named entity recognition; Corpora development; Clinical text processing; Distributional semantics; Random indexing; Vocabulary expansion; Assertion classification; Clinical text mining; Electronic health records; Swedish; data- och systemvetenskap; Computer and Systems Sciences;

    Sammanfattning : Information contained in the free text of health records is useful for the immediate care of patients as well as for medical knowledge creation. Advances in clinical language processing have made it possible to automatically extract this information, but most research has, until recently, been conducted on clinical text written in English. LÄS MER

  5. 5. Marqueurs corrélatifs en français et en suédois : Étude sémantico-fonctionnelle de d’une part… d’autre part, d’un côté… de l’autre et de non seulement… mais en contraste

    Författare :Maria Svensson; Kerstin Jonasson; Coco Norén; Henning Nølke; Uppsala universitet; []
    Nyckelord :HUMANITIES; HUMANIORA; HUMANIORA; HUMANITIES; discourse markers; text organisation; French; Swedish; contrastive analysis; parallel corpora; Geneva model of discourse organization; Rhetorical Structure Theory; French language; Franska språket; Romance Languages; Romanska språk;

    Sammanfattning : This thesis deals with the correlative markers d’une part… d’autre part, d’un côté… de l’autre and non seulement… mais in French and their Swedish counterparts dels… dels, å ena sidan… å andra sidan and inte bara… utan. These markers are composed of two separate parts generally occurring together, and announce a serial of at least two textual units to be considered together. LÄS MER