Sökning: "tokenization"

Hittade 3 avhandlingar innehållade ordet tokenization.

1. A robust text processing technique applied to lexical error recovery

Författare :Peter Ingels; Linköpings universitet; []
Nyckelord :NATURVETENSKAP; NATURAL SCIENCES;

Sammanfattning : This thesis addresses automatic lexical error recovery and tokenization of corrupt text input. We propose a technique that can automatically correct misspellings, segmentation errors and real-word errors in a unified framework that uses both a model of language production and a model of the typing behavior, and which makes tokenization part of the recovery process. LÄS MER
2. Morphosyntactic Corpora and Tools for Persian

Författare :Mojgan Seraji; Joakim Nivre; Carina Jahani; Jan Hajic; Uppsala universitet; []
Nyckelord :NATURVETENSKAP; NATURAL SCIENCES; Persian; language technology; corpus; treebank; preprocessing; segmentation; part-of-speech tagging; dependency parsing; Computational Linguistics; Datorlingvistik;

Sammanfattning : This thesis presents open source resources in the form of annotated corpora and modules for automatic morphosyntactic processing and analysis of Persian texts. More specifically, the resources consist of an improved part-of-speech tagged corpus and a dependency treebank, as well as tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and dependency parsing for Persian. LÄS MER
3. The Search for Syntax : Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing

Författare :Artur Kulmizev; Joakim Nivre; Roger Levy; Uppsala universitet; []
Nyckelord :NATURVETENSKAP; NATURAL SCIENCES; syntax; language models; dependency parsing; universal dependencies; Datorlingvistik; Computational Linguistics;

Sammanfattning : Syntax — the study of the hierarchical structure of language — has long featured as a prominent research topic in the field of natural language processing (NLP). Traditionally, its role in NLP was confined towards developing parsers: supervised algorithms tasked with predicting the structure of utterances (often for use in downstream applications). LÄS MER

Resultatsidor:

Sökning: "tokenization"

1. A robust text processing technique applied to lexical error recovery

2. Morphosyntactic Corpora and Tools for Persian

3. The Search for Syntax : Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing

Sökningar just nu

Populära sökningar

Avhandlingar med många visningar igår (2024-04-19)