Sökning: "tokenization"
Hittade 3 avhandlingar innehållade ordet tokenization.
1. A robust text processing technique applied to lexical error recovery
Sammanfattning : This thesis addresses automatic lexical error recovery and tokenization of corrupt text input. We propose a technique that can automatically correct misspellings, segmentation errors and real-word errors in a unified framework that uses both a model of language production and a model of the typing behavior, and which makes tokenization part of the recovery process. LÄS MER
2. Morphosyntactic Corpora and Tools for Persian
Sammanfattning : This thesis presents open source resources in the form of annotated corpora and modules for automatic morphosyntactic processing and analysis of Persian texts. More specifically, the resources consist of an improved part-of-speech tagged corpus and a dependency treebank, as well as tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and dependency parsing for Persian. LÄS MER
3. The Search for Syntax : Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing
Sammanfattning : Syntax — the study of the hierarchical structure of language — has long featured as a prominent research topic in the field of natural language processing (NLP). Traditionally, its role in NLP was confined towards developing parsers: supervised algorithms tasked with predicting the structure of utterances (often for use in downstream applications). LÄS MER