A Resource-Light Approach to Morpho-Syntactic Tagging by Anna Feldman

By Anna Feldman

Whereas supervised corpus-based tools are hugely actual for various NLP tasks, together with morphological tagging, they're tricky to port to different languages simply because they require assets which are dear to create. accordingly, many languages don't have any life like prospect for morpho-syntactic annotation within the foreseeable destiny. the tactic awarded during this e-book goals to beat this challenge through considerably proscribing the required information and as an alternative extrapolating the appropriate info from one other, comparable language. The procedure has been proven on Catalan, Portuguese, and Russian. even if those languages are just fairly resource-poor, an analogous approach will be in precept utilized to any inflected language, so long as there's an annotated corpus of a similar language on hand. Time wanted for adjusting the procedure to a brand new language constitutes a fragment of the time wanted for platforms with wide, manually created assets: days rather than years. This e-book touches upon a few issues: typology, morphology, corpus linguistics, contrastive linguistics, linguistic annotation, computational linguistics and usual Language Processing (NLP). Researchers and scholars who're drawn to those medical parts in addition to in cross-lingual stories and functions will significantly reap the benefits of this paintings. students and practitioners in desktop technological know-how and linguistics are the potential readers of this booklet.

Show description

Read Online or Download A Resource-Light Approach to Morpho-Syntactic Tagging PDF

Similar study & teaching books

DICTIONNAIRE RUSSE-FRANÇAIS D'ÉTHYMOLOGIE COMPARÉE: Correspondances lexicales historiques (French Edition)

Les mots ont une " mémoire " secrète : leur histoire, c'est celle de nos cultures, de notre pensée collective inconsciente. A ce titre, le russe a une grande valeur pédagogique, similar à celle du latin ou du grec, pour faire comprendre à un francophone los angeles formation et le fonctionnement de sa propre langue, le français, et pour lui faire mieux connaître les autres principales langues européennes.

Corpora and Language Teaching (Studies in Corpus Linguistics)

The articles during this edited quantity characterize a huge insurance of parts. They speak about the function and effectiveness of corpora and corpus-linguistic innovations for language instructing but in addition take care of broader concerns corresponding to the connection among corpora and moment language educating and the way the various views of international language lecturers and utilized linguists could be reconciled.

Study Abroad and Second Language Use: Constructing the Self

Language performs a necessary position in how we painting our personalities. via social interplay, others strengthen an image people according to our linguistic cues. even if, once we have interaction in a international language and in a brand new nation, obstacles in linguistic and cultural wisdom could make self-presentation a tougher job.

Communication Skills for Foreign and Mobile Medical Professionals

This is an evidence-based, verbal exchange source publication for doctors who paintings in overseas nations, cultures and languages. It bargains a wealth of insights into doctor-patient conversation, established round the stages of a session.

Additional resources for A Resource-Light Approach to Morpho-Syntactic Tagging

Example text

However, more clusters obviously take more bits to encode. Since the cost function captures the length of the code for both data and clusters, minimizing this function (which maximizes the goodness of clustering) will determine both the number of clusters and how to assign objects to clusters. The primary goal of using MDL is to induce lexemes from boundaryless speech-like streams. The MDL approach is based on the insight that a good grammar can be used to most compactly describe the corpus. e.

With this method (as 22 Chapter 2. Common tagging techniques well as with the stacked classifiers, discussed below), a tag suggested by a minority (or even none) of the taggers still has a (slight) chance to win. 3 Stacked classifiers The practice of feeding the outputs of a number of classifiers to the next learner as features for a next learner is usually called stacking. The algorithm works as follows. , (xm , ym ). , hL . , hL (x)). Wolpert (1992) proposed a scheme for learning h∗ using a (−i) form of leave-one-out cross-validation.

This is made possible by the wider availability of parallel corpora with better alignment methods at paragraph, sentence, and word level. Examples of knowledge induction tasks include learning morphology, part-of-speech tags, and grammatical gender, as well as the development of wordnets for many languages using, as a starting point, knowledge transfer from the Princeton WordNet (Miller 1990). This section summarizes some of the relevant work in cross-language applications. 1 Cross-language knowledge transfer using parallel texts It is a common situation to find a dominant language with some language technology resources and a lesser-known language lacking one or all of these resources, but a fair amount of (machine-readable) parallel texts in the two languages.

Download PDF sample

Rated 4.26 of 5 – based on 6 votes