Jernej Vičič (2012) A fast implementation of rules based machine translation systems for similar natural languages. PhD thesis.
Abstract
The work presents an overview of the systems and methods for the natural language machine translation. It focuses primarily on systems and methods for the translation of the related languages. Most of the presented systems belong to the Shallow Parse and Transfer Rule-Based Machine Translation paradigm, which is better suited for the implementation of a translation system for related languages. The major problem of the rule-based translation systems is costly manual production of dictionaries and translation rules in the case of a classical approach to building such systems. The work provides an overview over the collection of selected and new methods designed for automatic production of materials for the installation of systems based on translation rules. Methods were tested on a case study: the implementation of a fully functioning translation system for related languages. The following four systems were used as the basis: Slovenian-Serbian, Slovenian-Czech, Slovenian-English and Slovenian-Estonian. The evaluation process focused on the quality of the translations as well as the estimation of the time needed for the implementation of a new system. The dissertation presents a method that extends the basic Statistical Machine Translation by Parsing paradigm for languages with limited support of language technologies. The learning phase uses an aligned corpus instead of a full treebank. The dissertation describes a method for the automatic creation of morphologies, which includes automatic paradigm tagging, automatic paradigm construction for the highly inflected languages and automatic production of bilingual dictionaries. The dissertation presents a method for the selection and assessment of the rules for the structural transfer. Methods for the automatic construction of structural transfer rules often produce a large set of rules, which compete with each other (it is possible to use multiple rules on the same part of text). The best rules are chosen on the basis of the target language corpus.
Item Type: | Thesis (PhD thesis) |
Keywords: | rbmt, machine translation, machine translation of related languages, speeding up the implementation of machine translation systems |
Number of Pages: | 165 |
Language of Content: | Slovenian |
Mentor / Comentors: | Name and Surname | ID | Function |
---|
prof. dr. Igor Kononenko | 237 | Mentor | doc. dr. Tomaž Erjavec | | Comentor |
|
Link to COBISS: | http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=00009339732) |
Institution: | University of Ljubljana |
Department: | Faculty of Computer and Information Science |
Item ID: | 1778 |
Date Deposited: | 17 Aug 2012 12:04 |
Last Modified: | 05 Sep 2012 13:25 |
URI: | http://eprints.fri.uni-lj.si/id/eprint/1778 |
---|
Actions (login required)