Ljupche Milosheski (2019) Cross–lingual mappings of contextual word embedding ELMo. EngD thesis.
Abstract
To work with textual data, machine learning algorithms, in particular, neural networks, require word embeddings – vector representations of words in high-dimensional space. There are languages with a small amount of available resources. Exploiting the knowledge from the well-resourced languages for under-resourced languages is possible with cross-lingual embeddings by aligning the embeddings of one language with the vector space of another language. Existing methods for aligning embeddings are intended for context-independent embeddings, where every word has one representation. We propose a method, based on a dictionary and a parallel corpus aligns contextual embeddings, which capture more information about the context in which words appear. The proposed method requires a small amount of bilingual data, which is available for many language pairs. We empirically show that the proposed method outperforms the baseline obtained by alignment of context-independent embeddings.
Item Type: | Thesis (EngD thesis) |
Keywords: | cross-lingual word embeddings, contextual word embeddings, vector word embeddings, word translation, parallel corpus, vector space mappings, singular value decomposition |
Number of Pages: | 42 |
Language of Content: | English |
Mentor / Comentors: | Name and Surname | ID | Function |
---|
prof. dr. Marko Robnik Šikonja | 276 | Mentor |
|
Link to COBISS: | http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1538288835) |
Institution: | University of Ljubljana |
Department: | Faculty of Computer and Information Science |
Item ID: | 4444 |
Date Deposited: | 23 Jul 2019 15:04 |
Last Modified: | 08 Aug 2019 10:50 |
URI: | http://eprints.fri.uni-lj.si/id/eprint/4444 |
---|
Actions (login required)