ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Cross–lingual mappings of contextual word embedding ELMo

Ljupche Milosheski (2019) Cross–lingual mappings of contextual word embedding ELMo. EngD thesis.

Download (663Kb)


    To work with textual data, machine learning algorithms, in particular, neural networks, require word embeddings – vector representations of words in high-dimensional space. There are languages with a small amount of available resources. Exploiting the knowledge from the well-resourced languages for under-resourced languages is possible with cross-lingual embeddings by aligning the embeddings of one language with the vector space of another language. Existing methods for aligning embeddings are intended for context-independent embeddings, where every word has one representation. We propose a method, based on a dictionary and a parallel corpus aligns contextual embeddings, which capture more information about the context in which words appear. The proposed method requires a small amount of bilingual data, which is available for many language pairs. We empirically show that the proposed method outperforms the baseline obtained by alignment of context-independent embeddings.

    Item Type: Thesis (EngD thesis)
    Keywords: cross-lingual word embeddings, contextual word embeddings, vector word embeddings, word translation, parallel corpus, vector space mappings, singular value decomposition
    Number of Pages: 42
    Language of Content: English
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1538288835)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 4444
    Date Deposited: 23 Jul 2019 15:04
    Last Modified: 08 Aug 2019 10:50
    URI: http://eprints.fri.uni-lj.si/id/eprint/4444

    Actions (login required)

    View Item