Tadej Štajner (2009) Entity Resolution in Texts Using Machine Learning and Background Knowledge. EngD thesis.
Abstract
Machine learning methods are being successfully applied in text mining. Because of ambiguities which are inherently present in natural languages, we are faced with a challenge of determining the actual identities of entities mentioned in a document. Disambiguation is a problem that can be successfully solved by entity resolution methods. This thesis studies various possibilities for improving entity resolution performance by using various types of background knowledge and statistical learning. We compare precision and recall of pair-wise entity resolution with collective resolution. We also study the possibility of employing background knowledge. For this purpose, we define a multi-relational entity resolution approach, capable of representing implicit as well as explicit relationships. We discover the benefits of using entity co-occurrences and content similarities as implicit relationships. We also propose an approach capable of handling such heterogeneous relations for collective entity resolution.
Item Type: | Thesis (EngD thesis) |
Keywords: | entity resolution, ontology, background knowledge, statistical learning, natural language processing, text mining |
Number of Pages: | 43 |
Language of Content: | Slovenian |
Mentor / Comentors: | Name and Surname | ID | Function |
---|
doc. dr. Janez Demšar | 257 | Mentor | doc. dr. Dunja Mladenić | | Comentor |
|
Link to COBISS: | http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=7149908) |
Institution: | University of Ljubljana |
Department: | Faculty of Computer and Information Science |
Item ID: | 864 |
Date Deposited: | 10 Jun 2009 08:17 |
Last Modified: | 13 Aug 2011 00:35 |
URI: | http://eprints.fri.uni-lj.si/id/eprint/864 |
---|
Actions (login required)