ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Named entity recognition in legal documents

Matic Di Batista (2013) Named entity recognition in legal documents. EngD thesis.

[img]
Preview
PDF
Download (922Kb)

    Abstract

    Named entity recognition from natural language texts is getting more important every day, because it helps user with text manipulation. Technologies developed in last decades are able to produce really good result with information retrieval from natural texts. In this diploma thesis we made brief representation of available solutions for named entity recognition in law texts. We want to recognize as many Named entities as possible so we can use them to make hyperlinks to referring documents. In combination of multiple named entities we can get additional information of observed document. We described properties of available solutions for named entity recognition. Afterwards we tested named entity recognition on Slovenian law texts with two solutions – Stanford CoreNLP, and our own solution - application NERInLaw, with the use of CRFsuite. We tested both solutions on hand marked law texts, where we marked multiple named entities. We divided the texts into learning set and test set, so we were able to evaluate the results. Tests were made with the use of different set of attribute functions, so we could see the difference in results and see which functions are more important for the system. Another important property of testing was the speed of tested solutions. With large dataset, it is important that we get results as fast as possible. Our implementation got really good results with some basic settings. We are sure that with the future work, we could get even better results. Another good thing is, that current implementation could be easily used for other languages than Slovenian with some minor changes.

    Item Type: Thesis (EngD thesis)
    Keywords: named entity recognition, part of speech, conditional random fields, Stanford CoreNLP, CRFSuite
    Number of Pages: 59
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Marko Bajec245Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=10232660)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 2235
    Date Deposited: 11 Oct 2013 14:35
    Last Modified: 05 Nov 2013 14:35
    URI: http://eprints.fri.uni-lj.si/id/eprint/2235

    Actions (login required)

    View Item