ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Semantic approaches to domain template construction and opinion mining from natural language

Mitja Trampuš (2015) Semantic approaches to domain template construction and opinion mining from natural language. PhD thesis.

Download (2338Kb)


    Most of the text mining algorithms in use today are based on lexical representation of input texts, for example bag of words. A possible alternative is to first convert text into a semantic representation, one that captures the text content in a structured way and using only a set of pre-agreed labels. This thesis explores the feasibility of such an approach to two tasks on collections of documents: identifying common structure in input documents (»domain template construction«), and helping users find differing opinions in input documents (»opinion mining«). We first discuss ways of converting natural text to a semantic representation. We propose and compare two new methods with varying degrees of target representation complexity. The first method, showing more promise, is based on dependency parser output which it converts to lightweight semantic frames, with role fillers aligned to WordNet. The second method structures text using Semantic Role Labeling techniques and aligns the output to the Cyc ontology. Based on the first of the above representations, we next propose and evaluate two methods for constructing frame-based templates for documents from a given domain (e.g. bombing attack news reports). A template is the set of all salient attributes (e.g. attacker, number of casualties, \ldots). The idea of both methods is to construct abstract frames for which more specific instances (according to the WordNet hierarchy) can be found in the input documents. Fragments of these abstract frames represent the sought-for attributes. We achieve state of the art performance and additionally provide detailed type constraints for the attributes, something not possible with competing methods. Finally, we propose a software system for exposing differing opinions in the news. For any given event, we present the user with all known articles on the topic and let them navigate them by three semantic properties simultaneously: sentiment, topical focus and geography of origin. The result is a dynamically reranked set of relevant articles and a near real time focused summary of those articles. The summary, too, is computed from the semantic text representation discussed above. We conducted a user study of the whole system with very positive results.

    Item Type: Thesis (PhD thesis)
    Keywords: data mining, text mining, ontologies, natural language processing
    Number of Pages: 145
    Language of Content: English
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Dunja MladenićMentor
    izr. prof. dr. Janez Demšar257Comentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=280354304)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 2992
    Date Deposited: 05 Jun 2015 12:34
    Last Modified: 11 Aug 2015 15:07
    URI: http://eprints.fri.uni-lj.si/id/eprint/2992

    Actions (login required)

    View Item