ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Automatic prediction of company's characteristics based on their website

Žan Anderle (2017) Automatic prediction of company's characteristics based on their website. MSc thesis.

Download (1028Kb)


    Our main objective is predicting company's characteristics (industry, age, number of employees) based on the company's website. We present different prediction models which all extract information from the website in distinct ways. We show what features to extract from a website, that will be useful for a specific prediction. We find that website's content text and meta tags text are often the most relevant. By using these texts we get two separate prediction models and we can also use them in an ensemble model. The latter was used in predicting the company's industry and achieved satisfactory results. We also tested using alternative ways to describe a website by using different meta data that we can extract from a website. This is useful when it is necessary to avoid the computational cost of performing text analysis. We used a model using these features in predicting the age and number of employees. The model was not particularly successful. We also discuss the problem of an appropriate dataset needed for developing aformentioned prediction models. We find that solving this problem is crucial for achieving better results.

    Item Type: Thesis (MSc thesis)
    Keywords: website classification, machine learning, website information
    Number of Pages: 70
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    izr. prof. dr. Janez Demšar257Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1537370051)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3783
    Date Deposited: 09 Feb 2017 13:01
    Last Modified: 06 Mar 2017 13:36
    URI: http://eprints.fri.uni-lj.si/id/eprint/3783

    Actions (login required)

    View Item