ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

Source code similarity

Marko Lugarič (2008) Source code similarity. EngD thesis.

[img] PDF
Download (740Kb)

    Abstract

    We propose a detection method for plagiarised source code in programs written by students. The purpose of this work is to present state of-the-art solutions, evaluate them and then construct a new method for detection of plagiarism in source code. Method constructed uses the ideas and data representation found in text mining. Every program is presented with keywords from a program. The number of variables and their types are included as attributes. The method extracts subsets of programs (the clusters of plagiarism) such that each program within a particular subset has been derived from the same original. A set of plagiarized programming assignments is created to test and improve our method. Finally the method is tested on real programming assignments. The results are analysed and improvements are suggested.

    Item Type: Thesis (EngD thesis)
    Keywords: source code similarity, plagirism, clustering, attribute construction, data mining
    Number of Pages: 32
    Language of Content: Slovenian
    Mentor / Comentors:
    Name and SurnameIDFunction
    doc. dr. Marko Robnik Šikonja276Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=50070&select=(ID=6745428)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 255
    Date Deposited: 21 Oct 2008 10:54
    Last Modified: 13 Aug 2011 00:32
    URI: http://eprints.fri.uni-lj.si/id/eprint/255

    Actions (login required)

    View Item