Marko Lugarič (2008) Source code similarity. EngD thesis.
Abstract
We propose a detection method for plagiarised source code in programs written by students. The purpose of this work is to present state of-the-art solutions, evaluate them and then construct a new method for detection of plagiarism in source code. Method constructed uses the ideas and data representation found in text mining. Every program is presented with keywords from a program. The number of variables and their types are included as attributes. The method extracts subsets of programs (the clusters of plagiarism) such that each program within a particular subset has been derived from the same original. A set of plagiarized programming assignments is created to test and improve our method. Finally the method is tested on real programming assignments. The results are analysed and improvements are suggested.
Actions (login required)