Kaja Zupanc (2018) Semantics-based automated essay evaluation. PhD thesis.
Abstract
Automated essay evaluation (AEE) is a widely used practical solution for replacing time-consuming manual grading of student essays. Automated systems are used in combination with human graders in different high-stake assessments, as well as in classrooms. During the last 50 years, since the beginning of the development of the field, many challenges have arisen in the field, including seeking ways to evaluate the semantic content, providing automated feedback, determining reliability of grades, making the field more "exposed", and others. In this dissertation we address several of these challenges and propose novel solutions for semantic based essay evaluation. Most of the AEE research has been conducted by commercial organizations that protect their investments by releasing proprietary systems where details are not publicly available. We provide comparison (as detailed as possible) of 20 state-of-the-art approaches for automated essay evaluation and we propose a new automated essay evaluation system named SAGE (Semantic Automated Grader for Essays) with all the technological details revealed to the scientific community. Lack of consideration of text semantics is one of the main weaknesses of the existing state-of-the-art systems. We address the evaluation of essay semantics from perspectives of essay coherence and semantic error detection. Coherence describes the flow of information in an essay and allows us to evaluate the connections between the discourse. We propose two groups of coherence attributes: coherence attributes obtained in a highly dimensional semantic space and coherence attributes obtained from a sentence-similarity networks. Furthermore, we propose the Automated Error Detection (AED) system and evaluate the essay semantics from the perspective of essay consistency. The system detects semantic errors using information extraction and logic reasoning and is able to provide semantic feedback for the writer. The proposed system SAGE achieves significantly higher grading accuracy compared with other state-of-the-art automated essay evaluation systems. In the last part of the dissertation we address the question of reliability of grades. Despite the unified grading rules, human graders introduce bias into scores. Consequently, a grading model has to implement a grading logic that may be a mixture of grading logics from various graders. We propose an approach for separating a set of essays into subsets that represent different graders, which uses an explanation methodology and clustering. The results show that learning from the ensemble of separated models significantly improves the average prediction accuracy on artificial and real-world datasets.
Actions (login required)