The Accuracy of Automatic Qualitative Analyses of Constructed-Response Solutions to Algebra Word Problems. GRE Board Professional Report No. 91-03P.

Autor/inn/en	Bennett, Randy Elliot; Sebrechts, Marc M.
Institution	Educational Testing Service, Princeton, NJ.
Titel	The Accuracy of Automatic Qualitative Analyses of Constructed-Response Solutions to Algebra Word Problems. GRE Board Professional Report No. 91-03P.
Quelle	(1994), (111 Seiten) PDF als Volltext kostenfreie Datei Verfügbarkeit
Sprache	englisch
Dokumenttyp	gedruckt; online; Monographie
Schlagwörter	Algebra; Automation; Classification; College Entrance Examinations; College Students; Computer Assisted Testing; Constructed Response; Educational Diagnosis; Expert Systems; Higher Education; Qualitative Research; Scoring; Test Construction; Word Problems (Mathematics); Graduate Record Examinations + Suchen Sie Ihr Suchwort? Classification system; Klassifikation; Klassifikationssystem; Aufnahmeprüfung; Collegestudent; Pedagogical diagnostics; Pädagogische Diagnostik; Expert system; Expertensystem; Hochschulbildung; Hochschulsystem; Hochschulwesen; Qualitative Forschung; Bewertung; Testaufbau; Textaufgabe
Abstract	This study evaluated expert system diagnoses of examinees' solutions to complex constructed-response algebra word problems. Problems were presented to three samples (30 college students each), each of which had taken the Graduate Record Examinations General Test. One sample took the problems in paper-and-pencil form and the other two on computer. Responses were then diagnostically analyzed by an expert system, GIDE, and by four Educational Testing Service mathematics test developers. Results were highly consistent across the samples. Human judges generally agreed in describing responses as right or wrong, but concurred at lower levels in categorizing the specific bugs they detected in incorrect solutions. The expert system agreed highly with the judges' right/wrong decisions, but less closely with bug categorizations that judges agreed on. Causes of machine-rater disagreement were identified, and suggested remedies were proposed. These results suggest that highly accurate diagnostic analysis through knowledge-based understanding of complex responses may be difficult to achieve at the fine-grained level used by GIDE. Increasing accuracy is discussed. Appendixes A, B, and C present probabilities and canonical solutions for each of the samples; and Appendixes D, E, and F contain Sample 2 judges' instructions, and Sample 2 and Sample 3 Bug Classification Scheme and Detailed Error Descriptions with Examples. Twenty-one tables present study data. (Contains 13 references.) (Author/SLD)
Erfasst von	ERIC (Education Resources Information Center), Washington, DC