Adaptive Tutor Prediction

3 Months (February 2018-May 2018)

I created a machine learning model to predict student correctness on questions from an English language intelligent tutoring system. I cleaned and reformatted the data, engineered new features from the data, performed error analysis of the machine learning model's performance to inform new feature engineering, and tuned the parameters of the model to maximize performance.

Skills:

Machine Learning · Weka · Lightside · Data Cleaning · Data Analysis · Python · Error Analysis · Parameter Tuning

Advisor:

Carolyn Rosé (Applied Machine Learning)

Deliverables:

Written Report (see above) · Python Code (see above)

Educational data mining is a field of data science dedicated to analyzing and using data from educational sources to improve educational experiences. Personalization in educational technology is more effective than standard versions, but it is very time consuming to create the architecture for the personalization. Machine learning can provide a fast and efficient way to determine a student's current knowledge state in order to better target instruction.

I used data from an English language tutor for English articles created using the Cognitive Tutor Authoring Tool (CTAT). The data was accessed through Datashop, with specific data formatting based on the structure of CTAT. I performed extensive data cleaning to fix bugs in the data and remove features that either provided no useful information or information that was too informative for making predictions at the question level. I then created new features based on the data, including learner demographics, prior performance, and order answering questions. Finally, I determined the best algorithm to use and tuned the parameters to increase performance using Weka and Lightside machine learning softwares. ​

The final performance from the model was 0.77 percent correct, and 0.57 kappa. Although the intelligent tutor is no longer running, the machine learning model I made is applicable to other types of software that use data in the Datashop format. Through this project I learned not only about how to manipulate data and create machine learning models, but also how to make sense of data and determine how to use it effectively in an educational context.