Abstract
The Pinyin Tutor has been used the past few years at over thirty institutions around the world to teach students to transcribe spoken Chinese phrases into Pinyin. Large amounts of data have been collected from this program on the types of errors students make on this task. We analyze these data to discover what makes this task difficult and use our findings to iteratively improve the tutor. For instance, is a particular set of consonants, vowels, or tones causing the most difficulty? Or perhaps do certain challenges arise in the context in which these sounds are spoken? Since each Pinyin phrase can be broken down into a set of features (for example, consonants, vowel sounds, and tones), we apply machine learning techniques to uncover the most confounding aspects of this task. We then exploit what we learned to construct and maintain an accurate representation of what the student knows for best individual instruction. Our goal is to allow the learner to focus on the aspects of the task on which he or she is having most difficulty, thereby accelerating his or her understanding of spoken Chinese beyond what would be possible without such focused “intelligent” instruction.
Original language | English |
---|---|
Pages (from-to) | 3-27 |
Number of pages | 25 |
Journal | Journal of Educational Data Mining |
Volume | 6 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2014 |
Externally published | Yes |
Keywords
- Pinyin Tutor
- least angle regression (LARS)
- LIBLINEAR-trained model
- understanding of spoken Chinese
- knowledge tracing
- Hidden Markov Model