Statistical modeling of student performance to improve Chinese dictation skills with an intelligent tutor

John Kowalski, Yanhui Zhang, Geoffrey Gordon

Research output: Journal PublicationArticlepeer-review


The Pinyin Tutor has been used the past few years at over thirty institutions around the world to teach students to transcribe spoken Chinese phrases into Pinyin. Large amounts of data have been collected from this program on the types of errors students make on this task. We analyze these data to discover what makes this task difficult and use our findings to iteratively improve the tutor. For instance, is a particular set of consonants, vowels, or tones causing the most difficulty? Or perhaps do certain challenges arise in the context in which these sounds are spoken? Since each Pinyin phrase can be broken down into a set of features (for example, consonants, vowel sounds, and tones), we apply machine learning techniques to uncover the most confounding aspects of this task. We then exploit what we learned to construct and maintain an accurate representation of what the student knows for best individual instruction. Our goal is to allow the learner to focus on the aspects of the task on which he or she is having most difficulty, thereby accelerating his or her understanding of spoken Chinese beyond what would be possible without such focused “intelligent” instruction.
Original languageEnglish
Pages (from-to)3-27
Number of pages25
JournalJournal of Educational Data Mining
Issue number1
Publication statusPublished - 2014
Externally publishedYes


  • Pinyin Tutor
  • least angle regression (LARS)
  • LIBLINEAR-trained model
  • understanding of spoken Chinese
  • knowledge tracing
  • Hidden Markov Model


Dive into the research topics of 'Statistical modeling of student performance to improve Chinese dictation skills with an intelligent tutor'. Together they form a unique fingerprint.

Cite this