How effective are lexical richness measures for differentiations of vocabulary proficiency? a comprehensive examination with clustering analysis

Yanhui Zhang, Weiping Wu

    Research output: Journal PublicationArticlepeer-review

    Abstract

    This study proposed an innovative automated approach to differentiation of the vocabulary proficiency of Chinese speakers. A robust K-means algorithm was designed to compare the oral proficiency between L1 and L2 Chinese speakers regarding lexical richness and how relatively effective the various lexical measures were in performing the differentiation task. Eighteen lexical richness measures were surveyed and compared using the clustering analysis. The effectiveness of each selected measure as well as an overall evaluation of all the measures for the concerned differentiation tasks were comprehensively calibrated. The results demonstrate that, while the L1 versus L2 group difference in lexical richness was observed with statistical significance for each of the chosen measures, the clustering and membership prediction accuracy of individual speakers varied greatly from one measure to another. The implication is that a more fully defined metric of lexical richness is still a worthwhile endeavor for language proficiency assessment, with optimal directions for such endeavors discussed in the concluding remarks.

    Original languageEnglish
    Article number15
    JournalLanguage Testing in Asia
    Volume11
    Issue number1
    DOIs
    Publication statusPublished - Dec 2021

    Keywords

    • Chinese as a foreign language
    • Clustering analysis
    • L1 and L2 comparison
    • Language proficiency
    • Language testing
    • Lexical richness measures

    ASJC Scopus subject areas

    • Language and Linguistics
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of 'How effective are lexical richness measures for differentiations of vocabulary proficiency? a comprehensive examination with clustering analysis'. Together they form a unique fingerprint.

    Cite this