How effective are lexical richness measures for differentiations of vocabulary proficiency? a comprehensive examination with clustering analysis

Yanhui Zhang; Weiping Wu

doi:10.1186/s40468-021-00133-6

How effective are lexical richness measures for differentiations of vocabulary proficiency? a comprehensive examination with clustering analysis

Yanhui Zhang, Weiping Wu

Research output: Journal Publication › Article › peer-review

12 Citations (Scopus)

Abstract

This study proposed an innovative automated approach to differentiation of the vocabulary proficiency of Chinese speakers. A robust K-means algorithm was designed to compare the oral proficiency between L1 and L2 Chinese speakers regarding lexical richness and how relatively effective the various lexical measures were in performing the differentiation task. Eighteen lexical richness measures were surveyed and compared using the clustering analysis. The effectiveness of each selected measure as well as an overall evaluation of all the measures for the concerned differentiation tasks were comprehensively calibrated. The results demonstrate that, while the L1 versus L2 group difference in lexical richness was observed with statistical significance for each of the chosen measures, the clustering and membership prediction accuracy of individual speakers varied greatly from one measure to another. The implication is that a more fully defined metric of lexical richness is still a worthwhile endeavor for language proficiency assessment, with optimal directions for such endeavors discussed in the concluding remarks.

Original language	English
Article number	15
Journal	Language Testing in Asia
Volume	11
Issue number	1
DOIs	https://doi.org/10.1186/s40468-021-00133-6
Publication status	Published - Dec 2021

Keywords

Chinese as a foreign language
Clustering analysis
L1 and L2 comparison
Language proficiency
Language testing
Lexical richness measures

ASJC Scopus subject areas

Language and Linguistics
Linguistics and Language

Access to Document

10.1186/s40468-021-00133-6

Cite this

@article{27e4f240700c4e4697489e21b918cb0d,

title = "How effective are lexical richness measures for differentiations of vocabulary proficiency? a comprehensive examination with clustering analysis",

abstract = "This study proposed an innovative automated approach to differentiation of the vocabulary proficiency of Chinese speakers. A robust K-means algorithm was designed to compare the oral proficiency between L1 and L2 Chinese speakers regarding lexical richness and how relatively effective the various lexical measures were in performing the differentiation task. Eighteen lexical richness measures were surveyed and compared using the clustering analysis. The effectiveness of each selected measure as well as an overall evaluation of all the measures for the concerned differentiation tasks were comprehensively calibrated. The results demonstrate that, while the L1 versus L2 group difference in lexical richness was observed with statistical significance for each of the chosen measures, the clustering and membership prediction accuracy of individual speakers varied greatly from one measure to another. The implication is that a more fully defined metric of lexical richness is still a worthwhile endeavor for language proficiency assessment, with optimal directions for such endeavors discussed in the concluding remarks.",

keywords = "Chinese as a foreign language, Clustering analysis, L1 and L2 comparison, Language proficiency, Language testing, Lexical richness measures",

author = "Yanhui Zhang and Weiping Wu",

note = "Publisher Copyright: {\textcopyright} 2021, The Author(s).",

year = "2021",

month = dec,

doi = "10.1186/s40468-021-00133-6",

language = "English",

volume = "11",

journal = "Language Testing in Asia",

issn = "2229-0443",

publisher = "SpringerOpen",

number = "1",

}

TY - JOUR

T1 - How effective are lexical richness measures for differentiations of vocabulary proficiency? a comprehensive examination with clustering analysis

AU - Zhang, Yanhui

AU - Wu, Weiping

PY - 2021/12

Y1 - 2021/12

N2 - This study proposed an innovative automated approach to differentiation of the vocabulary proficiency of Chinese speakers. A robust K-means algorithm was designed to compare the oral proficiency between L1 and L2 Chinese speakers regarding lexical richness and how relatively effective the various lexical measures were in performing the differentiation task. Eighteen lexical richness measures were surveyed and compared using the clustering analysis. The effectiveness of each selected measure as well as an overall evaluation of all the measures for the concerned differentiation tasks were comprehensively calibrated. The results demonstrate that, while the L1 versus L2 group difference in lexical richness was observed with statistical significance for each of the chosen measures, the clustering and membership prediction accuracy of individual speakers varied greatly from one measure to another. The implication is that a more fully defined metric of lexical richness is still a worthwhile endeavor for language proficiency assessment, with optimal directions for such endeavors discussed in the concluding remarks.

AB - This study proposed an innovative automated approach to differentiation of the vocabulary proficiency of Chinese speakers. A robust K-means algorithm was designed to compare the oral proficiency between L1 and L2 Chinese speakers regarding lexical richness and how relatively effective the various lexical measures were in performing the differentiation task. Eighteen lexical richness measures were surveyed and compared using the clustering analysis. The effectiveness of each selected measure as well as an overall evaluation of all the measures for the concerned differentiation tasks were comprehensively calibrated. The results demonstrate that, while the L1 versus L2 group difference in lexical richness was observed with statistical significance for each of the chosen measures, the clustering and membership prediction accuracy of individual speakers varied greatly from one measure to another. The implication is that a more fully defined metric of lexical richness is still a worthwhile endeavor for language proficiency assessment, with optimal directions for such endeavors discussed in the concluding remarks.

KW - Chinese as a foreign language

KW - Clustering analysis

KW - L1 and L2 comparison

KW - Language proficiency

KW - Language testing

KW - Lexical richness measures

UR - http://www.scopus.com/inward/record.url?scp=85113959410&partnerID=8YFLogxK

U2 - 10.1186/s40468-021-00133-6

DO - 10.1186/s40468-021-00133-6

M3 - Article

AN - SCOPUS:85113959410

SN - 2229-0443

VL - 11

JO - Language Testing in Asia

JF - Language Testing in Asia

IS - 1

M1 - 15

ER -

How effective are lexical richness measures for differentiations of vocabulary proficiency? a comprehensive examination with clustering analysis

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this