A corpus based analysis of lexical richness of Beijing Mandarin speakers: Variable identification and model construction

Research output: Journal PublicationArticlepeer-review

12 Citations (Scopus)

Abstract

This work concerns the lexical richness of Beijing Mandarin speakers measured by entropy. The data used for the study are the Beijing Mandarin Spoken Corpora, a conversational and spontaneous speech corpus of contemporary Beijing Mandarin speakers. Based on the sociovariational linguistic hypotheses and data analysis, the study attempts to identify and explain the key demographical and socioeconomic parameters that impact the entropy of each subject's spoken texts. Both one-dimensional and multi-dimensional statistical models are proposed to quantify the relationships between the pertinent measure of lexical richness and the prominent indicative variables, including age, level of education, and profession premium. A multi-dimensional nonlinear model encompassing these findings is designed and calibrated with statistical estimation methods. Possible future directions and applications in relevant field of applied linguistics are provided.

Original languageEnglish
Pages (from-to)60-69
Number of pages10
JournalLanguage Sciences
Volume44
DOIs
Publication statusPublished - Jul 2014
Externally publishedYes

Keywords

  • Beijing Mandarin
  • Corpus linguistics
  • Entropy
  • Lexical richness
  • Sociovariational analysis
  • Statistical modeling

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'A corpus based analysis of lexical richness of Beijing Mandarin speakers: Variable identification and model construction'. Together they form a unique fingerprint.

Cite this