A corpus based analysis of lexical richness of Beijing Mandarin speakers: Variable identification and model construction

Yanhui Zhang

doi:10.1016/j.langsci.2013.12.003

A corpus based analysis of lexical richness of Beijing Mandarin speakers: Variable identification and model construction

Yanhui Zhang

Research output: Journal Publication › Article › peer-review

12 Citations (Scopus)

Abstract

This work concerns the lexical richness of Beijing Mandarin speakers measured by entropy. The data used for the study are the Beijing Mandarin Spoken Corpora, a conversational and spontaneous speech corpus of contemporary Beijing Mandarin speakers. Based on the sociovariational linguistic hypotheses and data analysis, the study attempts to identify and explain the key demographical and socioeconomic parameters that impact the entropy of each subject's spoken texts. Both one-dimensional and multi-dimensional statistical models are proposed to quantify the relationships between the pertinent measure of lexical richness and the prominent indicative variables, including age, level of education, and profession premium. A multi-dimensional nonlinear model encompassing these findings is designed and calibrated with statistical estimation methods. Possible future directions and applications in relevant field of applied linguistics are provided.

Original language	English
Pages (from-to)	60-69
Number of pages	10
Journal	Language Sciences
Volume	44
DOIs	https://doi.org/10.1016/j.langsci.2013.12.003
Publication status	Published - Jul 2014
Externally published	Yes

Keywords

Beijing Mandarin
Corpus linguistics
Entropy
Lexical richness
Sociovariational analysis
Statistical modeling

ASJC Scopus subject areas

Language and Linguistics
Linguistics and Language

Access to Document

10.1016/j.langsci.2013.12.003

Cite this

@article{6ac639be457d4a99913b213245e2f20f,

title = "A corpus based analysis of lexical richness of Beijing Mandarin speakers: Variable identification and model construction",

abstract = "This work concerns the lexical richness of Beijing Mandarin speakers measured by entropy. The data used for the study are the Beijing Mandarin Spoken Corpora, a conversational and spontaneous speech corpus of contemporary Beijing Mandarin speakers. Based on the sociovariational linguistic hypotheses and data analysis, the study attempts to identify and explain the key demographical and socioeconomic parameters that impact the entropy of each subject's spoken texts. Both one-dimensional and multi-dimensional statistical models are proposed to quantify the relationships between the pertinent measure of lexical richness and the prominent indicative variables, including age, level of education, and profession premium. A multi-dimensional nonlinear model encompassing these findings is designed and calibrated with statistical estimation methods. Possible future directions and applications in relevant field of applied linguistics are provided.",

keywords = "Beijing Mandarin, Corpus linguistics, Entropy, Lexical richness, Sociovariational analysis, Statistical modeling",

author = "Yanhui Zhang",

year = "2014",

month = jul,

doi = "10.1016/j.langsci.2013.12.003",

language = "English",

volume = "44",

pages = "60--69",

journal = "Language Sciences",

issn = "0388-0001",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - A corpus based analysis of lexical richness of Beijing Mandarin speakers

T2 - Variable identification and model construction

AU - Zhang, Yanhui

PY - 2014/7

Y1 - 2014/7

N2 - This work concerns the lexical richness of Beijing Mandarin speakers measured by entropy. The data used for the study are the Beijing Mandarin Spoken Corpora, a conversational and spontaneous speech corpus of contemporary Beijing Mandarin speakers. Based on the sociovariational linguistic hypotheses and data analysis, the study attempts to identify and explain the key demographical and socioeconomic parameters that impact the entropy of each subject's spoken texts. Both one-dimensional and multi-dimensional statistical models are proposed to quantify the relationships between the pertinent measure of lexical richness and the prominent indicative variables, including age, level of education, and profession premium. A multi-dimensional nonlinear model encompassing these findings is designed and calibrated with statistical estimation methods. Possible future directions and applications in relevant field of applied linguistics are provided.

AB - This work concerns the lexical richness of Beijing Mandarin speakers measured by entropy. The data used for the study are the Beijing Mandarin Spoken Corpora, a conversational and spontaneous speech corpus of contemporary Beijing Mandarin speakers. Based on the sociovariational linguistic hypotheses and data analysis, the study attempts to identify and explain the key demographical and socioeconomic parameters that impact the entropy of each subject's spoken texts. Both one-dimensional and multi-dimensional statistical models are proposed to quantify the relationships between the pertinent measure of lexical richness and the prominent indicative variables, including age, level of education, and profession premium. A multi-dimensional nonlinear model encompassing these findings is designed and calibrated with statistical estimation methods. Possible future directions and applications in relevant field of applied linguistics are provided.

KW - Beijing Mandarin

KW - Corpus linguistics

KW - Entropy

KW - Lexical richness

KW - Sociovariational analysis

KW - Statistical modeling

UR - http://www.scopus.com/inward/record.url?scp=84899872939&partnerID=8YFLogxK

U2 - 10.1016/j.langsci.2013.12.003

DO - 10.1016/j.langsci.2013.12.003

M3 - Article

AN - SCOPUS:84899872939

SN - 0388-0001

VL - 44

SP - 60

EP - 69

JO - Language Sciences

JF - Language Sciences

ER -

A corpus based analysis of lexical richness of Beijing Mandarin speakers: Variable identification and model construction

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this