Modelling the lexical complexity of homogenous texts: a time series approach

Yanhui Zhang

doi:10.1007/s11135-022-01451-4

Modelling the lexical complexity of homogenous texts: a time series approach

Yanhui Zhang

School of Education and English

Research output: Journal Publication › Article › peer-review

1 Citation (Scopus)

Abstract

Lexical complexity of homogeneous texts, especially when produced by an institutional author over time, exhibits a generally observed increasing trend with local random fluctuations. Such an irreversible entropic process fits very cogently into the dynamical complexity system theory, where the social, economic, and cultural missions such texts set to serve constitute the underlying driving momentum for the texts to adapt themselves from low to high complexity. Structural equations have been shown effective in modeling such macroscopic behavior of the entropic process of the homogeneous texts. The current work formulates the problem from a time series modeling approach applied to a large sociolinguistic corpus in written Chinese. The findings show that such an alternative approach not only produces as valid models with strong goodness of fit as the structural equation approach, but also exhibits, by design, additional benefits in explaining the entropic process of homogeneous texts in the dynamical complexity system framework. Some technical challenges, such as phase change in model calibration, are also solved with less cost using the newly proposed approach. Further directions are pointed out to more fully compare these approaches in the setup of the current study and corpus linguistics in general.

Original language	English
Pages (from-to)	2033-2052
Number of pages	20
Journal	Quality and Quantity
Volume	57
Issue number	3
DOIs	https://doi.org/10.1007/s11135-022-01451-4
Publication status	Published - 2022

Keywords

ARIMA model
Dynamical complexity
Entropy
Homogeneous texts
Time series

ASJC Scopus subject areas

Statistics and Probability
General Social Sciences

Access to Document

10.1007/s11135-022-01451-4

Cite this

@article{c488e69d5b1240bea5ac59f92842754d,

title = "Modelling the lexical complexity of homogenous texts: a time series approach",

abstract = "Lexical complexity of homogeneous texts, especially when produced by an institutional author over time, exhibits a generally observed increasing trend with local random fluctuations. Such an irreversible entropic process fits very cogently into the dynamical complexity system theory, where the social, economic, and cultural missions such texts set to serve constitute the underlying driving momentum for the texts to adapt themselves from low to high complexity. Structural equations have been shown effective in modeling such macroscopic behavior of the entropic process of the homogeneous texts. The current work formulates the problem from a time series modeling approach applied to a large sociolinguistic corpus in written Chinese. The findings show that such an alternative approach not only produces as valid models with strong goodness of fit as the structural equation approach, but also exhibits, by design, additional benefits in explaining the entropic process of homogeneous texts in the dynamical complexity system framework. Some technical challenges, such as phase change in model calibration, are also solved with less cost using the newly proposed approach. Further directions are pointed out to more fully compare these approaches in the setup of the current study and corpus linguistics in general.",

keywords = "ARIMA model, Dynamical complexity, Entropy, Homogeneous texts, Time series",

author = "Yanhui Zhang",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Nature B.V.",

year = "2022",

doi = "10.1007/s11135-022-01451-4",

language = "English",

volume = "57",

pages = "2033--2052",

journal = "Quality and Quantity",

issn = "0033-5177",

publisher = "Springer Netherlands",

number = "3",

}

TY - JOUR

T1 - Modelling the lexical complexity of homogenous texts

T2 - a time series approach

AU - Zhang, Yanhui

PY - 2022

Y1 - 2022

N2 - Lexical complexity of homogeneous texts, especially when produced by an institutional author over time, exhibits a generally observed increasing trend with local random fluctuations. Such an irreversible entropic process fits very cogently into the dynamical complexity system theory, where the social, economic, and cultural missions such texts set to serve constitute the underlying driving momentum for the texts to adapt themselves from low to high complexity. Structural equations have been shown effective in modeling such macroscopic behavior of the entropic process of the homogeneous texts. The current work formulates the problem from a time series modeling approach applied to a large sociolinguistic corpus in written Chinese. The findings show that such an alternative approach not only produces as valid models with strong goodness of fit as the structural equation approach, but also exhibits, by design, additional benefits in explaining the entropic process of homogeneous texts in the dynamical complexity system framework. Some technical challenges, such as phase change in model calibration, are also solved with less cost using the newly proposed approach. Further directions are pointed out to more fully compare these approaches in the setup of the current study and corpus linguistics in general.

AB - Lexical complexity of homogeneous texts, especially when produced by an institutional author over time, exhibits a generally observed increasing trend with local random fluctuations. Such an irreversible entropic process fits very cogently into the dynamical complexity system theory, where the social, economic, and cultural missions such texts set to serve constitute the underlying driving momentum for the texts to adapt themselves from low to high complexity. Structural equations have been shown effective in modeling such macroscopic behavior of the entropic process of the homogeneous texts. The current work formulates the problem from a time series modeling approach applied to a large sociolinguistic corpus in written Chinese. The findings show that such an alternative approach not only produces as valid models with strong goodness of fit as the structural equation approach, but also exhibits, by design, additional benefits in explaining the entropic process of homogeneous texts in the dynamical complexity system framework. Some technical challenges, such as phase change in model calibration, are also solved with less cost using the newly proposed approach. Further directions are pointed out to more fully compare these approaches in the setup of the current study and corpus linguistics in general.

KW - ARIMA model

KW - Dynamical complexity

KW - Entropy

KW - Homogeneous texts

KW - Time series

UR - http://www.scopus.com/inward/record.url?scp=85132178935&partnerID=8YFLogxK

U2 - 10.1007/s11135-022-01451-4

DO - 10.1007/s11135-022-01451-4

M3 - Article

AN - SCOPUS:85132178935

SN - 0033-5177

VL - 57

SP - 2033

EP - 2052

JO - Quality and Quantity

JF - Quality and Quantity

IS - 3

ER -

Modelling the lexical complexity of homogenous texts: a time series approach

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this