TY - JOUR
T1 - Modelling the lexical complexity of homogenous texts
T2 - a time series approach
AU - Zhang, Yanhui
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Nature B.V.
PY - 2022
Y1 - 2022
N2 - Lexical complexity of homogeneous texts, especially when produced by an institutional author over time, exhibits a generally observed increasing trend with local random fluctuations. Such an irreversible entropic process fits very cogently into the dynamical complexity system theory, where the social, economic, and cultural missions such texts set to serve constitute the underlying driving momentum for the texts to adapt themselves from low to high complexity. Structural equations have been shown effective in modeling such macroscopic behavior of the entropic process of the homogeneous texts. The current work formulates the problem from a time series modeling approach applied to a large sociolinguistic corpus in written Chinese. The findings show that such an alternative approach not only produces as valid models with strong goodness of fit as the structural equation approach, but also exhibits, by design, additional benefits in explaining the entropic process of homogeneous texts in the dynamical complexity system framework. Some technical challenges, such as phase change in model calibration, are also solved with less cost using the newly proposed approach. Further directions are pointed out to more fully compare these approaches in the setup of the current study and corpus linguistics in general.
AB - Lexical complexity of homogeneous texts, especially when produced by an institutional author over time, exhibits a generally observed increasing trend with local random fluctuations. Such an irreversible entropic process fits very cogently into the dynamical complexity system theory, where the social, economic, and cultural missions such texts set to serve constitute the underlying driving momentum for the texts to adapt themselves from low to high complexity. Structural equations have been shown effective in modeling such macroscopic behavior of the entropic process of the homogeneous texts. The current work formulates the problem from a time series modeling approach applied to a large sociolinguistic corpus in written Chinese. The findings show that such an alternative approach not only produces as valid models with strong goodness of fit as the structural equation approach, but also exhibits, by design, additional benefits in explaining the entropic process of homogeneous texts in the dynamical complexity system framework. Some technical challenges, such as phase change in model calibration, are also solved with less cost using the newly proposed approach. Further directions are pointed out to more fully compare these approaches in the setup of the current study and corpus linguistics in general.
KW - ARIMA model
KW - Dynamical complexity
KW - Entropy
KW - Homogeneous texts
KW - Time series
UR - http://www.scopus.com/inward/record.url?scp=85132178935&partnerID=8YFLogxK
U2 - 10.1007/s11135-022-01451-4
DO - 10.1007/s11135-022-01451-4
M3 - Article
AN - SCOPUS:85132178935
SN - 0033-5177
JO - Quality and Quantity
JF - Quality and Quantity
ER -