Modelling the lexical complexity of homogenous texts: a time series approach

Research output: Journal PublicationArticlepeer-review


Lexical complexity of homogeneous texts, especially when produced by an institutional author over time, exhibits a generally observed increasing trend with local random fluctuations. Such an irreversible entropic process fits very cogently into the dynamical complexity system theory, where the social, economic, and cultural missions such texts set to serve constitute the underlying driving momentum for the texts to adapt themselves from low to high complexity. Structural equations have been shown effective in modeling such macroscopic behavior of the entropic process of the homogeneous texts. The current work formulates the problem from a time series modeling approach applied to a large sociolinguistic corpus in written Chinese. The findings show that such an alternative approach not only produces as valid models with strong goodness of fit as the structural equation approach, but also exhibits, by design, additional benefits in explaining the entropic process of homogeneous texts in the dynamical complexity system framework. Some technical challenges, such as phase change in model calibration, are also solved with less cost using the newly proposed approach. Further directions are pointed out to more fully compare these approaches in the setup of the current study and corpus linguistics in general.

Original languageEnglish
Pages (from-to)2033-2052
Number of pages20
JournalQuality and Quantity
Issue number3
Publication statusPublished - 2022


  • ARIMA model
  • Dynamical complexity
  • Entropy
  • Homogeneous texts
  • Time series

ASJC Scopus subject areas

  • Statistics and Probability
  • Social Sciences (all)


Dive into the research topics of 'Modelling the lexical complexity of homogenous texts: a time series approach'. Together they form a unique fingerprint.

Cite this