With evidence-based measures, COVID-19 can be effectively controlled by advanced data analysis and prediction. However, while valuable insights are available, there is a shortage of robust and rigorous research on what factors shape COVID-19 transmissions at the city cluster level. Therefore, to bridge the research gap, we adopted a data-driven hierarchical modeling approach to identify the most influential factors in shaping COVID-19 transmissions across different Chinese cities and clusters. The data used in this study are from Chinese officials, and hierarchical modeling conclusions drawn from the analysis are systematic, multifaceted, and comprehensive. To further improve research rigor, the study utilizes SPSS, Python and RStudio to conduct multiple linear regression and polynomial best subset regression (PBSR) analysis for the hierarchical modeling. The regression model utilizes the magnitude of various relative factors in nine Chinese city clusters, including 45 cities at a different level of clusters, to examine these aspects from the city cluster scale, exploring the correlation between various factors of the cities. These initial 12 factors are comprised of ‘Urban population ratio’, ‘Retail sales of consumer goods’, ‘Number of tourists’, ‘Tourism Income’, ‘Ratio of the elderly population (> 60 year old) in this city’, ‘population density’, ‘Mobility scale (move in/inbound) during the spring festival’, ‘Ratio of Population and Health facilities’, ‘Jobless rate (%)’, ‘The straight-line distance from original epicenter Wuhan to this city’, ‘urban per capita GDP’, and ‘the prevalence of the COVID-19’. The study’s results provide rigorously-tested and evidence-based insights on most instrumental factors that shape COVID-19 transmissions across cities and regions in China. Overall, the study findings found that per capita GDP and population mobility rates were the most affected factors in the prevalence of COVID-19 in a city, which could inform health experts and government officials to design and develop evidence-based and effective public health policies that could curb the spread of the COVID-19 pandemic.
- multiple linear regression
- polynomial best subset regression