A molecular descriptor-based correlation with the composition of acid-pretreated cornstalk cultivation medium for biohydrogen production using a machine learning approach

Xiyue Zhang, Yixiao WANG, Jing Hu, Qingyue Zhang, Xiaoting Xuan, Lufang Shi, Yong Sun

Research output: Journal PublicationArticlepeer-review

3 Citations (Scopus)

Abstract

In this work, the machine learning (ML) was used to examine the relationship between physiochemical properties and concentration levels of 50 typical compounds derived from cornstalk acid hydrolysates during lignocellulosic pretreatment. These compounds, selected to represent the chemical matrix (with <32 % similarity), were analyzed using RDKit's MolecularDescriptorCalculator (MDC), which effectively reduced the number of extended-connectivity fingerprints (ECFP4) from 366 chemical descriptors to 19 key descriptors. Notably, compounds such as glucose, fructose, furfural, lactic acid, acetate, formic acid, 4-hydroxy-3-methoxycinnamic acid, and citric acid exhibited consistent hierarchical clustering in cultivation media before (Con_int) and after (Con_aft) fermentation. The chemical descriptors of Gasteiger charge and LogP were effective in illustrating subtle differences for those compounds. The TensorFlow (TF), demonstrated a stronger correlation (R2>75 %) between chemical descriptors and pre-fermentation concentrations (Con_int) compared to post-fermentation (Con_aft) from regression model evaluation. SHapley Additive exPlanations (SHAP) analysis was applied using TF algorithm to interpret the chemical properties that influence level of compounds in fermentation cultivation medium, with LogP, Gasteiger charge, and aromatic ring counts being the most influential for Con_int, and Kappa1, radius of gyration, and hydrogen donors for Con_aft. The lignocellulosic acid hydrolysates compounds library (LAHCL) was also constructed for future exploration of potential compounds during biohydrogen fermentation based on cheminformatics study. This cheminformatics approach offers valuable insights into predicting compound concentrations, biological activity and pool of relevant compounds for dark fermentation with reasonable accuracy.

Original languageEnglish
Pages (from-to)307-320
Number of pages14
JournalInternational Journal of Hydrogen Energy
Volume123
DOIs
Publication statusPublished - 29 Apr 2025

Keywords

  • Biohydrogen
  • Chemical descriptors
  • Cornstalk acid pretreatment
  • Machine learning

ASJC Scopus subject areas

  • Renewable Energy, Sustainability and the Environment
  • Fuel Technology
  • Condensed Matter Physics
  • Energy Engineering and Power Technology

Fingerprint

Dive into the research topics of 'A molecular descriptor-based correlation with the composition of acid-pretreated cornstalk cultivation medium for biohydrogen production using a machine learning approach'. Together they form a unique fingerprint.

Cite this