Abstract
In this work, the machine learning (ML) was used to examine the relationship between physiochemical properties and concentration levels of 50 typical compounds derived from cornstalk acid hydrolysates during lignocellulosic pretreatment. These compounds, selected to represent the chemical matrix (with <32 % similarity), were analyzed using RDKit's MolecularDescriptorCalculator (MDC), which effectively reduced the number of extended-connectivity fingerprints (ECFP4) from 366 chemical descriptors to 19 key descriptors. Notably, compounds such as glucose, fructose, furfural, lactic acid, acetate, formic acid, 4-hydroxy-3-methoxycinnamic acid, and citric acid exhibited consistent hierarchical clustering in cultivation media before (Con_int) and after (Con_aft) fermentation. The chemical descriptors of Gasteiger charge and LogP were effective in illustrating subtle differences for those compounds. The TensorFlow (TF), demonstrated a stronger correlation (R2>75 %) between chemical descriptors and pre-fermentation concentrations (Con_int) compared to post-fermentation (Con_aft) from regression model evaluation. SHapley Additive exPlanations (SHAP) analysis was applied using TF algorithm to interpret the chemical properties that influence level of compounds in fermentation cultivation medium, with LogP, Gasteiger charge, and aromatic ring counts being the most influential for Con_int, and Kappa1, radius of gyration, and hydrogen donors for Con_aft. The lignocellulosic acid hydrolysates compounds library (LAHCL) was also constructed for future exploration of potential compounds during biohydrogen fermentation based on cheminformatics study. This cheminformatics approach offers valuable insights into predicting compound concentrations, biological activity and pool of relevant compounds for dark fermentation with reasonable accuracy.
| Original language | English |
|---|---|
| Pages (from-to) | 307-320 |
| Number of pages | 14 |
| Journal | International Journal of Hydrogen Energy |
| Volume | 123 |
| DOIs | |
| Publication status | Published - 29 Apr 2025 |
Keywords
- Biohydrogen
- Chemical descriptors
- Cornstalk acid pretreatment
- Machine learning
ASJC Scopus subject areas
- Renewable Energy, Sustainability and the Environment
- Fuel Technology
- Condensed Matter Physics
- Energy Engineering and Power Technology