Abstract
Mining and expanding high-quality genetic parts for synthetic biology and bioengineering are urgent needs in the research and development of next-generation biotechnology. However, gene mining has relied on sequence homology or ample expert knowledge, which fundamentally limits the establishment of a comprehensive genetic part catalog. In this work, we propose SYMPLEX (synthetic biological part mining platform by large language model–enabled knowledge extraction), a universal gene-mining platform based on large language models. We applied SYMPLEX to mine enzymes responsible for messenger RNA (mRNA) capping, a key process in eukaryotic posttranscriptional modification, and obtained thousands of diverse candidates with traceable evidence from biomedical literature and databases. Of the 46 experimentally tested integral capping enzyme candidates, 14 demonstrated in vivo cross-species capping activity, and 2 displayed superior in vitro activity over the commercial vaccinia capping enzymes currently used in mRNA vaccine production. SYMPLEX provides a distinct paradigm for functional gene mining and offers powerful tools to facilitate knowledge discovery in fundamental research.
Original language | English |
---|---|
Article number | eadt0402 |
Journal | Science advances |
Volume | 11 |
Issue number | 15 |
DOIs | |
Publication status | Published - 11 Apr 2025 |
Externally published | Yes |
ASJC Scopus subject areas
- General