Discovery of diverse and high-quality mRNA capping enzymes through a language model–enabled platform

Tianze Wang, Bowen R. Qin, Sihong Li, Zimo Wang, Xuejian Li, Yuanxu Jiang, Chenrui Qin, Qi Ouyang, Chunbo Lou, Long Qian

Research output: Journal PublicationArticlepeer-review

Abstract

Mining and expanding high-quality genetic parts for synthetic biology and bioengineering are urgent needs in the research and development of next-generation biotechnology. However, gene mining has relied on sequence homology or ample expert knowledge, which fundamentally limits the establishment of a comprehensive genetic part catalog. In this work, we propose SYMPLEX (synthetic biological part mining platform by large language model–enabled knowledge extraction), a universal gene-mining platform based on large language models. We applied SYMPLEX to mine enzymes responsible for messenger RNA (mRNA) capping, a key process in eukaryotic posttranscriptional modification, and obtained thousands of diverse candidates with traceable evidence from biomedical literature and databases. Of the 46 experimentally tested integral capping enzyme candidates, 14 demonstrated in vivo cross-species capping activity, and 2 displayed superior in vitro activity over the commercial vaccinia capping enzymes currently used in mRNA vaccine production. SYMPLEX provides a distinct paradigm for functional gene mining and offers powerful tools to facilitate knowledge discovery in fundamental research.

Original languageEnglish
Article numbereadt0402
JournalScience advances
Volume11
Issue number15
DOIs
Publication statusPublished - 11 Apr 2025
Externally publishedYes

ASJC Scopus subject areas

  • General

Cite this