Abstract
Integrating diverse biomedical knowledge information is essential to enhance the accuracy and efficiency of medical diagnoses, facilitate personalized treatment plans, and ultimately improve patient outcomes. However, Biomedical Information Integration (BII) faces significant challenges due to variations in terminology and the complex structure of entity descriptions across different datasets. A critical step in BII is biomedical entity alignment, which involves accurately identifying and matching equivalent entities across diverse datasets to ensure seamless data integration. In recent years, Large Language Model (LLMs), such as Bidirectional Encoder Representations from Transformers (BERTs), have emerged as valuable tools for discerning heterogeneous biomedical data due to their deep contextual embeddings and bidirectionality. However, different LLMs capture various nuances and complexity levels within the biomedical data, and none of them can ensure their effectiveness in all heterogeneous entity matching tasks. To address this issue, we propose a novel Two-Stage LLM construction (TSLLM) framework to adaptively select and combine LLMs for Biomedical Information Integration (BII). First, a Multi-Objective Genetic Programming (MOGP) algorithm is proposed for generating versatile high-level LLMs, and then, a Single-Objective Genetic Algorithm (SOGA) employs a confidence-based strategy is presented to combine the built LLMs, which can further improve the discriminative power of distinguishing heterogeneous entities. The experiment utilizes OAEI's entity matching datasets, i.e., Benchmark and Conference, along with LargeBio, Disease and Phenotype datasets to test the performance of TSLLM. The experimental findings validate the efficiency of TSLLM in adaptively differentiating heterogeneous biomedical entities, which significantly outperforms the leading entity matching techniques.
Original language | English |
---|---|
Journal | IEEE Journal of Biomedical and Health Informatics |
DOIs | |
Publication status | Accepted/In press - 2024 |
Keywords
- Biomedical Information Integration
- Genetic Algorithm
- Genetic Programming
- Large Language Model
ASJC Scopus subject areas
- Computer Science Applications
- Health Informatics
- Electrical and Electronic Engineering
- Health Information Management