Biomedical Information Integration via Adaptive Large Language Model Construction

Xingsi Xue, Mu En Wu, Fazlullah Khan

Research output: Journal PublicationArticlepeer-review

Abstract

Integrating diverse biomedical knowledge information is essential to enhance the accuracy and efficiency of medical diagnoses, facilitate personalized treatment plans, and ultimately improve patient outcomes. However, Biomedical Information Integration (BII) faces significant challenges due to variations in terminology and the complex structure of entity descriptions across different datasets. A critical step in BII is biomedical entity alignment, which involves accurately identifying and matching equivalent entities across diverse datasets to ensure seamless data integration. In recent years, Large Language Model (LLMs), such as Bidirectional Encoder Representations from Transformers (BERTs), have emerged as valuable tools for discerning heterogeneous biomedical data due to their deep contextual embeddings and bidirectionality. However, different LLMs capture various nuances and complexity levels within the biomedical data, and none of them can ensure their effectiveness in all heterogeneous entity matching tasks. To address this issue, we propose a novel Two-Stage LLM construction (TSLLM) framework to adaptively select and combine LLMs for Biomedical Information Integration (BII). First, a Multi-Objective Genetic Programming (MOGP) algorithm is proposed for generating versatile high-level LLMs, and then, a Single-Objective Genetic Algorithm (SOGA) employs a confidence-based strategy is presented to combine the built LLMs, which can further improve the discriminative power of distinguishing heterogeneous entities. The experiment utilizes OAEI's entity matching datasets, i.e., Benchmark and Conference, along with LargeBio, Disease and Phenotype datasets to test the performance of TSLLM. The experimental findings validate the efficiency of TSLLM in adaptively differentiating heterogeneous biomedical entities, which significantly outperforms the leading entity matching techniques.

Original languageEnglish
JournalIEEE Journal of Biomedical and Health Informatics
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • Biomedical Information Integration
  • Genetic Algorithm
  • Genetic Programming
  • Large Language Model

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics
  • Electrical and Electronic Engineering
  • Health Information Management

Fingerprint

Dive into the research topics of 'Biomedical Information Integration via Adaptive Large Language Model Construction'. Together they form a unique fingerprint.

Cite this