Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

Wenting Chen; Linlin Shen; Jingyang Lin; Jiebo Luo; Xiang Li; Yixuan Yuan

doi:10.18653/v1/2024.acl-long.514

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

Wenting Chen, Linlin Shen, Jingyang Lin, Jiebo Luo, Xiang Li, Yixuan Yuan

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

5 Citations (Scopus)

Abstract

Fine-grained vision-language models (VLM) have been widely used for inter-modality local alignment between the predefined fixed patches and textual words. However, in medical analysis, lesions exhibit varying sizes and positions, and using fixed patches may cause incomplete representations of lesions. Moreover, these methods provide explainability by using heatmaps to show the general image areas potentially associated with texts rather than specific regions, making their explanations not explicit and specific enough. To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce an Adaptive Patch extraction (AdaPatch) module to acquire adaptive patches for these regions adaptively. Aiming to provide explicit explainability for the CXR-report generation task, we propose an AdaMatch-based bidirectional LLM for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs AdaMatch to obtain the keywords for CXR images and 'keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets validate the effectiveness of our method and its superior performance over existing methods.

Original language	English
Title of host publication	Long Papers
Editors	Lun-Wei Ku, Andre F. T. Martins, Vivek Srikumar
Publisher	Association for Computational Linguistics (ACL)
Pages	9494-9509
Number of pages	16
ISBN (Electronic)	9798891760943
DOIs	https://doi.org/10.18653/v1/2024.acl-long.514
Publication status	Published - 2024
Externally published	Yes
Event	62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Bangkok, Thailand Duration: 11 Aug 2024 → 16 Aug 2024

Publication series

Name	Proceedings of the Annual Meeting of the Association for Computational Linguistics
Volume	1
ISSN (Print)	0736-587X

Conference

Conference	62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/Territory	Thailand
City	Bangkok
Period	11/08/24 → 16/08/24

ASJC Scopus subject areas

Computer Science Applications
Linguistics and Language
Language and Linguistics

Access to Document

10.18653/v1/2024.acl-long.514

Cite this

Chen, W., Shen, L., Lin, J., Luo, J., Li, X., & Yuan, Y. (2024). Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation. In L.-W. Ku, A. F. T. Martins, & V. Srikumar (Eds.), Long Papers (pp. 9494-9509). (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 1). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2024.acl-long.514

Chen, Wenting ; Shen, Linlin ; Lin, Jingyang et al. / Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation. Long Papers. editor / Lun-Wei Ku ; Andre F. T. Martins ; Vivek Srikumar. Association for Computational Linguistics (ACL), 2024. pp. 9494-9509 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

@inproceedings{1a1fa082659b4316bd7edcd7060143ff,

title = "Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation",

abstract = "Fine-grained vision-language models (VLM) have been widely used for inter-modality local alignment between the predefined fixed patches and textual words. However, in medical analysis, lesions exhibit varying sizes and positions, and using fixed patches may cause incomplete representations of lesions. Moreover, these methods provide explainability by using heatmaps to show the general image areas potentially associated with texts rather than specific regions, making their explanations not explicit and specific enough. To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce an Adaptive Patch extraction (AdaPatch) module to acquire adaptive patches for these regions adaptively. Aiming to provide explicit explainability for the CXR-report generation task, we propose an AdaMatch-based bidirectional LLM for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs AdaMatch to obtain the keywords for CXR images and 'keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets validate the effectiveness of our method and its superior performance over existing methods.",

author = "Wenting Chen and Linlin Shen and Jingyang Lin and Jiebo Luo and Xiang Li and Yixuan Yuan",

note = "Publisher Copyright: {\textcopyright} 2024 Association for Computational Linguistics.; 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 ; Conference date: 11-08-2024 Through 16-08-2024",

year = "2024",

doi = "10.18653/v1/2024.acl-long.514",

language = "English",

series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics (ACL)",

pages = "9494--9509",

editor = "Lun-Wei Ku and Martins, {Andre F. T.} and Vivek Srikumar",

booktitle = "Long Papers",

address = "United States",

}

Chen, W, Shen, L, Lin, J, Luo, J, Li, X & Yuan, Y 2024, Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation. in L-W Ku, AFT Martins & V Srikumar (eds), Long Papers. Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, Association for Computational Linguistics (ACL), pp. 9494-9509, 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand, 11/08/24. https://doi.org/10.18653/v1/2024.acl-long.514

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation. / Chen, Wenting; Shen, Linlin; Lin, Jingyang et al.
Long Papers. ed. / Lun-Wei Ku; Andre F. T. Martins; Vivek Srikumar. Association for Computational Linguistics (ACL), 2024. p. 9494-9509 (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 1).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

AU - Chen, Wenting

AU - Shen, Linlin

AU - Lin, Jingyang

AU - Luo, Jiebo

AU - Li, Xiang

AU - Yuan, Yixuan

PY - 2024

Y1 - 2024

N2 - Fine-grained vision-language models (VLM) have been widely used for inter-modality local alignment between the predefined fixed patches and textual words. However, in medical analysis, lesions exhibit varying sizes and positions, and using fixed patches may cause incomplete representations of lesions. Moreover, these methods provide explainability by using heatmaps to show the general image areas potentially associated with texts rather than specific regions, making their explanations not explicit and specific enough. To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce an Adaptive Patch extraction (AdaPatch) module to acquire adaptive patches for these regions adaptively. Aiming to provide explicit explainability for the CXR-report generation task, we propose an AdaMatch-based bidirectional LLM for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs AdaMatch to obtain the keywords for CXR images and 'keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets validate the effectiveness of our method and its superior performance over existing methods.

AB - Fine-grained vision-language models (VLM) have been widely used for inter-modality local alignment between the predefined fixed patches and textual words. However, in medical analysis, lesions exhibit varying sizes and positions, and using fixed patches may cause incomplete representations of lesions. Moreover, these methods provide explainability by using heatmaps to show the general image areas potentially associated with texts rather than specific regions, making their explanations not explicit and specific enough. To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce an Adaptive Patch extraction (AdaPatch) module to acquire adaptive patches for these regions adaptively. Aiming to provide explicit explainability for the CXR-report generation task, we propose an AdaMatch-based bidirectional LLM for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs AdaMatch to obtain the keywords for CXR images and 'keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets validate the effectiveness of our method and its superior performance over existing methods.

UR - http://www.scopus.com/inward/record.url?scp=85203839354&partnerID=8YFLogxK

U2 - 10.18653/v1/2024.acl-long.514

DO - 10.18653/v1/2024.acl-long.514

M3 - Conference contribution

AN - SCOPUS:85203839354

T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics

SP - 9494

EP - 9509

BT - Long Papers

A2 - Ku, Lun-Wei

A2 - Martins, Andre F. T.

A2 - Srikumar, Vivek

PB - Association for Computational Linguistics (ACL)

T2 - 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024

Y2 - 11 August 2024 through 16 August 2024

ER -

Chen W, Shen L, Lin J, Luo J, Li X, Yuan Y. Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation. In Ku LW, Martins AFT, Srikumar V, editors, Long Papers. Association for Computational Linguistics (ACL). 2024. p. 9494-9509. (Proceedings of the Annual Meeting of the Association for Computational Linguistics). doi: 10.18653/v1/2024.acl-long.514

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this