ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting

Zepeng Huang; Qi Wan; Junliang Chen; Xiaodong Zhao; Kai Ye; Linlin Shen

doi:10.1109/ICME55011.2023.00243

ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting

Zepeng Huang, Qi Wan, Junliang Chen, Xiaodong Zhao, Kai Ye, Linlin Shen

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

3 Citations (Scopus)

Abstract

Scene text spotting has attracted great attention in recent years. Compared with two-stage approaches that locate scene texts in the first stage and recognize them in the second stage, the advantages of joint location and recognition training are not fully explored. In this paper, we present an ADaptive RoI-Align based transformer for end-to-end Text Spotting (ADATS), which simultaneously locates and recognizes text with a single forward pass. By employing an Adaptive RoI-Align, the text features are extracted from the feature extraction network with the original aspect ratio, such that less information is lost during the alignment of arbitrarily-shaped scene text. Attention-based segmentation and recognition heads allow us to simultaneously optimize detection and recognition. Experiments on ICDAR 2015, MSRA-TD500, Total-Text, and CTW1500 demonstrate the effectiveness of our method.

Original language	English
Title of host publication	Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Publisher	IEEE Computer Society
Pages	1403-1408
Number of pages	6
ISBN (Electronic)	9781665468916
DOIs	https://doi.org/10.1109/ICME55011.2023.00243
Publication status	Published - 2023
Externally published	Yes
Event	2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, Australia Duration: 10 Jul 2023 → 14 Jul 2023

Publication series

Name	Proceedings - IEEE International Conference on Multimedia and Expo
Volume	2023-July
ISSN (Print)	1945-7871
ISSN (Electronic)	1945-788X

Conference

Conference	2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Country/Territory	Australia
City	Brisbane
Period	10/07/23 → 14/07/23

Keywords

End-to-end text spotting
segmentation
text detection
text recognition

ASJC Scopus subject areas

Computer Networks and Communications
Computer Science Applications

Access to Document

10.1109/ICME55011.2023.00243

Cite this

Huang, Z., Wan, Q., Chen, J., Zhao, X., Ye, K., & Shen, L. (2023). ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting. In Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023 (pp. 1403-1408). (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2023-July). IEEE Computer Society. https://doi.org/10.1109/ICME55011.2023.00243

@inproceedings{08e08f1d53be40948a65b8c450e2ad90,

title = "ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting",

abstract = "Scene text spotting has attracted great attention in recent years. Compared with two-stage approaches that locate scene texts in the first stage and recognize them in the second stage, the advantages of joint location and recognition training are not fully explored. In this paper, we present an ADaptive RoI-Align based transformer for end-to-end Text Spotting (ADATS), which simultaneously locates and recognizes text with a single forward pass. By employing an Adaptive RoI-Align, the text features are extracted from the feature extraction network with the original aspect ratio, such that less information is lost during the alignment of arbitrarily-shaped scene text. Attention-based segmentation and recognition heads allow us to simultaneously optimize detection and recognition. Experiments on ICDAR 2015, MSRA-TD500, Total-Text, and CTW1500 demonstrate the effectiveness of our method.",

keywords = "End-to-end text spotting, segmentation, text detection, text recognition",

author = "Zepeng Huang and Qi Wan and Junliang Chen and Xiaodong Zhao and Kai Ye and Linlin Shen",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE International Conference on Multimedia and Expo, ICME 2023 ; Conference date: 10-07-2023 Through 14-07-2023",

year = "2023",

doi = "10.1109/ICME55011.2023.00243",

language = "English",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

publisher = "IEEE Computer Society",

pages = "1403--1408",

booktitle = "Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023",

address = "United States",

}

Huang, Z, Wan, Q, Chen, J, Zhao, X, Ye, K & Shen, L 2023, ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting. in Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023. Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2023-July, IEEE Computer Society, pp. 1403-1408, 2023 IEEE International Conference on Multimedia and Expo, ICME 2023, Brisbane, Australia, 10/07/23. https://doi.org/10.1109/ICME55011.2023.00243

ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting. / Huang, Zepeng; Wan, Qi; Chen, Junliang et al.
Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023. IEEE Computer Society, 2023. p. 1403-1408 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2023-July).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - ADATS

T2 - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023

AU - Huang, Zepeng

AU - Wan, Qi

AU - Chen, Junliang

AU - Zhao, Xiaodong

AU - Ye, Kai

AU - Shen, Linlin

PY - 2023

Y1 - 2023

N2 - Scene text spotting has attracted great attention in recent years. Compared with two-stage approaches that locate scene texts in the first stage and recognize them in the second stage, the advantages of joint location and recognition training are not fully explored. In this paper, we present an ADaptive RoI-Align based transformer for end-to-end Text Spotting (ADATS), which simultaneously locates and recognizes text with a single forward pass. By employing an Adaptive RoI-Align, the text features are extracted from the feature extraction network with the original aspect ratio, such that less information is lost during the alignment of arbitrarily-shaped scene text. Attention-based segmentation and recognition heads allow us to simultaneously optimize detection and recognition. Experiments on ICDAR 2015, MSRA-TD500, Total-Text, and CTW1500 demonstrate the effectiveness of our method.

AB - Scene text spotting has attracted great attention in recent years. Compared with two-stage approaches that locate scene texts in the first stage and recognize them in the second stage, the advantages of joint location and recognition training are not fully explored. In this paper, we present an ADaptive RoI-Align based transformer for end-to-end Text Spotting (ADATS), which simultaneously locates and recognizes text with a single forward pass. By employing an Adaptive RoI-Align, the text features are extracted from the feature extraction network with the original aspect ratio, such that less information is lost during the alignment of arbitrarily-shaped scene text. Attention-based segmentation and recognition heads allow us to simultaneously optimize detection and recognition. Experiments on ICDAR 2015, MSRA-TD500, Total-Text, and CTW1500 demonstrate the effectiveness of our method.

KW - End-to-end text spotting

KW - segmentation

KW - text detection

KW - text recognition

UR - http://www.scopus.com/inward/record.url?scp=85171178317&partnerID=8YFLogxK

U2 - 10.1109/ICME55011.2023.00243

DO - 10.1109/ICME55011.2023.00243

M3 - Conference contribution

AN - SCOPUS:85171178317

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

SP - 1403

EP - 1408

BT - Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023

PB - IEEE Computer Society

Y2 - 10 July 2023 through 14 July 2023

ER -

ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this