Arbitrary-Shape Scene Text Detection via Visual-Relational Rectification and Contour Approximation

Chengpei Xu; Wenjing Jia; Tingcheng Cui; Ruomei Wang; Yuan Fang Zhang; Xiangjian He

doi:10.1109/TMM.2022.3171085

Arbitrary-Shape Scene Text Detection via Visual-Relational Rectification and Contour Approximation

Chengpei Xu, Wenjing Jia, Tingcheng Cui, Ruomei Wang, Yuan Fang Zhang, Xiangjian He

School of Computer Science

Research output: Journal Publication › Article › peer-review

7 Citations (Scopus)

Abstract

One trend in the latest bottom-up approaches for arbitrary-shape scene text detection is to determine the links between text segments using Graph Convolutional Networks (GCNs). However, the performance of these bottom-up methods is still inferior to that of state-of-the-art top-down methods even with the help of GCNs. We argue that a cause of this is that bottom-up methods fail to make proper use of visual-relational features, which results in accumulated false detection, as well as the error-prone route-finding used for grouping text segments. In this paper, we improve classic bottom-up text detection frameworks by fusing the visual-relational features of text with two effective false positive/negative suppression (FPNS) mechanisms and developing a new shape-approximation strategy. First, dense overlapping text segments depicting the 'characterness' and 'streamline' properties of text are constructed and used in weakly supervised node classification to filter the falsely detected text segments. Then, relational features and visual features of text segments are fused with a novel Location-Aware Transfer (LAT) module and Fuse Decoding (FD) module to jointly rectify the detected text segments. Finally, a novel multiple-text-map-aware contour-approximation strategy is developed based on the rectified text segments, instead of the error-prone route-finding process, to generate the final contour of the detected text. Experiments conducted on five benchmark datasets demonstrate that our method outperforms the state-of-the-art performance when embedded in a classic text detection framework, which revitalizes the strengths of bottom-up methods.

Original language	English
Pages (from-to)	4052-4066
Number of pages	15
Journal	IEEE Transactions on Multimedia
Volume	25
DOIs	https://doi.org/10.1109/TMM.2022.3171085
Publication status	Published - 2022

Keywords

Arbitrary-shape scene text detection
bottom-up method
false positive/negative suppression
relational reasoning

ASJC Scopus subject areas

Signal Processing
Electrical and Electronic Engineering
Media Technology
Computer Science Applications

Access to Document

10.1109/TMM.2022.3171085

Cite this

@article{c1eeabfa734c4cddaaf4ed0e4b7d25c5,

title = "Arbitrary-Shape Scene Text Detection via Visual-Relational Rectification and Contour Approximation",

abstract = "One trend in the latest bottom-up approaches for arbitrary-shape scene text detection is to determine the links between text segments using Graph Convolutional Networks (GCNs). However, the performance of these bottom-up methods is still inferior to that of state-of-the-art top-down methods even with the help of GCNs. We argue that a cause of this is that bottom-up methods fail to make proper use of visual-relational features, which results in accumulated false detection, as well as the error-prone route-finding used for grouping text segments. In this paper, we improve classic bottom-up text detection frameworks by fusing the visual-relational features of text with two effective false positive/negative suppression (FPNS) mechanisms and developing a new shape-approximation strategy. First, dense overlapping text segments depicting the 'characterness' and 'streamline' properties of text are constructed and used in weakly supervised node classification to filter the falsely detected text segments. Then, relational features and visual features of text segments are fused with a novel Location-Aware Transfer (LAT) module and Fuse Decoding (FD) module to jointly rectify the detected text segments. Finally, a novel multiple-text-map-aware contour-approximation strategy is developed based on the rectified text segments, instead of the error-prone route-finding process, to generate the final contour of the detected text. Experiments conducted on five benchmark datasets demonstrate that our method outperforms the state-of-the-art performance when embedded in a classic text detection framework, which revitalizes the strengths of bottom-up methods.",

keywords = "Arbitrary-shape scene text detection, bottom-up method, false positive/negative suppression, relational reasoning",

author = "Chengpei Xu and Wenjing Jia and Tingcheng Cui and Ruomei Wang and Zhang, {Yuan Fang} and Xiangjian He",

note = "Publisher Copyright: IEEE",

year = "2022",

doi = "10.1109/TMM.2022.3171085",

language = "English",

volume = "25",

pages = "4052--4066",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Arbitrary-Shape Scene Text Detection via Visual-Relational Rectification and Contour Approximation

AU - Xu, Chengpei

AU - Jia, Wenjing

AU - Cui, Tingcheng

AU - Wang, Ruomei

AU - Zhang, Yuan Fang

AU - He, Xiangjian

N1 - Publisher Copyright: IEEE

PY - 2022

Y1 - 2022

N2 - One trend in the latest bottom-up approaches for arbitrary-shape scene text detection is to determine the links between text segments using Graph Convolutional Networks (GCNs). However, the performance of these bottom-up methods is still inferior to that of state-of-the-art top-down methods even with the help of GCNs. We argue that a cause of this is that bottom-up methods fail to make proper use of visual-relational features, which results in accumulated false detection, as well as the error-prone route-finding used for grouping text segments. In this paper, we improve classic bottom-up text detection frameworks by fusing the visual-relational features of text with two effective false positive/negative suppression (FPNS) mechanisms and developing a new shape-approximation strategy. First, dense overlapping text segments depicting the 'characterness' and 'streamline' properties of text are constructed and used in weakly supervised node classification to filter the falsely detected text segments. Then, relational features and visual features of text segments are fused with a novel Location-Aware Transfer (LAT) module and Fuse Decoding (FD) module to jointly rectify the detected text segments. Finally, a novel multiple-text-map-aware contour-approximation strategy is developed based on the rectified text segments, instead of the error-prone route-finding process, to generate the final contour of the detected text. Experiments conducted on five benchmark datasets demonstrate that our method outperforms the state-of-the-art performance when embedded in a classic text detection framework, which revitalizes the strengths of bottom-up methods.

AB - One trend in the latest bottom-up approaches for arbitrary-shape scene text detection is to determine the links between text segments using Graph Convolutional Networks (GCNs). However, the performance of these bottom-up methods is still inferior to that of state-of-the-art top-down methods even with the help of GCNs. We argue that a cause of this is that bottom-up methods fail to make proper use of visual-relational features, which results in accumulated false detection, as well as the error-prone route-finding used for grouping text segments. In this paper, we improve classic bottom-up text detection frameworks by fusing the visual-relational features of text with two effective false positive/negative suppression (FPNS) mechanisms and developing a new shape-approximation strategy. First, dense overlapping text segments depicting the 'characterness' and 'streamline' properties of text are constructed and used in weakly supervised node classification to filter the falsely detected text segments. Then, relational features and visual features of text segments are fused with a novel Location-Aware Transfer (LAT) module and Fuse Decoding (FD) module to jointly rectify the detected text segments. Finally, a novel multiple-text-map-aware contour-approximation strategy is developed based on the rectified text segments, instead of the error-prone route-finding process, to generate the final contour of the detected text. Experiments conducted on five benchmark datasets demonstrate that our method outperforms the state-of-the-art performance when embedded in a classic text detection framework, which revitalizes the strengths of bottom-up methods.

KW - Arbitrary-shape scene text detection

KW - bottom-up method

KW - false positive/negative suppression

KW - relational reasoning

UR - http://www.scopus.com/inward/record.url?scp=85129420527&partnerID=8YFLogxK

U2 - 10.1109/TMM.2022.3171085

DO - 10.1109/TMM.2022.3171085

M3 - Article

AN - SCOPUS:85129420527

SN - 1520-9210

VL - 25

SP - 4052

EP - 4066

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Arbitrary-Shape Scene Text Detection via Visual-Relational Rectification and Contour Approximation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this