ReELFA: A scene text recognizer with encoded location and focused attention

Qingqing Wang; Wenjing Jia; Xiangjian He; Yue Lu; Michael Blumenstein; Ye Huang; Shujing Lyu

doi:10.1109/ICDARW.2019.40084

ReELFA: A scene text recognizer with encoded location and focused attention

Qingqing Wang, Wenjing Jia, Xiangjian He, Yue Lu, Michael Blumenstein, Ye Huang, Shujing Lyu

Research output: Contribution to conference › Paper › peer-review

5 Citations (Scopus)

Abstract

LSTM and attention mechanism have been widely used for scene text recognition. However, the existing LSTM-based recognizers usually convert 2D feature maps into 1D space by flattening or pooling operations, resulting in the neglect of spatial information of text images. Additionally, the attention drift problem, where models fail to align targets at proper feature regions, has a serious impact on the recognition performance of existing models. To tackle the above problems, in this paper, we propose a scene text Recognizer with Encoded Location and Focused Attention, i.e., ReELFA. Our ReELFA utilizes one-hot encoded coordinates to indicate the spatial relationship of pixels and character center masks to help focus attention on the right feature areas. Experiments conducted on the benchmarking datasets IIIT5K, SVT, CUTE and IC15 demonstrate that the proposed method achieves comparable performance on the regular, low-resolution and noisy text images, and outperforms state-of-the-art approaches on the more challenging curved text images.

Original language	English
Pages	71-76
Number of pages	6
DOIs	https://doi.org/10.1109/ICDARW.2019.40084
Publication status	Published - 2019
Externally published	Yes
Event	2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop - Sydney, Australia Duration: 21 Sept 2019 → 22 Sept 2019

Conference

Conference	2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop
Country/Territory	Australia
City	Sydney
Period	21/09/19 → 22/09/19

Keywords

Attention LSTM
Attention drift
Center masks
Encoded location

ASJC Scopus subject areas

Artificial Intelligence
Computer Vision and Pattern Recognition
Signal Processing
Media Technology

Access to Document

10.1109/ICDARW.2019.40084

Cite this

@conference{369ac2deb8104c2e8a100432b6881268,

title = "ReELFA: A scene text recognizer with encoded location and focused attention",

abstract = "LSTM and attention mechanism have been widely used for scene text recognition. However, the existing LSTM-based recognizers usually convert 2D feature maps into 1D space by flattening or pooling operations, resulting in the neglect of spatial information of text images. Additionally, the attention drift problem, where models fail to align targets at proper feature regions, has a serious impact on the recognition performance of existing models. To tackle the above problems, in this paper, we propose a scene text Recognizer with Encoded Location and Focused Attention, i.e., ReELFA. Our ReELFA utilizes one-hot encoded coordinates to indicate the spatial relationship of pixels and character center masks to help focus attention on the right feature areas. Experiments conducted on the benchmarking datasets IIIT5K, SVT, CUTE and IC15 demonstrate that the proposed method achieves comparable performance on the regular, low-resolution and noisy text images, and outperforms state-of-the-art approaches on the more challenging curved text images.",

keywords = "Attention LSTM, Attention drift, Center masks, Encoded location",

author = "Qingqing Wang and Wenjing Jia and Xiangjian He and Yue Lu and Michael Blumenstein and Ye Huang and Shujing Lyu",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop ; Conference date: 21-09-2019 Through 22-09-2019",

year = "2019",

doi = "10.1109/ICDARW.2019.40084",

language = "English",

pages = "71--76",

}

TY - CONF

T1 - ReELFA

T2 - 2nd International Workshop on Machine Learning, WML 2019 - ICDAR 2019 Workshop

AU - Wang, Qingqing

AU - Jia, Wenjing

AU - He, Xiangjian

AU - Lu, Yue

AU - Blumenstein, Michael

AU - Huang, Ye

AU - Lyu, Shujing

PY - 2019

Y1 - 2019

N2 - LSTM and attention mechanism have been widely used for scene text recognition. However, the existing LSTM-based recognizers usually convert 2D feature maps into 1D space by flattening or pooling operations, resulting in the neglect of spatial information of text images. Additionally, the attention drift problem, where models fail to align targets at proper feature regions, has a serious impact on the recognition performance of existing models. To tackle the above problems, in this paper, we propose a scene text Recognizer with Encoded Location and Focused Attention, i.e., ReELFA. Our ReELFA utilizes one-hot encoded coordinates to indicate the spatial relationship of pixels and character center masks to help focus attention on the right feature areas. Experiments conducted on the benchmarking datasets IIIT5K, SVT, CUTE and IC15 demonstrate that the proposed method achieves comparable performance on the regular, low-resolution and noisy text images, and outperforms state-of-the-art approaches on the more challenging curved text images.

AB - LSTM and attention mechanism have been widely used for scene text recognition. However, the existing LSTM-based recognizers usually convert 2D feature maps into 1D space by flattening or pooling operations, resulting in the neglect of spatial information of text images. Additionally, the attention drift problem, where models fail to align targets at proper feature regions, has a serious impact on the recognition performance of existing models. To tackle the above problems, in this paper, we propose a scene text Recognizer with Encoded Location and Focused Attention, i.e., ReELFA. Our ReELFA utilizes one-hot encoded coordinates to indicate the spatial relationship of pixels and character center masks to help focus attention on the right feature areas. Experiments conducted on the benchmarking datasets IIIT5K, SVT, CUTE and IC15 demonstrate that the proposed method achieves comparable performance on the regular, low-resolution and noisy text images, and outperforms state-of-the-art approaches on the more challenging curved text images.

KW - Attention LSTM

KW - Attention drift

KW - Center masks

KW - Encoded location

UR - http://www.scopus.com/inward/record.url?scp=85095363245&partnerID=8YFLogxK

U2 - 10.1109/ICDARW.2019.40084

DO - 10.1109/ICDARW.2019.40084

M3 - Paper

AN - SCOPUS:85095363245

SP - 71

EP - 76

Y2 - 21 September 2019 through 22 September 2019

ER -

ReELFA: A scene text recognizer with encoded location and focused attention

Abstract

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this