Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning

Wentao He; Jialu Zhang; Jianfeng Ren; Ruibin Bai; Xudong Jiang

doi:10.1609/aaai.v37i1.25072

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning

Wentao He, Jialu Zhang, Jianfeng Ren, Ruibin Bai, Xudong Jiang

School of Computer Science

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

17 Citations (Scopus)

Abstract

Raven’s Progressive Matrices (RPMs) have been widely used to evaluate the visual reasoning ability of humans. To tackle the challenges of visual perception and logic reasoning on RPMs, we propose a Hierarchical ConViT with Attention-based Relational Reasoner (HCV-ARR). Traditional solution methods often apply relatively shallow convolution networks to visually perceive shape patterns in RPM images, which may not fully model the long-range dependencies of complex pattern combinations in RPMs. The proposed ConViT consists of a convolutional block to capture the low-level attributes of visual patterns, and a transformer block to capture the high-level image semantics such as pattern formations. Furthermore, the proposed hierarchical ConViT captures visual features from multiple receptive fields, where the shallow layers focus on the image fine details while the deeper layers focus on the image semantics. To better model the underlying reasoning rules embedded in RPM images, an Attention-based Relational Reasoner (ARR) is proposed to establish the underlying relations among images. The proposed ARR well exploits the hidden relations among question images through the developed element-wise attentive reasoner. Experimental results on three RPM datasets demonstrate that the proposed HCV-ARR achieves a significant performance gain compared with the state-of-the-art models. The source code is available at: https://github.com/wentaoheunnc/HCV-ARR.

Original language	English
Title of host publication	AAAI-23 Technical Tracks 1
Editors	Brian Williams, Yiling Chen, Jennifer Neville
Publisher	AAAI Press
Pages	22-30
Number of pages	9
ISBN (Electronic)	9781577358800
DOIs	https://doi.org/10.1609/aaai.v37i1.25072
Publication status	Published - 27 Jun 2023
Event	37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States Duration: 7 Feb 2023 → 14 Feb 2023

Publication series

Name	Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Volume	37

Conference

Conference	37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/Territory	United States
City	Washington
Period	7/02/23 → 14/02/23

ASJC Scopus subject areas

Artificial Intelligence

Access to Document

10.1609/aaai.v37i1.25072

Cite this

He, W., Zhang, J., Ren, J., Bai, R., & Jiang, X. (2023). Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning. In B. Williams, Y. Chen, & J. Neville (Eds.), AAAI-23 Technical Tracks 1 (pp. 22-30). (Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023; Vol. 37). AAAI Press. https://doi.org/10.1609/aaai.v37i1.25072

@inproceedings{2f467b1989444fe39b59c9ba626151fa,

title = "Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning",

abstract = "Raven{\textquoteright}s Progressive Matrices (RPMs) have been widely used to evaluate the visual reasoning ability of humans. To tackle the challenges of visual perception and logic reasoning on RPMs, we propose a Hierarchical ConViT with Attention-based Relational Reasoner (HCV-ARR). Traditional solution methods often apply relatively shallow convolution networks to visually perceive shape patterns in RPM images, which may not fully model the long-range dependencies of complex pattern combinations in RPMs. The proposed ConViT consists of a convolutional block to capture the low-level attributes of visual patterns, and a transformer block to capture the high-level image semantics such as pattern formations. Furthermore, the proposed hierarchical ConViT captures visual features from multiple receptive fields, where the shallow layers focus on the image fine details while the deeper layers focus on the image semantics. To better model the underlying reasoning rules embedded in RPM images, an Attention-based Relational Reasoner (ARR) is proposed to establish the underlying relations among images. The proposed ARR well exploits the hidden relations among question images through the developed element-wise attentive reasoner. Experimental results on three RPM datasets demonstrate that the proposed HCV-ARR achieves a significant performance gain compared with the state-of-the-art models. The source code is available at: https://github.com/wentaoheunnc/HCV-ARR.",

author = "Wentao He and Jialu Zhang and Jianfeng Ren and Ruibin Bai and Xudong Jiang",

note = "Publisher Copyright: Copyright {\textcopyright} 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 37th AAAI Conference on Artificial Intelligence, AAAI 2023 ; Conference date: 07-02-2023 Through 14-02-2023",

year = "2023",

month = jun,

day = "27",

doi = "10.1609/aaai.v37i1.25072",

language = "English",

series = "Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023",

publisher = "AAAI Press",

pages = "22--30",

editor = "Brian Williams and Yiling Chen and Jennifer Neville",

booktitle = "AAAI-23 Technical Tracks 1",

}

He, W, Zhang, J, Ren, J , Bai, R & Jiang, X 2023, Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning. in B Williams, Y Chen & J Neville (eds), AAAI-23 Technical Tracks 1. Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023, vol. 37, AAAI Press, pp. 22-30, 37th AAAI Conference on Artificial Intelligence, AAAI 2023, Washington, United States, 7/02/23. https://doi.org/10.1609/aaai.v37i1.25072

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning. / He, Wentao; Zhang, Jialu; Ren, Jianfeng et al.
AAAI-23 Technical Tracks 1. ed. / Brian Williams; Yiling Chen; Jennifer Neville. AAAI Press, 2023. p. 22-30 (Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023; Vol. 37).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning

AU - He, Wentao

AU - Zhang, Jialu

AU - Ren, Jianfeng

AU - Bai, Ruibin

AU - Jiang, Xudong

PY - 2023/6/27

Y1 - 2023/6/27

N2 - Raven’s Progressive Matrices (RPMs) have been widely used to evaluate the visual reasoning ability of humans. To tackle the challenges of visual perception and logic reasoning on RPMs, we propose a Hierarchical ConViT with Attention-based Relational Reasoner (HCV-ARR). Traditional solution methods often apply relatively shallow convolution networks to visually perceive shape patterns in RPM images, which may not fully model the long-range dependencies of complex pattern combinations in RPMs. The proposed ConViT consists of a convolutional block to capture the low-level attributes of visual patterns, and a transformer block to capture the high-level image semantics such as pattern formations. Furthermore, the proposed hierarchical ConViT captures visual features from multiple receptive fields, where the shallow layers focus on the image fine details while the deeper layers focus on the image semantics. To better model the underlying reasoning rules embedded in RPM images, an Attention-based Relational Reasoner (ARR) is proposed to establish the underlying relations among images. The proposed ARR well exploits the hidden relations among question images through the developed element-wise attentive reasoner. Experimental results on three RPM datasets demonstrate that the proposed HCV-ARR achieves a significant performance gain compared with the state-of-the-art models. The source code is available at: https://github.com/wentaoheunnc/HCV-ARR.

AB - Raven’s Progressive Matrices (RPMs) have been widely used to evaluate the visual reasoning ability of humans. To tackle the challenges of visual perception and logic reasoning on RPMs, we propose a Hierarchical ConViT with Attention-based Relational Reasoner (HCV-ARR). Traditional solution methods often apply relatively shallow convolution networks to visually perceive shape patterns in RPM images, which may not fully model the long-range dependencies of complex pattern combinations in RPMs. The proposed ConViT consists of a convolutional block to capture the low-level attributes of visual patterns, and a transformer block to capture the high-level image semantics such as pattern formations. Furthermore, the proposed hierarchical ConViT captures visual features from multiple receptive fields, where the shallow layers focus on the image fine details while the deeper layers focus on the image semantics. To better model the underlying reasoning rules embedded in RPM images, an Attention-based Relational Reasoner (ARR) is proposed to establish the underlying relations among images. The proposed ARR well exploits the hidden relations among question images through the developed element-wise attentive reasoner. Experimental results on three RPM datasets demonstrate that the proposed HCV-ARR achieves a significant performance gain compared with the state-of-the-art models. The source code is available at: https://github.com/wentaoheunnc/HCV-ARR.

UR - http://www.scopus.com/inward/record.url?scp=85150326245&partnerID=8YFLogxK

U2 - 10.1609/aaai.v37i1.25072

DO - 10.1609/aaai.v37i1.25072

M3 - Conference contribution

AN - SCOPUS:85150326245

T3 - Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023

SP - 22

EP - 30

BT - AAAI-23 Technical Tracks 1

A2 - Williams, Brian

A2 - Chen, Yiling

A2 - Neville, Jennifer

PB - AAAI Press

T2 - 37th AAAI Conference on Artificial Intelligence, AAAI 2023

Y2 - 7 February 2023 through 14 February 2023

ER -

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this