Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning

Wentao He; Jianfeng Ren; Ruibin Bai; Xudong Jiang

doi:10.1145/3664647.3681246

Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning

Wentao He, Jianfeng Ren, Ruibin Bai, Xudong Jiang

School of Computer Science

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

3 Citations (Scopus)

Abstract

Advances in computer vision research enable human-like high-dimensional perceptual induction over analogical visual reasoning problems, such as Raven's Progressive Matrices (RPMs). In this paper, we propose a Hierarchical Perception and Predictive Analogy-Inference network (HP^2AI), consisting of three major components that tackle key challenges of RPM problems. Firstly, in view of the limited receptive fields of shallow networks in most existing RPM solvers, a perceptual encoder is proposed, consisting of a series of hierarchically coupled Patch Attention and Local Context (PALC) blocks, which could capture local attributes at early stages and capture the global panel layout at deep stages. Secondly, most methods seek for object-level similarities to map the context images directly to the answer image, while failing to extract the underlying analogies. The proposed reasoning module, Predictive Analogy-Inference (PredAI), consists of a set of Analogy-Inference Blocks (AIBs) to model and exploit the inherent analogical reasoning rules instead of object similarity. Lastly, the Squeeze-and-Excitation Channel-wise Attention (SECA) in the proposed PredAI discriminates essential attributes and analogies from irrelevant ones. Extensive experiments over four benchmark RPM datasets show that the proposed HP^2AI achieves significant performance gains over all the state-of-the-art methods consistently on all four datasets.

Original language	English
Title of host publication	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
Publisher	Association for Computing Machinery, Inc
Pages	4841-4850
Number of pages	10
ISBN (Electronic)	9798400706868
DOIs	https://doi.org/10.1145/3664647.3681246
Publication status	Published - 28 Oct 2024
Event	32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia Duration: 28 Oct 2024 → 1 Nov 2024

Publication series

Name	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

Conference

Conference	32nd ACM International Conference on Multimedia, MM 2024
Country/Territory	Australia
City	Melbourne
Period	28/10/24 → 1/11/24

Keywords

analogical visual reasoning
intelligence quotient test
predicting-and-verifying
raven's progressive matrix
transformer

ASJC Scopus subject areas

Artificial Intelligence
Computer Graphics and Computer-Aided Design
Human-Computer Interaction
Software

Access to Document

10.1145/3664647.3681246

Cite this

He, W., Ren, J., Bai, R., & Jiang, X. (2024). Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning. In MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia (pp. 4841-4850). (MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia). Association for Computing Machinery, Inc. https://doi.org/10.1145/3664647.3681246

@inproceedings{be0f71d3660546ba8d4432fc051e8936,

title = "Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning",

abstract = "Advances in computer vision research enable human-like high-dimensional perceptual induction over analogical visual reasoning problems, such as Raven's Progressive Matrices (RPMs). In this paper, we propose a Hierarchical Perception and Predictive Analogy-Inference network (HP^2AI), consisting of three major components that tackle key challenges of RPM problems. Firstly, in view of the limited receptive fields of shallow networks in most existing RPM solvers, a perceptual encoder is proposed, consisting of a series of hierarchically coupled Patch Attention and Local Context (PALC) blocks, which could capture local attributes at early stages and capture the global panel layout at deep stages. Secondly, most methods seek for object-level similarities to map the context images directly to the answer image, while failing to extract the underlying analogies. The proposed reasoning module, Predictive Analogy-Inference (PredAI), consists of a set of Analogy-Inference Blocks (AIBs) to model and exploit the inherent analogical reasoning rules instead of object similarity. Lastly, the Squeeze-and-Excitation Channel-wise Attention (SECA) in the proposed PredAI discriminates essential attributes and analogies from irrelevant ones. Extensive experiments over four benchmark RPM datasets show that the proposed HP^2AI achieves significant performance gains over all the state-of-the-art methods consistently on all four datasets.",

keywords = "analogical visual reasoning, intelligence quotient test, predicting-and-verifying, raven's progressive matrix, transformer",

author = "Wentao He and Jianfeng Ren and Ruibin Bai and Xudong Jiang",

note = "Publisher Copyright: {\textcopyright} 2024 ACM.; 32nd ACM International Conference on Multimedia, MM 2024 ; Conference date: 28-10-2024 Through 01-11-2024",

year = "2024",

month = oct,

day = "28",

doi = "10.1145/3664647.3681246",

language = "English",

series = "MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia",

publisher = "Association for Computing Machinery, Inc",

pages = "4841--4850",

booktitle = "MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia",

}

He, W, Ren, J , Bai, R & Jiang, X 2024, Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning. in MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia. MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, Association for Computing Machinery, Inc, pp. 4841-4850, 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, Australia, 28/10/24. https://doi.org/10.1145/3664647.3681246

Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning. / He, Wentao; Ren, Jianfeng ; Bai, Ruibin et al.
MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia. Association for Computing Machinery, Inc, 2024. p. 4841-4850 (MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning

AU - He, Wentao

AU - Ren, Jianfeng

AU - Bai, Ruibin

AU - Jiang, Xudong

PY - 2024/10/28

Y1 - 2024/10/28

N2 - Advances in computer vision research enable human-like high-dimensional perceptual induction over analogical visual reasoning problems, such as Raven's Progressive Matrices (RPMs). In this paper, we propose a Hierarchical Perception and Predictive Analogy-Inference network (HP^2AI), consisting of three major components that tackle key challenges of RPM problems. Firstly, in view of the limited receptive fields of shallow networks in most existing RPM solvers, a perceptual encoder is proposed, consisting of a series of hierarchically coupled Patch Attention and Local Context (PALC) blocks, which could capture local attributes at early stages and capture the global panel layout at deep stages. Secondly, most methods seek for object-level similarities to map the context images directly to the answer image, while failing to extract the underlying analogies. The proposed reasoning module, Predictive Analogy-Inference (PredAI), consists of a set of Analogy-Inference Blocks (AIBs) to model and exploit the inherent analogical reasoning rules instead of object similarity. Lastly, the Squeeze-and-Excitation Channel-wise Attention (SECA) in the proposed PredAI discriminates essential attributes and analogies from irrelevant ones. Extensive experiments over four benchmark RPM datasets show that the proposed HP^2AI achieves significant performance gains over all the state-of-the-art methods consistently on all four datasets.

AB - Advances in computer vision research enable human-like high-dimensional perceptual induction over analogical visual reasoning problems, such as Raven's Progressive Matrices (RPMs). In this paper, we propose a Hierarchical Perception and Predictive Analogy-Inference network (HP^2AI), consisting of three major components that tackle key challenges of RPM problems. Firstly, in view of the limited receptive fields of shallow networks in most existing RPM solvers, a perceptual encoder is proposed, consisting of a series of hierarchically coupled Patch Attention and Local Context (PALC) blocks, which could capture local attributes at early stages and capture the global panel layout at deep stages. Secondly, most methods seek for object-level similarities to map the context images directly to the answer image, while failing to extract the underlying analogies. The proposed reasoning module, Predictive Analogy-Inference (PredAI), consists of a set of Analogy-Inference Blocks (AIBs) to model and exploit the inherent analogical reasoning rules instead of object similarity. Lastly, the Squeeze-and-Excitation Channel-wise Attention (SECA) in the proposed PredAI discriminates essential attributes and analogies from irrelevant ones. Extensive experiments over four benchmark RPM datasets show that the proposed HP^2AI achieves significant performance gains over all the state-of-the-art methods consistently on all four datasets.

KW - analogical visual reasoning

KW - intelligence quotient test

KW - predicting-and-verifying

KW - raven's progressive matrix

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85209358673&partnerID=8YFLogxK

U2 - 10.1145/3664647.3681246

DO - 10.1145/3664647.3681246

M3 - Conference contribution

AN - SCOPUS:85209358673

T3 - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

SP - 4841

EP - 4850

BT - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

PB - Association for Computing Machinery, Inc

T2 - 32nd ACM International Conference on Multimedia, MM 2024

Y2 - 28 October 2024 through 1 November 2024

ER -

He W, Ren J , Bai R, Jiang X. Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning. In MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia. Association for Computing Machinery, Inc. 2024. p. 4841-4850. (MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia). doi: 10.1145/3664647.3681246

Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this