TY - GEN
T1 - Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning
AU - He, Wentao
AU - Ren, Jianfeng
AU - Bai, Ruibin
AU - Jiang, Xudong
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/10/28
Y1 - 2024/10/28
N2 - Advances in computer vision research enable human-like high-dimensional perceptual induction over analogical visual reasoning problems, such as Raven's Progressive Matrices (RPMs). In this paper, we propose a Hierarchical Perception and Predictive Analogy-Inference network (HP^2AI), consisting of three major components that tackle key challenges of RPM problems. Firstly, in view of the limited receptive fields of shallow networks in most existing RPM solvers, a perceptual encoder is proposed, consisting of a series of hierarchically coupled Patch Attention and Local Context (PALC) blocks, which could capture local attributes at early stages and capture the global panel layout at deep stages. Secondly, most methods seek for object-level similarities to map the context images directly to the answer image, while failing to extract the underlying analogies. The proposed reasoning module, Predictive Analogy-Inference (PredAI), consists of a set of Analogy-Inference Blocks (AIBs) to model and exploit the inherent analogical reasoning rules instead of object similarity. Lastly, the Squeeze-and-Excitation Channel-wise Attention (SECA) in the proposed PredAI discriminates essential attributes and analogies from irrelevant ones. Extensive experiments over four benchmark RPM datasets show that the proposed HP^2AI achieves significant performance gains over all the state-of-the-art methods consistently on all four datasets.
AB - Advances in computer vision research enable human-like high-dimensional perceptual induction over analogical visual reasoning problems, such as Raven's Progressive Matrices (RPMs). In this paper, we propose a Hierarchical Perception and Predictive Analogy-Inference network (HP^2AI), consisting of three major components that tackle key challenges of RPM problems. Firstly, in view of the limited receptive fields of shallow networks in most existing RPM solvers, a perceptual encoder is proposed, consisting of a series of hierarchically coupled Patch Attention and Local Context (PALC) blocks, which could capture local attributes at early stages and capture the global panel layout at deep stages. Secondly, most methods seek for object-level similarities to map the context images directly to the answer image, while failing to extract the underlying analogies. The proposed reasoning module, Predictive Analogy-Inference (PredAI), consists of a set of Analogy-Inference Blocks (AIBs) to model and exploit the inherent analogical reasoning rules instead of object similarity. Lastly, the Squeeze-and-Excitation Channel-wise Attention (SECA) in the proposed PredAI discriminates essential attributes and analogies from irrelevant ones. Extensive experiments over four benchmark RPM datasets show that the proposed HP^2AI achieves significant performance gains over all the state-of-the-art methods consistently on all four datasets.
KW - analogical visual reasoning
KW - intelligence quotient test
KW - predicting-and-verifying
KW - raven's progressive matrix
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85209358673&partnerID=8YFLogxK
U2 - 10.1145/3664647.3681246
DO - 10.1145/3664647.3681246
M3 - Conference contribution
AN - SCOPUS:85209358673
T3 - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
SP - 4841
EP - 4850
BT - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 32nd ACM International Conference on Multimedia, MM 2024
Y2 - 28 October 2024 through 1 November 2024
ER -