Two-stage Rule-induction visual reasoning on RPMs with an application to video prediction

Wentao He; Jianfeng Ren; Ruibin Bai; Xudong Jiang

doi:10.1016/j.patcog.2024.111151

Two-stage Rule-induction visual reasoning on RPMs with an application to video prediction

Wentao He, Jianfeng Ren, Ruibin Bai, Xudong Jiang

School of Computer Science

Research output: Journal Publication › Article › peer-review

1 Citation (Scopus)

Abstract

Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable efforts in developing systems to automatically solve the RPM problem, often through a black-box end-to-end convolutional neural network for both visual recognition and logical reasoning tasks. Based on the intrinsic natures of RPM problem, we propose a Two-stage Rule-Induction Visual Reasoner (TRIVR), which consists of a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we further propose a “2+1” formulation that models human's thinking in solving RPMs and significantly reduces the model complexity. It derives a reasoning rule from each RPM sample, which is not feasible for existing methods. As a result, the proposed reasoning module is capable of yielding a set of reasoning rules modeling human in solving the RPM problems. To validate the proposed method on real-world applications, an RPM-like Video Prediction (RVP) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames. Experimental results on various RPM-like datasets demonstrate that the proposed TRIVR achieves a significant and consistent performance gain compared with state-of-the-art models.

Original language	English
Article number	111151
Journal	Pattern Recognition
Volume	160
DOIs	https://doi.org/10.1016/j.patcog.2024.111151
Publication status	Published - Apr 2025

Keywords

Raven's progressive matrices
Video prediction
Visual reasoning

ASJC Scopus subject areas

Software
Signal Processing
Computer Vision and Pattern Recognition
Artificial Intelligence

Access to Document

10.1016/j.patcog.2024.111151

Cite this

@article{c028cafa85484c93bb751438eef9c61c,

title = "Two-stage Rule-induction visual reasoning on RPMs with an application to video prediction",

abstract = "Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable efforts in developing systems to automatically solve the RPM problem, often through a black-box end-to-end convolutional neural network for both visual recognition and logical reasoning tasks. Based on the intrinsic natures of RPM problem, we propose a Two-stage Rule-Induction Visual Reasoner (TRIVR), which consists of a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we further propose a “2+1” formulation that models human's thinking in solving RPMs and significantly reduces the model complexity. It derives a reasoning rule from each RPM sample, which is not feasible for existing methods. As a result, the proposed reasoning module is capable of yielding a set of reasoning rules modeling human in solving the RPM problems. To validate the proposed method on real-world applications, an RPM-like Video Prediction (RVP) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames. Experimental results on various RPM-like datasets demonstrate that the proposed TRIVR achieves a significant and consistent performance gain compared with state-of-the-art models.",

keywords = "Raven's progressive matrices, Video prediction, Visual reasoning",

author = "Wentao He and Jianfeng Ren and Ruibin Bai and Xudong Jiang",

note = "Publisher Copyright: {\textcopyright} 2024 The Authors",

year = "2025",

month = apr,

doi = "10.1016/j.patcog.2024.111151",

language = "English",

volume = "160",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Two-stage Rule-induction visual reasoning on RPMs with an application to video prediction

AU - He, Wentao

AU - Ren, Jianfeng

AU - Bai, Ruibin

AU - Jiang, Xudong

PY - 2025/4

Y1 - 2025/4

N2 - Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable efforts in developing systems to automatically solve the RPM problem, often through a black-box end-to-end convolutional neural network for both visual recognition and logical reasoning tasks. Based on the intrinsic natures of RPM problem, we propose a Two-stage Rule-Induction Visual Reasoner (TRIVR), which consists of a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we further propose a “2+1” formulation that models human's thinking in solving RPMs and significantly reduces the model complexity. It derives a reasoning rule from each RPM sample, which is not feasible for existing methods. As a result, the proposed reasoning module is capable of yielding a set of reasoning rules modeling human in solving the RPM problems. To validate the proposed method on real-world applications, an RPM-like Video Prediction (RVP) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames. Experimental results on various RPM-like datasets demonstrate that the proposed TRIVR achieves a significant and consistent performance gain compared with state-of-the-art models.

AB - Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable efforts in developing systems to automatically solve the RPM problem, often through a black-box end-to-end convolutional neural network for both visual recognition and logical reasoning tasks. Based on the intrinsic natures of RPM problem, we propose a Two-stage Rule-Induction Visual Reasoner (TRIVR), which consists of a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we further propose a “2+1” formulation that models human's thinking in solving RPMs and significantly reduces the model complexity. It derives a reasoning rule from each RPM sample, which is not feasible for existing methods. As a result, the proposed reasoning module is capable of yielding a set of reasoning rules modeling human in solving the RPM problems. To validate the proposed method on real-world applications, an RPM-like Video Prediction (RVP) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames. Experimental results on various RPM-like datasets demonstrate that the proposed TRIVR achieves a significant and consistent performance gain compared with state-of-the-art models.

KW - Raven's progressive matrices

KW - Video prediction

KW - Visual reasoning

UR - http://www.scopus.com/inward/record.url?scp=85209374373&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2024.111151

DO - 10.1016/j.patcog.2024.111151

M3 - Article

AN - SCOPUS:85209374373

SN - 0031-3203

VL - 160

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 111151

ER -

Two-stage Rule-induction visual reasoning on RPMs with an application to video prediction

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this