TY - GEN
T1 - DARR
T2 - 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
AU - Li, Chengtai
AU - Tan, Yee Yang
AU - He, Yuting
AU - Ren, Jianfeng
AU - Bai, Ruibin
AU - Zhao, Yitian
AU - Yu, Heng
AU - Jiang, Xudong
N1 - Publisher Copyright:
Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2025/4/11
Y1 - 2025/4/11
N2 - Abstract visual reasoning (AVR) is a critical ability of humans, and it has been widely studied, but arithmetic visual reasoning, a unique task in AVR to reason over number sense, is less studied in the literature. To facilitate this research, we construct a Machine Number Reasoning (MNR) dataset to assess the model's ability in arithmetic visual reasoning over number sense and spatial layouts. To solve the MNR tasks, we propose a Dual-branch Arithmetic Regression Reasoning (DARR) framework, which includes an Intra-Image Arithmetic Regression Reasoning (IIARR) module and a Cross-Image Arithmetic Regression Reasoning (CIARR) module. The IIARR includes a set of Intra-Image Regression Blocks to identify the correct number orders and the underlying arithmetic rules within individual images, and an Order Gate to determine the correct number order. The CIARR establishes the arithmetic relations across different images through a '3-to-1' regressor and a set of '2-to-1' regressors, with a Selection Gate to select the most suitable '2-to-1' regressor and a gated fusion to combine the two kinds of regressors. Experiments on the MNR dataset show that the DARR outperforms state-of-the-art models for arithmetic visual reasoning.
AB - Abstract visual reasoning (AVR) is a critical ability of humans, and it has been widely studied, but arithmetic visual reasoning, a unique task in AVR to reason over number sense, is less studied in the literature. To facilitate this research, we construct a Machine Number Reasoning (MNR) dataset to assess the model's ability in arithmetic visual reasoning over number sense and spatial layouts. To solve the MNR tasks, we propose a Dual-branch Arithmetic Regression Reasoning (DARR) framework, which includes an Intra-Image Arithmetic Regression Reasoning (IIARR) module and a Cross-Image Arithmetic Regression Reasoning (CIARR) module. The IIARR includes a set of Intra-Image Regression Blocks to identify the correct number orders and the underlying arithmetic rules within individual images, and an Order Gate to determine the correct number order. The CIARR establishes the arithmetic relations across different images through a '3-to-1' regressor and a set of '2-to-1' regressors, with a Selection Gate to select the most suitable '2-to-1' regressor and a gated fusion to combine the two kinds of regressors. Experiments on the MNR dataset show that the DARR outperforms state-of-the-art models for arithmetic visual reasoning.
UR - http://www.scopus.com/inward/record.url?scp=105003993551&partnerID=8YFLogxK
U2 - 10.1609/aaai.v39i2.32127
DO - 10.1609/aaai.v39i2.32127
M3 - Conference contribution
AN - SCOPUS:105003993551
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 1373
EP - 1382
BT - Special Track on AI Alignment
A2 - Walsh, Toby
A2 - Shah, Julie
A2 - Kolter, Zico
PB - Association for the Advancement of Artificial Intelligence
Y2 - 25 February 2025 through 4 March 2025
ER -