Learning Visual Prior via Generative Pre-Training

Jinheng Xie; Kai Ye; Yudong Li; Yuexiang Li; Kevin Qinghong Lin; Yefeng Zheng; Linlin Shen; Mike Zheng Shou

Learning Visual Prior via Generative Pre-Training

Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

4 Citations (Scopus)

Abstract

Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.

Original language	English
Title of host publication	Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Editors	A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
Publisher	Neural information processing systems foundation
ISBN (Electronic)	9781713899921
Publication status	Published - 2023
Externally published	Yes
Event	37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States Duration: 10 Dec 2023 → 16 Dec 2023

Publication series

Name	Advances in Neural Information Processing Systems
Volume	36
ISSN (Print)	1049-5258

Conference

Conference	37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/Territory	United States
City	New Orleans
Period	10/12/23 → 16/12/23

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems
Signal Processing

Cite this

Xie, J., Ye, K., Li, Y., Li, Y., Lin, K. Q., Zheng, Y., Shen, L., & Shou, M. Z. (2023). Learning Visual Prior via Generative Pre-Training. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023 (Advances in Neural Information Processing Systems; Vol. 36). Neural information processing systems foundation.

Xie, Jinheng ; Ye, Kai ; Li, Yudong et al. / Learning Visual Prior via Generative Pre-Training. Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023. editor / A. Oh ; T. Neumann ; A. Globerson ; K. Saenko ; M. Hardt ; S. Levine. Neural information processing systems foundation, 2023. (Advances in Neural Information Processing Systems).

@inproceedings{e8e46158f29b4c34a2750e0afd2d51b3,

title = "Learning Visual Prior via Generative Pre-Training",

abstract = "Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.",

author = "Jinheng Xie and Kai Ye and Yudong Li and Yuexiang Li and Lin, {Kevin Qinghong} and Yefeng Zheng and Linlin Shen and Shou, {Mike Zheng}",

note = "Publisher Copyright: {\textcopyright} 2023 Neural information processing systems foundation. All rights reserved.; 37th Conference on Neural Information Processing Systems, NeurIPS 2023 ; Conference date: 10-12-2023 Through 16-12-2023",

year = "2023",

language = "English",

series = "Advances in Neural Information Processing Systems",

publisher = "Neural information processing systems foundation",

editor = "A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine",

booktitle = "Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023",

address = "United States",

}

Xie, J, Ye, K, Li, Y, Li, Y, Lin, KQ, Zheng, Y, Shen, L & Shou, MZ 2023, Learning Visual Prior via Generative Pre-Training. in A Oh, T Neumann, A Globerson, K Saenko, M Hardt & S Levine (eds), Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023. Advances in Neural Information Processing Systems, vol. 36, Neural information processing systems foundation, 37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, United States, 10/12/23.

Learning Visual Prior via Generative Pre-Training. / Xie, Jinheng; Ye, Kai; Li, Yudong et al.
Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023. ed. / A. Oh; T. Neumann; A. Globerson; K. Saenko; M. Hardt; S. Levine. Neural information processing systems foundation, 2023. (Advances in Neural Information Processing Systems; Vol. 36).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Learning Visual Prior via Generative Pre-Training

AU - Xie, Jinheng

AU - Ye, Kai

AU - Li, Yudong

AU - Li, Yuexiang

AU - Lin, Kevin Qinghong

AU - Zheng, Yefeng

AU - Shen, Linlin

AU - Shou, Mike Zheng

PY - 2023

Y1 - 2023

N2 - Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.

AB - Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.

UR - http://www.scopus.com/inward/record.url?scp=85180177516&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85180177516

T3 - Advances in Neural Information Processing Systems

BT - Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023

A2 - Oh, A.

A2 - Neumann, T.

A2 - Globerson, A.

A2 - Saenko, K.

A2 - Hardt, M.

A2 - Levine, S.

PB - Neural information processing systems foundation

T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023

Y2 - 10 December 2023 through 16 December 2023

ER -

Xie J, Ye K, Li Y, Li Y, Lin KQ, Zheng Y et al. Learning Visual Prior via Generative Pre-Training. In Oh A, Neumann T, Globerson A, Saenko K, Hardt M, Levine S, editors, Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023. Neural information processing systems foundation. 2023. (Advances in Neural Information Processing Systems).

Learning Visual Prior via Generative Pre-Training

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this