Flaws can be Applause: Unleashing Potential of Segmenting Ambiguous Objects in SAM

Chenxin Li; null Yuzhihuang; Wuyang Li; Hengyu Liu; Xinyu Liu; Qing Xu; Zhen Chen; Yue Huang; Yixuan Yuan

Flaws can be Applause: Unleashing Potential of Segmenting Ambiguous Objects in SAM

Chenxin Li, Yuzhihuang, Wuyang Li, Hengyu Liu, Xinyu Liu, Qing Xu, Zhen Chen, Yue Huang, Yixuan Yuan

School of Computer Science

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

Abstract

As the vision foundation models like the Segment Anything Model (SAM) demonstrate potent universality, they also present challenges in giving ambiguous and uncertain predictions. Significant variations in the model output and granularity can occur with simply subtle changes in the prompt, contradicting the consensus requirement for the robustness of a model. While some established works have been dedicated to stabilizing and fortifying the prediction of SAM, this paper takes a unique path to explore how this flaw can be inverted into an advantage when modeling inherently ambiguous data distributions. We introduce an optimization framework based on a conditional variational autoencoder, which jointly models the prompt and the granularity of the object with a latent probability distribution. This approach enables the model to adaptively perceive and represent the real ambiguous label distribution, taming SAM to produce a series of diverse, convincing, and reasonable segmentation outputs controllably. Extensive experiments on several practical deployment scenarios involving ambiguity demonstrates the exceptional performance of our framework.

Original language	English
Title of host publication	The Thirty-eighth Annual Conference on Neural Information Processing Systems
Publication status	Published - 2024

Keywords

Foundation Model
Ambiguous Segmentation
Uncertainty

Cite this

@inproceedings{48f06bdb64a0429e83998e61ff311ef1,

title = "Flaws can be Applause: Unleashing Potential of Segmenting Ambiguous Objects in SAM",

abstract = "As the vision foundation models like the Segment Anything Model (SAM) demonstrate potent universality, they also present challenges in giving ambiguous and uncertain predictions. Significant variations in the model output and granularity can occur with simply subtle changes in the prompt, contradicting the consensus requirement for the robustness of a model. While some established works have been dedicated to stabilizing and fortifying the prediction of SAM, this paper takes a unique path to explore how this flaw can be inverted into an advantage when modeling inherently ambiguous data distributions. We introduce an optimization framework based on a conditional variational autoencoder, which jointly models the prompt and the granularity of the object with a latent probability distribution. This approach enables the model to adaptively perceive and represent the real ambiguous label distribution, taming SAM to produce a series of diverse, convincing, and reasonable segmentation outputs controllably. Extensive experiments on several practical deployment scenarios involving ambiguity demonstrates the exceptional performance of our framework.",

keywords = "Foundation Model, Ambiguous Segmentation, Uncertainty",

author = "Chenxin Li and Yuzhihuang and Wuyang Li and Hengyu Liu and Xinyu Liu and Qing Xu and Zhen Chen and Yue Huang and Yixuan Yuan",

year = "2024",

language = "English",

booktitle = "The Thirty-eighth Annual Conference on Neural Information Processing Systems",

}

TY - GEN

T1 - Flaws can be Applause: Unleashing Potential of Segmenting Ambiguous Objects in SAM

AU - Li, Chenxin

AU - Yuzhihuang, null

AU - Li, Wuyang

AU - Liu, Hengyu

AU - Liu, Xinyu

AU - Xu, Qing

AU - Chen, Zhen

AU - Huang, Yue

AU - Yuan, Yixuan

PY - 2024

Y1 - 2024

N2 - As the vision foundation models like the Segment Anything Model (SAM) demonstrate potent universality, they also present challenges in giving ambiguous and uncertain predictions. Significant variations in the model output and granularity can occur with simply subtle changes in the prompt, contradicting the consensus requirement for the robustness of a model. While some established works have been dedicated to stabilizing and fortifying the prediction of SAM, this paper takes a unique path to explore how this flaw can be inverted into an advantage when modeling inherently ambiguous data distributions. We introduce an optimization framework based on a conditional variational autoencoder, which jointly models the prompt and the granularity of the object with a latent probability distribution. This approach enables the model to adaptively perceive and represent the real ambiguous label distribution, taming SAM to produce a series of diverse, convincing, and reasonable segmentation outputs controllably. Extensive experiments on several practical deployment scenarios involving ambiguity demonstrates the exceptional performance of our framework.

AB - As the vision foundation models like the Segment Anything Model (SAM) demonstrate potent universality, they also present challenges in giving ambiguous and uncertain predictions. Significant variations in the model output and granularity can occur with simply subtle changes in the prompt, contradicting the consensus requirement for the robustness of a model. While some established works have been dedicated to stabilizing and fortifying the prediction of SAM, this paper takes a unique path to explore how this flaw can be inverted into an advantage when modeling inherently ambiguous data distributions. We introduce an optimization framework based on a conditional variational autoencoder, which jointly models the prompt and the granularity of the object with a latent probability distribution. This approach enables the model to adaptively perceive and represent the real ambiguous label distribution, taming SAM to produce a series of diverse, convincing, and reasonable segmentation outputs controllably. Extensive experiments on several practical deployment scenarios involving ambiguity demonstrates the exceptional performance of our framework.

KW - Foundation Model

KW - Ambiguous Segmentation

KW - Uncertainty

UR - https://openreview.net/forum?id=vJSNsSFO95

M3 - Conference contribution

BT - The Thirty-eighth Annual Conference on Neural Information Processing Systems

ER -

Flaws can be Applause: Unleashing Potential of Segmenting Ambiguous Objects in SAM

Abstract

Keywords

Other files and links

Fingerprint

Cite this