TY - GEN
T1 - Flaws can be Applause: Unleashing Potential of Segmenting Ambiguous Objects in SAM
AU - Li, Chenxin
AU - Yuzhihuang, null
AU - Li, Wuyang
AU - Liu, Hengyu
AU - Liu, Xinyu
AU - Xu, Qing
AU - Chen, Zhen
AU - Huang, Yue
AU - Yuan, Yixuan
PY - 2024
Y1 - 2024
N2 - As the vision foundation models like the Segment Anything Model (SAM) demonstrate potent universality, they also present challenges in giving ambiguous and uncertain predictions. Significant variations in the model output and granularity can occur with simply subtle changes in the prompt, contradicting the consensus requirement for the robustness of a model. While some established works have been dedicated to stabilizing and fortifying the prediction of SAM, this paper takes a unique path to explore how this flaw can be inverted into an advantage when modeling inherently ambiguous data distributions. We introduce an optimization framework based on a conditional variational autoencoder, which jointly models the prompt and the granularity of the object with a latent probability distribution. This approach enables the model to adaptively perceive and represent the real ambiguous label distribution, taming SAM to produce a series of diverse, convincing, and reasonable segmentation outputs controllably. Extensive experiments on several practical deployment scenarios involving ambiguity demonstrates the exceptional performance of our framework.
AB - As the vision foundation models like the Segment Anything Model (SAM) demonstrate potent universality, they also present challenges in giving ambiguous and uncertain predictions. Significant variations in the model output and granularity can occur with simply subtle changes in the prompt, contradicting the consensus requirement for the robustness of a model. While some established works have been dedicated to stabilizing and fortifying the prediction of SAM, this paper takes a unique path to explore how this flaw can be inverted into an advantage when modeling inherently ambiguous data distributions. We introduce an optimization framework based on a conditional variational autoencoder, which jointly models the prompt and the granularity of the object with a latent probability distribution. This approach enables the model to adaptively perceive and represent the real ambiguous label distribution, taming SAM to produce a series of diverse, convincing, and reasonable segmentation outputs controllably. Extensive experiments on several practical deployment scenarios involving ambiguity demonstrates the exceptional performance of our framework.
KW - Foundation Model
KW - Ambiguous Segmentation
KW - Uncertainty
UR - https://openreview.net/forum?id=vJSNsSFO95
M3 - Conference contribution
BT - The Thirty-eighth Annual Conference on Neural Information Processing Systems
ER -