Deep RGB-D Saliency Detection Without Depth

Yuan Fang Zhang; Jiangbin Zheng; Wenjing Jia; Wenfeng Huang; Long Li; Nian Liu; Fei Li; Xiangjian He

doi:10.1109/TMM.2021.3058788

Deep RGB-D Saliency Detection Without Depth

Yuan Fang Zhang, Jiangbin Zheng, Wenjing Jia, Wenfeng Huang, Long Li, Nian Liu, Fei Li, Xiangjian He

Research output: Journal Publication › Article › peer-review

13 Citations (Scopus)

Abstract

The existing saliency detection models based on RGB colors only leverage appearance cues to detect salient objects. Depth information also plays a very important role in visual saliency detection and can supply complementary cues for saliency detection. Although many RGB-D saliency models have been proposed, they require to acquire depth data, which is expensive and not easy to get. In this paper, we propose to estimate depth information from monocular RGB images and leverage the intermediate depth features to enhance the saliency detection performance in a deep neural network framework. Specifically, we first use an encoder network to extract common features from each RGB image and then build two decoder networks for depth estimation and saliency detection, respectively. The depth decoder features can be fused with the RGB saliency features to enhance their capability. Furthermore, we also propose a novel dense multiscale fusion model to densely fuse multiscale depth and RGB features based on the dense ASPP model. A new global context branch is also added to boost the multiscale features. Experimental results demonstrate that the added depth cues and the proposed fusion model can both improve the saliency detection performance. Finally, our model not only outperforms state-of-the-art RGB saliency models, but also achieves comparable results compared with state-of-the-art RGB-D saliency models.

Original language	English
Pages (from-to)	755-767
Number of pages	13
Journal	IEEE Transactions on Multimedia
Volume	24
DOIs	https://doi.org/10.1109/TMM.2021.3058788
Publication status	Published - 2022
Externally published	Yes

Keywords

Convolutional neural network
depth estimation
feature fusion
saliency detection

ASJC Scopus subject areas

Signal Processing
Media Technology
Computer Science Applications
Electrical and Electronic Engineering

Access to Document

10.1109/TMM.2021.3058788

Cite this

@article{680a205b0f4b4fd8b71a919791f70f63,

title = "Deep RGB-D Saliency Detection Without Depth",

abstract = "The existing saliency detection models based on RGB colors only leverage appearance cues to detect salient objects. Depth information also plays a very important role in visual saliency detection and can supply complementary cues for saliency detection. Although many RGB-D saliency models have been proposed, they require to acquire depth data, which is expensive and not easy to get. In this paper, we propose to estimate depth information from monocular RGB images and leverage the intermediate depth features to enhance the saliency detection performance in a deep neural network framework. Specifically, we first use an encoder network to extract common features from each RGB image and then build two decoder networks for depth estimation and saliency detection, respectively. The depth decoder features can be fused with the RGB saliency features to enhance their capability. Furthermore, we also propose a novel dense multiscale fusion model to densely fuse multiscale depth and RGB features based on the dense ASPP model. A new global context branch is also added to boost the multiscale features. Experimental results demonstrate that the added depth cues and the proposed fusion model can both improve the saliency detection performance. Finally, our model not only outperforms state-of-the-art RGB saliency models, but also achieves comparable results compared with state-of-the-art RGB-D saliency models.",

keywords = "Convolutional neural network, depth estimation, feature fusion, saliency detection",

author = "Zhang, \{Yuan Fang\} and Jiangbin Zheng and Wenjing Jia and Wenfeng Huang and Long Li and Nian Liu and Fei Li and Xiangjian He",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2022",

doi = "10.1109/TMM.2021.3058788",

language = "English",

volume = "24",

pages = "755--767",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Deep RGB-D Saliency Detection Without Depth

AU - Zhang, Yuan Fang

AU - Zheng, Jiangbin

AU - Jia, Wenjing

AU - Huang, Wenfeng

AU - Li, Long

AU - Liu, Nian

AU - Li, Fei

AU - He, Xiangjian

PY - 2022

Y1 - 2022

N2 - The existing saliency detection models based on RGB colors only leverage appearance cues to detect salient objects. Depth information also plays a very important role in visual saliency detection and can supply complementary cues for saliency detection. Although many RGB-D saliency models have been proposed, they require to acquire depth data, which is expensive and not easy to get. In this paper, we propose to estimate depth information from monocular RGB images and leverage the intermediate depth features to enhance the saliency detection performance in a deep neural network framework. Specifically, we first use an encoder network to extract common features from each RGB image and then build two decoder networks for depth estimation and saliency detection, respectively. The depth decoder features can be fused with the RGB saliency features to enhance their capability. Furthermore, we also propose a novel dense multiscale fusion model to densely fuse multiscale depth and RGB features based on the dense ASPP model. A new global context branch is also added to boost the multiscale features. Experimental results demonstrate that the added depth cues and the proposed fusion model can both improve the saliency detection performance. Finally, our model not only outperforms state-of-the-art RGB saliency models, but also achieves comparable results compared with state-of-the-art RGB-D saliency models.

AB - The existing saliency detection models based on RGB colors only leverage appearance cues to detect salient objects. Depth information also plays a very important role in visual saliency detection and can supply complementary cues for saliency detection. Although many RGB-D saliency models have been proposed, they require to acquire depth data, which is expensive and not easy to get. In this paper, we propose to estimate depth information from monocular RGB images and leverage the intermediate depth features to enhance the saliency detection performance in a deep neural network framework. Specifically, we first use an encoder network to extract common features from each RGB image and then build two decoder networks for depth estimation and saliency detection, respectively. The depth decoder features can be fused with the RGB saliency features to enhance their capability. Furthermore, we also propose a novel dense multiscale fusion model to densely fuse multiscale depth and RGB features based on the dense ASPP model. A new global context branch is also added to boost the multiscale features. Experimental results demonstrate that the added depth cues and the proposed fusion model can both improve the saliency detection performance. Finally, our model not only outperforms state-of-the-art RGB saliency models, but also achieves comparable results compared with state-of-the-art RGB-D saliency models.

KW - Convolutional neural network

KW - depth estimation

KW - feature fusion

KW - saliency detection

UR - http://www.scopus.com/inward/record.url?scp=85101745317&partnerID=8YFLogxK

U2 - 10.1109/TMM.2021.3058788

DO - 10.1109/TMM.2021.3058788

M3 - Article

AN - SCOPUS:85101745317

SN - 1520-9210

VL - 24

SP - 755

EP - 767

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Deep RGB-D Saliency Detection Without Depth

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this