Detecting and segmenting Spatio-temporal foreground objects from videos are significant to motion pattern modelling and video content analysis. Extensive efforts have been made in the past decades. Nevertheless, video-based saliency detection and foreground segmentation remained challenging. On the one hand, the performances of image-based saliency detection algorithms are limited in complex contents, while the temporal connectivity between frames are not well-resolved. On the other hand, compared with the prosperous image-based datasets, the datasets in video-level saliency detection and segmentation usually have smaller scale and less diversity of contents. Towards a better understanding of video-level semantics, this thesis investigates the foreground estimation and segmentation in both image-level and video-level.
This thesis firstly demonstrates the effectiveness of traditional features in video foreground estimation and segmentation. Motion patterns obtained by optical flow are utilised to draw coarse estimations about the foreground objects. The coarse estimations are refined by aligning motion boundaries with actual contours of the foreground objects with the participation of HOG descriptor. And a precise segmentation of the foreground is computed based on the refined foreground estimations and video-level colour distribution.
Second, a deep convolutional neural network (CNN) for image saliency detection is proposed, which is named HReSNet. To improve the accuracy of saliency prediction, an independent feature refining network is implemented. A Euclidean distance loss is integrated into loss computation to enhance the saliency predictions near the contours of objects. The experimental results demonstrate that our network obtains competitive results compared with the state-of-art algorithms.
Third, a large-scale dataset for video saliency detection and foreground segmentation is built to enrich the diversity of current video-based foreground segmentation datasets. A supervised framework is also proposed as the baseline, which integrates our HReSNet, Long-Short Term Memory (LSTM) networks and a hierarchical segmentation network.
Forth, in the practice of change detection, there requires distinguishing the expected changes with semantics from the unexpected changes. Therefore, a new CNN design is proposed to detect changes in multi-temporal high-resolution urban images. Experimental results showed our change detection network outperformed the competing algorithms with significant advantages.
|Date of Award||16 Nov 2019|
- Univerisity of Nottingham
|Supervisor||Guoping Qiu (Supervisor), Xu Sun (Supervisor) & Michel Valstar (Supervisor)|
- Foreground segmentation
- foreground estimation
- image saliency detection