We study two video surveillance problems in this thesis including people counting and person re-identification.
To address the problem of people counting, we first propose a method called Random Projection Forest to utilise rich hand-crafted features.
To achieve computational efficiency and scalability, we use random forest as the regression model whose tree structure is intrinsically fast and scalable. Unlike traditional approaches to random forest construction, we embed random projection in the tree nodes to simultaneously combat the curse of dimensionality and to introduce randomness in the tree construction thus making our new method very efficient and effective.
We have also developed a deep learning model for people counting. We propose a multi-task deep learning model to simultaneously predict people number and the level of crowd density, which makes our method invariant to the image scale. To deal with problem of insufficient size of training dataset, we propose an "ambiguous labelling" strategy to create various labels for the training images. In a series of experiment, we show that creating ``ambiguous label" is a simple but effective method to improve not only the deep learning model but also the Random Projection Forest model based on hand-crafted features.
For the problem of person re-identification, we have developed a novel deep learning framework called Deep Augmented Attribute Network (DAAN) to learn augmented attribute features for person re-identification. We first manually label two large datasets with pre-defined mid-level semantic attributes. We then construct a deep neural network with two output branches. The first branch predicts the attributes of the input image, while the second branch generates complement features that are fused with the output of the first branch to form the augmented attributes of the input image. We optimize the attribute branch with multiple-label classification loss and apply a ’Siamese’ network structure to ensure that the augmented attributes of images from the same person are close to each other whilst those from different persons are far apart. The final learned augmented attribute features are then used for person re-identification based on Euclidean distance. As manually labelling images is a time-consuming process, we have also extended our method to datasets with only person ID information but without attribute labels. We have conducted comprehensive experiments and results show that our method outperforms state-of-the-art methods.
As labelling identity and attribute for person image is time consuming, we thus propose an unsupervised method to solve person re-identification and apply it to a more challenging problem called partial person re-identification. We first use an established image segmentation method to generate superpixels to construct an Attributed Region Adjacency Graph (ARAG) in which nodes corresponding with superpixels and edges representing correlations between superpixels. We then apply region-based Normalized Cut to the graph to merge similar neighbouring superpixels in order to form natural image regions corresponding to various body parts and backgrounds. To extract feature from segmented patches, we apply a Denoising Autoencoder to learn discriminative representation of image patches in each node of the graph. Finally, the similarity of an image pair is measured by the Earth Mover's Distance (EMD) between the robust image signatures of the nodes in the corresponding ARAGs.
|Date of Award||5 Jul 2018|
- Univerisity of Nottingham
|Supervisor||Guoping Qiu (Supervisor) & Jon Garibaldi (Supervisor)|
- Person Re-identification
- People counting
- Deep learning
- Random forest