Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

Jason Kuen, Kian Ming Lim, Chin Poo Lee

Research output: Journal PublicationArticlepeer-review

38 Citations (Scopus)

Abstract

Visual representation is crucial for visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant representations from tracked sequential image patches, via strong temporal slowness constraint and stacked convolutional autoencoders. The deep slow local representations are learned offline on unlabeled data and transferred to the observational model of our proposed tracker. The proposed observational model retains old training samples to alleviate drift, and collect negative samples which are coherent with target's motion pattern for better discriminative tracking. With the learned representation and online training samples, a logistic regression classifier is adopted to distinguish target from background, and retrained online to adapt to appearance changes. Subsequently, the observational model is integrated into a particle filter framework to perform visual tracking. Experimental results on various challenging benchmark sequences demonstrate that the proposed tracker performs favorably against several state-of-the-art trackers.

Original languageEnglish
Pages (from-to)2964-2982
Number of pages19
JournalPattern Recognition
Volume48
Issue number10
DOIs
Publication statusPublished - 1 Oct 2015
Externally publishedYes

Keywords

  • Deep learning
  • Invariant representation
  • Self-taught learning
  • Temporal slowness
  • Visual tracking

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle'. Together they form a unique fingerprint.

Cite this