Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

Jason Kuen; Kian Ming Lim; Chin Poo Lee

doi:10.1016/j.patcog.2015.02.012

Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

Jason Kuen, Kian Ming Lim, Chin Poo Lee

Research output: Journal Publication › Article › peer-review

40 Citations (Scopus)

Abstract

Visual representation is crucial for visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant representations from tracked sequential image patches, via strong temporal slowness constraint and stacked convolutional autoencoders. The deep slow local representations are learned offline on unlabeled data and transferred to the observational model of our proposed tracker. The proposed observational model retains old training samples to alleviate drift, and collect negative samples which are coherent with target's motion pattern for better discriminative tracking. With the learned representation and online training samples, a logistic regression classifier is adopted to distinguish target from background, and retrained online to adapt to appearance changes. Subsequently, the observational model is integrated into a particle filter framework to perform visual tracking. Experimental results on various challenging benchmark sequences demonstrate that the proposed tracker performs favorably against several state-of-the-art trackers.

Original language	English
Pages (from-to)	2964-2982
Number of pages	19
Journal	Pattern Recognition
Volume	48
Issue number	10
DOIs	https://doi.org/10.1016/j.patcog.2015.02.012
Publication status	Published - 1 Oct 2015
Externally published	Yes

Keywords

Deep learning
Invariant representation
Self-taught learning
Temporal slowness
Visual tracking

ASJC Scopus subject areas

Software
Signal Processing
Computer Vision and Pattern Recognition
Artificial Intelligence

Access to Document

10.1016/j.patcog.2015.02.012

Cite this

@article{3a999c3a575649dfb52f787d65fa4ce1,

title = "Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle",

abstract = "Visual representation is crucial for visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant representations from tracked sequential image patches, via strong temporal slowness constraint and stacked convolutional autoencoders. The deep slow local representations are learned offline on unlabeled data and transferred to the observational model of our proposed tracker. The proposed observational model retains old training samples to alleviate drift, and collect negative samples which are coherent with target's motion pattern for better discriminative tracking. With the learned representation and online training samples, a logistic regression classifier is adopted to distinguish target from background, and retrained online to adapt to appearance changes. Subsequently, the observational model is integrated into a particle filter framework to perform visual tracking. Experimental results on various challenging benchmark sequences demonstrate that the proposed tracker performs favorably against several state-of-the-art trackers.",

keywords = "Deep learning, Invariant representation, Self-taught learning, Temporal slowness, Visual tracking",

author = "Jason Kuen and Lim, {Kian Ming} and Lee, {Chin Poo}",

year = "2015",

month = oct,

day = "1",

doi = "10.1016/j.patcog.2015.02.012",

language = "English",

volume = "48",

pages = "2964--2982",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

number = "10",

}

TY - JOUR

T1 - Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

AU - Kuen, Jason

AU - Lim, Kian Ming

AU - Lee, Chin Poo

PY - 2015/10/1

Y1 - 2015/10/1

N2 - Visual representation is crucial for visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant representations from tracked sequential image patches, via strong temporal slowness constraint and stacked convolutional autoencoders. The deep slow local representations are learned offline on unlabeled data and transferred to the observational model of our proposed tracker. The proposed observational model retains old training samples to alleviate drift, and collect negative samples which are coherent with target's motion pattern for better discriminative tracking. With the learned representation and online training samples, a logistic regression classifier is adopted to distinguish target from background, and retrained online to adapt to appearance changes. Subsequently, the observational model is integrated into a particle filter framework to perform visual tracking. Experimental results on various challenging benchmark sequences demonstrate that the proposed tracker performs favorably against several state-of-the-art trackers.

AB - Visual representation is crucial for visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant representations from tracked sequential image patches, via strong temporal slowness constraint and stacked convolutional autoencoders. The deep slow local representations are learned offline on unlabeled data and transferred to the observational model of our proposed tracker. The proposed observational model retains old training samples to alleviate drift, and collect negative samples which are coherent with target's motion pattern for better discriminative tracking. With the learned representation and online training samples, a logistic regression classifier is adopted to distinguish target from background, and retrained online to adapt to appearance changes. Subsequently, the observational model is integrated into a particle filter framework to perform visual tracking. Experimental results on various challenging benchmark sequences demonstrate that the proposed tracker performs favorably against several state-of-the-art trackers.

KW - Deep learning

KW - Invariant representation

KW - Self-taught learning

KW - Temporal slowness

KW - Visual tracking

UR - http://www.scopus.com/inward/record.url?scp=84931572897&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2015.02.012

DO - 10.1016/j.patcog.2015.02.012

M3 - Article

AN - SCOPUS:84931572897

SN - 0031-3203

VL - 48

SP - 2964

EP - 2982

JO - Pattern Recognition

JF - Pattern Recognition

IS - 10

ER -

Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this