Novelty Detection and Online Learning for Chunk Data Streams

Yi Wang; Yi Ding; Xiangjian He; Xin Fan; Chi Lin; Fengqi Li; Tianzhu Wang; Zhongxuan Luo; Jiebo Luo

doi:10.1109/TPAMI.2020.2965531

Novelty Detection and Online Learning for Chunk Data Streams

Yi Wang, Yi Ding, Xiangjian He, Xin Fan, Chi Lin, Fengqi Li, Tianzhu Wang, Zhongxuan Luo, Jiebo Luo

Research output: Journal Publication › Article › peer-review

18 Citations (Scopus)

Abstract

Datastream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while incrementally updating the model efficiently and stably, especially for high-dimensional and/or large-scale data streams. This paper proposes an efficient framework for novelty detection and incremental learning for unlabeled chunk data streams. First, an accurate factorization-free kernel discriminative analysis (FKDA-X) is put forward through solving a linear system in the kernel space. FKDA-X produces a Reproducing Kernel Hilbert Space (RKHS), in which unlabeled chunk data can be detected and classified by multiple known-classes in a single decision model with a deterministic classification boundary. Moreover, based on FKDA-X, two optimal methods FKDA-CX and FKDA-C are proposed. FKDA-CX uses the micro-cluster centers of original data as the input to achieve excellent performance in novelty detection. FKDA-C and incremental FKDA-C (IFKDA-C) using the class centers of original data as their input have extremely fast speed in online learning. Theoretical analysis and experimental validation on under-sampled and large-scale real-world datasets demonstrate that the proposed algorithms make it possible to learn unlabeled chunk data streams with significantly lower computational costs and comparable accuracies than the state-of-The-Art approaches.

Original language	English
Article number	8955936
Pages (from-to)	2400-2412
Number of pages	13
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	43
Issue number	7
DOIs	https://doi.org/10.1109/TPAMI.2020.2965531
Publication status	Published - 1 Jul 2021
Externally published	Yes

Keywords

Data stream
feature selection
novelty detection
online learning

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition
Computational Theory and Mathematics
Artificial Intelligence
Applied Mathematics

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/TPAMI.2020.2965531

Cite this

@article{4920096d11d44465ac91a41151fc46f9,

title = "Novelty Detection and Online Learning for Chunk Data Streams",

abstract = "Datastream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while incrementally updating the model efficiently and stably, especially for high-dimensional and/or large-scale data streams. This paper proposes an efficient framework for novelty detection and incremental learning for unlabeled chunk data streams. First, an accurate factorization-free kernel discriminative analysis (FKDA-X) is put forward through solving a linear system in the kernel space. FKDA-X produces a Reproducing Kernel Hilbert Space (RKHS), in which unlabeled chunk data can be detected and classified by multiple known-classes in a single decision model with a deterministic classification boundary. Moreover, based on FKDA-X, two optimal methods FKDA-CX and FKDA-C are proposed. FKDA-CX uses the micro-cluster centers of original data as the input to achieve excellent performance in novelty detection. FKDA-C and incremental FKDA-C (IFKDA-C) using the class centers of original data as their input have extremely fast speed in online learning. Theoretical analysis and experimental validation on under-sampled and large-scale real-world datasets demonstrate that the proposed algorithms make it possible to learn unlabeled chunk data streams with significantly lower computational costs and comparable accuracies than the state-of-The-Art approaches.",

keywords = "Data stream, feature selection, novelty detection, online learning",

author = "Yi Wang and Yi Ding and Xiangjian He and Xin Fan and Chi Lin and Fengqi Li and Tianzhu Wang and Zhongxuan Luo and Jiebo Luo",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2021",

month = jul,

day = "1",

doi = "10.1109/TPAMI.2020.2965531",

language = "English",

volume = "43",

pages = "2400--2412",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "7",

}

TY - JOUR

T1 - Novelty Detection and Online Learning for Chunk Data Streams

AU - Wang, Yi

AU - Ding, Yi

AU - He, Xiangjian

AU - Fan, Xin

AU - Lin, Chi

AU - Li, Fengqi

AU - Wang, Tianzhu

AU - Luo, Zhongxuan

AU - Luo, Jiebo

PY - 2021/7/1

Y1 - 2021/7/1

N2 - Datastream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while incrementally updating the model efficiently and stably, especially for high-dimensional and/or large-scale data streams. This paper proposes an efficient framework for novelty detection and incremental learning for unlabeled chunk data streams. First, an accurate factorization-free kernel discriminative analysis (FKDA-X) is put forward through solving a linear system in the kernel space. FKDA-X produces a Reproducing Kernel Hilbert Space (RKHS), in which unlabeled chunk data can be detected and classified by multiple known-classes in a single decision model with a deterministic classification boundary. Moreover, based on FKDA-X, two optimal methods FKDA-CX and FKDA-C are proposed. FKDA-CX uses the micro-cluster centers of original data as the input to achieve excellent performance in novelty detection. FKDA-C and incremental FKDA-C (IFKDA-C) using the class centers of original data as their input have extremely fast speed in online learning. Theoretical analysis and experimental validation on under-sampled and large-scale real-world datasets demonstrate that the proposed algorithms make it possible to learn unlabeled chunk data streams with significantly lower computational costs and comparable accuracies than the state-of-The-Art approaches.

AB - Datastream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while incrementally updating the model efficiently and stably, especially for high-dimensional and/or large-scale data streams. This paper proposes an efficient framework for novelty detection and incremental learning for unlabeled chunk data streams. First, an accurate factorization-free kernel discriminative analysis (FKDA-X) is put forward through solving a linear system in the kernel space. FKDA-X produces a Reproducing Kernel Hilbert Space (RKHS), in which unlabeled chunk data can be detected and classified by multiple known-classes in a single decision model with a deterministic classification boundary. Moreover, based on FKDA-X, two optimal methods FKDA-CX and FKDA-C are proposed. FKDA-CX uses the micro-cluster centers of original data as the input to achieve excellent performance in novelty detection. FKDA-C and incremental FKDA-C (IFKDA-C) using the class centers of original data as their input have extremely fast speed in online learning. Theoretical analysis and experimental validation on under-sampled and large-scale real-world datasets demonstrate that the proposed algorithms make it possible to learn unlabeled chunk data streams with significantly lower computational costs and comparable accuracies than the state-of-The-Art approaches.

KW - Data stream

KW - feature selection

KW - novelty detection

KW - online learning

UR - http://www.scopus.com/inward/record.url?scp=85108023969&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2020.2965531

DO - 10.1109/TPAMI.2020.2965531

M3 - Article

C2 - 31940520

AN - SCOPUS:85108023969

SN - 0162-8828

VL - 43

SP - 2400

EP - 2412

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 7

M1 - 8955936

ER -

Novelty Detection and Online Learning for Chunk Data Streams

Abstract

Keywords

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this