An improved system for sentence-level novelty detection in textual streams

Xinyu Fu, Eugene Ch'Ng, Uwe Aickelin, Lanyun Zhang

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space model. We present a novel event detection system based on the Incremental Term Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with Locality Sensitive Hashing (LSH). Our system could efficiently and effectively adapt to the changes within the data streams of any new terms with continual updates to the vector space model. Regarding miss probability, our proposed novelty detection framework outperforms a recognised baseline system by approximately 16% when evaluating a benchmark dataset from Google News.

Original languageEnglish
Title of host publicationIET Conference Publications
PublisherInstitution of Engineering and Technology
Pages1-6
Number of pages6
EditionCP672
ISBN (Electronic)9781785610325
DOIs
Publication statusPublished - 2015
Event2015 International Conference on Smart and Sustainable City and Big Data, ICSSC 2015 - Shanghai, China
Duration: 26 Jul 201527 Jul 2015

Publication series

NameIET Conference Publications
NumberCP672
Volume2015

Conference

Conference2015 International Conference on Smart and Sustainable City and Big Data, ICSSC 2015
Country/TerritoryChina
CityShanghai
Period26/07/1527/07/15

Keywords

  • Big data
  • First story detection
  • Locality sensitive hashing
  • Novelty detection
  • Text mining

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'An improved system for sentence-level novelty detection in textual streams'. Together they form a unique fingerprint.

Cite this