Exploring the influence of sampling on pattern support distribution

Xu Luofeng, Marsland Stephen, Wang Ruili

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Identifying the pattern support distribution (PSD) in datasets is useful for many data mining tasks, such as market basket analysis. The support of a pattern is the frequency of its occurrence in a dataset. Calculating the distribution of these supports over an entire dataset is computationally expensive; this cost can be reduced by sampling from the dataset and computing the PSD on a relatively small sample. However, this may miscount patterns and cause significant changes in the distribution identified. Based on the fact that the PSD shows a power-law relationship, in this paper we investigate the influence of sampling on the characteristics of the power-law relationship in the pattern support distribution. We consider sampling effect on this relationship under two assumptions: uniform distribution of pattern supports, and independent identically distributed (i.i.d.) distributions. We experimentally evaluate the influence on data from four real-world transaction datasets.

Original languageEnglish
Title of host publicationProceedings - 8th IEEE International Conference on Computer and Information Technology Workshops, CIT Workshops 2008
Pages66-71
Number of pages6
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event8th IEEE International Conference on Computer and Information Technology Workshops, CIT Workshops 2008 - Sydney, Australia
Duration: 8 Jul 200811 Jul 2008

Publication series

NameProceedings - 8th IEEE International Conference on Computer and Information Technology Workshops, CIT Workshops 2008

Conference

Conference8th IEEE International Conference on Computer and Information Technology Workshops, CIT Workshops 2008
Country/TerritoryAustralia
CitySydney
Period8/07/0811/07/08

ASJC Scopus subject areas

  • General Computer Science
  • Computer Graphics and Computer-Aided Design
  • Information Systems
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Exploring the influence of sampling on pattern support distribution'. Together they form a unique fingerprint.

Cite this