Succinct representations in collaborative filtering: A case study using wavelet tree on 1,000 cores

Xiangjun Peng, Qingfeng Wang, Xu Sun, Chunye Gong, Yaohua Wang

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

44 Downloads (Pure)

Abstract

User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.

Original languageEnglish
Title of host publicationProceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
EditorsHui Tian, Hong Shen, Wee Lum Tan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages427-432
Number of pages6
ISBN (Electronic)9781728126166
DOIs
Publication statusPublished - Dec 2019
Event20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 - Gold Coast, Australia
Duration: 5 Dec 20197 Dec 2019

Publication series

NameProceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019

Conference

Conference20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
Country/TerritoryAustralia
CityGold Coast
Period5/12/197/12/19

Keywords

  • Collaborative filtering
  • Succinct data structures
  • Supercomputing

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Succinct representations in collaborative filtering: A case study using wavelet tree on 1,000 cores'. Together they form a unique fingerprint.

Cite this