TY - GEN
T1 - Succinct representations in collaborative filtering
T2 - 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
AU - Peng, Xiangjun
AU - Wang, Qingfeng
AU - Sun, Xu
AU - Gong, Chunye
AU - Wang, Yaohua
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.
AB - User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.
KW - Collaborative filtering
KW - Succinct data structures
KW - Supercomputing
UR - http://www.scopus.com/inward/record.url?scp=85083157730&partnerID=8YFLogxK
U2 - 10.1109/PDCAT46702.2019.00083
DO - 10.1109/PDCAT46702.2019.00083
M3 - Conference contribution
AN - SCOPUS:85083157730
T3 - Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
SP - 427
EP - 432
BT - Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
A2 - Tian, Hui
A2 - Shen, Hong
A2 - Tan, Wee Lum
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 December 2019 through 7 December 2019
ER -