Cross-modal Short Video Recommendation

Zhitao Wan, Yuanwei Xu, Miao Yang, Xiuping Hua

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

With the rapid development of short video platforms, providing accurate short video recommendations for users has become increasingly important. However, due to the multimodal nature of short videos, effectively utilizing this information to improve recommendation quality remains challenging. This paper proposes a cross-modal short video recommendation method that comprehensively utilizes text, image, and audio information. The method involves multimodal processing of short videos, including segmentation of text, image, and audio, and multimodal alignment. It extracts multimodal features of short videos and fuses these features into a unified short video representation. Recommendations are made based on the similarity between user profiles created from their text browsing history and the short video representations. Finally, a cultural short video recommendation experiment based on users' text reading history is presented. Experimental results show that the cross-modal feature-based recommendation can effectively improve the personalized recommendation accuracy for users in other modalities, especially for cold start users. The proposed method outperforms existing methods in terms of accuracy, recall, and other metrics.

Original languageEnglish
Title of host publicationProceedings - 2023 International Conference on Machine Vision, Image Processing and Imaging Technology, MVIPIT 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages117-122
Number of pages6
ISBN (Electronic)9798350306545
DOIs
Publication statusPublished - 2023
Event2023 International Conference on Machine Vision, Image Processing and Imaging Technology, MVIPIT 2023 - Hangzhou, China
Duration: 22 Sept 202324 Sept 2023

Publication series

NameProceedings - 2023 International Conference on Machine Vision, Image Processing and Imaging Technology, MVIPIT 2023

Conference

Conference2023 International Conference on Machine Vision, Image Processing and Imaging Technology, MVIPIT 2023
Country/TerritoryChina
CityHangzhou
Period22/09/2324/09/23

Keywords

  • cross-modal
  • multi-modal
  • video recommendation

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Cross-modal Short Video Recommendation'. Together they form a unique fingerprint.

Cite this