TY - GEN
T1 - Cross-modal Short Video Recommendation
AU - Wan, Zhitao
AU - Xu, Yuanwei
AU - Yang, Miao
AU - Hua, Xiuping
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - With the rapid development of short video platforms, providing accurate short video recommendations for users has become increasingly important. However, due to the multimodal nature of short videos, effectively utilizing this information to improve recommendation quality remains challenging. This paper proposes a cross-modal short video recommendation method that comprehensively utilizes text, image, and audio information. The method involves multimodal processing of short videos, including segmentation of text, image, and audio, and multimodal alignment. It extracts multimodal features of short videos and fuses these features into a unified short video representation. Recommendations are made based on the similarity between user profiles created from their text browsing history and the short video representations. Finally, a cultural short video recommendation experiment based on users' text reading history is presented. Experimental results show that the cross-modal feature-based recommendation can effectively improve the personalized recommendation accuracy for users in other modalities, especially for cold start users. The proposed method outperforms existing methods in terms of accuracy, recall, and other metrics.
AB - With the rapid development of short video platforms, providing accurate short video recommendations for users has become increasingly important. However, due to the multimodal nature of short videos, effectively utilizing this information to improve recommendation quality remains challenging. This paper proposes a cross-modal short video recommendation method that comprehensively utilizes text, image, and audio information. The method involves multimodal processing of short videos, including segmentation of text, image, and audio, and multimodal alignment. It extracts multimodal features of short videos and fuses these features into a unified short video representation. Recommendations are made based on the similarity between user profiles created from their text browsing history and the short video representations. Finally, a cultural short video recommendation experiment based on users' text reading history is presented. Experimental results show that the cross-modal feature-based recommendation can effectively improve the personalized recommendation accuracy for users in other modalities, especially for cold start users. The proposed method outperforms existing methods in terms of accuracy, recall, and other metrics.
KW - cross-modal
KW - multi-modal
KW - video recommendation
UR - http://www.scopus.com/inward/record.url?scp=85198923685&partnerID=8YFLogxK
U2 - 10.1109/MVIPIT60427.2023.00025
DO - 10.1109/MVIPIT60427.2023.00025
M3 - Conference contribution
AN - SCOPUS:85198923685
T3 - Proceedings - 2023 International Conference on Machine Vision, Image Processing and Imaging Technology, MVIPIT 2023
SP - 117
EP - 122
BT - Proceedings - 2023 International Conference on Machine Vision, Image Processing and Imaging Technology, MVIPIT 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 International Conference on Machine Vision, Image Processing and Imaging Technology, MVIPIT 2023
Y2 - 22 September 2023 through 24 September 2023
ER -