Tencent pretrain: A scalable and flexible toolkit for pre-Training models of different modalities

Zhe Zhao; Yudong Li; Cheng Hou; Jing Zhao; Rong Tian; Weijie Liu; Yiren Chen; Ningyuan Sun; Haoyan Liu; Weiquan Mao; Han Guo; Weigang Guo; Taiqiang Wu; Tao Zhu; Wenhang Shi; Chen Chen; Shan Huang; Sihong Chen; Liqun Liu; Feifei Li; Xiaoshuai Chen; Xingwu Sun; Zhanhui Kang; Xiaoyong Du; Linlin Shen; Kimmo Yan

Tencent pretrain: A scalable and flexible toolkit for pre-Training models of different modalities

Zhe Zhao, Yudong Li, Cheng Hou, Jing Zhao, Rong Tian, Weijie Liu, Yiren Chen, Ningyuan Sun, Haoyan Liu, Weiquan Mao, Han Guo, Weigang Guo, Taiqiang Wu, Tao Zhu, Wenhang Shi, Chen Chen, Shan Huang, Sihong Chen, Liqun Liu, Feifei LiXiaoshuai Chen, Xingwu Sun, Zhanhui Kang, Xiaoyong Du, Linlin Shen, Kimmo Yan

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

5 Citations (Scopus)

Abstract

Recently, the success of pre-Training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-Training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-Training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-Training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pretraining models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-Training model. The modular design enables users to efficiently reproduce existing pre-Training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.

Original language	English
Title of host publication	System Demonstrations
Publisher	Association for Computational Linguistics (ACL)
Pages	217-225
Number of pages	9
ISBN (Electronic)	9781959429708
Publication status	Published - 2023
Externally published	Yes
Event	61st Annual Meeting of the Association for Computational Linguistics, ACL-DEMO 2023 - Toronto, Canada Duration: 10 Jul 2023 → 12 Jul 2023

Publication series

Name	Proceedings of the Annual Meeting of the Association for Computational Linguistics
Volume	3
ISSN (Print)	0736-587X

Conference

Conference	61st Annual Meeting of the Association for Computational Linguistics, ACL-DEMO 2023
Country/Territory	Canada
City	Toronto
Period	10/07/23 → 12/07/23

ASJC Scopus subject areas

Computer Science Applications
Linguistics and Language
Language and Linguistics

Cite this

Zhao, Z., Li, Y., Hou, C., Zhao, J., Tian, R., Liu, W., Chen, Y., Sun, N., Liu, H., Mao, W., Guo, H., Guo, W., Wu, T., Zhu, T., Shi, W., Chen, C., Huang, S., Chen, S., Liu, L., ... Yan, K. (2023). Tencent pretrain: A scalable and flexible toolkit for pre-Training models of different modalities. In System Demonstrations (pp. 217-225). (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 3). Association for Computational Linguistics (ACL).

@inproceedings{12f2017abe814b938233cd18c4cdcc37,

title = "Tencent pretrain: A scalable and flexible toolkit for pre-Training models of different modalities",

abstract = "Recently, the success of pre-Training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-Training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-Training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-Training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pretraining models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-Training model. The modular design enables users to efficiently reproduce existing pre-Training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.",

author = "Zhe Zhao and Yudong Li and Cheng Hou and Jing Zhao and Rong Tian and Weijie Liu and Yiren Chen and Ningyuan Sun and Haoyan Liu and Weiquan Mao and Han Guo and Weigang Guo and Taiqiang Wu and Tao Zhu and Wenhang Shi and Chen Chen and Shan Huang and Sihong Chen and Liqun Liu and Feifei Li and Xiaoshuai Chen and Xingwu Sun and Zhanhui Kang and Xiaoyong Du and Linlin Shen and Kimmo Yan",

note = "Publisher Copyright: {\textcopyright} ACL-DEMO 2023. All rights reserved.; 61st Annual Meeting of the Association for Computational Linguistics, ACL-DEMO 2023 ; Conference date: 10-07-2023 Through 12-07-2023",

year = "2023",

language = "English",

series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics (ACL)",

pages = "217--225",

booktitle = "System Demonstrations",

address = "United States",

}

Zhao, Z, Li, Y, Hou, C, Zhao, J, Tian, R, Liu, W, Chen, Y, Sun, N, Liu, H, Mao, W, Guo, H, Guo, W, Wu, T, Zhu, T, Shi, W, Chen, C, Huang, S, Chen, S, Liu, L, Li, F, Chen, X, Sun, X, Kang, Z, Du, X, Shen, L & Yan, K 2023, Tencent pretrain: A scalable and flexible toolkit for pre-Training models of different modalities. in System Demonstrations. Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 3, Association for Computational Linguistics (ACL), pp. 217-225, 61st Annual Meeting of the Association for Computational Linguistics, ACL-DEMO 2023, Toronto, Canada, 10/07/23.

Tencent pretrain: A scalable and flexible toolkit for pre-Training models of different modalities. / Zhao, Zhe; Li, Yudong; Hou, Cheng et al.
System Demonstrations. Association for Computational Linguistics (ACL), 2023. p. 217-225 (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 3).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Tencent pretrain

T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL-DEMO 2023

AU - Zhao, Zhe

AU - Li, Yudong

AU - Hou, Cheng

AU - Zhao, Jing

AU - Tian, Rong

AU - Liu, Weijie

AU - Chen, Yiren

AU - Sun, Ningyuan

AU - Liu, Haoyan

AU - Mao, Weiquan

AU - Guo, Han

AU - Guo, Weigang

AU - Wu, Taiqiang

AU - Zhu, Tao

AU - Shi, Wenhang

AU - Chen, Chen

AU - Huang, Shan

AU - Chen, Sihong

AU - Liu, Liqun

AU - Li, Feifei

AU - Chen, Xiaoshuai

AU - Sun, Xingwu

AU - Kang, Zhanhui

AU - Du, Xiaoyong

AU - Shen, Linlin

AU - Yan, Kimmo

PY - 2023

Y1 - 2023

N2 - Recently, the success of pre-Training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-Training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-Training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-Training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pretraining models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-Training model. The modular design enables users to efficiently reproduce existing pre-Training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.

AB - Recently, the success of pre-Training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-Training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-Training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-Training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pretraining models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-Training model. The modular design enables users to efficiently reproduce existing pre-Training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.

UR - http://www.scopus.com/inward/record.url?scp=85170847533&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85170847533

T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics

SP - 217

EP - 225

BT - System Demonstrations

PB - Association for Computational Linguistics (ACL)

Y2 - 10 July 2023 through 12 July 2023

ER -

Tencent pretrain: A scalable and flexible toolkit for pre-Training models of different modalities

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this