Abstract
Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as “specialized images”. This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality in a more direct and deeper manner, via extensively sharing the parameters with a pretrained image encoder in a multi-way Transformer architecture. The parameter sharing strategy, combined with an additional pretext task for pre-training, i.e., transformation estimation, empowers PCExpert to outperform the state of the arts in a variety of tasks, with a remarkable reduction in the number of trainable parameters. Notably, PCExpert’s performance under LINEAR fine-tuning (e.g., yielding a 90.02% overall accuracy on ScanObjectNN) has already closely approximated the results obtained with FULL model fine-tuning (92.66%), demonstrating its effective representation capability.
Original language | English |
---|---|
Pages (from-to) | 10755-10765 |
Number of pages | 11 |
Journal | IEEE Transactions on Multimedia |
Volume | 26 |
DOIs | |
Publication status | Published - 2024 |
Keywords
- Cross-modal learning
- point cloud understanding
- self-supervision
- transfer learning
ASJC Scopus subject areas
- Signal Processing
- Media Technology
- Computer Science Applications
- Electrical and Electronic Engineering