Point Clouds are Specialized Images: A Knowledge Transfer Approach for 3D Understanding

Jiachen Kang; Wenjing Jia; Xiangjian He; Kin Man Lam

doi:10.1109/TMM.2024.3412330

Point Clouds are Specialized Images: A Knowledge Transfer Approach for 3D Understanding

Jiachen Kang, Wenjing Jia, Xiangjian He, Kin Man Lam

School of Computer Science

Research output: Journal Publication › Article › peer-review

1 Citation (Scopus)

Abstract

Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as “specialized images”. This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality in a more direct and deeper manner, via extensively sharing the parameters with a pretrained image encoder in a multi-way Transformer architecture. The parameter sharing strategy, combined with an additional pretext task for pre-training, i.e., transformation estimation, empowers PCExpert to outperform the state of the arts in a variety of tasks, with a remarkable reduction in the number of trainable parameters. Notably, PCExpert’s performance under LINEAR fine-tuning (e.g., yielding a 90.02% overall accuracy on ScanObjectNN) has already closely approximated the results obtained with FULL model fine-tuning (92.66%), demonstrating its effective representation capability.

Original language	English
Pages (from-to)	10755-10765
Number of pages	11
Journal	IEEE Transactions on Multimedia
Volume	26
DOIs	https://doi.org/10.1109/TMM.2024.3412330
Publication status	Published - 2024

Keywords

Cross-modal learning
point cloud understanding
self-supervision
transfer learning

ASJC Scopus subject areas

Signal Processing
Media Technology
Computer Science Applications
Electrical and Electronic Engineering

Access to Document

10.1109/TMM.2024.3412330

Cite this

@article{f979474d514a4ccab95f6dee74881011,

title = "Point Clouds are Specialized Images: A Knowledge Transfer Approach for 3D Understanding",

abstract = "Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as “specialized images”. This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality in a more direct and deeper manner, via extensively sharing the parameters with a pretrained image encoder in a multi-way Transformer architecture. The parameter sharing strategy, combined with an additional pretext task for pre-training, i.e., transformation estimation, empowers PCExpert to outperform the state of the arts in a variety of tasks, with a remarkable reduction in the number of trainable parameters. Notably, PCExpert{\textquoteright}s performance under LINEAR fine-tuning (e.g., yielding a 90.02\% overall accuracy on ScanObjectNN) has already closely approximated the results obtained with FULL model fine-tuning (92.66\%), demonstrating its effective representation capability.",

keywords = "Cross-modal learning, point cloud understanding, self-supervision, transfer learning",

author = "Jiachen Kang and Wenjing Jia and Xiangjian He and Lam, \{Kin Man\}",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.",

year = "2024",

doi = "10.1109/TMM.2024.3412330",

language = "English",

volume = "26",

pages = "10755--10765",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Point Clouds are Specialized Images

T2 - A Knowledge Transfer Approach for 3D Understanding

AU - Kang, Jiachen

AU - Jia, Wenjing

AU - He, Xiangjian

AU - Lam, Kin Man

PY - 2024

Y1 - 2024

N2 - Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as “specialized images”. This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality in a more direct and deeper manner, via extensively sharing the parameters with a pretrained image encoder in a multi-way Transformer architecture. The parameter sharing strategy, combined with an additional pretext task for pre-training, i.e., transformation estimation, empowers PCExpert to outperform the state of the arts in a variety of tasks, with a remarkable reduction in the number of trainable parameters. Notably, PCExpert’s performance under LINEAR fine-tuning (e.g., yielding a 90.02% overall accuracy on ScanObjectNN) has already closely approximated the results obtained with FULL model fine-tuning (92.66%), demonstrating its effective representation capability.

AB - Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as “specialized images”. This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality in a more direct and deeper manner, via extensively sharing the parameters with a pretrained image encoder in a multi-way Transformer architecture. The parameter sharing strategy, combined with an additional pretext task for pre-training, i.e., transformation estimation, empowers PCExpert to outperform the state of the arts in a variety of tasks, with a remarkable reduction in the number of trainable parameters. Notably, PCExpert’s performance under LINEAR fine-tuning (e.g., yielding a 90.02% overall accuracy on ScanObjectNN) has already closely approximated the results obtained with FULL model fine-tuning (92.66%), demonstrating its effective representation capability.

KW - Cross-modal learning

KW - point cloud understanding

KW - self-supervision

KW - transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85196117309&partnerID=8YFLogxK

U2 - 10.1109/TMM.2024.3412330

DO - 10.1109/TMM.2024.3412330

M3 - Article

AN - SCOPUS:85196117309

SN - 1520-9210

VL - 26

SP - 10755

EP - 10765

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Point Clouds are Specialized Images: A Knowledge Transfer Approach for 3D Understanding

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this