Abstract
Occluded person re-identification (Re-ID) is a challenging problem due to the absence of notable discriminative features resulting from incomplete body part images and interference from occluded regions. Recently, some transformer-based methods have demonstrated excellent capabilities in resolving this problem, however these methods are not able to precisely focus on the non-occluded body parts and cannot capture fine-grained local features. To achieve these we propose a Mask-Aware Hierarchical Aggregation TrAnsforMer (MAHATMA) method to enhance occluded person Re-ID. Specifically, we propose a Mask Information Embedding (MIE) module, which directs the model to focus on non-occluded body parts by incorporating the mask semantic information of a human body. Furthermore, to effectively capture fine-grained local features, we propose a Hierarchical Feature Aggregation (HFA) module that mines more exploitable high-quality detail information by aggregating hierarchical image patch representations. To further alleviate the feature loss problem, we propose a Diverse Feature Completion (DFC) module, which is able to complete global features through multi-path feature integration. Extensive experimental evaluations demonstrate that our method exhibits superior performance in dealing with occluded and holistic person datasets.
Original language | English |
---|---|
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
DOIs | |
Publication status | Accepted/In press - 2025 |
Keywords
- feature aggregation
- feature completion
- Occluded person re-identification
- vision transformer
ASJC Scopus subject areas
- Media Technology
- Electrical and Electronic Engineering