Local-enhanced representation for text-based person search

Guoqing Zhang; Yuhao Chen; Yuhui Zheng; Gaven Martin; Ruili Wang

doi:10.1016/j.patcog.2024.111247

Local-enhanced representation for text-based person search

Guoqing Zhang, Yuhao Chen, Yuhui Zheng, Gaven Martin, Ruili Wang

School of Computer Science

Research output: Journal Publication › Article › peer-review

3 Citations (Scopus)

Abstract

Text-based person search is a critical task in intelligent security, designed to locate a person of interest by text descriptions. The primary challenge in this task is to effectively bridge the significant gap between the text and image domains while simultaneously extracting the discriminative features that are crucial for the accurate identification of individuals. Existing methods have made some effective attempts by conducting cross-modal matching at the fine-grained representation level. However, these approaches frequently overlook two crucial factors: (i) the presence of noise in the local features during information fusion, and (ii) the lack of intra-modal matching when measuring feature similarity. To address the above issues, we propose a novel local-enhanced representation framework in this paper. Specifically, to restrain noises in local features, we design a Relation-based cross-modal local-enhanced fusion module, which can filter out weak related information by relation assessment. In addition, we explore an intra-cross modal projection strategy to overcome the limitations of existing cross-modal projection methods. This strategy jointly applies the intra-modal and cross-modal matching constrains in feature distribution. Finally, experiments on three mainstream datasets verify the performance superiority of our proposed method compared to existing state-of-the-art methods.

Original language	English
Article number	111247
Journal	Pattern Recognition
Volume	161
DOIs	https://doi.org/10.1016/j.patcog.2024.111247
Publication status	Published - May 2025

Keywords

Cross-modal retrieval
Local representation
Person re-identification

ASJC Scopus subject areas

Software
Signal Processing
Computer Vision and Pattern Recognition
Artificial Intelligence

Access to Document

10.1016/j.patcog.2024.111247

Cite this

@article{2b941646d0734cc794a39bd31171e37b,

title = "Local-enhanced representation for text-based person search",

abstract = "Text-based person search is a critical task in intelligent security, designed to locate a person of interest by text descriptions. The primary challenge in this task is to effectively bridge the significant gap between the text and image domains while simultaneously extracting the discriminative features that are crucial for the accurate identification of individuals. Existing methods have made some effective attempts by conducting cross-modal matching at the fine-grained representation level. However, these approaches frequently overlook two crucial factors: (i) the presence of noise in the local features during information fusion, and (ii) the lack of intra-modal matching when measuring feature similarity. To address the above issues, we propose a novel local-enhanced representation framework in this paper. Specifically, to restrain noises in local features, we design a Relation-based cross-modal local-enhanced fusion module, which can filter out weak related information by relation assessment. In addition, we explore an intra-cross modal projection strategy to overcome the limitations of existing cross-modal projection methods. This strategy jointly applies the intra-modal and cross-modal matching constrains in feature distribution. Finally, experiments on three mainstream datasets verify the performance superiority of our proposed method compared to existing state-of-the-art methods.",

keywords = "Cross-modal retrieval, Local representation, Person re-identification",

author = "Guoqing Zhang and Yuhao Chen and Yuhui Zheng and Gaven Martin and Ruili Wang",

note = "Publisher Copyright: {\textcopyright} 2024 The Authors",

year = "2025",

month = may,

doi = "10.1016/j.patcog.2024.111247",

language = "English",

volume = "161",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Local-enhanced representation for text-based person search

AU - Zhang, Guoqing

AU - Chen, Yuhao

AU - Zheng, Yuhui

AU - Martin, Gaven

AU - Wang, Ruili

PY - 2025/5

Y1 - 2025/5

N2 - Text-based person search is a critical task in intelligent security, designed to locate a person of interest by text descriptions. The primary challenge in this task is to effectively bridge the significant gap between the text and image domains while simultaneously extracting the discriminative features that are crucial for the accurate identification of individuals. Existing methods have made some effective attempts by conducting cross-modal matching at the fine-grained representation level. However, these approaches frequently overlook two crucial factors: (i) the presence of noise in the local features during information fusion, and (ii) the lack of intra-modal matching when measuring feature similarity. To address the above issues, we propose a novel local-enhanced representation framework in this paper. Specifically, to restrain noises in local features, we design a Relation-based cross-modal local-enhanced fusion module, which can filter out weak related information by relation assessment. In addition, we explore an intra-cross modal projection strategy to overcome the limitations of existing cross-modal projection methods. This strategy jointly applies the intra-modal and cross-modal matching constrains in feature distribution. Finally, experiments on three mainstream datasets verify the performance superiority of our proposed method compared to existing state-of-the-art methods.

AB - Text-based person search is a critical task in intelligent security, designed to locate a person of interest by text descriptions. The primary challenge in this task is to effectively bridge the significant gap between the text and image domains while simultaneously extracting the discriminative features that are crucial for the accurate identification of individuals. Existing methods have made some effective attempts by conducting cross-modal matching at the fine-grained representation level. However, these approaches frequently overlook two crucial factors: (i) the presence of noise in the local features during information fusion, and (ii) the lack of intra-modal matching when measuring feature similarity. To address the above issues, we propose a novel local-enhanced representation framework in this paper. Specifically, to restrain noises in local features, we design a Relation-based cross-modal local-enhanced fusion module, which can filter out weak related information by relation assessment. In addition, we explore an intra-cross modal projection strategy to overcome the limitations of existing cross-modal projection methods. This strategy jointly applies the intra-modal and cross-modal matching constrains in feature distribution. Finally, experiments on three mainstream datasets verify the performance superiority of our proposed method compared to existing state-of-the-art methods.

KW - Cross-modal retrieval

KW - Local representation

KW - Person re-identification

UR - http://www.scopus.com/inward/record.url?scp=85211477345&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2024.111247

DO - 10.1016/j.patcog.2024.111247

M3 - Article

AN - SCOPUS:85211477345

SN - 0031-3203

VL - 161

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 111247

ER -

Local-enhanced representation for text-based person search

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this