A Contrastive Learning Based Multiview Scene Matching Method for UAV View Geo-Localization

Qiyi He; Ao Xu; Yifan Zhang; Zhiwei Ye; Wen Zhou; Ruijie Xi; Qiao Lin

doi:10.3390/rs16163039

A Contrastive Learning Based Multiview Scene Matching Method for UAV View Geo-Localization

Qiyi He, Ao Xu, Yifan Zhang, Zhiwei Ye, Wen Zhou, Ruijie Xi, Qiao Lin

School of Computer Science

Research output: Journal Publication › Article › peer-review

Abstract

Multi-view scene matching refers to the establishment of a mapping relationship between images captured from different perspectives, such as those taken by unmanned aerial vehicles (UAVs) and satellites. This technology is crucial for the geo-localization of UAV views. However, the geometric discrepancies between images from different perspectives, combined with the inherent computational constraints of UAVs, present significant challenges for matching UAV and satellite images. Additionally, the imbalance of positive and negative samples between drone and satellite images during model training can lead to instability. To address these challenges, this study proposes a novel and efficient cross-view geo-localization framework called MSM-Transformer. The framework employs the Dual Attention Vision Transformer (DaViT) as the core architecture for feature extraction, which significantly enhances the modeling capacity for global features and the contextual relevance of adjacent regions. The weight-sharing mechanism in MSM-Transformer effectively reduces model complexity, making it highly suitable for deployment on embedded devices such as UAVs and satellites. Furthermore, the framework introduces a contrastive learning-based Symmetric Decoupled Contrastive Learning (DCL) loss function, which effectively mitigates the issue of sample imbalance between satellite and UAV images. Experimental validation on the University-1652 dataset demonstrates that MSM-Transformer achieves outstanding performance, delivering optimal matching results with a minimal number of parameters.

Original language	English
Article number	3039
Number of pages	18
Journal	Remote Sensing
Volume	16
Issue number	16
DOIs	https://doi.org/10.3390/rs16163039
Publication status	Published - Aug 2024

Keywords

contrastive learning
geo-localization
image retrieval
multi-view scene matching
transformer

ASJC Scopus subject areas

General Earth and Planetary Sciences

Access to Document

10.3390/rs16163039

Cite this

@article{900b510d289c4fca915681d36ab49ff2,

title = "A Contrastive Learning Based Multiview Scene Matching Method for UAV View Geo-Localization",

abstract = "Multi-view scene matching refers to the establishment of a mapping relationship between images captured from different perspectives, such as those taken by unmanned aerial vehicles (UAVs) and satellites. This technology is crucial for the geo-localization of UAV views. However, the geometric discrepancies between images from different perspectives, combined with the inherent computational constraints of UAVs, present significant challenges for matching UAV and satellite images. Additionally, the imbalance of positive and negative samples between drone and satellite images during model training can lead to instability. To address these challenges, this study proposes a novel and efficient cross-view geo-localization framework called MSM-Transformer. The framework employs the Dual Attention Vision Transformer (DaViT) as the core architecture for feature extraction, which significantly enhances the modeling capacity for global features and the contextual relevance of adjacent regions. The weight-sharing mechanism in MSM-Transformer effectively reduces model complexity, making it highly suitable for deployment on embedded devices such as UAVs and satellites. Furthermore, the framework introduces a contrastive learning-based Symmetric Decoupled Contrastive Learning (DCL) loss function, which effectively mitigates the issue of sample imbalance between satellite and UAV images. Experimental validation on the University-1652 dataset demonstrates that MSM-Transformer achieves outstanding performance, delivering optimal matching results with a minimal number of parameters.",

keywords = "contrastive learning, geo-localization, image retrieval, multi-view scene matching, transformer",

author = "Qiyi He and Ao Xu and Yifan Zhang and Zhiwei Ye and Wen Zhou and Ruijie Xi and Qiao Lin",

note = "Publisher Copyright: {\textcopyright} 2024 by the authors.",

year = "2024",

month = aug,

doi = "10.3390/rs16163039",

language = "English",

volume = "16",

journal = "Remote Sensing",

issn = "2072-4292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "16",

}

TY - JOUR

T1 - A Contrastive Learning Based Multiview Scene Matching Method for UAV View Geo-Localization

AU - He, Qiyi

AU - Xu, Ao

AU - Zhang, Yifan

AU - Ye, Zhiwei

AU - Zhou, Wen

AU - Xi, Ruijie

AU - Lin, Qiao

PY - 2024/8

Y1 - 2024/8

N2 - Multi-view scene matching refers to the establishment of a mapping relationship between images captured from different perspectives, such as those taken by unmanned aerial vehicles (UAVs) and satellites. This technology is crucial for the geo-localization of UAV views. However, the geometric discrepancies between images from different perspectives, combined with the inherent computational constraints of UAVs, present significant challenges for matching UAV and satellite images. Additionally, the imbalance of positive and negative samples between drone and satellite images during model training can lead to instability. To address these challenges, this study proposes a novel and efficient cross-view geo-localization framework called MSM-Transformer. The framework employs the Dual Attention Vision Transformer (DaViT) as the core architecture for feature extraction, which significantly enhances the modeling capacity for global features and the contextual relevance of adjacent regions. The weight-sharing mechanism in MSM-Transformer effectively reduces model complexity, making it highly suitable for deployment on embedded devices such as UAVs and satellites. Furthermore, the framework introduces a contrastive learning-based Symmetric Decoupled Contrastive Learning (DCL) loss function, which effectively mitigates the issue of sample imbalance between satellite and UAV images. Experimental validation on the University-1652 dataset demonstrates that MSM-Transformer achieves outstanding performance, delivering optimal matching results with a minimal number of parameters.

AB - Multi-view scene matching refers to the establishment of a mapping relationship between images captured from different perspectives, such as those taken by unmanned aerial vehicles (UAVs) and satellites. This technology is crucial for the geo-localization of UAV views. However, the geometric discrepancies between images from different perspectives, combined with the inherent computational constraints of UAVs, present significant challenges for matching UAV and satellite images. Additionally, the imbalance of positive and negative samples between drone and satellite images during model training can lead to instability. To address these challenges, this study proposes a novel and efficient cross-view geo-localization framework called MSM-Transformer. The framework employs the Dual Attention Vision Transformer (DaViT) as the core architecture for feature extraction, which significantly enhances the modeling capacity for global features and the contextual relevance of adjacent regions. The weight-sharing mechanism in MSM-Transformer effectively reduces model complexity, making it highly suitable for deployment on embedded devices such as UAVs and satellites. Furthermore, the framework introduces a contrastive learning-based Symmetric Decoupled Contrastive Learning (DCL) loss function, which effectively mitigates the issue of sample imbalance between satellite and UAV images. Experimental validation on the University-1652 dataset demonstrates that MSM-Transformer achieves outstanding performance, delivering optimal matching results with a minimal number of parameters.

KW - contrastive learning

KW - geo-localization

KW - image retrieval

KW - multi-view scene matching

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85202444310&partnerID=8YFLogxK

U2 - 10.3390/rs16163039

DO - 10.3390/rs16163039

M3 - Article

SN - 2072-4292

VL - 16

JO - Remote Sensing

JF - Remote Sensing

IS - 16

M1 - 3039

ER -

A Contrastive Learning Based Multiview Scene Matching Method for UAV View Geo-Localization

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this