Abstract
The state-of-the-art YOLO detection algorithms still suffer from the issue of redundant extraction of similar features during feature propagation, and the simplistic stacking approach of connecting different features limits the flexibility of feature fusion. We propose a new feature recombination mechanism involving refining feature extraction and flexible concatenation. It includes the HFConv (Hybrid Flexibility Convolution) module, the MFD (Multivariate Flexibility Downsampling) module, and the DFSPP (Deformable and Flexible Spatial Pyramid Pooling) module. Specifically, the HFConv module employs feature refinement and flexible connection strategies to optimize feature representation and reduce redundancy in a dynamic way, acquiring diverse feature information from local and surrounding regions. The MFD module leverages multiple downsampling methods to address the issue of feature redundancy that may arise from a single downsampling method, thereby enhancing feature diversity. The DFSPP module learns an offset corresponding to the pooling kernel size, allowing for the extraction of the most critical information in a dynamic manner. By incorporating these modules into the YOLO architecture, we develop a more robust network called FRFCNet, and the experimental results show a notable 4.1% and 2.8% improvement in AP values on the VOC2012 and COCO2017 datasets, respectively, compared to the baseline (YOLOV7-Tiny-SiLu), outperforming current one-stage detectors.
| Original language | English |
|---|---|
| Pages (from-to) | 8498-8509 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Multimedia |
| Volume | 27 |
| DOIs | |
| Publication status | Published - 2025 |
Keywords
- Object detection
- deep learning
- feature fusion
ASJC Scopus subject areas
- Signal Processing
- Media Technology
- Computer Science Applications
- Electrical and Electronic Engineering