Feature flow: In-network feature flow estimation for video object detection

Ruibing Jin; Guosheng Lin; Changyun Wen; Jianliang Wang; Fayao Liu

doi:10.1016/j.patcog.2021.108323

Feature flow: In-network feature flow estimation for video object detection

Ruibing Jin, Guosheng Lin, Changyun Wen, Jianliang Wang, Fayao Liu

Research output: Journal Publication › Article › peer-review

10 Citations (Scopus)

Abstract

Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID.

Original language	English
Article number	108323
Journal	Pattern Recognition
Volume	122
DOIs	https://doi.org/10.1016/j.patcog.2021.108323
Publication status	Published - Feb 2022
Externally published	Yes

Keywords

Deep convolutional neural network (DCNN)
Feature flow
Object detection
Video analysis
Video object detection

ASJC Scopus subject areas

Software
Signal Processing
Computer Vision and Pattern Recognition
Artificial Intelligence

Access to Document

10.1016/j.patcog.2021.108323

Cite this

@article{d02da925894b482cbbf3b26c4608934a,

title = "Feature flow: In-network feature flow estimation for video object detection",

abstract = "Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID.",

keywords = "Deep convolutional neural network (DCNN), Feature flow, Object detection, Video analysis, Video object detection",

author = "Ruibing Jin and Guosheng Lin and Changyun Wen and Jianliang Wang and Fayao Liu",

note = "Publisher Copyright: {\textcopyright} 2021",

year = "2022",

month = feb,

doi = "10.1016/j.patcog.2021.108323",

language = "English",

volume = "122",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Feature flow

T2 - In-network feature flow estimation for video object detection

AU - Jin, Ruibing

AU - Lin, Guosheng

AU - Wen, Changyun

AU - Wang, Jianliang

AU - Liu, Fayao

PY - 2022/2

Y1 - 2022/2

N2 - Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID.

AB - Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID.

KW - Deep convolutional neural network (DCNN)

KW - Feature flow

KW - Object detection

KW - Video analysis

KW - Video object detection

UR - http://www.scopus.com/inward/record.url?scp=85115428517&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2021.108323

DO - 10.1016/j.patcog.2021.108323

M3 - Article

AN - SCOPUS:85115428517

SN - 0031-3203

VL - 122

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 108323

ER -

Feature flow: In-network feature flow estimation for video object detection

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this