ConTrans-Detect: A Multi-Scale Convolution-Transformer Network for DeepFake Video Detection

Weirong Sun, Yujun Ma, Hong Zhang, Ruili Wang

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

With the recent advancement of generative deep learning technologies, DeepFakes are the outcome of the manipulation to generate synthetic images, such as swapping a person's face in a video with another face in another video. Nowadays, deep generative models make it easy to generate fake videos, which is hard to detect. Existing methods have utilized Convolutional Neural Networks (CNNs) to identify manipulated regions for DeepFake video detection. However, these methods might not entirely tackle the difficulties of learning low-level spatial features and capturing temporal variations in temporal information, which are crucial for face forgery detection. Therefore, we propose a Convolution-Transformer Deepfake Detection (ConTrans-Detect) model, comprising a multi-scale CNN module for spatial feature representation and a multi-branch Transformer for temporal feature modeling. The multi-scale CNN module uses 3D Inception block to extract multi-scale low-level features (e.g., edges, corners, and angles) from videos. The multi-branch Transformer module consists of multi-stream Transformer layers, each taking different temporal resolutions and spatial feature dimensions as input to perceive various motion variations. Our model achieves an AUC of 0.929 and 0.920 f1 score, surpassing several state-of-The-Art performances on the DeepFake Detection Challenge Datasets (DFDC).

Original languageEnglish
Title of host publication2023 29th International Conference on Mechatronics and Machine Vision in Practice, M2VIP 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350325621
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event29th International Conference on Mechatronics and Machine Vision in Practice, M2VIP 2023 - Queenstown, New Zealand
Duration: 21 Nov 202324 Nov 2023

Publication series

Name2023 29th International Conference on Mechatronics and Machine Vision in Practice, M2VIP 2023

Conference

Conference29th International Conference on Mechatronics and Machine Vision in Practice, M2VIP 2023
Country/TerritoryNew Zealand
CityQueenstown
Period21/11/2324/11/23

Keywords

  • Convolutional neural network
  • DeepFake video detection
  • Privacy
  • Security
  • Vision transformer

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Mechanical Engineering
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'ConTrans-Detect: A Multi-Scale Convolution-Transformer Network for DeepFake Video Detection'. Together they form a unique fingerprint.

Cite this