Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation

Ning Yang; Muhammad Usman; Xiangjian He; Mian Ahmad Jan; Liming Zhang

doi:10.1109/ACCESS.2017.2761741

Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation

Ning Yang, Muhammad Usman, Xiangjian He, Mian Ahmad Jan, Liming Zhang

Research output: Journal Publication › Article › peer-review

13 Citations (Scopus)

Abstract

Blind Source Separation techniques are widely used in the field of wireless communication for a very long time to extract signals of interest from a set of multiple signals without training data. In this paper, we investigate the problem of separation of the human voice from a mixture of human voice and sounds from different musical instruments. The human voice may be a singing voice in a song or may be a part of some news, broadcast by a channel with background music. This paper proposes a generalized Short Time Fourier Transform (STFT)-based technique, combined with filter bank to extract vocals from background music. The main purpose is to design a filter bank and to eliminate background aliasing errors with best reconstruction conditions, having approximated scaling factors. Stereo signals in time-frequency domain are used in experiments. The input stereo signals are processed in the form of frames and passed through the proposed STFT-based technique. The output of the STFT-based technique is passed through the filter bank to minimize the background aliasing errors. For reconstruction, first an inverse STFT is applied and then the signals are reconstructed by the OverLap-Add method to get the final output, containing vocals only. The experiments show that the proposed approach performs better than the other state-of-the-art approaches, in terms of Signal-to-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR), respectively.

Original language	English
Article number	8063868
Pages (from-to)	27114-27125
Number of pages	12
Journal	IEEE Access
Volume	5
DOIs	https://doi.org/10.1109/ACCESS.2017.2761741
Publication status	Published - 9 Oct 2017
Externally published	Yes

Keywords

Blind Source Separation
OverLap-Add
SDR
SIR
Short Time Fourier Transform

ASJC Scopus subject areas

General Computer Science
General Materials Science
General Engineering

Access to Document

10.1109/ACCESS.2017.2761741

Cite this

@article{2d6e5b4c8bf04b32b90208516a07df7f,

title = "Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation",

abstract = "Blind Source Separation techniques are widely used in the field of wireless communication for a very long time to extract signals of interest from a set of multiple signals without training data. In this paper, we investigate the problem of separation of the human voice from a mixture of human voice and sounds from different musical instruments. The human voice may be a singing voice in a song or may be a part of some news, broadcast by a channel with background music. This paper proposes a generalized Short Time Fourier Transform (STFT)-based technique, combined with filter bank to extract vocals from background music. The main purpose is to design a filter bank and to eliminate background aliasing errors with best reconstruction conditions, having approximated scaling factors. Stereo signals in time-frequency domain are used in experiments. The input stereo signals are processed in the form of frames and passed through the proposed STFT-based technique. The output of the STFT-based technique is passed through the filter bank to minimize the background aliasing errors. For reconstruction, first an inverse STFT is applied and then the signals are reconstructed by the OverLap-Add method to get the final output, containing vocals only. The experiments show that the proposed approach performs better than the other state-of-the-art approaches, in terms of Signal-to-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR), respectively.",

keywords = "Blind Source Separation, OverLap-Add, SDR, SIR, Short Time Fourier Transform",

author = "Ning Yang and Muhammad Usman and Xiangjian He and Jan, {Mian Ahmad} and Liming Zhang",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.",

year = "2017",

month = oct,

day = "9",

doi = "10.1109/ACCESS.2017.2761741",

language = "English",

volume = "5",

pages = "27114--27125",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Time-Frequency Filter Bank

T2 - A Simple Approach for Audio and Music Separation

AU - Yang, Ning

AU - Usman, Muhammad

AU - He, Xiangjian

AU - Jan, Mian Ahmad

AU - Zhang, Liming

PY - 2017/10/9

Y1 - 2017/10/9

N2 - Blind Source Separation techniques are widely used in the field of wireless communication for a very long time to extract signals of interest from a set of multiple signals without training data. In this paper, we investigate the problem of separation of the human voice from a mixture of human voice and sounds from different musical instruments. The human voice may be a singing voice in a song or may be a part of some news, broadcast by a channel with background music. This paper proposes a generalized Short Time Fourier Transform (STFT)-based technique, combined with filter bank to extract vocals from background music. The main purpose is to design a filter bank and to eliminate background aliasing errors with best reconstruction conditions, having approximated scaling factors. Stereo signals in time-frequency domain are used in experiments. The input stereo signals are processed in the form of frames and passed through the proposed STFT-based technique. The output of the STFT-based technique is passed through the filter bank to minimize the background aliasing errors. For reconstruction, first an inverse STFT is applied and then the signals are reconstructed by the OverLap-Add method to get the final output, containing vocals only. The experiments show that the proposed approach performs better than the other state-of-the-art approaches, in terms of Signal-to-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR), respectively.

AB - Blind Source Separation techniques are widely used in the field of wireless communication for a very long time to extract signals of interest from a set of multiple signals without training data. In this paper, we investigate the problem of separation of the human voice from a mixture of human voice and sounds from different musical instruments. The human voice may be a singing voice in a song or may be a part of some news, broadcast by a channel with background music. This paper proposes a generalized Short Time Fourier Transform (STFT)-based technique, combined with filter bank to extract vocals from background music. The main purpose is to design a filter bank and to eliminate background aliasing errors with best reconstruction conditions, having approximated scaling factors. Stereo signals in time-frequency domain are used in experiments. The input stereo signals are processed in the form of frames and passed through the proposed STFT-based technique. The output of the STFT-based technique is passed through the filter bank to minimize the background aliasing errors. For reconstruction, first an inverse STFT is applied and then the signals are reconstructed by the OverLap-Add method to get the final output, containing vocals only. The experiments show that the proposed approach performs better than the other state-of-the-art approaches, in terms of Signal-to-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR), respectively.

KW - Blind Source Separation

KW - OverLap-Add

KW - SDR

KW - SIR

KW - Short Time Fourier Transform

UR - http://www.scopus.com/inward/record.url?scp=85040288641&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2017.2761741

DO - 10.1109/ACCESS.2017.2761741

M3 - Article

AN - SCOPUS:85040288641

SN - 2169-3536

VL - 5

SP - 27114

EP - 27125

JO - IEEE Access

JF - IEEE Access

M1 - 8063868

ER -

Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this