Using Multiple Level Fusion for Improving Performance of Speaker Recognition

Liu Di, Cho Siu Yeung, Sun Dongmei, Qiu Zhengding

Research output: Journal PublicationArticlepeer-review

Abstract

In this paper, a multiple level fusion framework to apply into the automatic speaker recognition system in order to improve its performance is presented. Based on the framework, different multiple level fusion methods, such as a strong multiple level fusion and three weak multiple level fusions, are defined in this paper. To examine the effectiveness of the proposed framework, two-feature combination scheme would be considered. After investigating the availability of strong and weak multiple level fusions for this scheme, the framework adopts a weak multiple level fusion method which combines two level fusions, ie matching-score fusion and decisionmaking fusion. In the matching-score level, a commonly used method called the score vector fusion is adopted. In the decision-making level, the kernel combination, also known as Multiple Kernel Learning is chosen. These two techniques can be embedded into many automatic speaker recognition systems. Throughout the evaluation by NIST 2001 corpus, two sets of experiments were conducted that the results of the two-feature combination scheme by the multiple level fusions are better than the traditional matching-score level fusion and unimodal methods. It is demonstrated that the multiple level fusion framework is an effective method to fuse the features for speaker recognition applications.

Original languageEnglish
Pages (from-to)39-48
Number of pages10
JournalTransactions Hong Kong Institution of Engineers
Volume18
Issue number4
DOIs
Publication statusPublished - 2011

Keywords

  • Decision-making Level Fusion
  • Feature Level Fusion
  • Fusion Techniques
  • Matching-score Level Fusion
  • Multiple Kernel Learning
  • Multiple Level Fusion
  • Speaker Recognition

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Using Multiple Level Fusion for Improving Performance of Speaker Recognition'. Together they form a unique fingerprint.

Cite this