Institut for Forretningsudvikling og Teknologi

Ramjee Prasad

Convex Combination of Multiple Statistical Models with Application to VAD

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

  • Theodoros Petsatodis, Danmark
  • Christos Boukis, Grækenland
  • Fotios Talantzis, Grækenland
  • Zheng-Hua Tan, Danmark
  • Ramjee Prasad
This paper proposes a robust Voice Activity Detector (VAD) based on the observation that the distribution of speech captured with far-field microphones is highly varying, depending on the noise and reverberation conditions. The proposed VAD employs a convex combination scheme comprising three statistical distributions - a Gaussian, a Laplacian, and a two-sided Gamma - to effectively model captured speech. This scheme shows increased ability to adapt to dynamic acoustic environments. The contribution of each distribution to this convex combination is automatically adjusted based on the statistical characteristics of the instantaneous audio input. To further improve the performance of the system, an adaptive threshold is introduced, while a decision-smoothing scheme caters to the intra-frame correlation of speech signals. Extensive experiments under realistic scenarios support the proposed approach of combining several models for increased adaptation and performance.
OriginalsprogEngelsk
TidsskriftI E E E Transactions on Audio, Speech and Language Processing
Vol/bind19
Nummer8
Sider (fra-til)2314-2327
ISSN1558-7916
DOI
StatusUdgivet - nov. 2011
Eksternt udgivetJa

Se relationer på Aarhus Universitet Citationsformater

ID: 171379612