Robust Voice Activity Detection based on a Gated Recurrent Unit

Han Il, The Center for Advanced Technology Research and Development, Kim Il Sung University

2024.8.14.

Voice activity detection (VAD) is an important front-end step for various speech applications. Audio signals are always corrupted by the various noises; thus, voice activity detection is very important.

In this paper we proposed an efficient deep neural network based on time-delayed neural network (TDNN) and gated recurrent network (GRU), which overcomes the shortage of traditional VAD under strong noisy environment.

We take 40-dimensions MFCC as feature.

VAD consists of three TDNN layers and two stacked GRU layers. In the experiments, we use speech files from Musan corpus and all models were trained in TensorFlow Framework.

Ten type's noise types such as factory and bable were taken in this experiment.

In the experiment, the proposed method was compared with traditional method under various types of noise with SNRs of 10, 5, 0 and-5dB. To represent the performance of the proposed method, the receiver operation characteristic (ROC) curve, in which true positive rate is plotted against false positive rate, is considered. The experimental results show that proposed method is more effective than traditional methods for VAD under considered noisy conditions, that is the neural network based on TDNN and GRU improves detection performance.

Results of our study were published in the journal "Multimedia Tools and Applications" under the title of "A Gated Recurrent Unit based robust voice activity detector" (https://doi.org/10.1007/s11042-023-17123-w).