The Sound of Silence: How Traditional and Deep Learning Based Voice Activity Detection Influences Speech Quality Monitoring

Files in This Item:
File Description SizeFormat 
aics_17.pdf951.6 kBAdobe PDFDownload
Title: The Sound of Silence: How Traditional and Deep Learning Based Voice Activity Detection Influences Speech Quality Monitoring
Authors: Jaiswal, RahulHines, Andrew
Permanent link: http://hdl.handle.net/10197/11367
Date: 7-Dec-2018
Online since: 2020-05-05T14:51:49Z
Abstract: Real-time speech quality assessment is important for VoIP applications such as Google Hangouts, Microsoft Skype, and Apple Face-Time. Conventionally, subjective listening tests are used to quantify speech quality but are impractical for real-time monitoring scenarios. Objective speech quality assessment metrics can predict human judgement of perceived speech quality. Originally designed for narrow-band telephony applications, ITU-T P.563 is a single-ended or non-intrusive speech quality assessment that predicts speech quality without access to a reference signal. This paper investigates the suitability of P.563 in Voice over Internet Protocol (VoIP) scenarios and specifically the influence of silences on the predicted speech quality. The performance of P.563 was evaluated using TCD-VoIP dataset, containing speech with degradations commonly experienced with VoIP. The predictive capability of P.563 was established by comparing with subjective listening test results. The effect of pre-processing the signal to remove silences using Voice Activity Detection (VAD) was evaluated for five acoustic feature-based VAD algorithms: energy, energy and spectral centroid, Mahalanobis distance, weighted energy, weighted spectral centroid and four Deep learning model-based VAD algorithms: Deep Neural Network, Boosted Deep Neural Network, Long Short-Term Memory and Adaptive context attention model. Analysis shows P.563 prediction accuracy improves for different speech conditions of VoIP when the silences were removed by a VAD. The improvements varied with input content highlighting a potential to switch the VAD used based on the input to create a content aware speech quality monitoring system.
Funding Details: European Commission - European Regional Development Fund
Science Foundation Ireland
Type of material: Conference Publication
Copyright (published version): 2018 the Authors
Keywords: Speech qualityVoice activity detection
Other versions: http://ceur-ws.org/Vol-2259/
Language: en
Status of Item: Peer reviewed
Is part of: Brennan, R. B., Beel, J., Byrne, R., Debattista, J. and Crotti Junior, A. (eds.). Proceedings for the 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science Trinity College Dublin
Conference Details: The 26th Irish Conference on Artificial Intelligence and Cognitive Science, Trinity College Dublin, Ireland, 6-7 December 2018
Appears in Collections:Computer Science Research Collection

Show full item record

Page view(s)

156
Last Week
21
Last month
checked on Jun 6, 2020

Download(s)

11
checked on Jun 6, 2020

Google ScholarTM

Check


This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.