Options
The Sound of Silence: How Traditional and Deep Learning Based Voice Activity Detection Influences Speech Quality Monitoring
Author(s)
Date Issued
2018-12-07
Date Available
2020-05-05T14:51:49Z
Abstract
Real-time speech quality assessment is important for VoIP applications such as Google Hangouts, Microsoft Skype, and Apple Face-Time. Conventionally, subjective listening tests are used to quantify speech quality but are impractical for real-time monitoring scenarios. Objective speech quality assessment metrics can predict human judgement of perceived speech quality. Originally designed for narrow-band telephony applications, ITU-T P.563 is a single-ended or non-intrusive speech quality assessment that predicts speech quality without access to a reference signal. This paper investigates the suitability of P.563 in Voice over Internet Protocol (VoIP) scenarios and specifically the influence of silences on the predicted speech quality. The performance of P.563 was evaluated using TCD-VoIP dataset, containing speech with degradations commonly experienced with VoIP. The predictive capability of P.563 was established by comparing with subjective listening test results. The effect of pre-processing the signal to remove silences using Voice Activity Detection (VAD) was evaluated for five acoustic feature-based VAD algorithms: energy, energy and spectral centroid, Mahalanobis distance, weighted energy, weighted spectral centroid and four Deep learning model-based VAD algorithms: Deep Neural Network, Boosted Deep Neural Network, Long Short-Term Memory and Adaptive context attention model. Analysis shows P.563 prediction accuracy improves for different speech conditions of VoIP when the silences were removed by a VAD. The improvements varied with input content highlighting a potential to switch the VAD used based on the input to create a content aware speech quality monitoring system.
Sponsorship
European Commission - European Regional Development Fund
Science Foundation Ireland
Type of Material
Conference Publication
Copyright (Published Version)
2018 the Authors
Web versions
Language
English
Status of Item
Peer reviewed
Journal
Brennan, R. B., Beel, J., Byrne, R., Debattista, J. and Crotti Junior, A. (eds.). Proceedings for the 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science Trinity College Dublin
Conference Details
The 26th Irish Conference on Artificial Intelligence and Cognitive Science, Trinity College Dublin, Ireland, 6-7 December 2018
ISSN
1613-0073
This item is made available under a Creative Commons License
File(s)![Thumbnail Image](https://researchrepository.ucd.ie/server/api/core/bitstreams/a0e7fdb6-4e28-4fcd-9c6a-dfde3c8c211e/content)
Loading...
Name
aics_17.pdf
Size
951.6 KB
Format
Adobe PDF
Checksum (MD5)
2d33f85b629236d5d24e018bee35920f
Owning collection