Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Science
  3. School of Computer Science
  4. Computer Science Research Collection
  5. The Sound of Silence: How Traditional and Deep Learning Based Voice Activity Detection Influences Speech Quality Monitoring
 
  • Details
Options

The Sound of Silence: How Traditional and Deep Learning Based Voice Activity Detection Influences Speech Quality Monitoring

Author(s)
Jaiswal, Rahul  
Hines, Andrew  
Uri
http://hdl.handle.net/10197/11367
Date Issued
2018-12-07
Date Available
2020-05-05T14:51:49Z
Abstract
Real-time speech quality assessment is important for VoIP applications such as Google Hangouts, Microsoft Skype, and Apple Face-Time. Conventionally, subjective listening tests are used to quantify speech quality but are impractical for real-time monitoring scenarios. Objective speech quality assessment metrics can predict human judgement of perceived speech quality. Originally designed for narrow-band telephony applications, ITU-T P.563 is a single-ended or non-intrusive speech quality assessment that predicts speech quality without access to a reference signal. This paper investigates the suitability of P.563 in Voice over Internet Protocol (VoIP) scenarios and specifically the influence of silences on the predicted speech quality. The performance of P.563 was evaluated using TCD-VoIP dataset, containing speech with degradations commonly experienced with VoIP. The predictive capability of P.563 was established by comparing with subjective listening test results. The effect of pre-processing the signal to remove silences using Voice Activity Detection (VAD) was evaluated for five acoustic feature-based VAD algorithms: energy, energy and spectral centroid, Mahalanobis distance, weighted energy, weighted spectral centroid and four Deep learning model-based VAD algorithms: Deep Neural Network, Boosted Deep Neural Network, Long Short-Term Memory and Adaptive context attention model. Analysis shows P.563 prediction accuracy improves for different speech conditions of VoIP when the silences were removed by a VAD. The improvements varied with input content highlighting a potential to switch the VAD used based on the input to create a content aware speech quality monitoring system.
Sponsorship
European Commission - European Regional Development Fund
Science Foundation Ireland
Type of Material
Conference Publication
Copyright (Published Version)
2018 the Authors
Subjects

Speech quality

Voice activity detect...

Web versions
http://ceur-ws.org/Vol-2259/
Language
English
Status of Item
Peer reviewed
Journal
Brennan, R. B., Beel, J., Byrne, R., Debattista, J. and Crotti Junior, A. (eds.). Proceedings for the 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science Trinity College Dublin
Conference Details
The 26th Irish Conference on Artificial Intelligence and Cognitive Science, Trinity College Dublin, Ireland, 6-7 December 2018
ISSN
1613-0073
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
File(s)
Loading...
Thumbnail Image
Name

aics_17.pdf

Size

951.6 KB

Format

Adobe PDF

Checksum (MD5)

2d33f85b629236d5d24e018bee35920f

Owning collection
Computer Science Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

For all queries please contact research.repository@ucd.ie.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement