Now showing 1 - 10 of 23
  • Publication
    Exploring Composite Dataset Biases for Heart Sound Classification
    (CEUR Workshop Proceedings, 2020-12-08) ; ;
    In the last few years, the automatic classification of heart sounds has been widely studied as a screening method for heart disease. Some of these studies have achieved high accuracies in heart abnormality prediction. However, for such models to assist clinicians in the detection of heart abnormalities, it is of critical importance that they are generalisable, working on unseen real-world data. Despite the importance of generalisability, the presence of bias in the leading heart sound datasets used in these studies has remained unexplored. In this paper, we explore the presence of potential bias in heart sound datasets. Using a small set of spectral features for heart sound representation, we demonstrate experimentally that it is possible to detect sub-datasets of PhysioNet, the leading dataset of the field, with 98% accuracy. We also show that sensors which have been used to capture recordings of each dataset are likely the main cause of the bias in these datasets. Lack of awareness of this bias works against generalised models for heart sound diagnostics. Our findings call for further research on the bias issue in heart sound datasets and its impact on the generalisability of heart abnormality prediction models.
      19
  • Publication
    Assessing the Appetite for Trustworthiness and the Regulation of Artificial Intelligence in Europe
    (CEUR Workshop Proceedings, 2020-12-08) ; ;
    While Artificial Intelligence (AI) is near ubiquitous, there is no effective control framework within which it is being advanced. Without a control framework, trustworthiness of AI is impacted. This negatively affects adoption of AI and reduces its potential for social benefit. For international trade and technology cooperation, effective regulatory frameworks need to be created. This study presents a thematic analysis of national AI strategies for European countries in order to assess the appetite for an AI regulatory framework. A Declaration of Cooperation on AI was signed by EU members and non-members in 2018. Many of the signatories have adopted national strategies on AI. In general there is a high level of homogeneity in the national strategies. An expectation of regulation, in some form, is expressed in the strategies, though a reference to AI specific legislation is not universal. With the exception of some outliers, international cooperation is supported. The shape of effective AI regulation has not been agreed upon by stakeholders but governments are expecting and seeking regulatory frameworks. This indicates an appetite for regulation. The international focus has been on regulating AI solutions and not on the regulation of individuals. The introduction of a professional regulation system may be a complementary or alternative regulatory strategy. Whether the appetite and priorities seen in Europe are mirrored worldwide will require a broader study of the national AI strategy landscape.
      201
  • Publication
    The Sound of Silence: How Traditional and Deep Learning Based Voice Activity Detection Influences Speech Quality Monitoring
    Real-time speech quality assessment is important for VoIP applications such as Google Hangouts, Microsoft Skype, and Apple Face-Time. Conventionally, subjective listening tests are used to quantify speech quality but are impractical for real-time monitoring scenarios. Objective speech quality assessment metrics can predict human judgement of perceived speech quality. Originally designed for narrow-band telephony applications, ITU-T P.563 is a single-ended or non-intrusive speech quality assessment that predicts speech quality without access to a reference signal. This paper investigates the suitability of P.563 in Voice over Internet Protocol (VoIP) scenarios and specifically the influence of silences on the predicted speech quality. The performance of P.563 was evaluated using TCD-VoIP dataset, containing speech with degradations commonly experienced with VoIP. The predictive capability of P.563 was established by comparing with subjective listening test results. The effect of pre-processing the signal to remove silences using Voice Activity Detection (VAD) was evaluated for five acoustic feature-based VAD algorithms: energy, energy and spectral centroid, Mahalanobis distance, weighted energy, weighted spectral centroid and four Deep learning model-based VAD algorithms: Deep Neural Network, Boosted Deep Neural Network, Long Short-Term Memory and Adaptive context attention model. Analysis shows P.563 prediction accuracy improves for different speech conditions of VoIP when the silences were removed by a VAD. The improvements varied with input content highlighting a potential to switch the VAD used based on the input to create a content aware speech quality monitoring system.
      211
  • Publication
    UnB-AV: An Audio-Visual Database for Multimedia Quality Research
    In this paper we present the UnB-AV database, which is a database of audio-visual sequences and quality scores aimed at multimedia quality research. The database contains a total of 140 source content, with a diverse semantic content, both in terms of the video and audio components. It also contains 2,320 test sequences with audio and video degradations, along with the corresponding quality and content subjective scores. The subjective scores were collected by performing 3 different psycho-physical experiments using the Immersive Methodology. The three experiments have been presented individually in previous studies. In the first experiment, only the video component of the audio-visual sequences were degraded with compression (H.264 and H.265) and transmission (packet-loss and frame freezing) distortions. In the second experiment, only the audio component of the audio-visual sequences were degraded with common audio distortions (clip, echo, chop, and background noise). Finally, in the third experiment the audio and video degradations were combined to degrade both audio and video components. The UnB-AV database is available for download from the site of the Laboratory of Digital Signal Processing of the University of Brasilia and The Consumer Digital Video Library (CDVL).
      193Scopus© Citations 9
  • Publication
    Assessment of QoE for Video and Audio in WebRTC Applications Using Full-Reference Models
    WebRTC is a set of standard technologies that allows exchanging video and audio in real time on the Web. As with other media-related applications, the user-perceived audiovisual quality can be estimated using Quality of Experience (QoE) measurements. This paper analyses the behavior of different objective Full-Reference (FR) models for video and audio in WebRTC applications. FR models calculate the video and audio quality by comparing some original media reference with the degraded signal. To compute these models, we have created an open-source benchmark in which different types of reference media inputs are sent browser to browser while simulating different kinds of network conditions in terms of packet loss and jitter. Our benchmark provides recording capabilities of the impairment WebRTC streams. Then, we use different existing FR metrics for video (VMAF, VIFp, SSIM, MS-SSIM, PSNR, PSNR-HVS, and PSNR-HVS-M) and audio (PESQ, ViSQOL, and POLQA) recordings together with their references. Moreover, we use the same recordings to carry out a subjective analysis in which real users rate the video and audio quality using a Mean Opinion Score (MOS). Finally, we calculate the correlations between the objective and subjective results to find the objective models that better correspond with the subjective outcome, which is considered the ground truth QoE. We find that some of the studied objective models, such as VMAF, VIFp, and POLQA, show a strong correlation with the subjective results in packet loss scenarios.
      298Scopus© Citations 26
  • Publication
    ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric
    Estimation of perceptual quality in audio and speech is possible using a variety of methods. The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio, respectively,) provides improvements upon previous versions, in terms of both design and usage. As an open source C++ library or binary with permissive licensing, ViSQOL can now be deployed beyond the research context into production usage. The feedback from internal production teams at Google has helped to improve this new release, and serves to show cases where it is most applicable, as well as to highlight limitations. The new model is benchmarked against real-world data for evaluation purposes. The trends and direction of future work is discussed.
      385Scopus© Citations 49
  • Publication
    Development of a Speech Quality Database Under Uncontrolled Conditions
    Objective audio quality assessment is preferred to avoid timeconsuming and costly listening tests. The development of objective quality metrics depends on the availability of datasets appropriate to the application under study. Currently, a suitable human-annotated dataset for developing quality metrics in archive audio is missing. Given the online availability of archival recordings, we propose to develop a real-world audio quality dataset. We present a methodology used to curate a speech quality database using the archive recordings from the Apollo Space Program. The proposed procedure is based on two steps: a pilot listening test and an exploratory data analysis. The pilot listening test shows that we can extract audio clips through the control of speech-to-text performance metrics to prevent data repetition. Through unsupervised exploratory data analysis, we explore the characteristics of the degradations. We classify distinct degradations and we study spectral, intensity, tonality and overall quality properties of the data through clustering techniques. These results provide the necessary foundation to support the subsequent development of large-scale crowdsourced datasets for audio quality.
      12Scopus© Citations 3
  • Publication
    Audio Impairment Recognition using a Correlation-Based Feature Representation
    Audio impairment recognition is based on finding noise in audio files and categorising the impairment type. Recently, significant performance improvement has been obtained thanks to the usage of advanced deep learning models. However, feature robustness is still an unresolved issue and it is one of the main reasons why we need powerful deep learning architectures. In the presence of a variety of musical styles, handcrafted features are less efficient in capturing audio degradation characteristics and they are prone to failure when recognising audio impairments and could mistakenly learn musical concepts rather than impairment types. In this paper, we propose a new representation of hand-crafted features that is based on the correlation of feature pairs. We experimentally compare the proposed correlation-based feature representation with a typical raw feature representation used in machine learning and we show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage whilst achieving comparable accuracy.
      301Scopus© Citations 2
  • Publication
    Fusion confusion: Exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect
    Virtual Reality (VR) is attracting the attention of application developers for purposes beyond entertainment including serious games, health, education and training. By including 3D audio the overall VR quality of experience (QoE) will be enhanced through greater immersion. Better understanding the perception of spatial audio localisation in audio-visual immersion is needed especially in streaming applications where bandwidth is limited and compression is required. This paper explores the impact of audio-visual fusion on speech due to mismatches in a perceived talker location and the corresponding sound using a phenomenon known as the McGurk effect and binaurally rendered Ambisonic spatial audio. The illusion of the McGurk effect happens when a sound of a syllable paired with a video of a second syllable, gives the perception of a third syllable. For instance the sound of /ba/ dubbed in video of /ga/ will lead to the illusion of hearing /da/. Several studies investigated factors involved in the McGurk effect, but a little has been done to understand the audio spatial effect on this illusion. 3D spatial audio generated with Ambisonics has been shown to provide satisfactory QoE with respect to localisation of sound sources which makes it suitable for VR applications but not for audio visual talker scenarios. In order to test the perception of the McGurk effect at different direction of arrival (DOA) of sound, we rendered Ambisonics signals at the azimuth of 0°, 30°, 60°, and 90° to both the left and right of the video source. The results show that the audio visual fusion significantly affects the perception of the speech. Yet the spatial audio does not significantly impact the illusion. This finding suggests that precise localisation of speech audio might not be as critical for speech intelligibility. It was found that a more significant factor was the intelligibility of speech itself.
      463Scopus© Citations 6
  • Publication
    Micro-Benchmarking Property Preserving Encryption: Balancing Performance, Security and Functionality
    (IEEE, 2018-06-22) ;
    Practical encryption systems with new and more flexible capabilities have been enabled by recent advances in computing hardware performance and Property Preserving Encryption (PPE) schemes. PPE schemes allow limited and preselected operations to be performed on encrypted data allowing system designers to trade-off between performance, security and functionality. This paper uses micro-benchmark to evaluate three interdependent factors of PPE: performance, security and functionality. The findings validate the efficacy of this technique and provide guidance to application designers and technology evaluators seeking to understand these interdependent relationships for PPE database applications. Experiments were performed using the CryptDB research system. Results validate the previous assessments of CryptDB and provide supplemental detail on performance, security and functionality.
      360