Now showing 1 - 10 of 21
  • Publication
    Streaming VR for Immersion: Quality aspects of Compressed Spatial Audio
    (International Society on Virtual Systems and MultiMedia, 2017-11-05) ; ; ; ;
    Delivering a 360-degree soundscape that matches full sphere visuals is an essential aspect of immersive VR. Ambisonics is a full sphere surround sound technique that takes into account the azimuth and elevation of sound sources, portraying source location above and below as well as around the horizontal plane of the listener. In contrast to channel-based methods, ambisonics representation offers the advantage of being independent of a specific loudspeaker set-up. Streaming ambisonics over networks requires efficient encoding techniques that compress the raw audio content without compromising quality of experience (QoE). This work investigates the effect of audio channel compression via the OPUS 1.2 codec on the quality of spatial audio as perceived by listeners. In particular we evaluate the listening quality and localization accuracy of first-order ambisonic audio (FOA) and third-order ambisonic audio (HOA) compressed at various bitrates (i.e. 32, 64, 128 and 128, 256, 512kbps respectively). To assess the impact of OPUS compression on spatial audio a number of subjective listening tests were carried out. The sample set for the tests comprises both recorded and synthetic audio clips with a wide range of time-frequency characteristics. In order to evaluate localization accuracy of compressed audio a number of fixed and dynamic (moving vertically and horizontally) source positions were selected for the test samples. The results show that for compressed spatial audio, perceived quality and localization accuracy are influenced more by compression scheme, bitrate and ambisonic order than by sample content. The insights provided by this work into factors and parameters influencing QoE will guide future development of a objective spatial audio quality metric.
  • Publication
    Micro-Benchmarking Property Preserving Encryption: Balancing Performance, Security and Functionality
    (IEEE, 2018-06-22) ;
    Practical encryption systems with new and more flexible capabilities have been enabled by recent advances in computing hardware performance and Property Preserving Encryption (PPE) schemes. PPE schemes allow limited and preselected operations to be performed on encrypted data allowing system designers to trade-off between performance, security and functionality. This paper uses micro-benchmark to evaluate three interdependent factors of PPE: performance, security and functionality. The findings validate the efficacy of this technique and provide guidance to application designers and technology evaluators seeking to understand these interdependent relationships for PPE database applications. Experiments were performed using the CryptDB research system. Results validate the previous assessments of CryptDB and provide supplemental detail on performance, security and functionality.
  • Publication
    Assessing the Appetite for Trustworthiness and the Regulation of Artificial Intelligence in Europe
    (CEUR Workshop Proceedings, 2020-12-08) ; ;
    While Artificial Intelligence (AI) is near ubiquitous, there is no effective control framework within which it is being advanced. Without a control framework, trustworthiness of AI is impacted. This negatively affects adoption of AI and reduces its potential for social benefit. For international trade and technology cooperation, effective regulatory frameworks need to be created. This study presents a thematic analysis of national AI strategies for European countries in order to assess the appetite for an AI regulatory framework. A Declaration of Cooperation on AI was signed by EU members and non-members in 2018. Many of the signatories have adopted national strategies on AI. In general there is a high level of homogeneity in the national strategies. An expectation of regulation, in some form, is expressed in the strategies, though a reference to AI specific legislation is not universal. With the exception of some outliers, international cooperation is supported. The shape of effective AI regulation has not been agreed upon by stakeholders but governments are expecting and seeking regulatory frameworks. This indicates an appetite for regulation. The international focus has been on regulating AI solutions and not on the regulation of individuals. The introduction of a professional regulation system may be a complementary or alternative regulatory strategy. Whether the appetite and priorities seen in Europe are mirrored worldwide will require a broader study of the national AI strategy landscape.
  • Publication
    The Sound of Silence: How Traditional and Deep Learning Based Voice Activity Detection Influences Speech Quality Monitoring
    Real-time speech quality assessment is important for VoIP applications such as Google Hangouts, Microsoft Skype, and Apple Face-Time. Conventionally, subjective listening tests are used to quantify speech quality but are impractical for real-time monitoring scenarios. Objective speech quality assessment metrics can predict human judgement of perceived speech quality. Originally designed for narrow-band telephony applications, ITU-T P.563 is a single-ended or non-intrusive speech quality assessment that predicts speech quality without access to a reference signal. This paper investigates the suitability of P.563 in Voice over Internet Protocol (VoIP) scenarios and specifically the influence of silences on the predicted speech quality. The performance of P.563 was evaluated using TCD-VoIP dataset, containing speech with degradations commonly experienced with VoIP. The predictive capability of P.563 was established by comparing with subjective listening test results. The effect of pre-processing the signal to remove silences using Voice Activity Detection (VAD) was evaluated for five acoustic feature-based VAD algorithms: energy, energy and spectral centroid, Mahalanobis distance, weighted energy, weighted spectral centroid and four Deep learning model-based VAD algorithms: Deep Neural Network, Boosted Deep Neural Network, Long Short-Term Memory and Adaptive context attention model. Analysis shows P.563 prediction accuracy improves for different speech conditions of VoIP when the silences were removed by a VAD. The improvements varied with input content highlighting a potential to switch the VAD used based on the input to create a content aware speech quality monitoring system.
  • Publication
    Audio Impairment Recognition using a Correlation-Based Feature Representation
    Audio impairment recognition is based on finding noise in audio files and categorising the impairment type. Recently, significant performance improvement has been obtained thanks to the usage of advanced deep learning models. However, feature robustness is still an unresolved issue and it is one of the main reasons why we need powerful deep learning architectures. In the presence of a variety of musical styles, handcrafted features are less efficient in capturing audio degradation characteristics and they are prone to failure when recognising audio impairments and could mistakenly learn musical concepts rather than impairment types. In this paper, we propose a new representation of hand-crafted features that is based on the correlation of feature pairs. We experimentally compare the proposed correlation-based feature representation with a typical raw feature representation used in machine learning and we show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage whilst achieving comparable accuracy.
      164Scopus© Citations 2
  • Publication
    ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric
    Estimation of perceptual quality in audio and speech is possible using a variety of methods. The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio, respectively,) provides improvements upon previous versions, in terms of both design and usage. As an open source C++ library or binary with permissive licensing, ViSQOL can now be deployed beyond the research context into production usage. The feedback from internal production teams at Google has helped to improve this new release, and serves to show cases where it is most applicable, as well as to highlight limitations. The new model is benchmarked against real-world data for evaluation purposes. The trends and direction of future work is discussed.
      170Scopus© Citations 35
  • Publication
    Assessment of QoE for Video and Audio in WebRTC Applications Using Full-Reference Models
    WebRTC is a set of standard technologies that allows exchanging video and audio in real time on the Web. As with other media-related applications, the user-perceived audiovisual quality can be estimated using Quality of Experience (QoE) measurements. This paper analyses the behavior of different objective Full-Reference (FR) models for video and audio in WebRTC applications. FR models calculate the video and audio quality by comparing some original media reference with the degraded signal. To compute these models, we have created an open-source benchmark in which different types of reference media inputs are sent browser to browser while simulating different kinds of network conditions in terms of packet loss and jitter. Our benchmark provides recording capabilities of the impairment WebRTC streams. Then, we use different existing FR metrics for video (VMAF, VIFp, SSIM, MS-SSIM, PSNR, PSNR-HVS, and PSNR-HVS-M) and audio (PESQ, ViSQOL, and POLQA) recordings together with their references. Moreover, we use the same recordings to carry out a subjective analysis in which real users rate the video and audio quality using a Mean Opinion Score (MOS). Finally, we calculate the correlations between the objective and subjective results to find the objective models that better correspond with the subjective outcome, which is considered the ground truth QoE. We find that some of the studied objective models, such as VMAF, VIFp, and POLQA, show a strong correlation with the subjective results in packet loss scenarios.
      184Scopus© Citations 15
  • Publication
    UnB-AV: An Audio-Visual Database for Multimedia Quality Research
    In this paper we present the UnB-AV database, which is a database of audio-visual sequences and quality scores aimed at multimedia quality research. The database contains a total of 140 source content, with a diverse semantic content, both in terms of the video and audio components. It also contains 2,320 test sequences with audio and video degradations, along with the corresponding quality and content subjective scores. The subjective scores were collected by performing 3 different psycho-physical experiments using the Immersive Methodology. The three experiments have been presented individually in previous studies. In the first experiment, only the video component of the audio-visual sequences were degraded with compression (H.264 and H.265) and transmission (packet-loss and frame freezing) distortions. In the second experiment, only the audio component of the audio-visual sequences were degraded with common audio distortions (clip, echo, chop, and background noise). Finally, in the third experiment the audio and video degradations were combined to degrade both audio and video components. The UnB-AV database is available for download from the site of the Laboratory of Digital Signal Processing of the University of Brasilia and The Consumer Digital Video Library (CDVL).
      120Scopus© Citations 9
  • Publication
    You Drive Me Crazy! Interactive QoE Assessment for Telepresence Robot Control
    Telepresence robots (TPRs) are versatile, remotely controlled vehicles that enable physical presence and human-to-human interaction over a distance. Thanks to improving hardware and dropping price points, TPRs enjoy the growing interest in various industries and application domains. Still, a satisfying experience remains key for their acceptance and successful adoption, not only in terms of enabling remote communication with others, but also in terms of managing robot mobility by means of remote navigation. This paper focuses on the latter aspect of remote operation which has been hitherto neglected. We present the results of an extensive subjective study designed to systematically assess remote navigation Quality of Experience (QoE) in the context of using a TPR live over the Internet. Participants were ‘beamed’ into a remote office space and asked to perform characteristic TPR remote operation tasks (driving, turning, parking). Visual and control dimensions of their experience were systematically impaired by altering network characteristics (bandwidth, delay and packet loss rate) in a controlled fashion. Our results show that users can differentiate well between visual and navigation/control aspects of their experience. Furthermore, QoE impairment sensitivity varies with the actual task at hand.
      154Scopus© Citations 4
  • Publication
    Towards Application-Aware Networking: ML-Based End-to-End Application KPI/QoE Metrics Characterization in SDN
    Software Defined Networking (SDN) presents a unique networking paradigm that facilitates the development of network innovations. This paper aims to improve application awareness by incorporating Machine Learning (ML) techniques within an open source SDN architecture. The paper explores how end-to-end application Key Performance Indicator (KPI) metrics can be designed and utilized for the purpose of application awareness in networks. The main goal of this research is to characterize application KPI metrics using a suitable ML approach based on available network data. Resource allocation and network orchestration tasks can be automated based on the findings. A key facet of this research is introducing a novel feedback interface to the SDN's Northbound Interface that receives realtime performance feedback from applications. This paper aim to show how could we exploit the applications feedback to determine useful characteristics of an application's traffic. A mapping application with a defined KPI is used for experimentation. Linear multiple regression is used to derive a characteristic relationship between the application KPI and the network metrics.
      652Scopus© Citations 17