Now showing 1 - 10 of 21
  • Publication
    Micro-Benchmarking Property Preserving Encryption: Balancing Performance, Security and Functionality
    (IEEE, 2018-06-22) ;
    Practical encryption systems with new and more flexible capabilities have been enabled by recent advances in computing hardware performance and Property Preserving Encryption (PPE) schemes. PPE schemes allow limited and preselected operations to be performed on encrypted data allowing system designers to trade-off between performance, security and functionality. This paper uses micro-benchmark to evaluate three interdependent factors of PPE: performance, security and functionality. The findings validate the efficacy of this technique and provide guidance to application designers and technology evaluators seeking to understand these interdependent relationships for PPE database applications. Experiments were performed using the CryptDB research system. Results validate the previous assessments of CryptDB and provide supplemental detail on performance, security and functionality.
      233
  • Publication
    The Sound of Silence: How Traditional and Deep Learning Based Voice Activity Detection Influences Speech Quality Monitoring
    Real-time speech quality assessment is important for VoIP applications such as Google Hangouts, Microsoft Skype, and Apple Face-Time. Conventionally, subjective listening tests are used to quantify speech quality but are impractical for real-time monitoring scenarios. Objective speech quality assessment metrics can predict human judgement of perceived speech quality. Originally designed for narrow-band telephony applications, ITU-T P.563 is a single-ended or non-intrusive speech quality assessment that predicts speech quality without access to a reference signal. This paper investigates the suitability of P.563 in Voice over Internet Protocol (VoIP) scenarios and specifically the influence of silences on the predicted speech quality. The performance of P.563 was evaluated using TCD-VoIP dataset, containing speech with degradations commonly experienced with VoIP. The predictive capability of P.563 was established by comparing with subjective listening test results. The effect of pre-processing the signal to remove silences using Voice Activity Detection (VAD) was evaluated for five acoustic feature-based VAD algorithms: energy, energy and spectral centroid, Mahalanobis distance, weighted energy, weighted spectral centroid and four Deep learning model-based VAD algorithms: Deep Neural Network, Boosted Deep Neural Network, Long Short-Term Memory and Adaptive context attention model. Analysis shows P.563 prediction accuracy improves for different speech conditions of VoIP when the silences were removed by a VAD. The improvements varied with input content highlighting a potential to switch the VAD used based on the input to create a content aware speech quality monitoring system.
      68
  • Publication
    Fusion confusion: Exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect
    Virtual Reality (VR) is attracting the attention of application developers for purposes beyond entertainment including serious games, health, education and training. By including 3D audio the overall VR quality of experience (QoE) will be enhanced through greater immersion. Better understanding the perception of spatial audio localisation in audio-visual immersion is needed especially in streaming applications where bandwidth is limited and compression is required. This paper explores the impact of audio-visual fusion on speech due to mismatches in a perceived talker location and the corresponding sound using a phenomenon known as the McGurk effect and binaurally rendered Ambisonic spatial audio. The illusion of the McGurk effect happens when a sound of a syllable paired with a video of a second syllable, gives the perception of a third syllable. For instance the sound of /ba/ dubbed in video of /ga/ will lead to the illusion of hearing /da/. Several studies investigated factors involved in the McGurk effect, but a little has been done to understand the audio spatial effect on this illusion. 3D spatial audio generated with Ambisonics has been shown to provide satisfactory QoE with respect to localisation of sound sources which makes it suitable for VR applications but not for audio visual talker scenarios. In order to test the perception of the McGurk effect at different direction of arrival (DOA) of sound, we rendered Ambisonics signals at the azimuth of 0°, 30°, 60°, and 90° to both the left and right of the video source. The results show that the audio visual fusion significantly affects the perception of the speech. Yet the spatial audio does not significantly impact the illusion. This finding suggests that precise localisation of speech audio might not be as critical for speech intelligibility. It was found that a more significant factor was the intelligibility of speech itself.
      287Scopus© Citations 4
  • Publication
    Streaming VR for Immersion: Quality aspects of Compressed Spatial Audio
    (International Society on Virtual Systems and MultiMedia, 2017-11-05) ; ; ; ;
    Delivering a 360-degree soundscape that matches full sphere visuals is an essential aspect of immersive VR. Ambisonics is a full sphere surround sound technique that takes into account the azimuth and elevation of sound sources, portraying source location above and below as well as around the horizontal plane of the listener. In contrast to channel-based methods, ambisonics representation offers the advantage of being independent of a specific loudspeaker set-up. Streaming ambisonics over networks requires efficient encoding techniques that compress the raw audio content without compromising quality of experience (QoE). This work investigates the effect of audio channel compression via the OPUS 1.2 codec on the quality of spatial audio as perceived by listeners. In particular we evaluate the listening quality and localization accuracy of first-order ambisonic audio (FOA) and third-order ambisonic audio (HOA) compressed at various bitrates (i.e. 32, 64, 128 and 128, 256, 512kbps respectively). To assess the impact of OPUS compression on spatial audio a number of subjective listening tests were carried out. The sample set for the tests comprises both recorded and synthetic audio clips with a wide range of time-frequency characteristics. In order to evaluate localization accuracy of compressed audio a number of fixed and dynamic (moving vertically and horizontally) source positions were selected for the test samples. The results show that for compressed spatial audio, perceived quality and localization accuracy are influenced more by compression scheme, bitrate and ambisonic order than by sample content. The insights provided by this work into factors and parameters influencing QoE will guide future development of a objective spatial audio quality metric.
      444
  • Publication
    Assessing the Appetite for Trustworthiness and the Regulation of Artificial Intelligence in Europe
    (CEUR Workshop Proceedings, 2020-12-08) ; ;
    While Artificial Intelligence (AI) is near ubiquitous, there is no effective control framework within which it is being advanced. Without a control framework, trustworthiness of AI is impacted. This negatively affects adoption of AI and reduces its potential for social benefit. For international trade and technology cooperation, effective regulatory frameworks need to be created. This study presents a thematic analysis of national AI strategies for European countries in order to assess the appetite for an AI regulatory framework. A Declaration of Cooperation on AI was signed by EU members and non-members in 2018. Many of the signatories have adopted national strategies on AI. In general there is a high level of homogeneity in the national strategies. An expectation of regulation, in some form, is expressed in the strategies, though a reference to AI specific legislation is not universal. With the exception of some outliers, international cooperation is supported. The shape of effective AI regulation has not been agreed upon by stakeholders but governments are expecting and seeking regulatory frameworks. This indicates an appetite for regulation. The international focus has been on regulating AI solutions and not on the regulation of individuals. The introduction of a professional regulation system may be a complementary or alternative regulatory strategy. Whether the appetite and priorities seen in Europe are mirrored worldwide will require a broader study of the national AI strategy landscape.
      71
  • Publication
    Establishing Waiting Time Thresholds in Interactive Web Mapping Applications for Network QoE Management
    Customer expectations will continue to drive communication service developers to optimise their use of network resources based on user satisfaction. Thus, network platforms need to be remodelled from Quality of Service (QoS) centric to Quality of Experience (QoE) aware platforms. The perceived QoE for interactive web applications such as Google maps or Openstreetmaps is dominated by waiting time, i.e. the perceived time to render the page and map. Studies have explored waiting time estimation for Web QoE applications (e.g. email, downloads, web pages). Perceived waiting time for web mapping applications have been less comprehensively explored. The relationship between perceived waiting time and network QoS is a key QoE management factor to enable QoE aware networks. In this paper, we review the principle of network QoE management and the perception of waiting times. We present experimental design and methodology that facilitate the identification of waiting time thresholds for web applications, using web maps as a use case. We outline our results along with a statistical analysis and discussion interpreting the results and their applications. Finally, we discuss follow-up experiments and how they could be developed and applied in the network QoE management.
      155Scopus© Citations 6
  • Publication
    UnB-AV: An Audio-Visual Database for Multimedia Quality Research
    In this paper we present the UnB-AV database, which is a database of audio-visual sequences and quality scores aimed at multimedia quality research. The database contains a total of 140 source content, with a diverse semantic content, both in terms of the video and audio components. It also contains 2,320 test sequences with audio and video degradations, along with the corresponding quality and content subjective scores. The subjective scores were collected by performing 3 different psycho-physical experiments using the Immersive Methodology. The three experiments have been presented individually in previous studies. In the first experiment, only the video component of the audio-visual sequences were degraded with compression (H.264 and H.265) and transmission (packet-loss and frame freezing) distortions. In the second experiment, only the audio component of the audio-visual sequences were degraded with common audio distortions (clip, echo, chop, and background noise). Finally, in the third experiment the audio and video degradations were combined to degrade both audio and video components. The UnB-AV database is available for download from the site of the Laboratory of Digital Signal Processing of the University of Brasilia and The Consumer Digital Video Library (CDVL).
      116Scopus© Citations 9
  • Publication
    Analyzing the performance of autoencoder-based objective quality metrics on audio-visual content
    (Society for Imaging Science and Technology, 2020-01-30) ; ;
    The development of audio-visual quality models faces a number of challenges, including the integration of audio and video sensory channels and the modeling of their interaction characteristics. Commonly, objective quality metrics estimate the quality of a single component (audio or video) of the content. Machine learning techniques, such as autoencoders, offer as a very promising alternative to develop objective assessment models. This paper studies the performance of a group of autoencoder-based objective quality metrics on a diverse set of audio-visual content. To perform this test, we use a large dataset of audio-visual content (The UnB-AV database), which contains degradations in both audio and video components. The database has accompanying subjective scores collected on three separate subjective experiments. We compare our autoencoder-based methods, which take into account both audio and video components (multi-modal), against several objective (single-modal) audio and video quality metrics. The main goal of this work is to verify the gain or loss in performance of these single-modal metrics, when tested on audio-visual sequences.
      181
  • Publication
    Perceived quality of audio-visual stimuli containing streaming audio degradations
    Multimedia services play an important role in modern human communication. Understanding the impact of multisensory input (audio and video) on perceived quality is important for optimizing the delivery of these services. This work explores the impact of audio degradations on audio-visual quality. With this goal, we present a new dataset that contains audio-visual sequences with distortions only in the audio component (ImAV-Exp2). The degradations in this new dataset correspond to commonly encountered streaming degradations, matching those found in the audio-only TCD-VoIP dataset. Using the Immersive Methodology, we perform a subjective experiment with the Im-AV-Exp2 dataset. We analyze the experimental data and compared the quality scores of the Im-AV-Exp2 and TCDVoIP datasets. Results show that the video component act as a masking factor for certain classes of audio degradations (e.g. echo), showing that there is an interaction of video and audio quality that may depend on content.
      332Scopus© Citations 2
  • Publication
    AMBIQUAL - a full reference objective quality metric for ambisonic spatial audio
    Streaming spatial audio over networks requires efficient encoding techniques that compress the raw audio content without compromising quality of experience. Streaming service providers such as YouTube need a perceptually relevant objective audio quality metric to monitor users' perceived quality and spatial localization accuracy. In this paper we introduce a full reference objective spatial audio quality metric, AMBIQUAL, which assesses both Listening Quality and Localization Accuracy. In our solution both metrics are derived directly from the B-format Ambisonic audio. The metric extends and adapts the algorithm used in ViSQOLAudio, a full reference objective metric designed for assessing speech and audio quality. In particular, Listening Quality is derived from the omnidirectional channel and Localization Accuracy is derived from a weighted sum of similarity from B-format directional channels. This paper evaluates whether the proposed AMBIQUAL objective spatial audio quality metric can predict two factors: Listening Quality and Localization Accuracy by comparing its predictions with results from MUSHRA subjective listening tests. In particular, we evaluated the Listening Quality and Localization Accuracy of First and Third-Order Ambisonic audio compressed with the OPUS 1.2 codec at various bitrates (i.e. 32, 128 and 256, 512kbps respectively). The sample set for the tests comprised both recorded and synthetic audio clips with a wide range of time-frequency characteristics. To evaluate Localization Accuracy of compressed audio a number of fixed and dynamic (moving vertically and horizontally) source positions were selected for the test samples. Results showed a strong correlation (PCC=0.919; Spearman=0.882 regarding Listening Quality and PCC=0.854; Spearman=0.842 regarding Localization Accuracy) between objective quality scores derived from the B-format Ambisonic audio using AMBIQUAL and subjective scores obtained during listening MUSHRA tests. AMBIQUAL displays very promising quality assessment predictions for spatial audio. Future work will optimise the algorithm to generalise and validate it for any Higher Order Ambisonic formats.
      659Scopus© Citations 18