Now showing 1 - 4 of 4
  • Publication
    How Deep is Your Encoder: An Analysis of Features Descriptors for an Autoencoder-Based Audio-Visual Quality Metric
    The development of audio-visual quality assessment models poses a number of challenges in order to obtain accurate predictions. One of these challenges is the modelling of the complex interaction that audio and visual stimuli have and how this interaction is interpreted by human users. The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a machine learning perspective. The metric receives two sets of audio and video features descriptors and produces a low-dimensional set of features used to predict the audio-visual quality. A basic implementation of NAViDAd was able to produce accurate predictions tested with a range of different audio-visual databases. The current work performs an ablation study on the base architecture of the metric. Several modules are removed or re-trained using different configurations to have a better understanding of the metric functionality. The results presented in this study provided important feedback that allows us to understand the real capacity of the metric's architecture and eventually develop a much better audio-visual quality metric.
      69Scopus© Citations 4
  • Publication
    Analyzing the performance of autoencoder-based objective quality metrics on audio-visual content
    (Society for Imaging Science and Technology, 2020-01-30) ; ;
    The development of audio-visual quality models faces a number of challenges, including the integration of audio and video sensory channels and the modeling of their interaction characteristics. Commonly, objective quality metrics estimate the quality of a single component (audio or video) of the content. Machine learning techniques, such as autoencoders, offer as a very promising alternative to develop objective assessment models. This paper studies the performance of a group of autoencoder-based objective quality metrics on a diverse set of audio-visual content. To perform this test, we use a large dataset of audio-visual content (The UnB-AV database), which contains degradations in both audio and video components. The database has accompanying subjective scores collected on three separate subjective experiments. We compare our autoencoder-based methods, which take into account both audio and video components (multi-modal), against several objective (single-modal) audio and video quality metrics. The main goal of this work is to verify the gain or loss in performance of these single-modal metrics, when tested on audio-visual sequences.
      183
  • Publication
    Perceived quality of audio-visual stimuli containing streaming audio degradations
    Multimedia services play an important role in modern human communication. Understanding the impact of multisensory input (audio and video) on perceived quality is important for optimizing the delivery of these services. This work explores the impact of audio degradations on audio-visual quality. With this goal, we present a new dataset that contains audio-visual sequences with distortions only in the audio component (ImAV-Exp2). The degradations in this new dataset correspond to commonly encountered streaming degradations, matching those found in the audio-only TCD-VoIP dataset. Using the Immersive Methodology, we perform a subjective experiment with the Im-AV-Exp2 dataset. We analyze the experimental data and compared the quality scores of the Im-AV-Exp2 and TCDVoIP datasets. Results show that the video component act as a masking factor for certain classes of audio degradations (e.g. echo), showing that there is an interaction of video and audio quality that may depend on content.
      335Scopus© Citations 2
  • Publication
    UnB-AV: An Audio-Visual Database for Multimedia Quality Research
    In this paper we present the UnB-AV database, which is a database of audio-visual sequences and quality scores aimed at multimedia quality research. The database contains a total of 140 source content, with a diverse semantic content, both in terms of the video and audio components. It also contains 2,320 test sequences with audio and video degradations, along with the corresponding quality and content subjective scores. The subjective scores were collected by performing 3 different psycho-physical experiments using the Immersive Methodology. The three experiments have been presented individually in previous studies. In the first experiment, only the video component of the audio-visual sequences were degraded with compression (H.264 and H.265) and transmission (packet-loss and frame freezing) distortions. In the second experiment, only the audio component of the audio-visual sequences were degraded with common audio distortions (clip, echo, chop, and background noise). Finally, in the third experiment the audio and video degradations were combined to degrade both audio and video components. The UnB-AV database is available for download from the site of the Laboratory of Digital Signal Processing of the University of Brasilia and The Consumer Digital Video Library (CDVL).
      116Scopus© Citations 9