Fusion confusion: Exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect

Files in This Item:
File Description SizeFormat 
MMVE_2019___Ambisonic_McGurk_Preprint.pdf1.02 MBAdobe PDFDownload
Title: Fusion confusion: Exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect
Authors: Siddig, AbubakrRagano, AlessandroJahromi, Hamed Z.Hines, Andrew
Permanent link: http://hdl.handle.net/10197/11365
Date: 21-Jun-2019
Online since: 2020-05-05T14:19:20Z
Abstract: Virtual Reality (VR) is attracting the attention of application developers for purposes beyond entertainment including serious games, health, education and training. By including 3D audio the overall VR quality of experience (QoE) will be enhanced through greater immersion. Better understanding the perception of spatial audio localisation in audio-visual immersion is needed especially in streaming applications where bandwidth is limited and compression is required. This paper explores the impact of audio-visual fusion on speech due to mismatches in a perceived talker location and the corresponding sound using a phenomenon known as the McGurk effect and binaurally rendered Ambisonic spatial audio. The illusion of the McGurk effect happens when a sound of a syllable paired with a video of a second syllable, gives the perception of a third syllable. For instance the sound of /ba/ dubbed in video of /ga/ will lead to the illusion of hearing /da/. Several studies investigated factors involved in the McGurk effect, but a little has been done to understand the audio spatial effect on this illusion. 3D spatial audio generated with Ambisonics has been shown to provide satisfactory QoE with respect to localisation of sound sources which makes it suitable for VR applications but not for audio visual talker scenarios. In order to test the perception of the McGurk effect at different direction of arrival (DOA) of sound, we rendered Ambisonics signals at the azimuth of 0°, 30°, 60°, and 90° to both the left and right of the video source. The results show that the audio visual fusion significantly affects the perception of the speech. Yet the spatial audio does not significantly impact the illusion. This finding suggests that precise localisation of speech audio might not be as critical for speech intelligibility. It was found that a more significant factor was the intelligibility of speech itself.
Type of material: Conference Publication
Publisher: ACM
Copyright (published version): 2019 ACM
Keywords: Human-centered computingVirtual realityAmbisonicsMcGurk effect
DOI: 10.1145/3304113.3326112
Other versions: http://www.mmsys2019.org/participation/workshops/mmve/
Language: en
Status of Item: Peer reviewed
Is part of: Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems, MMVE 2019
Conference Details: The 11th ACM Workshops on Immersive Mixed and Virtual Environment Systems (MMVE 2019), Massachusetts, United States of America, 18-31 June 2019
ISBN: 9781450362993
Appears in Collections:Computer Science Research Collection

Show full item record

Citations 50

Last Week
Last month
checked on Jun 4, 2020

Page view(s)

Last Week
Last month
checked on Jun 5, 2020


checked on Jun 5, 2020

Google ScholarTM



This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.