Fusion confusion: Exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect
Files in This Item:
|MMVE_2019___Ambisonic_McGurk_Preprint.pdf||1.02 MB||Adobe PDF||Download|
|Title:||Fusion confusion: Exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect||Authors:||Siddig, Abubakr; Ragano, Alessandro; Jahromi, Hamed Z.; Hines, Andrew||Permanent link:||http://hdl.handle.net/10197/11365||Date:||21-Jun-2019||Online since:||2020-05-05T14:19:20Z||Abstract:||Virtual Reality (VR) is attracting the attention of application developers for purposes beyond entertainment including serious games, health, education and training. By including 3D audio the overall VR quality of experience (QoE) will be enhanced through greater immersion. Better understanding the perception of spatial audio localisation in audio-visual immersion is needed especially in streaming applications where bandwidth is limited and compression is required. This paper explores the impact of audio-visual fusion on speech due to mismatches in a perceived talker location and the corresponding sound using a phenomenon known as the McGurk effect and binaurally rendered Ambisonic spatial audio. The illusion of the McGurk effect happens when a sound of a syllable paired with a video of a second syllable, gives the perception of a third syllable. For instance the sound of /ba/ dubbed in video of /ga/ will lead to the illusion of hearing /da/. Several studies investigated factors involved in the McGurk effect, but a little has been done to understand the audio spatial effect on this illusion. 3D spatial audio generated with Ambisonics has been shown to provide satisfactory QoE with respect to localisation of sound sources which makes it suitable for VR applications but not for audio visual talker scenarios. In order to test the perception of the McGurk effect at different direction of arrival (DOA) of sound, we rendered Ambisonics signals at the azimuth of 0°, 30°, 60°, and 90° to both the left and right of the video source. The results show that the audio visual fusion significantly affects the perception of the speech. Yet the spatial audio does not significantly impact the illusion. This finding suggests that precise localisation of speech audio might not be as critical for speech intelligibility. It was found that a more significant factor was the intelligibility of speech itself.||Type of material:||Conference Publication||Publisher:||ACM||Copyright (published version):||2019 ACM||Keywords:||Human-centered computing; Virtual reality; Ambisonics; McGurk effect||DOI:||10.1145/3304113.3326112||Other versions:||http://www.mmsys2019.org/participation/workshops/mmve/||Language:||en||Status of Item:||Peer reviewed||Is part of:||Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems, MMVE 2019||Conference Details:||The 11th ACM Workshops on Immersive Mixed and Virtual Environment Systems (MMVE 2019), Massachusetts, United States of America, 18-31 June 2019||ISBN:||9781450362993|
|Appears in Collections:||Computer Science Research Collection|
Show full item record
This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.