Options
Fusion confusion: Exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect
Date Issued
2019-06-21
Date Available
2020-05-05T14:19:20Z
Abstract
Virtual Reality (VR) is attracting the attention of application developers for purposes beyond entertainment including serious games, health, education and training. By including 3D audio the overall VR quality of experience (QoE) will be enhanced through greater immersion. Better understanding the perception of spatial audio localisation in audio-visual immersion is needed especially in streaming applications where bandwidth is limited and compression is required. This paper explores the impact of audio-visual fusion on speech due to mismatches in a perceived talker location and the corresponding sound using a phenomenon known as the McGurk effect and binaurally rendered Ambisonic spatial audio. The illusion of the McGurk effect happens when a sound of a syllable paired with a video of a second syllable, gives the perception of a third syllable. For instance the sound of /ba/ dubbed in video of /ga/ will lead to the illusion of hearing /da/. Several studies investigated factors involved in the McGurk effect, but a little has been done to understand the audio spatial effect on this illusion. 3D spatial audio generated with Ambisonics has been shown to provide satisfactory QoE with respect to localisation of sound sources which makes it suitable for VR applications but not for audio visual talker scenarios. In order to test the perception of the McGurk effect at different direction of arrival (DOA) of sound, we rendered Ambisonics signals at the azimuth of 0°, 30°, 60°, and 90° to both the left and right of the video source. The results show that the audio visual fusion significantly affects the perception of the speech. Yet the spatial audio does not significantly impact the illusion. This finding suggests that precise localisation of speech audio might not be as critical for speech intelligibility. It was found that a more significant factor was the intelligibility of speech itself.
Type of Material
Conference Publication
Publisher
ACM
Copyright (Published Version)
2019 ACM
Language
English
Status of Item
Peer reviewed
Journal
Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems, MMVE 2019
Conference Details
The 11th ACM Workshops on Immersive Mixed and Virtual Environment Systems (MMVE 2019), Massachusetts, United States of America, 18-31 June 2019
ISBN
9781450362993
This item is made available under a Creative Commons License
File(s)
Loading...
Name
MMVE_2019___Ambisonic_McGurk_Preprint.pdf
Size
1022 KB
Format
Adobe PDF
Checksum (MD5)
42b479aa8a9b0c703e3f049af4be24df
Owning collection