Options
Following the Embedding: Identifying Transition Phenomena in Wav2vec 2.0 Representations of Speech Audio
Date Issued
2024-04-19
Date Available
2024-07-11T09:21:29Z
Abstract
Although transformer-based models have improved the state-of-the-art in speech recognition, it is still not well understood what information from the speech signal these models encode in their latent representations. This study investigates the potential of using labelled data (TIMIT) to probe wav2vec 2.0 embeddings for insights into the encoding and visualisation of speech signal information at phone boundaries. Our experiment involves training probing models to detect phone-specific articulatory features in the hidden layers based on IPA classifications. Furthermore, we propose an analysis framework for visualising the probabilities of the detected articulatory features in every layer and frame vector. Our primary focus is to probe and better understand the structure of speech signal information in the embeddings learned by unsupervised transformers, with a view to contributing to more explainable speech processing systems.
Sponsorship
Science Foundation Ireland
Type of Material
Conference Publication
Publisher
IEEE
Start Page
6685
End Page
6689
Language
English
Status of Item
Peer reviewed
Part of
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Conference Details
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Seoul, Korea, 14-19 April 2024
This item is made available under a Creative Commons License
File(s)
Owning collection
Scopus© citations
1
Acquisition Date
Oct 8, 2024
Oct 8, 2024
Views
74
Last Month
29
29
Acquisition Date
Oct 8, 2024
Oct 8, 2024
Downloads
49
Last Month
12
12
Acquisition Date
Oct 8, 2024
Oct 8, 2024