Conference Proceeding

A Data Driven Approach to Audiovisual Speech Mapping



Abel A, Marxer R, Hussain A, Barker J, Watt R, Whitmer B & Derleth P (2016) A Data Driven Approach to Audiovisual Speech Mapping. In: Liu C, Hussain A, Luo B, Tan K, Zeng Y & Zhang Z (eds.) Advances in Brain Inspired Cognitive Systems. Lecture Notes in Computer Science, 10023. BICS 2016: International Conference on Brain Inspired Cognitive Systems, Beijing, China, 28.11.2016-30.11.2016. Cham, Switzerland: Springer, pp. 331-342.

The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.

Audiovisual; Speech processing; Speech mapping; ANNs

FundersEngineering and Physical Sciences Research Council
Title of seriesLecture Notes in Computer Science
Number in series10023
Publication date31/12/2016
Publication date online30/11/2016
Place of publicationCham, Switzerland
ISSN of series0302-9743
ConferenceBICS 2016: International Conference on Brain Inspired Cognitive Systems
Conference locationBeijing, China

People (1)


Professor Roger Watt

Professor Roger Watt

Emeritus Professor, Psychology