Paper
6 March 2002 Robust speech separation using visually constructed speech signals
Parham Aarabi, Negar Habibi Khameneh
Author Affiliations +
Abstract
A technique to virtually recreate speech signals entirely from the visual lip motions of a speaker is proposed. By using six geometric parameters of the lips obtained from the Tulips1 database, a virtual speech signal is recreated by using a 3.6s audiovisual training segment as a basis for the recreation. It is shown that the virtual speech signal has an envelope that is directly related to the envelope of the original acoustic signal. This visual signal envelope reconstruction is then used as a basis for robust speech separation where all the visual parameters of the different speakers are available. It is shown that, unlike previous signal separation techniques, which required an ideal mixture of independent signals, the mixture coefficients can be very accurately estimated using the proposed technique in even non-ideal situations.
© (2002) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Parham Aarabi and Negar Habibi Khameneh "Robust speech separation using visually constructed speech signals", Proc. SPIE 4731, Sensor Fusion: Architectures, Algorithms, and Applications VI, (6 March 2002); https://doi.org/10.1117/12.458389
Lens.org Logo
CITATIONS
Cited by 6 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Databases

Acoustics

Information visualization

Video

Laser induced plasma spectroscopy

Mouth

RELATED CONTENT

Visual words for lip-reading
Proceedings of SPIE (April 28 2010)
Is automated conversion of video to text a reality?
Proceedings of SPIE (October 30 2012)
MPEG-4 outer-inner lip FAP interpolation
Proceedings of SPIE (March 14 2005)

Back to Top