Multimodal system supporting acoustic communication with computers

Audiovisual speech recognition

Voice-based interfaces are more and more popular as a medium for human-computer interaction. When it comes to speech recognition in mobile conditions, e.g. inside a vehicle, noise is a crucial issue, which can deteriorate the efficiency of the system. A possible technique for enhancement of speech recognition accuracy is to supplement the method with visual information in the form of lips image. The MODALITY project researches innovative techniques for improving speech recognition by adding visual signal analysis:

  • recording a multimodal database of speech signals for English language;
  • analysis of images from cameras with high framerate (fps > 100);
  • employment of stereo, thermal imaging and Time-of-Flight cameras;
  • development of feature extraction methods for the purpose of audiovisual speech recognition;
  • assessment of the accuracy of speech recognition based on additional modalities.

Photo: Bartosz Kunka