"The topic of audio-visual speech processing has attracted significant interest over the past 15 years. Relevant research has been focusing on recruiting visual speech information, extracted from the speaker's mouth region, as a means to improve robustness of traditional, unimodal, acoustic-only based speech processing. Nevertheless, to-date, most work has been limited to ideal-case scenarios, whe ...