Speech recognition systems are now used in a wide variety of domains. They have recently been introduced incars for hand-free control of radio, cell-phone and navigation applications. However, due to the ambient noisein the car recognition errors are relatively frequent. This paper tackles the problem of detecting when suchrecognition errors occur from the driver’s reaction. Automatic detection of communication errors in dialoguebasedsystems has been explored extensively in the speech community. The detection is most often based onprosody cues such as intensity and pitch. However, recent perceptual studies indicate that the detection can beimproved significantly if both acoustic and visual modalities are taken into account. To this end, we present aframework for automatic audio-visual detection of communication errors.