Normal-hearing students (n= 72) performed sentence, consonant, and word identification in either A (auditory), V (visual), or AV (audiovisual) modality. The auditory signal had difficult speech-to-noise relations. Talker (human vs. synthetic), topic (no cue vs. cue-words), and emotion (no cue vs. facially displayed vs. cue-words) were varied within groups. After the first block, effects of modality, face, topic, and emotion on initial appraisal and motivation were assessed. After the entire session, effects of modality on longer-term appraisal and motivation were assessed. The results from both assessments showed that V identification was more positively appraised than A identification. Correlations were tentatively interpreted such that evaluation of self-rated performance possibly depends on subjective standard and is reflected on motivation (if below subjective standard, AV group), or on appraisal (if above subjective standard, A group). Suggestions for further research are presented.