We propose an appearance based machine learning architecturethat estimates and tracks in real time largerange head yaw given a single non-calibrated monoculargrayscale low resolution image sequence of the head. Thearchitecture is composed of five parallel template detectors,a Radial Basis Function Network and two Kalman filters.The template detectors are five view-specific images of thehead ranging across full profiles in discrete steps of 45 degrees.The Radial Basis Function Network interpolates theresponse vector from the normalized correlation of the inputimage and the 5 template detectors. The first Kalman filtermodels the position and velocity of the response vector infive dimensional space. The second is a running averagethat filters the scalar output of the network. We assume thehead image has been closely detected and segmented, that itundergoes only limited roll and pitch and that there are nosharp contrasts in illumination. The architecture is personindependentand is robust to changes in appearance, gestureand global illumination. The goals of this paper are,one, to measure the performance of the architecture, two,to asses the impact the temporal information gained fromvideo has on accuracy and stability and three, to determinethe effects of relaxing our assumptions.
QC 20160426