Many robotic tasks such as autonomous navigation,human-machine collaboration, object manipulationand grasping facilitate visual information. Some of themajor reasearch and system design issues in terms of visualsystems are robustness and flexibility.In this paper, we present a number of visual strategiesfor robotic object manipulation tasks in natural, domesticenvironments. Given a complex fetch-and-carry type oftasks, the issues related to the whole detect-approachgrasploop are considered. Our vision system integratesa number of algorithms using monocular and binocularcues to achieve robustness in realistic settings. The cuesare considered and used in connection to both foveal andperipheral vision to provide depth information, segmentthe object(s) of interest in the scene, object recognition,tracking and pose estimation. One important propertyof the system is that the step from object recognitionto pose estimation is completely automatic combiningboth appearance and geometric models. Rather thanconcentrating on the integration issues, our primary goalis to investigate the importance and effect of cameraconfiguration, their number and type, to the choice anddesign of the underlying visual algorithms. Experimentalevaluation is performed in a realistic indoor environmentwith occlusions, clutter, changing lighting and backgroundconditions.