A mobile robot system operating in a domestic environment has to integrate components from a number of key research areas such as recognition, visual tracking, visual servoing, object grasping, robot localization, etc. There also has to be an underlying methodology to facilitate the integration. We have previously showed that through sequencing of basic skills, provided by the above mentioned competencies, the system has the ability to carry out flexible grasping for fetch and carry tasks in realistic environments. Through careful fusion of reactive and deliberative control and use of multiple sensory modalities a flexible system is achieved. However, our previous work has mostly concentrated on pick-and-place tasks leaving limited place for generalization. Currently, we are interested in more complex tasks such as collaborating and helping humans in their everyday tasks, opening doors and cupboards, building maps of the environment including objects that are automatically recognized by the system. In this paper, we will show some of the current results regarding the above. Most systems for simultaneous localization and mapping (SLAM) build maps that are only used for localizing the robot. Such maps are typically based on grids or different types of features such as point and lines. Here we augment the process with an object recognition system that detects objects in the environment and puts them in the map generated by the SLAM system. The metric map is also split into topological entities corresponding to rooms. In this way the user can command the robot to retrieve a certain object from a certain room.