This thesis builds on the observation that robots cannot be programmed to handle any possible situation in the world. Like humans, they need mechanisms to deal with previously unseen situations and unknown objects. One of the skills humans rely on to deal with the unknown is the ability to learn by observing others. This thesis addresses the challenge of enabling a robot to learn from a human instructor. In particular, it is focused on objects. How can a robot find previously unseen objects? How can it track the object with its gaze? How can the object be employed in activities? Throughout this thesis, these questions are addressed with the end goal of allowing a robot to observe a human instructor and learn how to perform an activity. The robot is assumed to know very little about the world and it is supposed to discover objects autonomously. Given a visual input, object hypotheses are formulated by leveraging on common contextual knowledge often used by humans (e.g. gravity, compactness, convexity). Moreover, unknown objects are tracked and their appearance is updated over time since only a small fraction of the object is visible from the robot initially. Finally, object functionality is inferred by looking how the human instructor is manipulating objects and how objects are used in relation to others. All the methods included in this thesis have been evaluated on datasets that are publicly available or that we collected, showing the importance of these learning abilities.