In this letter, we investigate learning forward dynamics models and multi-step prediction of state variables (long-term prediction) for contact-rich manipulation. The problems are formulated in the context of model-based reinforcement learning (MBRL). We focus on two aspects-discontinuous dynamics and data-efficiency-both of which are important in the identified scope and pose significant challenges to State-of-the-Art methods. We contribute to closing this gap by proposing a method that explicitly adopts a specific hybrid structure for the model while leveraging the uncertainty representation and data-efficiency of Gaussian process. Our experiments on an illustrative moving block task and a 7-DOF robot demonstrate a clear advantage when compared to popular baselines in low data regimes.
Deep reinforcement learning (DRL) has been successfully used to solve various robotic manipulation tasks. However, most of the existing works do not address the issue of control stability. This is in sharp contrast to the control theory community where the well-established norm is to prove stability whenever a control law is synthesized. What makes traditional stability analysis difficult for DRL are the uninterpretable nature of the neural network policies and unknown system dynamics. In this work, stability is obtained by deriving an interpretable deep policy structure based on the energy shaping control of Lagrangian systems. Then, stability during physical interaction with an unknown environment is established based on passivity. The result is a stability guaranteeing DRL in a model-free framework that is general enough for contact-rich manipulation tasks. With an experiment on a peg-in-hole task, we demonstrate, to the best of our knowledge, the first DRL with stability guarantee on a real robotic manipulator.
We present an automatic approach for the task of reconstructing a 2-D floor plan from unstructured point clouds of building interiors. Our approach emphasizes accurate and robust detection of building structural elements and, unlike previous approaches, does not require prior knowledge of scanning device poses. The reconstruction task is formulated as a multiclass labeling problem that we approach using energy minimization. We use intuitive priors to define the costs for the energy minimization problem and rely on accurate wall and opening detection algorithms to ensure robustness. We provide detailed experimental evaluation results, both qualitative and quantitative, against state-of-the-art methods and labeled ground-truth data.
Algorithms for autonomous robotic exploration usually focus on optimizing time and coverage, often in a greedy fashion. However, obstacle inflation is conservative and might limit mapping capabilities and even prevent the robot from moving through narrow, important places. This letter proposes a method to influence the manner the robot moves in the environment by taking into consideration a user-defined spatial preference formulated in a fragment of signal temporal logic (STL). We propose to guide the motion planning toward minimizing the violation of such preference through a cost function that integrates the quantitative semantics, i.e., robustness of STL. To demonstrate the effectiveness of the proposed approach, we integrate it into the autonomous exploration planner (AEP). Results from simulations and real-world experiments are presented, highlighting the benefits of our approach.
Rich geometric understanding of the world is an important component of many robotic applications such as planning and manipulation. In this paper, we present a modular pipeline for pose and shape estimation of objects from RGB-D images given their category. The core of our method is a generative shape model, which we integrate with a novel initialization network and a differentiable renderer to enable 6D pose and shape estimation from a single or multiple views. We investigate the use of discretized signed distance fields as an efficient shape representation for fast analysis-by-synthesis optimization. Our modular framework enables multi-view optimization and extensibility. We demonstrate the benefits of our approach over state-of-the-art methods in several experiments on both synthetic and real data. We open-source our approach at https://github.com/roym899/sdfest.
The purpose of this benchmark is to evaluate the planning and control aspects of robotic in-hand manipulation systems. The goal is to assess the systems ability to change the pose of a hand-held object by either using the fingers, environment or a combination of both. Given an object surface mesh from the YCB data-set, we provide examples of initial and goal states (i.e. static object poses and fingertip locations) for various in-hand manipulation tasks. We further propose metrics that measure the error in reaching the goal state from a specific initial state, which, when aggregated across all tasks, also serves as a measure of the systems in-hand manipulation capability. We provide supporting software, task examples, and evaluation results associated with the benchmark.
Accurate and robust extrinsic calibration is necessary for deploying autonomous systems which need multiple sensors for perception. In this letter, we present a robust system for real-time extrinsic calibration of multiple lidars in vehicle base framewithout the need for any fiducialmarkers or features. We base our approach on matching absolute GNSS (Global Navigation Satellite System) and estimated lidar poses in real-time. Comparing rotation components allows us to improve the robustness of the solution than traditional least-square approach comparing translation components only. Additionally, instead of comparing all corresponding poses, we select poses comprising maximum mutual information based on our novel observability criteria. This allows us to identify a subset of the poses helpful for real-time calibration. We also provide stopping criteria for ensuring calibration completion. To validate our approach extensive tests were carried out on data collected using Scania test vehicles (7 sequences for a total of approximate to 6.5 Km). The results presented in this letter show that our approach is able to accurately determine the extrinsic calibration for various combinations of sensor setups.
To develop robust manipulation policies, quantifying robustness is essential. Evaluating robustness in general manipulation, nonetheless, poses significant challenges due to complex hybrid dynamics, combinatorial explosion of possible contact interactions, global geometry, etc. This paper introduces an approach for evaluating manipulation robustness through energy margins and caging-based analysis. Our method assesses manipulation robustness by measuring the energy margin to failure and extends traditional caging concepts for dynamic manipulation. This global analysis is facilitated by a kinodynamic planning framework that naturally integrates global geometry, contact changes, and robot compliance. We validate the effectiveness of our approach in simulation and real-world experiments of multiple dynamic manipulation scenarios, highlighting its potential to predict manipulation success and robustness.
We propose UFOExplorer, a fast and efficient exploration method that scales well with the environment size. An exploration paradigm driven by map updates is proposed to enable the robot to react quicker and to always move towards the optimal exploration goal. For each map update, a dense graph-based planning structure is updated and extended. The planning structure is then used to generate a path using a simple exploration heuristic, which guides the robot towards the closest exploration goal. The proposed method scales well with the environment size, as the planning cost is amortized when updating and extending the planning structure. The simple exploration heuristic performs on par with the most recent state-of-the-art methods in smaller environments and outperforms them in larger environments, both in terms of exploration speed and computational efficiency. The implementation of the method is made available for future research.
3D models are an essential part of many robotic applications. In applications where the environment is unknown a-priori, or where only a part of the environment is known, it is important that the 3D model can handle the unknown space efficiently. Path planning, exploration, and reconstruction all fall into this category. In this letter we present an extension to OctoMap which we call UFOMap. UFOMap uses an explicit representation of all three states in the map, i.e., unknown, free, and occupied. This gives, surprisingly, a more memory efficient representation. We provide methods that allow for significantly faster insertions into the octree. Furthermore, UFOMap supports fast queries based on occupancy state using so called indicators and based on location by exploiting the octree structure and bounding volumes. This enables real-time colored octree mapping at high resolution (below 1 cm). UFOMap is contributed as a C++ library that can be used standalone but is also integrated into ROS.
In this letter, we tackle the challenge of predicting the unseen walls of a partially observed environment as a set of 2D line segments, conditioned on occupancy grids integrated along the trajectory of a 360(degrees) LIDAR sensor. A dataset of such occupancy grids and their corresponding target wall segments is collected by navigating a virtual robot between a set of randomly sampled waypoints in a collection of office-scale floor plans from a university campus. The line segment prediction task is formulated as an autoregressive sequence prediction task, and an attention-based deep network is trained on the dataset. The sequence-based autoregressive formulation is evaluated through predicted information gain, as in frontier-based autonomous exploration, demonstrating significant improvements over both non-predictive estimation and convolution-based image prediction found in the literature. Ablations on key components are evaluated, as well as sensor range and the occupancy grid's metric area. Finally, model generality is validated by predicting walls in a novel floor plan reconstructed on-the-fly in a real-world office environment.
In this article we present and evaluate a system which allows a mobile robot to autonomously detect, model and re-recognize objects in everyday environments. Whilst other systems have demonstrated one of these elements, to our knowledge we present the first system which is capable of doing all of these things, all without human interaction, in normal indoor scenes. Our system detects objects to learn by modelling the static part of the environment and extracting dynamic elements. It then creates and executes a view plan around a dynamic element to gather additional views for learning. Finally these views are fused to create an object model. The performance of the system is evaluated on publicly available datasets as well as on data collected by the robot in both controlled and uncontrolled scenarios.
Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware.
Cloth manipulation is a challenging task that, despite its importance, has received relatively little attention compared to rigid object manipulation. In this letter, we provide three benchmarks for evaluation and comparison of different approaches towards three basic tasks in cloth manipulation: spreading a tablecloth over a table, folding a towel, and dressing. The tasks can be executed on any bimanual robotic platform and the objects involved in the tasks are standardized and easy to acquire. We provide several complexity levels for each task, and describe the quality measures to evaluate task execution. Furthermore, we provide baseline solutions for all the tasks and evaluate them according to the proposed metrics.
We present a reinforcement learning based framework for human-centered collaborative systems. The framework is proactive and balances the benefits of timely actions with the risk of taking improper actions by minimizing the total time spent to complete the task. The framework is learned end-to-end in an unsupervised fashion addressing the perception uncertainties and decision making in an integrated manner. The framework is shown to provide more time-efficient coordination between human and robot partners on an example task of packaging compared to alternatives for which perception and decision-making systems are learned independently, using supervised learning. Two important benefits of the proposed approach are that tedious annotation of motion data is avoided, and the learning is performed on-line.
Planning algorithms are powerful at solving long-horizon decision-making problems but require that environment dynamics are known. Model-free reinforcement learning has recently been merged with graph-based planning to increase the robustness of trained policies in state-space navigation problems. Recent ideas suggest to use planning in order to provide intermediate waypoints guiding the policy in long-horizon tasks. Yet, it is not always practical to describe a problem in the setting of state-to-state navigation. Often, the goal is defined by one or multiple disjoint sets of valid states or implicitly using an abstract task description. Building upon previous efforts, we introduce a novel algorithm called Planning-Augmented Hierarchical Reinforcement Learning (PAHRL) which translates the concept of hybrid planning/RL to such problems with implicitly defined goal. Using a hierarchical framework, we divide the original task, formulated as a Markov Decision Process (MDP), into a hierarchy of shorter horizon MDPs. Actor-critic agents are trained in parallel for each level of the hierarchy. During testing, a planner then determines useful subgoals on a state graph constructed at the bottom level of the hierarchy. The effectiveness of our approach is demonstrated for a set of continuous control problems in simulation including robot arm reaching tasks and the manipulation of a deformable object.
We present a tracking controller for mobile multi-robot systems based on dual quaternion pose representations applied to formations of robots in a leader-follower configuration, by using a cluster-space state approach. The proposed controller improves system performance with respect to previous works by reducing steady-state tracking errors. The performance is evaluated through experimental field tests with a formation of an unmanned ground vehicle (UGV) and an unmanned aerial vehicle (UAV), as well as a formation of two UAVs.
A solution to the perimeter surveillance problem for one intruder and multiple surveillance robots based on set-invariance is presented. The surveillance robots, constrained to move on the perimeter of a polygonal region, intercept the intruders as they cross the perimeter. The proposed closed-form control laws only depend on the maximum speed of the robots and their distances to the endpoints of the line segments that make the sides of the polygon. The presented results allow for groups of robots with members of different characteristics, such as size and maximum speed, to defend polygonal regions. Simulations are used to show the application and effectiveness of the theoretical results.
We consider the problem of finding grasp contacts that are optimal under a given grasp quality function on arbitrary objects. Our approach formulates a framework for contact-level grasping as a path finding problem in the space of supercontact grasps. The initial supercontact grasp contains all grasps and in each step along a path grasps are removed. For this, we introduce and formally characterize search space structure and cost functions underwhich minimal cost paths correspond to optimal grasps. Our formulation avoids expensive exhaustive search and reduces computational cost by several orders of magnitude. We present admissible heuristic functions and exploit approximate heuristic search to further reduce the computational cost while maintaining bounded suboptimality for resulting grasps. We exemplify our formulation with point-contact grasping for which we define domain specific heuristics and demonstrate optimality and bounded suboptimality by comparing against exhaustive and uniform cost search on example objects. Furthermore, we explain how to restrict the search graph to satisfy grasp constraints for modeling hand kinematics. We also analyze our algorithm empirically in terms of created and visited search states and resultant effective branching factor.
A novel robust and slip-aware speed estimation framework is developed and experimentally verified for mobile robot navigation by designing proprioceptive robust observers at each wheel. The observer for each corner is proved to be consistent, in the sense that it can provide an upper bound of the mean square estimation error (MSE) timely. Under proper conditions, the MSE is proved to be uniformly bounded. A covariance intersection fusion method is used to fuse the wheel-level estimates, such that the updated estimate remains consistent. The estimated slips at each wheel are then used for a robust consensus to improve the reliability of speed estimation in harsh and combined-slip scenarios. As confirmed by indoor and outdoor experiments under different surface conditions, the developed framework addresses state estimation challenges for mobile robots that experience uneven torque distribution or large slip. The novel proprioceptive observer can also be integrated with existing tightly-coupled visual-inertial navigation systems.
Planning smooth and energy-efficient paths for wheeled mobile robots is a central task for applications ranging from autonomous driving to service and intralogistic robotics. Over the past decades, several sampling-based motion-planning algorithms, extend functions and post-smoothing algorithms have been introduced for such motion-planning systems. Choosing the best combination of components for an application is a tedious exercise, even for expert users. We therefore present Bench-MR, the first open-source motion-planning benchmarking framework designed for sampling-based motion planning for nonholonomic, wheeled mobile robots. Unlike related software suites, Bench-MR is an easy-to-use and comprehensive benchmarking framework that provides a large variety of sampling-based motion-planning algorithms, extend functions, collision checkers, post-smoothing algorithms and optimization criteria. It aids practitioners and researchers in designing, testing, and evaluating motion-planning systems, and comparing them against the state of the art on complex navigation scenarios through many performance metrics. Through several experiments, we demonstrate how Bench-MR can be used to gain extensive insights from the benchmarking results it generates.
The problem of robot joint position and velocity tracking with prescribed performance guarantees is considered. The proposed controller is able to guarantee a prescribed transient and steady state behavior for the position and the velocity tracking errors without utilizing either the robot dynamic model or any approximation structures. Its performance is demonstrated and assessed via experiments with a KUKA LWR4+ arm.
Behavior trees offer a modular approach to developing an overall controller from a set of sub-controllers that solve different sub-problems. These sub-controllers can be created using various methods, such as classical model-based control or reinforcement learning (RL). To achieve the overall goal, each sub-controller must satisfy the preconditions of the next sub-controller. Although every sub-controller may be locally optimal in achieving the preconditions of the next one, given some performance metric like completion time, the overall controller may still not be optimal with respect to the same performance metric. In this paper, we demonstrate how the performance of the overall controller can be improved if we use approximations of value functions to inform the design of a sub-controller of the needs of the next controller. We also show how, under certain assumptions, this leads to a globally optimal controller when the process is executed on all sub-controllers. Finally, this result also holds when some of the sub-controllers are already given. This means that if we are constrained to use some existing sub-controllers, the overall controller will be globally optimal, given this constraint.
Reinforcement learning (RL) has had its fair share of success in contact-rich manipulation tasks but it still lags behind in benefiting from advances in robot control theory such as impedance control and stability guarantees. Recently, the concept of variable impedance control (VIC) was adopted into RL with encouraging results. However, the more important issue of stability remains unaddressed. To clarify the challenge in stable RL, we introduce the term all-the-time-stability that unambiguously means that every possible rollout should be stability certified. Our contribution is a model-free RL method that not only adopts VIC but also achieves all-the-time-stability. Building on a recently proposed stable VIC controller as the policy parameterization, we introduce a novel policy search algorithm that is inspired by Cross-Entropy Method and inherently guarantees stability. Our experimental studies confirm the feasibility and usefulness of stability guarantee and also features, to the best of our knowledge, the first successful application of RL with all-the-time-stability on the benchmark problem of peg-in-hole.
We propose to leverage a real-world, human activity RGB dataset to teach a robot <italic>Task-Oriented Grasping</italic> (TOG). We develop a model that takes as input an RGB image and outputs a hand pose and configuration as well as an object pose and a shape. We follow the insight that jointly estimating hand and object poses increases accuracy compared to estimating these quantities independently of each other. Given the trained model, we process an RGB dataset to automatically obtain the data to train a TOG model. This model takes as input an object point cloud and outputs a suitable region for task-specific grasping. Our ablation study shows that training an object pose predictor with the hand pose information (and vice versa) is better than training without this information. Furthermore, our results on a real-world dataset show the applicability and competitiveness of our method over state-of-the-art. Experiments with a robot demonstrate that our method can allow a robot to preform TOG on novel objects.
We present AdaFold, a model-based feedback-loop framework for optimizing folding trajectories. AdaFold extracts a particle-based representation of cloth from RGB-D images and feeds back the representation to a model predictive control to re-plan folding trajectory at every time-step. A key component of AdaFold that enables feedback-loop manipulation is the use of semantic descriptors extracted from geometric features. These descriptors enhance the particle representation of the cloth to distinguish between ambiguous point clouds of differently folded cloths. Our experiments demonstrate AdaFold's ability to adapt folding trajectories of cloths with varying physical properties and generalize from simulated training to real-world execution.
Despite the successes of deep reinforcement learning (RL), it is still challenging to obtain safe policies. Formal verifi- cation approaches ensure safety at all times, but usually overly restrict the agent’s behaviors, since they assume adversarial behavior of the environment. Instead of assuming adversarial behavior, we suggest to focus on perceived safety instead, i.e., policies that avoid undesired behaviors while having a desired level of conservativeness. To obtain policies that are perceived as safe, we propose a shield synthesis framework with two distinct loops: (1) an inner loop that trains policies with a set of actions that is constrained by shields whose conservativeness is parameterized, and (2) an outer loop that presents example rollouts of the policy to humans and collects their feedback to update the parameters of the shields in the inner loop. We demonstrate our approach on a RL benchmark of Lunar landing and a scenario in which a mobile robot navigates around humans. For the latter, we conducted two user studies to obtain policies that were perceived as safe. Our results indicate that our framework converges to policies that are perceived as safe, is robust against noisy feedback, and can query feedback for multiple policies at the same time.
Three-dimensional (3-D) tracking of microrobots is demonstrated using stereo holographic projections. The method detects the lateral position of a microrobot in two orthogonal in-line holography images and triangulates to obtain the 3-D position in an observable volume of 1 cm(3). The algorithm is capable of processing holograms at 25 Hz on a desktop computer and has an accuracy of 24.7 mu mand 15.2 mu min the two independent directions and 7.3 mu m in the shared direction of the two imaging planes. This is the first use of stereo holograms to track an object in real time and does not rely on the computationally expensive process of holographic reconstruction.
While feature association to a global map has significant benefits, to keep the computations from growing exponentially, most lidar-based odometry and mapping methods opt to associate features with local maps at one voxel scale. Taking advantage of the fact that surfels (surface elements) at different voxel scales can be organized in a tree-like structure, we propose an octree-based global map of multi-scale surfels that can be updated incrementally. This alleviates the need for recalculating, for example, a k-d tree of the whole map repeatedly. The system can also take input from a single or a number of sensors, reinforcing the robustness in degenerate cases. We also propose a point-to-surfel (PTS) association scheme, continuous-time optimization on PTS and IMU preintegration factors, along with loop closure and bundle adjustment, making a complete framework for Lidar-Inertial continuous-time odometry and mapping. Experiments on public and in-house datasets demonstrate the advantages of our system compared to other state-of-the-art methods.
A key challenge in robotics is the efficient generation of optimal robot motion with safety guarantees in cluttered environments. Recently, deterministic optimal sampling-based motion planners have been shown to achieve good performance towards this end, in particular in terms of planning efficiency, final solution cost, quality guarantees as well as non-probabilistic completeness. Yet their application is still limited to relatively simple systems (i.e., linear, holonomic, Euclidean state spaces). In this work, we extend this technique to the class of symmetric and optimal driftless systems by presenting Dispertio, an offline dispersion optimization technique for computing sampling sets, aware of differential constraints, for sampling-based robot motion planning. We prove that the approach, when combined with PRM*, is deterministically complete and retains asymptotic optimality. Furthermore, in our experiments we show that the proposed deterministic sampling technique outperforms several baselines and alternative methods in terms of planning efficiency and solution cost.
This letter describes an extension of the classic Lazy Probabilistic Roadmaps algorithm (Lazy PRM), which results from pairing PRM and a novel Branch-And-Cut (BC) algorithm. Cuts are dynamically generated constraints that are imposed on minimum cost paths over the geometric graphs selected by PRM. Cuts eliminate paths that cannot be mapped into smooth plans that satisfy suitably defined geometric and differential constraints. We generate candidate smooth plans by fitting splines to vertices in a minimum-cost path. Plans are validated with a recently proposed algorithm that maps them into finite traces, without the need to choose a fixed discretization step. A trace records the exact sequence of constraint boundaries crossed by the plan, modulo arithmetic precision. We evaluate several planners using our methods over the recently proposed BARN benchmark, reporting evidence of the scalability of our approach.
In this letter, we propose a control strategy forhuman-robot cooperative manipulation under the ambiguous collaboration of a human agent. To cope with this uncertainty, an adaptive update law inferring the human contribution to the system dynamics from basic perception feedback through the human arm stiffness is used. Furthermore, the robustness and accuracy of the approach is enhanced by redundantly tracking the shared load references and its associated end-effector position references. To validate the control strategy, both theoretical Lyapunov stability analysis and experimental results –employing two robot manipulators with 6 degrees of freedom under external disturbances– are provided.
Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment, robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Reachability-based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions. In simulation, BRSL outperforms other state-of-the-art safe RL methods on a Turtlebot 3, a quadrotor, a trajectory-tracking point mass, and a hexarotor in wind with an unsafe set adjacent to the area of highest reward.
Exploration is an important aspect of robotics, whether it is for mapping, rescue missions, or path planning in an unknown environment. Frontier Exploration planning (FEP) and Receding Horizon Next-Best-View planning (RH-NBVP) are two different approaches with different strengths and weaknesses. FEP explores a large environment consisting of separate regions with ease, but is slow at reaching full exploration due to moving back and forth between regions. RH-NBVP shows great potential and efficiently explores individual regions, but has the disadvantage that it can get stuck in large environments not exploring all regions. In this letter, we present a method that combines both approaches, with FEP as a global exploration planner and RH-NBVP for local exploration. We also present techniques to estimate potential information gain faster, to cache previously estimated gains and to exploit these to efficiently estimate new queries.
We address the problem of cooperative manipulation of an object whose tasks are specified by a Signal Temporal Logic (STL) formula. We employ the Prescribed Performance Control (PPC) methodology to guarantee predefined transient and steady-state performance on the object trajectory in order to satisfy the STL formula. More specifically, we first provide a way that translates the problem of satisfaction of an STL task to the problem of state evolution within a user-defined time-varying funnel. We then design a control strategy for the robotic agents that guarantees compliance with this funnel. The control strategy is decentralized, in the sense that each agent calculates its own control signal, and does not use any information on the agents' and object's dynamic terms, which are assumed to be unknown. We experimentally verify the results on two manipulator arms, cooperatively working to manipulate an object based on a STL formula.
The manuscript "Tactile-based Blind Grasping: A Discrete-Time Object Manipulation Controller for Robotic Hands"; contains results that are dependent on two references. Since publication, we realized that one of these references has inconsistent results regarding continuity of quadratic programs. The second reference has updated conditions that are not completely reflected in the original manuscript. We take the time here to replace the inconsistencies and update this manuscript to preserve the theoretical guarantees of the proposed controller. We note that this correction does not change the proposed control law, and is made to formally ensure that the guarantees hold.
This study develops a multi-hypothesis extended Kalman filter (MH-EKF) for the online estimation of the bending angle of a 3D printed soft sensor attached to soft actuators. Despite the advantage of compliance and low interference, the 3D printed soft sensor is susceptible to the hysteresis property and nonlinear effects. Improving measurement accuracy for sensors with hysteresis is a common challenge. Current studies mainly apply complex models and highly nonlinear functions to characterize the hysteresis, requiring a complicated parameter identification process and challenging real-time applications. This study enhances the model simplicity and the real-time performance for the hysteresis characterization. We identify the hysteresis by combining multiple polynomial functions and improving the sensor estimation with the proposed MH-EKF. We examine the performance of the filter in the real-time closed-loop control system. Compared with the baseline methods, the proposed approach shows improvements in the estimation accuracy with low computational complexity.
In this letter, we present a deep learning-based network, GCNv2, for generation of keypoints and descriptors. GCNv2 is built on our previous method, GCN, a network trained for 3D projective geometry. GCNv2 is designed with a binary descriptor vector as the ORB feature so that it can easily replace ORB in systems such as ORB-SLAM2. GCNv2 significantly improves the computational efficiency over GCN that was only able to run on desktop hardware. We show how a modified version of ORBSLAM2 using GCNv2 features runs on a Jetson TX2, an embedded low-power platform. Experimental results show that GCNv2 retains comparable accuracy as GCN and that it is robust enough to use for control of a flying drone. Source code is available at: https://github.com/jiexiong2016/GCNv2_SLAM.
In this paper, we propose a new learning scheme for generating geometric correspondences to be used for visual odometry. A convolutional neural network (CNN) combined with a recurrent neural network (RNN) are trained together to detect the location of keypoints as well as to generate corresponding descriptors in one unified structure. The network is optimized by warping points from source frame to reference frame, with a rigid body transform. Essentially, learning from warping. The overall training is focused on movements of the camera rather than movements within the image, which leads to better consistency in the matching and ultimately better motion estimation. Experimental results show that the proposed method achieves better results than both related deep learning and hand crafted methods. Furthermore, as a demonstration of the promise of our method we use a naive SLAM implementation based on these keypoints and get a performance on par with ORB-SLAM.
In this letter, we proposed a new deep learning based dense monocular simultaneous localization and mapping (SLAM) method. Compared to existing methods, the proposed framework constructs a dense three-dimensional (3-D) model via a sparse to dense mapping using learned surface normals. With single view learned depth estimation as prior for monocular visual odometry, we obtain both accurate positioning and high-quality depth reconstruction. The depth and normal are predicted by a single network trained in a tightly coupled manner. Experimental results show that our method significantly improves the performance of visual tracking and depth prediction in comparison to the state-of-the-art in deep monocular dense SLAM.
Rao-Blackwellized particle filter (RBPF) SLAM solutions with Gaussian Process (GP) maps can both maintain multiple hypotheses of a vehicle pose estimate and perform implicit data association for loop closure detection in continuous terrain representations. Both qualities are of particular interest for SLAM with autonomous underwater vehicles (AUVs) in the open sea, where distinguishable features are scarce. However, the applicability of GP regression to parallel, real-time mapping in an RBPF framework remains limited by the size of the area to survey and the computational cost of the GP training. To overcome these constraints, in this letter we propose the adaption of Stochastic Variational GP (SVGP) regression to online mapping in combination with a novel, efficient particle trajectory storing in the RBPF. We show how the resulting RBPF-SVGP framework can achieve real-time performance in an embedded platform on two AUV surveys containing millions of points. We further test the framework on a live mission on an AUV and we make the implementation publicly available.
Registration methods for point clouds have become a key component of many SLAM systems on autonomous vehicles. However, an accurate estimate of the uncertainty of such registration is a key requirement to a consistent fusion of this kind of measurements in a SLAM filter. This estimate, which is normally given as a covariance in the transformation computed between point cloud reference frames, has been modelled following different approaches, among which the most accurate is considered to be the Monte Carlo method. However, a Monte Carlo approximation is cumbersome to use inside a time-critical application such as online SLAM. Efforts have been made to estimate this covariance via machine learning using carefully designed features to abstract the raw point clouds. However, the performance of this approach is sensitive to the features chosen. We argue that it is possible to learn the features along with the covariance by working with the raw data and thus we propose a new approach based on PointNet. In this work, we train this network using the KL divergence between the learned uncertainty distribution and one computed by the Monte Carlo method as the loss. We test the performance of the general model presented applying it to our target use-case of SLAM with an autonomous underwater vehicle (AUV) restricted to the 2-dimensional registration of 3D bathymetric point clouds.
Gaussian processes (GPs) are becoming a standard tool to build terrain representations thanks to their capacity to model map uncertainty. This effectively yields a reliability measure of the areas of the map, which can be directly utilized by Bayes filtering algorithms in robot localization problems. A key factor is that this map uncertainty can incorporate the noise intrinsic to the terrain surveying process through the GPs ability to train on uncertain inputs (UIs). However, existing techniques to build GP maps with UIs in a tractable manner are restricted in the form and degree of the input noise. In this letter, we propose a flexible and efficient framework to build large-scale GP maps with UIs based on Stochastic Variational GPs and Monte Carlo sampling of the UIs distributions. We validate our mapping approach on a large bathymetric survey collected with an autonomous underwater vehicle (AUV) and analyze its performance against the use of deterministic inputs (DI). Finally, we show how using UI SVGP maps yields more accurate particle filter localization results than DI SVGP on a real AUV mission over an entirely predicted area.
Ensuring safety in real-world robotic systems is often challenging due to unmodeled disturbances and noisy sensors. To account for such stochastic uncertainties, many robotic systems leverage probabilistic state estimators such as Kalman filters to obtain a robot's belief, i.e. a probability distribution over possible states. We propose belief control barrier functions (BCBFs) to enable risk-aware control, leveraging all information provided by state estimators. This allows robots to stay in predefined safety regions with desired confidence under these stochastic uncertainties. BCBFs are general and can be applied to a variety of robots that use extended Kalman filters as state estimator. We demonstrate BCBFs on a quadrotor that is exposed to external disturbances and varying sensing conditions. Our results show improved safety compared to traditional state-based approaches while allowing control frequencies of up to 1 kHz.
Interactive perception enables robots to manipulate the environment and objects to bring them into states that benefit the perception process. Deformable objects pose challenges to this due to manipulation difficulty and occlusion in vision-based perception. In this work, we address such a problem with a setup involving both an active camera and an object manipulator. Our approach is based on a sequential decision-making framework and explicitly considers the motion regularity and structure in coupling the camera and manipulator. We contribute a method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration. The effectiveness of the framework and approach are validated in both a simulation and a real dual-arm robot setup. Our results confirm the necessity of an active camera and coordinative motion in interactive perception for deformable objects.
We present an efficient multi-sensor odometry system for mobile platforms that jointly optimizes visual, lidar, and inertial information within a single integrated factor graph. This runs in real-time at full framerate using fixed lag smoothing. To perform such tight integration, a new method to extract 3D line and planar primitives from lidar point clouds is presented. This approach overcomes the suboptimality of typical frame-to-frame tracking methods by treating the primitives as landmarks and tracking them over multiple scans. True integration of lidar features with standard visual features and IMU is made possible using a subtle passive synchronization of lidar and camera frames. The lightweight formulation of the 3D features allows for real-time execution on a single CPU. Our proposed system has been tested on a variety of platforms and scenarios, including underground exploration with a legged robot and outdoor scanning with a dynamically moving handheld device, for a total duration of 96 min and 2.4 km traveled distance. In these test sequences, using only one exteroceptive sensor leads to failure due to either underconstrained geometry (affecting lidar) or textureless areas caused by aggressive lighting changes (affecting vision). In these conditions, our factor graph naturally uses the best information available from each sensor modality without any hard switches.
Multimodal sensor fusion methods for 3D object detection have been revolutionizing the autonomous driving research field. Nevertheless, most of these methods heavily rely on dense LiDAR data and accurately calibrated sensors which is often not the case in real-world scenarios. Data from LiDAR and cameras often come misaligned due to the miscalibration, decalibration, or different frequencies of the sensors. Additionally, some parts of the LiDAR data may be occluded and parts of the data may be missing due to hardware malfunction or weather conditions. This work presents a novel fusion step that addresses data corruptions and makes sensor fusion for 3D object detection more robust. Through extensive experiments, we demonstrate that our method performs on par with state-of-the-art approaches on normal data and outperforms them on misaligned data.
This research addresses the challenge of estimating bathymetry from imaging sonars where the state-of-the-art works have primarily relied on either supervised learning with ground-truth labels or surface rendering based on the Lambertian assumption. In this letter, we propose a novel, self-supervised framework based on volume rendering for reconstructing bathymetry using forward-looking sonar (FLS) data collected during standard surveys. We represent the seafloor as a neural heightmap encapsulated with a parametric multi-resolution hash encoding scheme and model the sonar measurements with a differentiable renderer using sonar volumetric rendering employed with hierarchical sampling techniques. Additionally, we model the horizontal and vertical beam patterns and estimate them jointly with the bathymetry. We evaluate the proposed method quantitatively on simulation and field data collected by remotely operated vehicles (ROVs) during low-altitude surveys. Results show that the proposed method outperforms the current state-of-the-art approaches that use imaging sonars for seabed mapping. We also demonstrate that the proposed approach can potentially be used to increase the resolution of a low-resolution prior map with FLS data from low-altitude surveys.
The computational load associated with computer vision is often prohibitive, and limits the capacity for on-board image analysis in compact mobile robots. Replicating the kind of feature detection and neural processing that animals excel at remains a challenge in most biomimetic aquatic robots. Event-driven sensors use a biologically inspired sensing strategy to eliminate the need for complete frame capture. Systems employing event-driven cameras enjoy reduced latencies, power consumption, bandwidth, and benefit from a large dynamic range. However, to the best of our knowledge, no work has been done to evaluate the performance of these devices in underwater robotics. This work proposes a robotic lamprey design capable of supporting computer vision, and uses this system to validate a computational neuron model for driving anguilliform swimming. The robot is equipped with two different types of cameras: frame-based and event-based cameras. These were used to stimulate the neural network, yielding goal-oriented swimming. Finally, a study is conducted comparing the performance of the computational model when driven by the two different types of camera. It was observed that event-based cameras improved the accuracy of swimming trajectories and led to significant improvements in the rate at which visual inputs were processed by the network.
A robot control system is often composed of a set of low level continuous controllers and a switching policy that decides which of those continuous controllers to apply at each time instant. The switching policy can be either a Finite State Machine (FSM), a Behavior Tree (BT) or some other structure. In previous work we have shown how to create BTs using a backward chained approach that results in a reactive goal directed policy. This policy can be thought of as providing disturbance rejection at the task level in the sense that if a disturbance changes the state in such a way that the currently running continuous controller cannot handle it, the policy will switch to the appropriate continuous controller. In this letter we show how to provide convergence guarantees for such policies.