High-performance GPU-accelerated particle filter methods are critical for object detection applications, ranging from autonomous driving, robot localization, to time-series prediction. In this work, we investigate the design, development and optimization of particle-filter using half-precision on CUDA cores and compare their performance and accuracy with single- and double-precision baselines on Nvidia V100, A100, A40 and T4 GPUs. To mitigate numerical instability and precision losses, we introduce algorithmic changes in the particle filters. Using half-precision leads to a performance improvement of 1.5–2 × and 2.5–4.6 × with respect to single- and double-precision baselines respectively, at the cost of a relatively small loss of accuracy.
Part of proceedings ISBN: 978-303150683-3
QC 20240520