The increasing number of unmanned aerial vehicles (UAVs) in urban environments requires a strategy to minimize their environmental impact, both in terms of energy efficiency and noise reduction, and novel strategies for developing prediction models and optimization of flight planning are needed. Our goal is to develop deep reinforcement learning (DRL) algorithms capable of enabling the autonomous navigation of UAVs in urban environments, taking into account their complexity, and optimizing the trajectories to reduce both energy consumption and noise. Fluid flow simulations represent the environment in which the UAV navigates. The UAV is trained as an agent that interacts with an urban environment. In this work, the domain is represented by a two-dimensional flow field with obstacles, ideally representing buildings, extracted from a three-dimensional high-fidelity numerical simulation. The presented methodology, using PPO + LSTM cells, was validated by reproducing a simple but fundamental problem in navigation, namely the Zermelo's problem, which deals with the trajectory optimization of a vessel navigating in a turbulent flow. The current method shows a significant improvement with respect to both a vanilla PPO and a TD3 algorithm, with a success rate (SR) of the PPO + LSTM trained policy of 98.7 %, and a crash rate (CR) of 0.1 %, outperforming both PPO (SR = 75.6 %, CR = 18.6 %) and TD3 (SR = 77.4 % and CR = 14.5 %). This is the first step towards DRL strategies which will guide UAVs in a three-dimensional flow field using real-time signals, making the navigation efficient in terms of flight time and avoiding damages to the vehicle.
QC 20250821