Dynamic human order picking (HOP) is challenged by real-time changes and complex constraints like operator workload and cart capacity. While deep reinforcement learning (DRL) is suitable for dynamic problems, effectively leveraging warehouse graph structures remains an opportunity. This paper proposes a novel deep reinforcement learning architecture employing a graph attention network (GAT) based encoder-decoder architecture to address dynamic HOP. The GAT encoder explicitly models spatial and task related dependencies within the warehouse graph. The decoder utilizes a specialized attention mechanism, separating the context query from dynamic state information embedded in keys and values. This architecture is designed to consider real-time factors including remaining orders, cart weight, and operator workload. The primary contribution of this work lies in architectural design and its motivation, anticipating improvements in scalability, generalization, and dynamic adaptability over existing attention-aware reinforcement learning (RL) models. While this paper focuses on presenting theoretical architecture, ongoing empirical validation aims to quantify these potential benefits through direct comparison with the results of prior work.
Part of ISBN 9783032035141
QC 20251003