Reinforcement learning has emerged as a powerful paradigm in machinelearning, witnessing remarkable progress in recent years. Amongreinforcement algorithms, Q-learning stands out, enabling agents tolearn quickly from past actions. This study aims to investigate andenhance Q-learning methodologies, with a specific focus on tabularQ-learning. In particular, it addresses Q-learning with an actionspace containing actions that require different amounts of time toexecute. With such an action space the algorithm might convergeto a suboptimal solution when using a constant discount factor sincediscounting occurs per action and not per time step. We refer to thisissue as the non-temporal discounting (NTD) problem. By introducinga time-normalised discounting function, we were able to address theissue of NTD. In addition, we were able to stabilise the solutionby implementing a cost for specific actions. As a result, the modelconverged to the expected solution. Building on these results it wouldbe wise to implement time-normalised discounting in a state-of-the-artreinforcement learning model such as deep Q-learning.