Edge computing (EC) is expected to provide low latency access to computing and storage resources to autonomous Wireless Devices (WDs). Pricing and resource allocation in EC thus have to cope with stochastic workloads, on the one hand offering resources at a price that is attractive to WDs, one the other hand ensuring revenue to the edge operator. In this paper, we formulate the strategic interaction between an edge operator and WDs as a Bayesian Stackelberg Markov game. We characterize the optimal strategy of the WDs that minimizes their costs. We then show that the operator's problem can be formulated as a Markov Decision Process and propose a model-based reinforcement learning approach, based on a novel approximation of the workload dynamics at the edge cell environment. The proposed approximation leverages two Bayesian Neural Networks (BNNs) to facilitate efficient policy learning, and enables sample efficient transfer learning from simulated environments to a real edge environment. Our extensive simulation results demonstrate the superiority of our approach in terms of sample efficiency, outperforming state-of-the-art methods 30 times in terms of learning rate and by 50% in terms of operator revenue.
Part of ISBN 9798350384475
QC 20240924