Reinforcement Learning in Multiple-UAV Networks: Deployment and Movement Design
My Takeaway: This is a multiple-UAV network, and the setting is right for me (emergency network), there is however no multiple-agent scenario. All agents work independently and their performances do not influence one another. I'm looking for teamwork agents and topology changes in the network.
The Goal of this experiment is to provide ground users with best performance (MOS, mean opinion score) based on certain parameters using drones as base stations. Altitude of the agent is introduced as a parameter that influences MOS (LoS feature). The experiment is divided into three phases: 1. GAK-means algorithm for cell partition of ground users, 2. Q-learning for 3D Deployment of UAVs, 3. Q-learning for dynamic movement design of UAVs.
Each UAV provides for one cluster derived from the first phase.
In the second phase, users are static. State space is the 3D coordinate of the agent, action space is designed with 7 options: (0, 0, 0), (±1, 0, 0), (0, ±1, 0), (0, 0, ±1), and reward is up to the comparison of the old MOS and the new one (three reward values for bigger, equal, and smaller). Policy is the ε-greedy policy.
The author has compared RL with two other algorithms: GAK-means and IGK algorithm. GAK-means doesn't consider parameters other than Euclidean distance, and the complexity of IGK is higher than RL.
In the third phase, I don't understand how ground users move, because the state space has only been expanded with two parameters: (x-user, y-user). I assume that all users in the same cluster move uniformly, which is however very unrealistic. The author mentioned introducing agent-user-association, expanding the state space and using DRL algorithm to reduce the dimension of Q-table as a possible future step.