Journal - Biweekly Meeting 6/13 21.Feb-03.Apr 2022

Meeting

My thoughts wander too far away, and my readings are not helpful. I should just ask my supervisor for support in constructing reinforcement learning model, speak of all my concerns, he has experiences.
Never stop programming. Reading alone can not help me raising possible questions in practice. But also keep reading. Programming and reading, do both!!
The idea of converging to an optimal solution is right, however when there are statistics in papers showing how well their solutions converge to an optimum, it means that they have simple problem settings, because they have only one optimum.
Start with simple environment to test your model.
Read your PPO paper, which is also an example of policy optimization.
Put in more time!!! You need to graduate!

Background Description

At each step, only 5-10% of nodes have demands for communication. Mimic low intensity of communication, so that drones need to move around. (rapid reconfiguration of network topology on demand)

Problem

how to get drones to teamwork? e.g. to bridge communication between router groups that are far from each other.

Research

emergency network structure: two-tier, centralized, agents connect directly to ground users / centralized user groups, no mesh network (agents and users are not peers)
multiple-UAV reinforcement learning: single-agent model even when multiple agents are deployed, environment does not change / is not affected by other agents
find a problem: the solution to my problem does not converge to a final optimal version

Keywords

(1. emergency mesh network)

changing reinforcement learning
mesh network reinforcement learning
read one example of policy optimization

Scenario

routers are at given positions (2D), agents fly around to activate routers. because agents need to recharge, and one agent can only activate a small portion of the routers after one recharge, the problem becomes np-hard, and the solution of this problem can converge to an optimum regarding the total flying time. add comparison to other algorithms. match router positions to a real map.

defect: this problem setting is only an initial phase of the entire scenario, and is simple. the network function of the simulation, which is already implemented, becomes useless. I can find a better problem setting / it can become a part of my final solution.

two-tier, routers do not connect with one another, one agent covers certain amount of routers, agents form a network (gateway?). use deep reinforcement learning to improve agent placement / routing protocol? consider agent-router-association. read more. but i am more interested in mesh network / D2D. utilize the existing devices. most associated with channel quality