Paper
1 March 2023 Adaptive exploration network policy for effective exploration in reinforcement learning
Min Li, William Zhu
Author Affiliations +
Proceedings Volume 12588, International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022); 1258818 (2023) https://doi.org/10.1117/12.2667206
Event: International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022), 2022, Chongqing, China
Abstract
How to achieve effective exploration is a key issue in the training of Reinforcement learning. The known exploration policy addresses this issue by adding noise to the policy for guiding the agent exploring. However, it has two problems that 1) the exploration scale has low adaptability to the training stability due to the added noise from a fixed distribution and 2) the policy learned after the training may be locally optimal because the exploration is insufficient. Adaptive exploration policy addresses the first problem by adjusting the noise scale according to the training stability. But the learned policy may still be locally optimal. In this paper, we propose an adaptive exploration network policy to address this problem by considering exploration direction. The motivation is that the agent should explore in the direction of increasing the sample diversity to avoid the local optimum caused by insufficient exploration. Firstly, we construct a prediction network to predict the next state after the agent makes a decision at the current state. Secondly, we propose an exploration network to generate the exploration direction. To increase the sample diversity, this network is trained by maximizing the distance between the predicted next state from prediction network and the current state. Then we adjust the exploration scale to adapt to the training stability. Finally, we propose adaptive exploration network policy based on the new noise constructed by the generated exploration direction and the adaptive exploration scale. Experiments illustrate the effectiveness of our method.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Min Li and William Zhu "Adaptive exploration network policy for effective exploration in reinforcement learning", Proc. SPIE 12588, International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022), 1258818 (1 March 2023); https://doi.org/10.1117/12.2667206
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Statistical analysis

Decision making

Machine learning

Technology

Visualization

Back to Top