Adaptive exploration network policy for effective exploration in reinforcement learning

Min Li; William Zhu

doi:10.1117/12.2667206

1 March 2023 Adaptive exploration network policy for effective exploration in reinforcement learning

Min Li, William Zhu

Proceedings Volume 12588, International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022); 1258818 (2023) https://doi.org/10.1117/12.2667206
Event: International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022), 2022, Chongqing, China

Abstract

How to achieve effective exploration is a key issue in the training of Reinforcement learning. The known exploration policy addresses this issue by adding noise to the policy for guiding the agent exploring. However, it has two problems that 1) the exploration scale has low adaptability to the training stability due to the added noise from a fixed distribution and 2) the policy learned after the training may be locally optimal because the exploration is insufficient. Adaptive exploration policy addresses the first problem by adjusting the noise scale according to the training stability. But the learned policy may still be locally optimal. In this paper, we propose an adaptive exploration network policy to address this problem by considering exploration direction. The motivation is that the agent should explore in the direction of increasing the sample diversity to avoid the local optimum caused by insufficient exploration. Firstly, we construct a prediction network to predict the next state after the agent makes a decision at the current state. Secondly, we propose an exploration network to generate the exploration direction. To increase the sample diversity, this network is trained by maximizing the distance between the predicted next state from prediction network and the current state. Then we adjust the exploration scale to adapt to the training stability. Finally, we propose adaptive exploration network policy based on the new noise constructed by the generated exploration direction and the adaptive exploration scale. Experiments illustrate the effectiveness of our method.

Citation Download Citation

Min Li and William Zhu "Adaptive exploration network policy for effective exploration in reinforcement learning", Proc. SPIE 12588, International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2022), 1258818 (1 March 2023); https://doi.org/10.1117/12.2667206

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available