马琼雄, 余润笙, 石振宇, 黄晁星, 李腾龙. 基于深度强化学习的水下机器人最优轨迹控制[J]. 华南师范大学学报(自然科学版), 2018, 50(1): 118-123.
引用本文: 马琼雄, 余润笙, 石振宇, 黄晁星, 李腾龙. 基于深度强化学习的水下机器人最优轨迹控制[J]. 华南师范大学学报(自然科学版), 2018, 50(1): 118-123.
Autonomous Underwater Vehicles Optimal Trajectory Control Base on Deep Reinforcement Learning[J]. Journal of South China Normal University (Natural Science Edition), 2018, 50(1): 118-123.
Citation: Autonomous Underwater Vehicles Optimal Trajectory Control Base on Deep Reinforcement Learning[J]. Journal of South China Normal University (Natural Science Edition), 2018, 50(1): 118-123.

基于深度强化学习的水下机器人最优轨迹控制

Autonomous Underwater Vehicles Optimal Trajectory Control Base on Deep Reinforcement Learning

  • 摘要: 为了实现水下机器人在跟踪复杂轨迹时具有出较高的精度和稳定性,提出了利用深度强化学习实现水下机器人最优轨迹控制的方法:首先,建立基于2个深度神经网络(Actor网络和Critic网络)的水下机器人控制模型,其中Actor网络用来选择动作,Critic网络用来评估Actor网络的训练结果;其次,构造合适的奖励信号使得深度强化学习算法适用于水下机器人的动力学模型;最后,提出了基于奖励信号标准差的网络训练成功评判条件,使得水下机器人在确保精度的同时保证稳定性. 仿真实验结果表明:在解决一定精度内的水下机器人复杂轨迹跟踪问题时,该算法与传统PID控制算法相比具有更优的性能.

     

    Abstract: To enable the autonomous underwater vehicles (AUV) to show high accuracy and stability in tracking complex trajectory, an AUV optimal trajectory tracking method is proposed by using deep reinforcement learning. Firstly, the control model is built based on the Actor deep neural network and the Critic deep neural network. The Actor network is trained to adapt action and the Critic network is trained to evaluate the training outcome of the Actor network. Secondly, proper reward function is constructed to make the deep reinforcement learning algorithm feasibly in underwater vehicles dynamics model. Lastly, the judgment of successful networks training is a set based on the standard deviation of reward functions to ensure the stability of AUV within certain accuracy. Simulations are carried out and we prove that this algorithm performance better than PID control in trajectory tracking in a complex trajectory.

     

/

返回文章
返回