UM
Affiliated with RCfalse
Status已發表Published
Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
Cheng, Yuhu1,2; Chen, Lin1,2; Chen, C. L.Philip3,4; Wang, Xuesong1,2
2021-12-01
Source PublicationIEEE Transactions on Cognitive and Developmental Systems
ISSN2379-8920
Volume13Issue:4Pages:1023-1032
Abstract

As an important machine learning method, deep reinforcement learning (DRL) has been rapidly developed in recent years and has achieved breakthrough results in many fields, such as video games, natural language processing, and robot control. However, due to the inherit trial-and-error learning mechanism of reinforcement learning and the time-consuming training of deep neural network itself, the convergence speed of DRL is very slow and consequently limits the real applications of DRL. In this article, aiming to improve the convergence speed of DRL, we proposed a novel Steffensen value iteration (SVI) method by applying the Steffensen iteration to the value function iteration of off-policy DRL from the perspective of fixed-point iteration. The proposed SVI is theoretically proved to be convergent and have a faster convergence speed than Bellman value iteration. The proposed SVI has versatility, which can be easily combined with existing off-policy RL algorithms. In this article, we proposed two speedy off-policy DRLs by combining SVI with DDQN and TD3, respectively, namely, SVI-DDQN and SVI-TD3. Experiments on several discrete-action and continuous-action tasks from the Atari 2600 and MuJoCo platforms demonstrated that our proposed SVI-based DRLs can achieve higher average reward in a shorter time than the comparative algorithm.

KeywordConvergence Speed Deep Reinforcement Learning (Drl) Off-policy Steffensen Iteration Value Iteration (Vi)
DOI10.1109/TCDS.2020.3034452
URLView the original
Indexed BySCIE
Language英語English
WOS Research AreaComputer Science ; Neurosciences & Neurology
WOS SubjectComputer Science, Artificial Intelligence ; Robotics ; Neurosciences
WOS IDWOS:000728925200028
Scopus ID2-s2.0-85096098282
Fulltext Access
Citation statistics
Cited Times [WOS]:1   [WOS Record]     [Related Records in WOS]
Document TypeJournal article
CollectionUniversity of Macau
Affiliation1.Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, Xuzhou Key Laboratory of Artificial Intelligence and Big Data, Xuzhou, 221116, China
2.School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
3.School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
4.Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao
Recommended Citation
GB/T 7714
Cheng, Yuhu,Chen, Lin,Chen, C. L.Philip,et al. Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration[J]. IEEE Transactions on Cognitive and Developmental Systems,2021,13(4):1023-1032.
APA Cheng, Yuhu,Chen, Lin,Chen, C. L.Philip,&Wang, Xuesong.(2021).Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration.IEEE Transactions on Cognitive and Developmental Systems,13(4),1023-1032.
MLA Cheng, Yuhu,et al."Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration".IEEE Transactions on Cognitive and Developmental Systems 13.4(2021):1023-1032.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Cheng, Yuhu]'s Articles
[Chen, Lin]'s Articles
[Chen, C. L.Philip]'s Articles
Baidu academic
Similar articles in Baidu academic
[Cheng, Yuhu]'s Articles
[Chen, Lin]'s Articles
[Chen, C. L.Philip]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Cheng, Yuhu]'s Articles
[Chen, Lin]'s Articles
[Chen, C. L.Philip]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.