Linghui Meng, Muning Wen, Chenyang Le, Xiyun Li, Dengpeng Xing, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Yaodong Yang, Bo Xu. Offline Pre-trained Multi-agent Decision Transformer. Machine Intelligence Research, vol. 20, no. 2, pp.233-248, 2023. https://doi.org/10.1007/s11633-022-1383-7
Citation: Linghui Meng, Muning Wen, Chenyang Le, Xiyun Li, Dengpeng Xing, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Yaodong Yang, Bo Xu. Offline Pre-trained Multi-agent Decision Transformer. Machine Intelligence Research, vol. 20, no. 2, pp.233-248, 2023. https://doi.org/10.1007/s11633-022-1383-7

Offline Pre-trained Multi-agent Decision Transformer

doi: 10.1007/s11633-022-1383-7
More Information
  • Author Bio:

    Linghui Meng received the B. Sc. degree in rail traffic signal control from Beijing Jiao Tong University, China in 2019. He is currently a Ph.D. degree candidate in pattern recognition and intelligent system at both Institute of Automation, Chinese Academy of Sciences, China, and University of Chinese Academy of Sciences, China. His interests include theoretical research on reinforcement learning, pre-training models, multi-agent system and speech recognition. E-mail: menglinghui2019@ia.ac.cn ORCID iD: 0000-0002-5826-8072

    Muning Wen is currently a Ph.D. degree candidate in computer science and technology at Shanghai Jiao Tong University, China. His research interests include reinforcement learning, deep learning and multi-agent system. E-mail: muning.wen@sjtu.edu.cn

    Chenyang Le is an undergraduate in computer science and technology at Shanghai Jiaotong University, China. His research interests include reinforcement learning, automatic speech recognition and speech translation. E-mail: nethermanpro@sjtu.edu.cn

    Xiyun Li is currently a Ph. D. degree candidate in pattern recognition and intelligent system at Institute of Automation, Chinese Academy of Sciences, China. His research interests include reinforcement learning and brain-inspired cognitive models. E-mail: lixiyun2020@ia.ac.cn

    Dengpeng Xing received the B. Sc. degree in mechanical electronics and the M. Sc. degree in mechanical manufacturing and automation from Tianjin University, China in 2002 and 2006, respectively, and the Ph. D. degree in control science and engineering from Shanghai Jiao Tong University, China in 2010. He is currently an associate professor in the Research Center for Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, China. His research interests include reinforcement learning and brain-inspired robotics. E-mail: dengpeng.xing@ia.ac.cn

    Weinan Zhang received the B. Eng. in computer science and technology from ACM Class of Shanghai Jiao Tong University, China in 2011, and the Ph. D. degree in computer science and technology from University College London, UK in 2016. He is now a tenure-track associate professor at Shanghai Jiao Tong University, China. He has published over 150 research papers on international conferences and journals and has been serving as an area chair or (senior) PC member at ICML, NeurIPS, ICLR, KDD, AAAI, IJCAI, SIGIR, etc, and a reviewer at JMLR, TOIS, TKDE, TIST, etc. His research interests include reinforcement learning, deep learning and data science with various real-world applications of recommender systems, search engines, text mining & generation, knowledge graphs and game AI. E-mail: wnzhang@sjtu.edu.cn

    Ying Wen received the B. Eng. degree from Beijing University of Posts and Telecommunications, China, the B. Eng. degree with first class honor from Queen Mary, University of London, UK in 2015, the M. Sc. degree with distinction honor from University College London, UK in 2016, and the Ph. D. degree from Department of Computer Science, University College London, UK. is currently a tenure-track Assistant Professor with the John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, China. He has published over 20 research papers about machine learning on top-tier international conferences (ICML, NeurIPS, ICLR, IJCAI, and AAMAS). He has been serving as a PC member at ICML, NeurIPS, ICLR, AAAI, IJCAI, ICAPS and a reviewer at TIFS, Operational Research, etc. He was granted Best Paper Award in AAMAS 2021 Blue Sky Track and Best System Paper Award in CoRL 2020. His research interests include machine learning, multi-agent systems and human-centered interactive systems. E-mail: ying.wen@sjtu.edu.cn

    Haifeng Zhang received the B. Sc. degree in computer science and economics and the Ph. D. degree in computer science from Peking University, China in 2012 and 2018, respectively. He was a visiting scientist at Center on Frontiers of Computing Studies (CFCS), Peking University, China. Earlier, he was a research fellow at University College London, UK. He is an associate professor at Institute of Automation, Chinese Academy of Sciences (CASIA), China. He has published research papers on international conferences ICML, NeurIPS, AAAI, IJCAI, AAMAS, etc. He has served as a Reviewer for AAAI, IJCAI, TNNLS, Acta Automatica Sinica, and Co-Chair for IJCAI competition, IJTCS, DAI Workshop, etc. His research interests include reinforcement learning, game AI, game theory and computational advertising. E-mail: haifeng.zhang@ia.ac.cn

    Jun Wang received the B. Sc. degree from Southeast University, China in 1997, the M. Sc. degree from National University of Singapore, Singapore in 2003, the Ph. D. degree from Delft University of Technology, The Netherlands in 2007. is currently a professor with Computer Science Department, University College London, UK. He has published over 200 research articles. His team won the First Global Real-Time Bidding Algorithm Contest with more than 80 participants worldwide. He was the winner of multiple best paper awards. He was a recipient of the Beyond Search—Semantic Computing and Internet Economics Award by Microsoft Research and the Yahoo! FREP Faculty Award. He has served as an Area Chair for ACM CIKM and ACM SIGIR. His recent service includes the Co-Chair for Artificial Intelligence, Semantics, and Dialog in ACM SIGIR 2018. His research interests are in the areas of AI and intelligent systems, covering (multiagent) reinforcement learning, deep generative models, and their diverse applications on information retrieval, recommender systems and personalization, data mining, smart cities, bot planning, and computational advertising. E-mail: jun.wang@cs.ucl.ac.uk (Corresponding author)ORCID iD: 0000-0001-9006-7951

    Yaodong Yang received the B. Sc. degree in electronic engineering & information science from University of Science and Technology of China, China in 2013, the M.Sc. degree in science (Quant. Biology/Biostatistics) degree from Imperial College London, UK 2014, and the Ph.D. degree in computer science from University College London, UK in 2021. He is a machine learning researcher with ten-year working experience in both academia and industry. Currently, he is an assistant professor at Peking University, China. Before joining Peking University, he was an assistant professor at King′s College London, UK. Before KCL, he was a principal research scientist at Huawei UK. Before Huawei, he was a senior research manager at AIG, working on AI applications in finance. He has maintained a track record of more than forty publications at top conferences and journals, along with the Best System Paper Award at CoRL 2020 and the Best Blue-sky Paper Award at AAMAS 2021. His research interests include reinforcement learning and multi-agent systems. E-mail: yaodong.yang@pku.edu.cn

    Bo Xu received the B. Sc. degree in electrical engineering from Zhejiang University, China in 1988, and the M. Sc. and Ph. D. degrees in pattern recognition and intelligent system from Institute of Automation, Chinese Academy of Sciences, China in 1992 and 1997, respectively. He is a professor, the director of Institute of Automation, Chinese Academy of Sciences, China, and also deputy director of Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, China. His research interests include brain-inspired intelligence, brain-inspired cognitive models, natural language processing and understanding, and brain-inspired robotics. E-mail: xubo@ia.ac.cn (Corresponding author)ORCID iD: 0000-0002-1111-1529

  • Received Date: 2022-07-06
  • Accepted Date: 2022-10-18
  • Publish Date: 2023-04-01
  • Offline reinforcement learning leverages previously collected offline datasets to learn optimal policies with no necessity to access the real environment. Such a paradigm is also desirable for multi-agent reinforcement learning (MARL) tasks, given the combinatorially increased interactions among agents and with the environment. However, in MARL, the paradigm of offline pre-training with online fine-tuning has not been studied, nor even datasets or benchmarks for offline MARL research are available. In this paper, we facilitate the research by providing large-scale datasets and using them to examine the usage of the decision transformer in the context of MARL. We investigate the generalization of MARL offline pre-training in the following three aspects: 1) between single agents and multiple agents, 2) from offline pretraining to online fine tuning, and 3) to that of multiple downstream tasks with few-shot and zero-shot capabilities. We start by introducing the first offline MARL dataset with diverse quality levels based on the StarCraftII environment, and then propose the novel architecture of multi-agent decision transformer (MADT) for effective offline learning. MADT leverages the transformer′s modelling ability for sequence modelling and integrates it seamlessly with both offline and online MARL tasks. A significant benefit of MADT is that it learns generalizable policies that can transfer between different types of agents under different task scenarios. On the StarCraft II offline dataset, MADT outperforms the state-of-the-art offline reinforcement learning (RL) baselines, including BCQ and CQL. When applied to online tasks, the pre-trained MADT significantly improves sample efficiency and enjoys strong performance in both few-short and zero-shot cases. To the best of our knowledge, this is the first work that studies and demonstrates the effectiveness of offline pre-trained models in terms of sample efficiency and generalizability enhancements for MARL.

     

  • *These authors contribute equally to this work
  • loading
  • [1]
    Y. D. Yang, J. Wang. An overview of multi-agent reinforcement learning from game theoretical perspective. [Online], Available: https://arxiv.org/abs/2011.00583, 2020.
    [2]
    S. Shalev-Shwartz, S. Shammah, A. Shashua. Safe, multi-agent, reinforcement learning for autonomous driving. [Online], Available: https://arxiv.org/abs/1610.03295, 2016.
    [3]
    M. Zhou, J. Luo, J. Villella, Y. D. Yang, D. Rusu, J. Y. Miao, W. N. Zhang, M. Alban, I. Fadakar, Z. Chen, A. C. Huang, Y. Wen, K. Hassanzadeh, D. Graves, D. Chen, Z. B. Zhu, N. Nguyen, M. Elsayed, K. Shao, S. Ahilan, B. K. Zhang, J. N. Wu, Z. G. Fu, K. Rezaee, P. Yadmellat, M. Rohani, N. P. Nieves, Y. H. Ni, S. Banijamali, A. C. Rivers, Z. Tian, D. Palenicek, H. bou Ammar, H. B. Zhang, W. L. Liu, J. Y. Hao, J. Wang. Smarts: Scalable multi-agent reinforcement learning training school for autonomous driving. [Online], Available: https://arxiv.org/abs/2010.09776, 2020.
    [4]
    H. F. Zhang, W. Z. Chen, Z. R. Huang, M. N. Li, Y. D. Yang, W. N. Zhang, J. Wang. Bi-level actor-critic for multi-agent coordination. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 7325–7332, 2020.
    [5]
    M. N. Li, Z. W. Qin, Y. Jiao, Y. D. Yang, J. Wang, C. X. Wang, G. B. Wu, J. P. Ye. Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In Proceedings of World Wide Web Conference, ACM, San Francisco, USA, pp. 983–994, 2019. DOI: 10.1145/3308558.3313433.
    [6]
    Y. D. Yang, R. Luo, M. N. Li, M. Zhou, W. N. Zhang, J. Wang. Mean field multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 5571–5580, 2018.
    [7]
    Y. D. Yang, L. T. Yu, Y. W. Bai, Y. Wen, W. N. Zhang, J. Wang. A study of AI population dynamics with million-agent reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, ACM, Stockholm, Sweden, pp. 2133–2135, 2018.
    [8]
    P. Peng, Y. Wen, Y. D. Yang, Q. Yuan, Z. K. Tang, H. T. Long, J. Wang. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games. [Online], Available: https://arxiv.org/abs/1703.10069, 2017.
    [9]
    M. Zhou, Z. Y. Wan, H. J. Wang, M. N. Wen, R. Z. Wu, Y. Wen, Y. D. Yang, W. N. Zhang, J. Wang. MALib: A parallel framework for population-based multi-agent reinforcement learning. [Online], Available: https://arxiv.org/abs/2106.07551, 2021.
    [10]
    X. T. Deng, Y. H. Li, D. H. Mguni, J. Wang, Y. D. Yang. On the complexity of computing Markov perfect equilibrium in general-sum stochastic games. [Online], Available: https://arxiv.org/abs/2109.01795, 2021.
    [11]
    T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, S. Levine. Soft actor-critic algorithms and applications. [Online], Available: https://arxiv.org/abs/1812.05905, 2018.
    [12]
    R. Munos, T. Stepleton, A. Harutyunyan, M. G. Bellemare. Safe and efficient off-policy reinforcement learning. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 1054–1062, 2016.
    [13]
    L. Espeholt, R. Marinier, P. Stanczyk, K. Wang, M. Michalski. SEED RL: Scalable and efficient deep-RL with accelerated central inference. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
    [14]
    L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, S. Legg, K. Kavukcuoglu. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1407–1416, 2018.
    [15]
    K. M. He, X. L. Chen, S. N. Xie, Y. H. Li, Dollár, R. Girshick. Masked autoencoders are scalable vision learners. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 15979–15988, 2021. DOI: 10.1109/CVPR52688.2022.01553.
    [16]
    Z. Liu, Y. T. Lin, Y. Cao, H. Hu, Y. X. Wei, Z. Zhang, S. Lin, B. N. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 9992–10002, 2021. DOI: 10.1109/ICCV48922.2021.00986.
    [17]
    S. Kim, J. Kim, H. W. Chun. Wave2Vec: Vectorizing electroencephalography bio-signal for prediction of brain disease. International Journal of Environmental Research and Public Health, vol. 15, no. 8, Article number 1750, 2018. DOI: 10.3390/ijerph15081750.
    [18]
    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 159, 2020. DOI: 10.5555/3495724.3495883.
    [19]
    L. L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, I. Mordatch. Decision transformer: Reinforcement learning via sequence modeling. [Online], Available: https://arxiv.org/abs/2106.01345, 2021.
    [20]
    Y. D. Yang, J. Luo, Y. Wen, O. Slumbers, D. Graves, H. bou Ammar, J. Wang, M. E. Taylor. Diverse auto-curriculum is critical for successful real-world multiagent learning systems. In Proceedings of the 20th International Conference on Autonomous Agents and Multi-agent Systems, ACM, pp. 51–56, 2021.
    [21]
    N. Perez-Nieves, Y. D. Yang, O. Slumbers, D. H. Mguni, Y. Wen, J. Wang. Modelling behavioural diversity for learning in open-ended games. In Proceedings of the 38th International Conference on Machine Learning, pp. 8514–8524, 2021.
    [22]
    X. Y. Liu, H. T. Jia, Y. Wen, Y. J. Hu, Y. F. Chen, C. J. Fan, Z. P. Hu, Y. D. Yang. Unifying behavioral and response diversity for open-ended learning in zero-sum games. In Proceedings of the 35th Conference on Neural Information Processing Systems, pp. 941–952, 2021.
    [23]
    S. Levine, A. Kumar, G. Tucker, J. Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. [Online], Available: https://arxiv.org/abs/2005.01643, 2020.
    [24]
    R. Sanjaya, J. Wang, Y. D. Yang. Measuring the non-transitivity in chess. Algorithms, vol. 15, no. 5, Article number 152, 2022. DOI: 10.3390/a15050152.
    [25]
    X. D. Feng, O. Slumbers, Y. D. Yang, Z. Y. Wan, B. Liu, S. McAleer, Y. Wen, J. Wang. Discovering multi-agent auto-curricula in two-player zero-sum games. [Online], Available: https://arxiv.org/abs/2106.02745, 2021.
    [26]
    M. Samvelyan, T. Rashid, C. S. de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C. M. Hung, P. H. S. Torr, J. Foerster, S. Whiteson. The StarCraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, Montreal, Canada, pp. 2186–2188, 2019.
    [27]
    Z. Li, S. R. Xue, X. H. Yu, H. J Gao. Controller optimization for multirate systems based on reinforcement learning. International Journal of Automation and Computing, vol. 17, no. 3, pp. 417–427, 2020. DOI: 10.1007/s11633-020-1229-0.
    [28]
    Y. Li, D. Xu. Skill learning for robotic insertion based on one-shot demonstration and reinforcement learning. International Journal of Automation and Computing, vol. 18, no. 3, pp. 457–467, 2021. DOI: 10.1007/s11633-021-1290-3.
    [29]
    C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al. Dota 2 with large scale deep reinforcement learning. [Online], Available: https://arxiv.org/abs/1912.06680, 2019.
    [30]
    A. Kumar, J. Fu, G. Tucker, S. Levine. Stabilizing off-policy Q-learning via bootstrapping error reduction. In Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 11761–11771, 2019.
    [31]
    A. Kumar, A. Zhou, G. Tucker, S. Levine. Conservative Q-learning for offline reinforcement learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 100, 2020. DOI: 10.5555/3495724.3495824.
    [32]
    S. Fujimoto, D. Meger, D. Precup. Off-policy deep reinforcement learning without exploration. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, pp. 2052–2062, 2019.
    [33]
    T. Matsushima, H. Furuta, Y. Matsuo, O. Nachum, S. X. Gu. Deployment-efficient reinforcement learning via model-based offline optimization. In Proceedings of the 9th International Conference on Learning Representations, 2021.
    [34]
    D. J. Su, J. D. Lee, J. M. Mulvey, H. V. Poor. MUSBO: Model-based uncertainty regularized and sample efficient batch optimization for deployment constrained reinforcement learning. [Online], Available: https://arxiv.org/abs/2102.11448, 2021.
    [35]
    Y. Q. Yang, X. T. Ma, C. H. Li, Z. W. Zheng, Q. Y. Zhang, G. Huang, J. Yang, Q. C. Zhao. Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. [Online], Available: https://arxiv.org/abs/2106.03400, 2021.
    [36]
    J. C. Jiang, Z. Q. Lu. Offline decentralized multi-agent reinforcement learning. [Online], Available: https://arxiv.org/abs/2108.01832, 2021.
    [37]
    A. Nair, M. Dalal, A. Gupta, S. Levine. Accelerating online reinforcement learning with offline datasets. [Online], Available: https://arxiv.org/abs/2006.09359, 2020.
    [38]
    M. Janner, Q. Y. Li, S. Levine. Offline reinforcement learning as one big sequence modeling problem. [Online], Available: https://arxiv.org/abs/2106.02039, 2021.
    [39]
    L. C. Dinh, Y. D. Yang, S. McAleer, Z. Tian, N. P. Nieves, O. Slumbers, D. H. Mguni, H. bou Ammar, J. Wang. Online double oracle. [Online], Available: https://arxiv.org/abs/2103.07780, 2021.
    [40]
    D. H. Mguni, Y. T. Wu, Y. L. Du, Y. D. Yang, Z. Y. Wang, M. N. Li, Y. Wen, J. Jennings, J. Wang. Learning in nonzero-sum stochastic games with potentials. In Proceedings of the 38th International Conference on Machine Learning, pp. 7688–7699, 2021.
    [41]
    Y. D. Yang, Y. Wen, J. Wang, L. H. Chen, K. Shao, D. Mguni, W. N. Zhang. Multi-agent determinantal Q-learning. In Proceedings of the 37th International Conference on Machine Learning, pp. 10757–10766, 2020.
    [42]
    T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, S. Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 4295–4304, 2018.
    [43]
    Y. Wen, Y. D. Yang, R. Luo, J. Wang, W. Pan. Probabilistic recursive reasoning for multi-agent reinforcement learning. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
    [44]
    Y. Wen, Y. D. Yang, J. Wang. Modelling bounded rationality in multi-agent interactions by generalized recursive reasoning. In Proceedings of the 29th International Joint Conference on Artificial Intelligence, Yokohama, Japan, pp. 414–421, 2020. DOI: 10.24963/ijcai.2020/58.
    [45]
    S. Hu, F. D. Zhu, X. J. Chang, X. D. Liang. UPDeT: Universal multi-agent reinforcement learning via policy decoupling with transformers. [Online], Available: https://arxiv.org/abs/2101.08001, 2021.
    [46]
    K. Son, D. Kim, W. J. Kang, D. E. Hostallero, Y. Yi. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, pp. 5887–5896, 2019.
    [47]
    J. G. Kuba, M. N. Wen, L. H. Meng, S. D. Gu, H. F. Zhang, D. H. Mguni, J. Wang, Y. D. Yang. Settling the variance of multi-agent policy gradients. In Proceedings of the 35th Conference on Neural Information Processing Systems, pp. 13458–13470, 2021.
    [48]
    J. G. Kuba, R. Q. Chen, M. N. Wen, Y. Wen, F. L. Sun, J. Wang, Y. D. Yang. Trust region policy optimisation in multi-agent reinforcement learning. In Proceedings of the 10th International Conference on Learning Representations, 2022.
    [49]
    S. D. Gu, J. G. Kuba, M. N. Wen, R. Q. Chen, Z. Y. Wang, Z. Tian, J. Wang, A. Knoll, Y. D. Yang. Multi-agent constrained policy optimisation. [Online], Available: https://arxiv.org/abs/2110.02793, 2021.
    [50]
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017. DOI: 10.5555/3295222.3295349.
    [51]
    I. Sutskever, O. Vinyals, Q. V. Le. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 3104–3112, 2014. DOI: 10.5555/2969033.2969173.
    [52]
    Q. Wang, B. Li, T. Xiao, J. B. Zhu, C. L. Li, D. F. Wong, L. S. Chao. Learning deep transformer models for machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 1810–1822, 2019. DOI: 10.18653/v1/P19-1176.
    [53]
    L. H. Dong, S. Xu, B. Xu. Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada, pp. 5884–5888, 2018. DOI: 10.1109/ICASSP.2018.8462506.
    [54]
    K. Han, Y. H. Wang, H. T. Chen, X. H. Chen, J. Y. Guo, Z. H. Liu, Y. H. Tang, A. Xiao, C. J. Xu, Y. X. Xu, Z. H. Yang, Y. M. Zhang, D. C. Tao. A survey on vision transformer. [Online], Available: https://arxiv.org/abs/2012.12556, 2020.
    [55]
    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations, 2020.
    [56]
    R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, vol. 8, no. 3, pp. 229–256, 1992. DOI: 10.1007/BF00992696.
    [57]
    I. Mordatch, P. Abbeel. Emergence of grounded compositional language in multi-agent populations. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, USA, Article number 183, 2018. DOI: 10.5555/3504035.3504218.
    [58]
    C. Yu, A. Velu, E. Vinitsky, J. X. Gao, Y. Wang, A. Bayen, Y. Wu. The surprising effectiveness of PPO in cooperative, multi-agent games. [Online], Available: https://arxiv.org/abs/2103.01955, 2021.
    [59]
    J. Fu, A. Kumar, O. Nachum, G. Tucker, S. Levine. D4RL: Datasets for deep data-driven reinforcement learning. [Online], Available: https://arxiv.org/abs/2004.07219, 2020.
    [60]
    Z. D. Zhu, K. X. Lin, A. K. Jain, J. Zhou. Transfer learning in deep reinforcement learning: A survey. [Online], Available: https://arxiv.org/abs/2009.07888, 2020.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(8)

    用微信扫码二维码

    分享至好友和朋友圈

    Article Metrics

    Article views (667) PDF downloads(55) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return