Xiyun Li, Tielin Zhang, Chenghao Liu, Shuang Xu, Bo Xu. Theory of Mind Inspired Large Reasoning Language Model Improved Multi-agent Reinforcement Learning Algorithm for Robust and Adaptive Partner Modelling[J]. Machine Intelligence Research, 2025, 22(6): 1088-1101. DOI: 10.1007/s11633-025-1547-3
Citation: Xiyun Li, Tielin Zhang, Chenghao Liu, Shuang Xu, Bo Xu. Theory of Mind Inspired Large Reasoning Language Model Improved Multi-agent Reinforcement Learning Algorithm for Robust and Adaptive Partner Modelling[J]. Machine Intelligence Research, 2025, 22(6): 1088-1101. DOI: 10.1007/s11633-025-1547-3

Theory of Mind Inspired Large Reasoning Language Model Improved Multi-agent Reinforcement Learning Algorithm for Robust and Adaptive Partner Modelling

  • The cooperative multi-agent reinforcement learning (MARL) field has experienced remarkable progress. However, these advanced methods still face substantial challenges in real-world applications. A significant direction for improving cooperative MARL techniques and addressing existing challenges is robust and adaptive partner modelling. Reasoning about the beliefs of partners, such as their intentions and behaviors, is crucial for partner modelling, which is known as the theory of mind (ToM) in cognitive science. In animals, biological ToM reasoning in the prefrontal cortex (PFC) plays an important role in complex environment survival before decision-making. However, the biological PFC is too complex to be directly incorporated into conventional artificial neural networks (ANNs) in either functional or structural manners. Large reasoning language models (LRMs) have recently demonstrated significant human-like reasoning abilities and impressive performance. Therefore, we propose an improved LRM framework to simulate the PFC for robust and adaptive partner modelling. Despite the excellent performance of LRMs in various fields, their ToM reasoning capabilities remain limited in complex MARL scenarios. Therefore, we further propose a ToM reasoner to enhance the ToM reasoning abilities of LRMs. Our framework exhibits robustness and adaptability across various LRM sizes, improving the ToM reasoning ability of agents and facilitating more effective partner modelling, thereby achieving higher performance scores in cooperative benchmarks.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return