Citation: | Yuxian Gu, Jiaxin Wen, Hao Sun, Yi Song, Pei Ke, Chujie Zheng, Zheng Zhang, Jianzhu Yao, Lei Liu, Xiaoyan Zhu, Minlie Huang. EVA2.0: Investigating Open-domain Chinese Dialogue Systems with Large-scale Pre-training. Machine Intelligence Research, vol. 20, no. 2, pp.207-219, 2023. https://doi.org/10.1007/s11633-022-1387-3 |
[1] |
X. Han, Z. Y. Zhang, N. Ding, Y. X. Gu, X. Liu, Y. Q. Huo, J. Z. Qiu, Y. Yao, A. Zhang, L. Zhang, W. T. Han, M. L. Huang, Q. Jin, Y. Y. Lan, Y. Liu, Z. Y. Liu, Z. W. Lu, X. P. Qiu, R. H. Song, J. Tang, J. R. Wen, J. H. Yuan, W. X. Zhao, J. Zhu. Pre-trained models: Past, present and future. AI Open, vol. 2, pp. 225–250, 2021. DOI: 10.1016/j.aiopen.2021.08.002.
|
[2] |
T. X. Sun, X. Y. Liu, X. P. Qiu, X. J. Huang. Paradigm shift in natural language processing. Machine Intelligence Research, vol. 19, no. 3, pp. 169–183, 2022. DOI: 10.1007/s11633-022-1331-6.
|
[3] |
Y. Z. Zhang, S. Q. Sun, M. Galley, Y. C. Chen, C. Brockett, X. Gao, J. F. Gao, J. J. Liu, B. Dolan. DIALOGPT: Large-scale generative pre-training for conversational response generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 270–278, 2020. DOI: 10.18653/v1/2020.acl-demos.30.
|
[4] |
D. Adiwardana, M. T. Luong, D. R. So, J. Hall, N. Fiedel, R. Thoppilan, Z. Yang, A. Kulshreshtha, G. Nemade, Y. F. Lu, Q. V. Le. Towards a human-like open-domain chatbot. [Online], Available: https://arxiv.org/abs/2001.09977, 2020.
|
[5] |
S. Roller, E. Dinan, N. Goyal, D. Ju, M. Williamson, Y. H. Liu, J. Xu, M. Ott, E. M. Smith, Y. L. Boureau, J. Weston. Recipes for building an open-domain chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 300–325, 2021. DOI: 10.18653/v1/2021.eacl-main.24.
|
[6] |
H. Zhou, P. Ke, Z. Zhang, Y. X. Gu, Y. H. Zheng, C. J. Zheng, Y. D. Wang, C. H. Wu, H. Sun, X. C. Yang, B. S. Wen, X. Y. Zhu, M. Huang, J. Tang. EVA: An open-domain Chinese dialogue system with large-scale generative pre-training. [Online], Available: https://arxiv.org/abs/2108.01547, 2021.
|
[7] |
S. Q. Bao, H. He, F. Wang, H. Wu, H. F. Wang, W. Q. Wu, Z. Guo, Z. B. Liu, X. C. Xu. PLATO-2: Towards building an open-domain chatbot via curriculum learning. In Proceedings of the Findings of the Association for Computational Linguistics, pp. 2513–2525, 2021. DOI: 10.18653/v1/2021.findings-acl.222.
|
[8] |
S. Q. Bao, H. He, F. Wang, H. Wu, H. F. Wang, W. Q. Wu, Z. H. Wu, Z. Guo, H. Lu, X. X. Huang, X. Tian, X. C. Xu, Y. Z. Lin, Z. Y. Niu. PLATO-XL: Exploring the large-scale pre-training of dialogue generation. [Online], Available: https://arxiv.org/abs/2109.09519, 2021.
|
[9] |
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever. Language Models are Unsupervised Multitask Learners, OpenAI Technical Report, San Francisco, USA, [Online], Available: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf, 2019.
|
[10] |
J. Devlin, M. W. Chang, K. Lee, K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, USA, pp. 4171–4186, 2019. DOI: 10.18653/v1/N19-1423.
|
[11] |
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Q. Zhou, W. Li, P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, vol. 21, no. 1, Article number 140, 2020.
|
[12] |
A. Radford, K. Narasimhan, T. Salimans, I. Sutskever. Improving Language Understanding By Generative Pre-training, OpenAI Technical Report, San Francisco, USA, [Online], Available: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.
|
[13] |
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Article number 159, 2020.
|
[14] |
Z. L. Yang, Z. H. Dai, Y. M. Yang, J. Carbonell, R. Salakhutdinov, Q. V. Le. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 517, 2019.
|
[15] |
M. Lewis, Y. H. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880, 2020. DOI: 10.18653/v1/2020.acl-main.703.
|
[16] |
J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei. Scaling laws for neural language models. [Online], Available: https://arxiv.org/abs/2001.08361, 2020.
|
[17] |
Y. H. Liu, M. Ott, N. Goyal, J. F. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. [Online], Available: https://arxiv.org/abs/1907.11692, 2019.
|
[18] |
Z. Y. Zhang, Y. X. Gu, X. Han, S. Q. Chen, C. J. Xiao, Z. B. Sun, Y. Yao, F. C. Qi, J. Guan, P. Ke, Y. Z. Cai, G. Y. Zeng, Z. X. Tan, Z. Y. Liu, M. L. Huang, W. T. Han, Y. Liu, X. Y. Zhu, M. S. Sun. CPM-2: Large-scale cost-effective pre-trained language models. AI Open, vol. 2, pp. 216–224, 2021. DOI: 10.1016/j.aiopen.2021.12.003.
|
[19] |
Z. Y. Zhang, X. Han, H. Zhou, P. Ke, Y. X. Gu, D. M. Ye, Y. J. Qin, Y. S. Su, H. Z. Ji, J. Guan, F. C. Qi, X. Z. Wang, Y. N. Zheng, G. Y. Zeng, H. Q. Cao, S. Q. Chen, D. X. Li, Z. B. Sun, Z. Y. Liu, M. L. Huang, W. T. Han, J. Tang, J. Z. Li, X. Y. Zhu, M. S. Sun. CPM: A large-scale generative Chinese pre-trained language model. AI Open, vol. 2, pp. 93–99, 2021. DOI: 10.1016/j.aiopen.2021.07.001.
|
[20] |
W. Zeng, X. Z. Ren, T. Su, H. Wang, Y. Liao, Z. W. Wang, X. Jiang, Z. Z. Yang, K. S. Wang, X. D. Zhang, C. Li, Z. Y. Gong, Y. F. Yao, X. J. Huang, J. Wang, J. F. Yu, Q. Guo, Y. Yu, Y. Zhang, J. Wang, H. T. Tao, D. S. Yan, Z. X. Yi, F. Peng, F. Q. Jiang, H. Zhang, L. F. Deng, Y. H. Zhang, Z. Lin, C. Zhang, S. J. Zhang, M. Y. Guo, S. Z. Gu, G. J. Fan, Y. W. Wang, X. F. Jin, Q. Liu, Y. H. Tian. Pangu-α: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation. [Online], Available: https://arxiv.org/abs/2104.12369, 2021.
|
[21] |
S. H. Wu, X. D. Zhao, T. Yu, R. G. Zhang, C. Shen, H. L. Liu, F. Li, H. Zhu, J. G. Luo, L. Xu, X. W. Zhang. Yuan 1.0: Large-scale pre-trained language model in zero-shot and few-shot learning. [Online], Available: https://arxiv.org/abs/2110.04725, 2021.
|
[22] |
Z. S. Zhang, H. Q. Zhang, K. M. Chen, Y. H. Guo, J. Y. Hua, Y. L. Wang, M. Zhou. Mengzi: Towards lightweight yet ingenious pre-trained models for Chinese. [Online], Available: https://arxiv.org/abs/2110.06696, 2021.
|
[23] |
Y. F. Shao, Z. C. Geng, Y. T. Liu, J. Q. Dai, H. Yan, F. Yang, L. Zhe, H. J. Bao, X. P. Qiu. CPT: A pre-trained unbalanced transformer for both Chinese language understanding and generation. [Online], Available: https://arxiv.org/abs/2109.05729, 2021.
|
[24] |
S. H. Wang, Y. Sun, Y. Xiang, Z. H. Wu, S. Y. Ding, W. B. Gong, S. K. Feng, J. Y. Shang, Y. B. Zhao, C. Pang, J. X. Liu, X. Y. Chen, Y. X. Lu, W. X. Liu, X. Wang, Y. F. Bai, Q. L. Chen, L. Zhao, S. Y. Li, P. Sun, D. H. Yu, Y. J. Ma, H. Tian, H. Wu, T. Wu, W. Zeng, G. Li, W. Gao, H. F. Wang. ERNIE 3.0 Titan: Exploring larger-scale knowledge enhanced pre-training for language understanding and generation. [Online], Available: https://arxiv.org/abs/2112.12731, 2021.
|
[25] |
R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer, A. Kulshreshtha, H. T. Cheng, A. Jin, T. Bos, L. Baker, Y. Du, Y. G. Li, H. Lee, H. S. Zheng, A. Ghafouri, M. Menegali, Y. P. Huang, M. Krikun, D. Lepikhin, J. Qin, D. H. Chen, Y. Z. Xu, Z. F. Chen, A. Roberts, M. Bosma, V. Zhao, Y. Q. Zhou, C.C. Chang, I. Krivokon, W. Rusch, M. Pickett, P. Srinivasan, L. Man, K. Meier-Hellstern, M. R. Morris, T. Doshi, R. D. Santos, T. Duke, J. Soraker, B. Zevenbergen, V. Prabhakaran, M. Diaz, B. Hutchinson, K. Olson, A. Molina, E. Hoffman-John, J. Lee, L. Aroyo, R. Rajakumar, A. Butryna, M. Lamm, V. Kuzmina, J. Fenton, A. Cohen, R. Bernstein, R. Kurzweil, B. Aguera-Arcas, C. Cui, M. Croak, E. Chi, Q. Le. LaMDA: Language models for dialog applications. [Online], Available: https://arxiv.org/abs/2201.08239, 2022.
|
[26] |
S. Z. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, J. Weston. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 2204–2213, 2018. DOI: 10.18653/v1/P18-1205.
|
[27] |
E. Dinan, S. Roller, K. Shuster, A. Fan, M. Auli, J. Weston. Wizard of wikipedia: Knowledge-powered conversational agents. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
|
[28] |
H. Rashkin, E. M. Smith, M. Li, Y. L. Boureau. Towards empathetic open-domain conversation models: A new benchmark and dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, pp. 5370–5381, 2019. DOI: 10.18653/v1/P19-1534.
|
[29] |
Y. Wang, P. Ke, Y. Zheng, K. Huang, Y. Jiang, X. Zhu, M. Huang. A large-scale Chinese short-text conversation dataset. In Proceedings of the 9th CCF International Conference on Natural Language Processing and Chinese Computing, Springer, Zhengzhou, China, pp. 91–103, 2020. DOI: 10.1007/978-3-030-60450-9_8.
|
[30] |
S. Q. Bao, H. He, F. Wang, H. Wu, H. F. Wang. PLATO: Pre-trained dialogue generation model with discrete latent variable. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 85–96, 2020. DOI: 10.18653/v1/2020.acl-main.9.
|
[31] |
M. Chen, R. X. Liu, L. Shen, S. Z. Yuan, J. Y. Zhou, Y. Z. Wu, X. D. He, B. W. Zhou. The JDDC corpus: A large-scale multi-turn Chinese dialogue dataset for E-commerce customer service. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, pp. 459–466, 2020.
|
[32] |
K. Lee, D. Ippolito, A. Nystrom, C. Y. Zhang, D. Eck, C. Callison-Burch, N. Carlini. Deduplicating training data makes language models better. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, pp. 8424–8445, 2022. DOI: 10.18653/v1/2022.acl-long.577.
|
[33] |
I. Sutskever, O. Vinyals, Q. V. Le. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 3104–3112, 2014.
|
[34] |
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017.
|
[35] |
S. Rajbhandari, J. Rasley, O. Ruwase, Y. X. He. ZeRO: Memory optimizations toward training trillion parameter models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE, Atlanta, USA, Article number 20, 2020. DOI: 10.1109/SC41405.2020.00024.
|
[36] |
J. Rasley, S. Rajbhandari, O. Ruwase, Y. X. He. DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3505–3506, 2020. DOI: 10.1145/3394486.3406703.
|
[37] |
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, R. Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, vol. 114, no. 13, pp. 3521–3526, 2017. DOI: 10.1073/pnas.1611835114.
|
[38] |
S. Panigrahi, A. Nanda, T. Swarnkar. A survey on transfer learning. In Proceedings of International Conference on Innovative Computing and Communication 2019: Intelligent and Cloud Computing, Springer, Singapore, pp. 781–789, 2021. DOI: 10.1007/978-981-15-5971-6_83.
|
[39] |
A. Holtzman, J. Buys, L. Du, M. Forbes, Y. Choi. The curious case of neural text degeneration. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
|
[40] |
A. Graves. Sequence transduction with recurrent neural networks. [Online], Available: https://arxiv.org/abs/1211.3711, 2012.
|
[41] |
M. Li, J. Weston, S. Roller. ACUTE-EVAL: Improved dialogue evaluation with optimized questions and multi-turn comparisons. [Online], Available: https://arxiv.org/abs/1909.03087, 2019.
|
[42] |
Y. X. Nie, M. Williamson, M. Bansal, D. Kiela, J. Weston. I like fish, especially dolphins: Addressing contradictions in dialogue modeling. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1699–1713, 2021. DOI: 10.18653/v1/2021.acl-long.134.
|
[43] |
M. Li, S. Roller, I. Kulikov, S. Welleck, Y. L. Boureau, K. Cho, J. Weston. Don′t say that! Making inconsistent dialogue unlikely with unlikelihood training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4715–4728, 2020. DOI: 10.18653/v1/2020.acl-main.428.
|
[44] |
H. Zhou, C. J. Zheng, K. L. Huang, M. L. Huang, X. Y. Zhu. KdConv: A Chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7098–7108, 2020. DOI: 10.18653/v1/2020.acl-main.635.
|
[45] |
P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. T. Yih, T. Rocktäschel, S. Riedel, D. Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 793, 2020.
|
[46] |
H. Sun, G. X. Xu, J. W. Deng, J. L. Cheng, C. J. Zheng, H. Zhou, N. Y. Peng, X. Y. Zhu, M. L. Huang. On the safety of conversational models: Taxonomy, dataset, and benchmark. In Proceedings of the Findings of the Association for Computational Linguistics, Dublin, Ireland, pp. 3906–3923, 2022. DOI: 10.18653/v1/2022.findings-acl.308.
|
[47] |
P. Lison, J. Tiedemann. OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia, pp. 923–929, 2016.
|
[48] |
J. Guan, Z. E. Feng, Y. M. Chen, R. L. He, X. X. Mao, C. J. Fan, M. L. Huang. LOT: A story-centric benchmark for evaluating Chinese long text understanding and generation. Transactions of the Association for Computational Linguistics, vol. 10, pp. 434–451, 2022. DOI: 10.1162/tacl_a_00469.
|
[49] |
W. Q. Wu, Z. Guo, X. Y. Zhou, H. Wu, X. Y. Zhang, R. Z. Lian, H. F. Wang. Proactive human-machine conversation with explicit conversation goal. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 3794–3804, 2019. DOI: 10.18653/v1/P19-1369.
|
[50] |
Z. M. Liu, H. F. Wang, Z. Y. Niu, H. Wu, W. X. Che, T. Liu. Towards conversational recommendation over multi-type dialogs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1036–1049, 2020. DOI: 10.18653/v1/2020.acl-main.98.
|
[51] |
X. Y. Wang, C. Li, J. Q. Zhao, D. Yu. NaturalConv: A Chinese dialogue dataset towards multi-turn topic-driven conversation. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, pp. 14006–14014, 2021. DOI: 10.1609/aaai.v35i16.17649.
|
[52] |
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. [Online], Available: https://arxiv.org/abs/1412.6980, 2015.
|