湖北农业科学 ›› 2022, Vol. 61 ›› Issue (16): 186-192.doi: 10.14088/j.cnki.issn0439-8114.2022.16.036

• 信息工程 • 上一篇    下一篇

基于NEZHA-UniLM模型的农业领域问题生成技术研究

黎芬, 费凡, 彭琳   

  1. 云南农业大学大数据学院,昆明 650000
  • 收稿日期:2022-06-17 出版日期:2022-08-25 发布日期:2022-09-14
  • 通讯作者: 彭 琳(1978-),女,河南邓州人,副教授,主要从事问答系统和知识图谱研究,(电子信箱)dapengjiao@163.com。
  • 作者简介:黎 芬(1996-),女,贵州毕节人,在读硕士研究生,主要从事问题生成研究,(电话)13098501521(电子信箱)1902556879@qq.com。
  • 基金资助:
    云南省重大科技专项计划(202002AD080002)

Research on question generation technology in agricultural field based on NEZHA-UniLM model

LI Fen, FEI Fan, PENG Lin   

  1. College of Big Data,Yunnan Agricultural University, Kunming 650000, China
  • Received:2022-06-17 Online:2022-08-25 Published:2022-09-14

摘要: 为解决农业领域问答数据集缺乏以及目前大多采用端到端模型用于问题生成任务等不足,通过系列数据爬取、清洗、过滤和标注等工作,构建农业领域问题生成数据集,同时研究了基于NEZHA-UniLM预训练模型的农业领域问题生成。与其他基准模型相比,NEZHA-UniLM模型的BLEU-4和Rouge-L达到0.383 0和0.583 9。相较于未加对抗训练的预训练模型,其BLEU-4和Rouge-L分别提升了0.068 9和0.113 8。与基准模型NQG进行对比,BLEU-4和Rouge-L分别提升了0.195 3和0.151 7。结果表明,该模型不仅有效缓解生成问题与答案匹配度低、生成问题漏词或者多词和曝光误差等问题,还能有效提高生成问题的质量。

关键词: 自然语言处理, NEZHA-UniLM预训练模型, 对抗训练, 问题生成

Abstract: To address the lack of question and answer datasets in the agricultural domain and the fact that most end-to-end models are currently used for question generation tasks, after a series of data crawling, cleaning, filtering and annotation, a question generation dataset in agricultural domain was constructed; and the question generation in agricultural domain based on NEZHA-UniLM pre-training model was studied, for the cumulative error phenomenon caused by exposure error, adversarial training to generate perturbed samples to alleviate the problem was introduced. Compared with other benchmark models, the BLEU-4 and Rouge-L of the NEZHA-UniLM model reached 0.383 0 and 0.583 9. Compared with the pre-trained model without adversarial training, the BLEU-4 and Rouge-L were improved by 0.068 9 and 0.113 8, respectively. BLEU_4 and Rouge-L were improved by 0.195 3 and 0.151 7, respectively. The experimental results showed that the model not only effectively alleviated the problems of low matching between generated questions and answers, missing or multiple words in generated questions and exposure errors, but also effectively improved the quality of generated questions.

Key words: natural language processing, NEZHA-UniLM pre-training model, adversarial training, problem generation

中图分类号: