基于NEZHA-UniLM模型的农业领域问题生成技术研究

doi:10.14088/j.cnki.issn0439-8114.2022.16.036

湖北农业科学 ›› 2022, Vol. 61 ›› Issue (16): 186-192.doi: 10.14088/j.cnki.issn0439-8114.2022.16.036

基于NEZHA-UniLM模型的农业领域问题生成技术研究

黎芬, 费凡, 彭琳

云南农业大学大数据学院,昆明 650000

收稿日期:2022-06-17 出版日期:2022-08-25 发布日期:2022-09-14
通讯作者: 彭琳（1978-）,女,河南邓州人,副教授,主要从事问答系统和知识图谱研究,（电子信箱）dapengjiao@163.com。
作者简介:黎芬（1996-）,女,贵州毕节人,在读硕士研究生,主要从事问题生成研究,（电话）13098501521（电子信箱）1902556879@qq.com。
基金资助:
云南省重大科技专项计划（202002AD080002）

Research on question generation technology in agricultural field based on NEZHA-UniLM model

LI Fen, FEI Fan, PENG Lin

College of Big Data,Yunnan Agricultural University, Kunming 650000, China

Received:2022-06-17 Online:2022-08-25 Published:2022-09-14

摘要/Abstract

摘要： 为解决农业领域问答数据集缺乏以及目前大多采用端到端模型用于问题生成任务等不足,通过系列数据爬取、清洗、过滤和标注等工作,构建农业领域问题生成数据集,同时研究了基于NEZHA-UniLM预训练模型的农业领域问题生成。与其他基准模型相比,NEZHA-UniLM模型的BLEU-4和Rouge-L达到0.383 0和0.583 9。相较于未加对抗训练的预训练模型,其BLEU-4和Rouge-L分别提升了0.068 9和0.113 8。与基准模型NQG进行对比,BLEU-4和Rouge-L分别提升了0.195 3和0.151 7。结果表明,该模型不仅有效缓解生成问题与答案匹配度低、生成问题漏词或者多词和曝光误差等问题,还能有效提高生成问题的质量。

关键词: 自然语言处理, NEZHA-UniLM预训练模型, 对抗训练, 问题生成

Abstract: To address the lack of question and answer datasets in the agricultural domain and the fact that most end-to-end models are currently used for question generation tasks, after a series of data crawling, cleaning, filtering and annotation, a question generation dataset in agricultural domain was constructed; and the question generation in agricultural domain based on NEZHA-UniLM pre-training model was studied, for the cumulative error phenomenon caused by exposure error, adversarial training to generate perturbed samples to alleviate the problem was introduced. Compared with other benchmark models, the BLEU-4 and Rouge-L of the NEZHA-UniLM model reached 0.383 0 and 0.583 9. Compared with the pre-trained model without adversarial training, the BLEU-4 and Rouge-L were improved by 0.068 9 and 0.113 8, respectively. BLEU_4 and Rouge-L were improved by 0.195 3 and 0.151 7, respectively. The experimental results showed that the model not only effectively alleviated the problems of low matching between generated questions and answers, missing or multiple words in generated questions and exposure errors, but also effectively improved the quality of generated questions.

Key words: natural language processing, NEZHA-UniLM pre-training model, adversarial training, problem generation

中图分类号:

TP391

黎芬, 费凡, 彭琳. 基于NEZHA-UniLM模型的农业领域问题生成技术研究[J]. 湖北农业科学, 2022, 61(16): 186-192.

LI Fen, FEI Fan, PENG Lin. Research on question generation technology in agricultural field based on NEZHA-UniLM model[J]. HUBEI AGRICULTURAL SCIENCES, 2022, 61(16): 186-192.

参考文献

[1] LIU B, WEI H, NIU D, et al.Asking questions the human way:Scalable question-answer generation from text Corpus[J]//Proceedings of the web conference https://doi.10.1145/3366423.3380270.
[2] YANG Z, HU J, SALAKHUTDINOV R, et al.Semi-supervised QA with generative domain-adaptive nets[J].https://doi.org/10.48550/arXiv.1702.02206.
[3] KENNETH M C,SYLVIA W,FRANKLIN D H.Artificial paranoia[J]. https://doi.org/10.1016/0004 B702(71)90002-6.
[4] 李岩,胡文岭.基于知识图谱的农业知识问答系统研究[J].智慧农业导刊,2021,1(11):20-22.
[5] 王郝日钦,王晓敏,缪祎晟,等.基于BERT-Attention-DenseBiGRU的农业问答社区问句相似度匹配[J].农业机械学报,2022,53(1):244-252.
[6] 吴赛赛,周爱莲,谢能付,等.基于深度学习的作物病虫害可视化知识图谱构建[J].农业工程学报,2020,36(24):177-185.
[7] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J].2018, arXiv:1810.04805.
[8] WEI J, REN X, LI X, et al.NEZHA: Neural contextualized representation for Chinese language understanding[J]. https://doi.org/10.48550/arXiv.1909.00204
[9] CUI Y, CHE W, LIU T, et al.Pre-training with whole word masking for Chinese BERT[J]. https://doi.org/10.48550/arXiv.1906.08101.
[10] PAULIUS M, SHARAN N, JONAH A, et al.Mixed precision training[J]. https://doi.org/10.48550/arXiv.1710.03740
[11] YOU Y, LI J, HSEU J, et al.Reducing bert pre-training time from 3 days to 76 minutes[J]. https://doi.org/10.48550/arXiv.1904.00962
[12] DONG L, YANG N, WANG W, et al.Unified Language Model Pre-training for Natural Language Understanding and Generation[A]. NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems[C]. 2019:13063-13075.
[13] RADFORD A, SALIMANS T.Improving language understanding by generative pre-training[A]. https://www.cs.ubc.ca/~amuham 01/LING530/papers/radford2018impr oving.
[14] 孙宝山,谭浩.基于ALBERT-UniLM模型的文本自动摘要技术研究[J].http://kns.cnki.net/kcms/detail/11.2127.TP.202108 02.0922.002.html
[15] MADRY A, MAKELOV A, SCHMIDT L, et al.Towards deep learning models resistant to adversarial attacks[J].https://doi.org/10.48550/arXiv.1706.06083
[16] GOODFELLOW I J, SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[J]. https://doi.org/10.48550/arXiv.1412.6572
[17] 周青宇,周明.基于深度神经网络的文本问题生成技术综述[J].智能计算机与应用,2020,10(8):10-13,18.
[18] 吴云芳,张仰森.问题生成研究综述[J].中文信息学报,2021,35(7):1-9.
[19] LIN C Y.Rouge:A package for automatic evaluation of summaries[A]. Proceedings of the workshop on text summarization branches out,2004(1):74-81.
[20] PAPINENI K, ROUKOS S, WARD T, et al.BLEU: a method for automatic evaluation of machine translation[J]. https://doi.org/10.3115/1073083.1073135.

基于NEZHA-UniLM模型的农业领域问题生成技术研究

Research on question generation technology in agricultural field based on NEZHA-UniLM model

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	雷娟. 基于NB-IoT的农田环境监测系统设计与实现[J]. 湖北农业科学, 2022, 61(14): 165-170.
[2]	麻之润, 费凡, 黎芬, 董慧洁, 彭琳. 基于少样本学习的农业领域短文本分类研究[J]. 湖北农业科学, 2022, 61(13): 151-155.
[3]	李亚文, 刘爱军, 陈垚. 基于GLCM纹理特征提取的黄瓜叶部病害检测算法研究[J]. 湖北农业科学, 2022, 61(9): 141-145.
[4]	李辉, 罗敏, 岳佳欣. 基于计算机视觉技术的水稻病害图像识别研究进展[J]. 湖北农业科学, 2022, 61(4): 9-15.
[5]	王涛, 刘星亮, 王泓淇, 于志军. 基于机器学习的富硒茶评论文本消费者满意度感知研究[J]. 湖北农业科学, 2022, 61(1): 146-152.
[6]	张竞超, 翟乃琦, 王一博, 云利军. 基于物联网技术的仓储烟叶霉变状态智能监测方法研究[J]. 湖北农业科学, 2021, 60(20): 167-170.
[7]	杨涛, 雷进, 朱皓睿, 胡钦云, 龙波. 基于图像特征融合的麦冬叶部病害识别[J]. 湖北农业科学, 2021, 60(7): 135-138.
[8]	杨涛, 马京晶, 雷进. 基于表面缺陷识别的猕猴桃分级方法[J]. 湖北农业科学, 2021, 60(7): 145-148.
[9]	郑小南, 张吴平, 韩冀皖, 杨凡, 刘宇平, 梁靓, 李富忠. 基于H分量K-Means多种环境条件下谷子冠层图像提取[J]. 湖北农业科学, 2021, 60(5): 125-130.
[10]	郝王丽, 尉培岩, 韩猛, 张丽, 席瑞泽. 基于YOLOv3网络的小麦麦穗检测及计数[J]. 湖北农业科学, 2021, 60(2): 158-160.
[11]	李艳红, 樊同科. 基于深度残差的多特征多粒度农业病虫害识别研究[J]. 湖北农业科学, 2020, 59(16): 153-157.
[12]	黄涛, 方梦瑞, 夏华鵾, 左亮亮, 吕军. 基于清晰度的茶叶嫩芽聚类分割方法[J]. 湖北农业科学, 2020, 59(8): 154-157.
[13]	曹成, 郭敏. 基于模糊区间ER的TOPSIS群决策方法[J]. 湖北农业科学, 2020, 59(6): 160-164.
[14]	郝雅洁, 张吴平, 史维杰, 赵明霞, 吕致, 李富忠. 基于计算机视觉的小麦叶面积测量[J]. 湖北农业科学, 2019, 58(16): 129-132.
[15]	黄杰贤, 刘燕, 杨冬涛. 基于BP神经网络的柚子分类研究[J]. 湖北农业科学, 2018, 57(24): 112-115.