湖北农业科学 ›› 2022, Vol. 61 ›› Issue (22): 169-173.doi: 10.14088/j.cnki.issn0439-8114.2022.22.030

• 信息工程 • 上一篇    下一篇

基于Hadoop的精准扶贫大数据的数据挖掘研究

张向荣   

  1. 商洛职业技术学院,陕西 商洛 726000
  • 收稿日期:2022-03-08 出版日期:2022-11-25 发布日期:2023-01-11
  • 作者简介:张向荣(1975-),女,陕西商洛人,副教授,硕士,主要从事农业信息技术研究,(电话)13991449608(电子信箱)y.g.chen@foxmail.com。
  • 基金资助:
    陕西省2018年度院级重点课题(SLZY2018005)

Research on data mining of big data of targeted poverty alleviation based on Hadoop

ZHANG Xiang-rong   

  1. Shangluo Vocational and Technical College, Shangluo 726000, Shaanxi, China
  • Received:2022-03-08 Online:2022-11-25 Published:2023-01-11

摘要: 提出基于Hadoop的分布式贫困户检索架构,结合数据的特征项提取及文本聚类技术,对相似文本进行聚合,根据查询精度要求建立对应文本特征向量空间,同时,过滤关联性差的数据,使其不参与搜索,以提升系统的执行效率降低内执行速度。结果表明,贫困户检索算法查全率和查准率对比全节点遍历检索具有较高的查全率和查准率,减少访问的数据源数量,节省了系统的总体计算和网络资源,具有很大的应用推广价值。

关键词: Hadoop, 大数据搜索, 数据挖掘, 特征项, 结果集排序

Abstract: This paper proposed a distributed search algorithm for the poor based on Hadoop, which combined the data feature extraction and text clustering technology to aggregate the similar text, established the corresponding text feature vector space according to the query accuracy requirements, and filtered the data with poor correlation so that it did not participate in the search, so as to improve the execution efficiency of the system and reduce the internal execution speed. The experiment results showed that the algorithm can be combined with parameter settings to personalize the use of different application scenarios and achieve the best personalized effect, which had great application value.

Key words: Hadoop, big data search, data mining, characteristic item, sorting result sets

中图分类号: