湖北农业科学 ›› 2026, Vol. 65 ›› Issue (5): 179-186.doi: 10.14088/j.cnki.issn0439-8114.2026.05.028

• 农业工程 • 上一篇    下一篇

融合Swin Transformer与LSTM的烟叶烘烤状态识别方法

骆正山1, 王月秋1, 髙懿琼1,2   

  1. 1.西安建筑科技大学管理学院,西安 710055;
    2.陇东学院,甘肃 庆阳 745000
  • 收稿日期:2026-02-19 出版日期:2026-05-25 发布日期:2026-05-26
  • 通讯作者: 王月秋(1999-),女,陕西西安人,硕士,主要从事烟叶烘烤过程的智能感知、状态预测与工艺决策优化研究工作,(电子信箱)754589662@qq.com。
  • 作者简介:骆正山(1969-),男,陕西汉中人,博士,教授,主要从事油气管道的腐蚀防护、风险评估与建模等研究工作。
  • 基金资助:
    陕西省烟草公司铜川市公司2023年度科技项目(TCYCKJ-2023-01)

Multimodal recognition of tobacco curing states by fusing Swin Transformer and LSTM

LUO Zheng-shan1, WANG Yue-qiu1, GAO Yi-qiong1,2   

  1. 1. School of Management, Xi'an University of Architecture and Technology, Xi'an 710055, China;
    2. Longdong University, Qingyang 745000, Gansu, China
  • Received:2026-02-19 Published:2026-05-25 Online:2026-05-26

摘要: 烟叶烘烤状态的精准识别是提升烟草品质与烘烤效率的关键。针对现有方法在图像全局特征提取、时序依赖建模与多模态融合方面的不足,提出了一种融合Swin Transformer与长短期记忆网络(LSTM)并引入门控跨模态注意力机制的识别模型。该模型利用Swin Transformer捕捉烟叶图像的局部与全局视觉特征,同时采用LSTM对干湿球温度、含水率等时序传感器数据进行建模,以挖掘其长期依赖关系。在此基础上,通过门控跨模态注意力机制动态、自适应地融合图像视觉特征与传感器物理特征,有效提升了状态识别的精度与鲁棒性。结果表明,所提模型在测试集上的识别准确率达94.52%,优于多种基准模型;消融试验进一步验证了各模块的有效性。此外,将模型输出的阶段概率演化规律应用于关键工艺“转火点”的自动预测,平均绝对误差为0.47 h,为烟叶烘烤的智能控制与工艺决策提供了可靠的技术支持。

关键词: 烟叶烘烤, 状态识别, Swin Transformer, LSTM, 特征融合

Abstract: Accurate identification of tobacco curing states is crucial for enhancing tobacco quality and curing efficiency. To address the limitations of existing methods in global image feature extraction, temporal dependency modeling, and multimodal fusion, a recognition model that integrated Swin Transformer with Long Short-Term Memory (LSTM) networks was proposed and a gated cross-modal attention mechanism was introduced. The model leveraged Swin Transformer to capture both local and global visual features from tobacco leaf images, and employed LSTM to model sequential sensor data, such as dry- and wet-bulb temperatures and moisture content, to extract their long-term dependencies. Furthermore, a gated cross-modal attention mechanism was utilized to dynamically and adaptively fuse visual features from images and physical features from sensors, significantly improving the accuracy and robustness of state identification. The results demonstrated that the proposed model achieved a state identification accuracy of 94.52% on the test set, outperforming various baseline models; ablation experiments further verified the effectiveness of each module. In addition, the evolution patterns of stage probabilities output by the model were applied to the automatic prediction of the key process "turning point", yielding a mean absolute error of 0.47 h. This provided reliable technical support for intelligent control and process decision-making in tobacco curing.

Key words: tobacco curing, state recognition, Swin Transformer, LSTM, feature fusion

中图分类号: