HUBEI AGRICULTURAL SCIENCES ›› 2026, Vol. 65 ›› Issue (5): 179-186.doi: 10.14088/j.cnki.issn0439-8114.2026.05.028

• Agricultural Engineering • Previous Articles     Next Articles

Multimodal recognition of tobacco curing states by fusing Swin Transformer and LSTM

LUO Zheng-shan1, WANG Yue-qiu1, GAO Yi-qiong1,2   

  1. 1. School of Management, Xi'an University of Architecture and Technology, Xi'an 710055, China;
    2. Longdong University, Qingyang 745000, Gansu, China
  • Received:2026-02-19 Online:2026-05-25 Published:2026-05-26

Abstract: Accurate identification of tobacco curing states is crucial for enhancing tobacco quality and curing efficiency. To address the limitations of existing methods in global image feature extraction, temporal dependency modeling, and multimodal fusion, a recognition model that integrated Swin Transformer with Long Short-Term Memory (LSTM) networks was proposed and a gated cross-modal attention mechanism was introduced. The model leveraged Swin Transformer to capture both local and global visual features from tobacco leaf images, and employed LSTM to model sequential sensor data, such as dry- and wet-bulb temperatures and moisture content, to extract their long-term dependencies. Furthermore, a gated cross-modal attention mechanism was utilized to dynamically and adaptively fuse visual features from images and physical features from sensors, significantly improving the accuracy and robustness of state identification. The results demonstrated that the proposed model achieved a state identification accuracy of 94.52% on the test set, outperforming various baseline models; ablation experiments further verified the effectiveness of each module. In addition, the evolution patterns of stage probabilities output by the model were applied to the automatic prediction of the key process "turning point", yielding a mean absolute error of 0.47 h. This provided reliable technical support for intelligent control and process decision-making in tobacco curing.

Key words: tobacco curing, state recognition, Swin Transformer, LSTM, feature fusion

CLC Number: