A non-contact estimation method for feed residue based on dual-modal MobileViTv2

doi:10.14088/j.cnki.issn0439-8114.2026.02.030

Abstract

Abstract: Aiming at the problems of traditional feed residue detection methods relying on contact sensors, high cost, and the need to modify feeding troughs, a lightweight convolutional fusion regression model （dual-modal MobileViTv2 + CMFIM + SE） based on dual-modal MobileViTv2 was proposed to achieve non-contact and high-precision automatic estimation of feed residue. Taking RGB images and depth images as input, the model extracted multi-scale features respectively through the dual-modal MobileViTv2 and introduced a cross-modal multi-scale feature interaction module (CMFIM) at four levels to achieve spatial-channel dual interaction between RGB and depth features. An SE module was employed to adaptively calibrate channel weights and enhance high-level semantic representation capability. The prediction results were output through a multilayer perceptron regression head. On the self-built dataset, the mean absolute error (MAE) and root mean square error (RMSE) of the dual-modal MobileViTv2 + CMFIM + SE model were 98.24 g and 140.21 g, respectively, which represented reductions of 21.65% and 16.73% compared to the dual-modal MobileViTv2 model without the CMFIM and SE modules, and the parameter size of the model was only 9.9×10⁶. The model combined the advantages of high accuracy, strong robustness, and lightweight design, providing a feasible technical pathway for precision feeding in intelligent livestock farming.

Key words: dual-modal MobileViTv2, feed residue, non-contact estimation, RGB images, depth images

CLC Number:

TP391

CAI Xiao-jin, BAI Tao, LI Xiang, QIAO Rui-qiang. A non-contact estimation method for feed residue based on dual-modal MobileViTv2[J]. HUBEI AGRICULTURAL SCIENCES, 2026, 65(2): 202-208.

References

[1] 努尔古再丽·阿力木,刘威.新疆肉羊产业发展现状分析与对策建议[J].中国集体经济,2022(23):18-20.
[2] 罗鹏辉,刘琦.新疆肉牛肉羊产业发展情况分析及建议[J].新疆畜牧业,2019,34(6):17-19,11.
[3] BACH A, IGLESIAS C, BUSTO I.Technical note: A computerized system for monitoring feeding behavior and individual feed intake of dairy cattle[J]. Journal of dairy science, 2004, 87(12): 4207-4209.
[4] MERENDA V R,FIGUEIREDO C C,GONZÁLEZ T D, et al. Technical note: Validation of a system for monitoring individual behavior of Holstein cows[J]. Journal of dairy science, 2020, 103(8): 7425-7430.
[5] BLOCH V, LEVIT H, HALACHMI I.Design a system for measuring individual cow feed intake in commercial dairies[J]. Animal, 2021, 15(7): 100277.
[6] 石建华. 物联网平台下颗粒型饲料生产线远程监控技术探究[J].饲料工业,2016,37(22):69-73.
[7] 崔锦辉,刘雨航,张海燕.一种饲料混合机下料分档运动自动控制技术研究[J].粮食与饲料工业,2024(6):44-48,52.
[8] 农钧麟,明鑫.牧畜饲料加工过程中的自动化控制与管理研究[J].农业技术与装备,2023(9):97-98,101.
[9] BEZEN R, EDAN Y, HALACHMI I.Computer vision system for measuring individual cow feed intake using RGB-D camera and deep learning algorithms[J]. Computers and electronics in agriculture, 2020, 172: 105345.
[10] SAAR M,EDAN Y,GODO A,et al.A machine vision system to predict individual cow feed intake of different feeds in a cowshed[J]. Animal, 2022, 16(1): 100432.
[11] 王鑫杰. 基于机器视觉的奶牛个体进食信息自动监测方法研究[D].哈尔滨:东北农业大学,2023.
[12] 任海林. 奶牛饲喂辅助机器人视觉识别系统开发与试验[D].广州:华南理工大学,2023.
[13] CAMPOS C,ELVIRA R,RODRÍGUEZ J J G,et al.ORB-SLAM3: An accurate open-source library for visual, visual-inertial, and multimap SLAM[J]. IEEE transactions on robotics,2021,37(6): 1874-1890.
[14] SHELLEY A N.Monitoring dairy cow feed intake using machine vision[D]. Lexington, Kentucky:University of Kentucky,2013.
[15] SHELLEY A N, LAU D L, STONE A E, et al.Short communication: Measuring feed volume and weight by machine vision[J]. Journal of dairy science, 2016, 99(1): 386-391.
[16] JAN LASSEN, THOMASEN J R, HANSEN R H, et al.Individual measure of feed intake on in-house commercial dairy cattle using 3D camera technology[A].Proceedings of the world congress on genetics applied to livestock production[C]. World congress on genetics applied to livestock production, 2018.
[17] MEHTA S, APPLE M. Separable self-attention for mobile vision transformers[EB/OL].(2022-06-06). https://arxiv.org/abs/2206.02680.
[18] ANSEL J, YANG E, HE H, et al.PyTorch 2: Faster machine learning through dynamic Python bytecode transformation and graph compilation[A].Proceedings of the 29th ACM international conference on architectural support for programming languages and operating systems, Volume 2[C]. La Jolla CA USA:ACM, 2024.929-947.
[19] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words: Transformers for image recognition atscale[J]. (2020-10-22).https:1110.485501arXiv.2010.11929, 2020.
[20] 陆声链,李沂杨,李帼,等.基于RGB与深度图像融合的生菜表型特征估算方法[J].农业机械学报,2025,56(1):84-91,101.
[21] WOO S, PARK J, LEE J Y, et al.CBAM: Convolutional block attention module[M].Cham: Springer international publishing, 2018.
[22] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[A].2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)[C]. Las Vegas, NV, USA: IEEE, 2016.770-778.
[23] TAN M, LE Q.Efficientnet: Rethinking model scaling for convolutional neural networks[A].International conference on machine learning[C]. PMLR, 2019. 6105-6114.
[24] DING X H, ZHANG X Y, MA N N, et al.RepVGG: Making VGG-style convNets great again[A].2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR)[C]. Nashville, TN, USA:IEEE, 2021.13728-13737.
[25] LIU Z, MAO H Z, WU C Y, et al.A ConvNet for the 2020s[A].2022 IEEE/CVF conference on computer vision and pattern recognition(CVPR)[C]. New Orleans,LA,USA:IEEE, 2022.119 66-11976.
[26] HU J,SHEN L,SUN G.Squeeze-and-excitation networks[A]. 2018 IEEE/CVF conference on computer vision and pattern recognition[C]. Salt Lake City, UT, USA: IEEE, 2018.7132-7141.