融合实例分割与Stacking集成学习的兔笼料盒饲料余量估测方法

姜伟; 徐际童; 杨慧琳; 吴在炎; 王粮局; 王红英

doi:10.11975/j.issn.1002-6819.202511165

融合实例分割与Stacking集成学习的兔笼料盒饲料余量估测方法

Segmentation and stacking ensemble learning-based method for estimating residual feed in rabbit cage feeders

摘要

摘要: 针对集约化肉兔养殖中人工巡检劳动强度大、现有 3D 视觉监测成本高以及传统 2D 深度学习算法在边缘端推理速度慢、回归精度不足等问题，该研究设计并验证了一套面向嵌入式终端的饲料余量估测方法。研究构建了搭载工业相机的自主巡检机器人平台，采用“分割-回归”双阶段解耦架构：第一阶段采用改进的 YOLOv8-seg 实例分割模型，利用其原生 C2f 模块与解耦头结构，在保证边缘轮廓分割精度的同时提升推理速度；第二阶段提取分割区域的面积、周长、最小外接矩形长宽等几何特征，构建基于 XGBoost、随机森林与 BP 神经网络的 Stacking 集成回归模型，建立 2D 图像特征与 3D 饲料质量之间的稳健映射关系。最后算法经 TensorRT 量化加速后部署于 NVIDIA Jetson Xavier NX 边缘计算平台。试验结果表明，YOLOv8-seg 模型在分割任务中平均精度（AP50）为0.995，平均像素准确率（mean pixel accuracy,MPA）为 0.9695；Stacking 集成模型在质量估测中的平均绝对误差（mean absolute error, MAE）为 2.3458 g，决定系数（R²）达 0.9904。经量化加速后，边缘端在保持 FP32 图像精度的前提下，推理速度达到 29.2 帧/s，满足巡检机器人 0.2 m/s 行进速度下的实时监测需求。该研究验证了融合实例分割与集成学习的方案在处理非结构化农业场景中“特征歧义”问题的有效性，为肉兔精准饲喂系统的工程化应用提供了可靠的技术支撑。

Abstract: Manual inspection of residual feed levels in intensive meat rabbit farming was historically labor-intensive and prone to human error, while emerging 3D vision monitoring solutions remained prohibitively expensive and fragile in harsh breeding environments. Furthermore, traditional 2D deep learning algorithms often struggled to balance inference speed with regression accuracy on resource-constrained edge devices, failing to address the strong nonlinearity between 2D projected features and 3D feed weight. To address these challenges, this study aimed to develop and validate a robust, low-cost, and real-time residual feed estimation system deployed on an autonomous inspection robot. The research methodology adopted a "segmentation-regression" decoupled two-stage architecture designed to resolve the conflict between visual perception complexity and limited edge computing power. In the first stage, the YOLOv8-seg instance segmentation model was employed to rapidly and accurately extract the feed region from the complex background of the rabbit cage. This model was selected for its advanced single-stage anchor-free architecture, which utilized the C2f module to optimize gradient flow and a decoupled head structure to separate classification from localization, thereby ensuring high-precision segmentation of irregular feed boundaries without the computational overhead of two-stage models like Mask R-CNN. In the second stage, rather than relying solely on pixel area, the study extracted four key geometric features—projection area (S), contour perimeter (Z), and the length (L) and width (K) of the minimum bounding rectangle—to construct a multidimensional feature vector. These specific features were chosen to physically constrain the ambiguity of 2D-to-3D mapping caused by the inclined side walls of the feeders and the irregular "crater-like" accumulation shapes formed by rabbit foraging behaviors. To map these geometric features to weight, a Stacking ensemble regression model was constructed, integrating three heterogeneous base learners: eXtreme Gradient Boosting (XGBoost), Random Forest (RF), and Back Propagation Neural Network (BPNN). A linear regression meta-learner was utilized to combine the predictions of these base models, effectively leveraging the inductive bias of tree-based models for tabular data while mitigating the overfitting risks associated with single neural networks. The final algorithm was optimized using TensorRT and deployed on an NVIDIA Jetson Xavier NX embedded platform, where a rigorous trade-off analysis between FP32 and INT8 quantization modes was conducted to ensure geometric fidelity. Experimental results indicated that the proposed system achieved exceptional performance on the constructed dataset. The YOLOv8-seg model attained a Average Precision (AP50) of 0.995 and a Mean Pixel Accuracy (MPA) of 0.9695, demonstrating its capability to capture fine-grained edge details essential for feature extraction. In the regression task, the Stacking ensemble model achieved a Mean Absolute Error (MAE) of 2.3458 g and a coefficient of determination (R²) of 0.9904. A rigorous internal baseline comparison revealed that the Stacking strategy reduced the MAE by 53.2% compared to a single BPNN model under identical experimental conditions, proving the effectiveness of the ensemble strategy in handling small-sample geometric regression tasks. Regarding system deployment, the deployment tests showed that while INT8 quantization offered higher speeds, it introduced "sawtooth" edge noise that degraded regression accuracy; conversely, the FP32 precision mode achieved an inference speed of 29.2 frames per second (FPS). This speed provided a nearly three-fold computational redundancy relative to the 10 FPS requirement of the inspection robot traveling at 0.2 m/s, successfully realizing "on-the-fly" detection. In conclusion, this study validates a lightweight, decoupled machine vision framework that effectively overcomes the inherent ambiguity of 2D features through feature engineering and ensemble learning. By unifying high precision, strong robustness, and low hardware costs, the proposed system offers a theoretically grounded and practically viable technical solution for the engineering application of precision feeding systems in standardized meat rabbit farming.

HTML全文

参考文献(41)

施引文献

资源附件(0)