Abstract:
Manual inspection of residual feed levels in intensive meat rabbit farming was historically labor-intensive and prone to human error, while emerging 3D vision monitoring solutions remained prohibitively expensive and fragile in harsh breeding environments. Furthermore, traditional 2D deep learning algorithms often struggled to balance inference speed with regression accuracy on resource-constrained edge devices, failing to address the strong nonlinearity between 2D projected features and 3D feed weight. To address these challenges, this study aimed to develop and validate a robust, low-cost, and real-time residual feed estimation system deployed on an autonomous inspection robot. The research methodology adopted a "segmentation-regression" decoupled two-stage architecture designed to resolve the conflict between visual perception complexity and limited edge computing power. In the first stage, the YOLOv8-seg instance segmentation model was employed to rapidly and accurately extract the feed region from the complex background of the rabbit cage. This model was selected for its advanced single-stage anchor-free architecture, which utilized the C2f module to optimize gradient flow and a decoupled head structure to separate classification from localization, thereby ensuring high-precision segmentation of irregular feed boundaries without the computational overhead of two-stage models like Mask R-CNN. In the second stage, rather than relying solely on pixel area, the study extracted four key geometric features—projection area (
S), contour perimeter (
Z), and the length (
L) and width (
K) of the minimum bounding rectangle—to construct a multidimensional feature vector. These specific features were chosen to physically constrain the ambiguity of 2D-to-3D mapping caused by the inclined side walls of the feeders and the irregular "crater-like" accumulation shapes formed by rabbit foraging behaviors. To map these geometric features to weight, a Stacking ensemble regression model was constructed, integrating three heterogeneous base learners: eXtreme Gradient Boosting (XGBoost), Random Forest (RF), and Back Propagation Neural Network (BPNN). A linear regression meta-learner was utilized to combine the predictions of these base models, effectively leveraging the inductive bias of tree-based models for tabular data while mitigating the overfitting risks associated with single neural networks. The final algorithm was optimized using TensorRT and deployed on an NVIDIA Jetson Xavier NX embedded platform, where a rigorous trade-off analysis between FP32 and INT8 quantization modes was conducted to ensure geometric fidelity. Experimental results indicated that the proposed system achieved exceptional performance on the constructed dataset. The YOLOv8-seg model attained a Average Precision (AP50) of 0.995 and a Mean Pixel Accuracy (MPA) of 0.9695, demonstrating its capability to capture fine-grained edge details essential for feature extraction. In the regression task, the Stacking ensemble model achieved a Mean Absolute Error (MAE) of 2.3458 g and a coefficient of determination (
R2) of 0.9904. A rigorous internal baseline comparison revealed that the Stacking strategy reduced the MAE by 53.2% compared to a single BPNN model under identical experimental conditions, proving the effectiveness of the ensemble strategy in handling small-sample geometric regression tasks. Regarding system deployment, the deployment tests showed that while INT8 quantization offered higher speeds, it introduced "sawtooth" edge noise that degraded regression accuracy; conversely, the FP32 precision mode achieved an inference speed of 29.2 frames per second (FPS). This speed provided a nearly three-fold computational redundancy relative to the 10 FPS requirement of the inspection robot traveling at 0.2 m/s, successfully realizing "on-the-fly" detection. In conclusion, this study validates a lightweight, decoupled machine vision framework that effectively overcomes the inherent ambiguity of 2D features through feature engineering and ensemble learning. By unifying high precision, strong robustness, and low hardware costs, the proposed system offers a theoretically grounded and practically viable technical solution for the engineering application of precision feeding systems in standardized meat rabbit farming.