Abstract:
Feed residue state in troughs is a key indicator for optimizing feed conversion, reducing waste, and giving early warnings of health abnormalities. In commercial pig production, feed costs account for 60%–70% of total expenses, which makes accurate monitoring essential. However, accurate residue detection remains challenging. Residue levels show high inter-class visual similarity, and boundaries between adjacent categories are subtle. Environmental factors add further interference, including uneven illumination, suspended feed dust, and trough reflection. Existing deep learning models lack the fine-grained feature extraction needed to separate visually similar states such as small and medium residue. This study developed SWF-YOLO (Spatial and Channel Synergistic Attention Weight Fusion YOLO), an improved YOLOv11n model for end-to-end classification of four residue states. Three innovations were integrated into the baseline. The C2f modules in the backbone were replaced with C2f-SCSA modules for fine-grained feature extraction. A weight-fusion strategy replaced neck concatenation for adaptive multi-scale fusion. A C2PSA module was added to capture long-range spatial dependencies. A dataset of 7,748 annotated samples was constructed from 308 growing pigs across 52 pens. A genetic algorithm then optimized 16 hyperparameters over 300 iterations. SWF-YOLO achieved a mean average precision at an IoU threshold of 0.5 (mAP50) of 93.78% on the test set. Its precision was 84.21%, recall was 89.40%, and F1-score was 86.73%. Compared with the YOLOv11n baseline, it improved mAP50, recall, and F1-score by 2.19, 4.95, and 2.19 percentage points, respectively. The model also became more compact. Parameters were reduced by 26.74% to 1.89 M, the computational cost was 3.51 GFLOPs, and the inference speed reached 50.2 frames per second. These figures meet the requirements of real-time edge deployment. Ablation experiments clarified the role of each component. The SCSA module contributed the largest precision gain of 6.26 percentage points. The weight-fusion strategy achieved the greatest efficiency improvement, reducing computational cost by 16.14%. The full three-module combination yielded the best balance between accuracy and efficiency. Fine-grained discrimination also improved markedly. The confusion rate between the easily misclassified small- and medium-residue categories dropped from 12.3% in the baseline to 5.1%, a 58.54% reduction. After genetic algorithm optimization, class-specific F1-scores reached 94.91%, 88.19%, 84.56%, and 80.23% for the no-, small-, medium-, and large-residue categories, respectively. In comparative experiments, SWF-YOLO outperformed Faster R-CNN, YOLOv8n, YOLOv9t, YOLOv10n, and YOLOv12n. The mAP50 improvements were 2.54, 2.53, 1.95, 2.56, and 2.80 percentage points, respectively. These results confirm its advantage over both two-stage and mainstream one-stage detectors. They also indicate that SWF-YOLO maintains a favorable accuracy and efficiency trade-off, making it well suited for resource-constrained edge devices in commercial pig farms. Grad-CAM visualization further showed that the SCSA mechanism directed attention more precisely toward critical trough regions than the SE, CBAM, ECA, SGA, and CA attention modules. SWF-YOLO addresses the fine-grained classification challenges of pig trough feed residue detection. It synergistically integrates spatial and channel attention, adaptive weighted feature fusion, and position-sensitive attention. The model achieves state-of-the-art accuracy while remaining lightweight. It supports feed-waste control and early health warning in precision livestock farming. These findings provide a practical reference for developing vision-based intelligent monitoring equipment in commercial pig production.