Abstract:
Accurate and high-throughput detection of rapeseed pods is a fundamental prerequisite for automated phenotypic analysis and genetic breeding. However, the slender morphology, unpredictable spatial orientation, and high-density overlap of pods in natural field environments pose significant challenges for conventional horizontal object detection frameworks. These traditional methods often suffer from severe background interference and feature fragmentation, failing to maintain bounding box integrity for elongated structures, which leads to labor-intensive manual post-processing. To address these limitations, this study proposes SwinPodDet, a sophisticated rotating object detection framework designed specifically for slender agricultural targets. The primary objective is to provide a non-destructive and automated pipeline for precisely localizing and quantifying individual pods directly from high-resolution field imagery, thereby bypassing the constraints of axis-aligned detectors.The proposed architecture introduces several key innovations to handle geometric and topological complexity. First, it utilizes a specialized backbone, R-SwinTransformer, which integrates an Adaptive Shifting Window Multi-Head Self-Attention (ASW-MHSA) mechanism. By dynamically adjusting the window aspect ratio and shift offsets, the backbone effectively captures the long-range dependencies of elongated pods, bridging the structural gap between local surface textures and global symmetry. Second, an Elongated Feature Enhancer (EFE) module is embedded to reinforce directional sensitivity. The EFE employs anisotropic depthwise separable convolutions—utilizing asymmetric 1×11 and 11×1 kernels—combined with a dual-attention mechanism to selectively amplify features along the pod's principal axis while suppressing orthogonal environmental noise and branch interference. Third, a Multi-Scale Context Channel Attention (MSCAA) module is integrated into the feature-pyramid neck. By utilizing four parallel heterogeneous branches—ranging from local average pooling to dilated separable convolutions—MSCAA adaptively fuses multi-scale contextual information through learnable weights, significantly mitigating missed detections in dense, overlapping clusters where boundary definitions are often blurred.The model was trained and rigorously validated on the newly constructed Rotated Bounding Box Rapeseed Pod Dataset (RBRD), a comprehensive benchmark containing 8,505 manually annotated rotating boxes that capture the crop across multiple growth stages. Experimental results demonstrate that SwinPodDet achieves a precision (P) of 98.50%, a recall (R) of 80.76%, and a mean average precision (mAP50) of 81.74%. Notably, SwinPodDet outperformed the baseline Oriented R-CNN by 13.42%, 0.42%, and 1.21% in P, R, and mAP50, respectively. Furthermore, compared to the rotating object detection network utilizing a standard Swin-Transformer backbone, the proposed model achieved respective improvements of 2.62%, 0.12%, and 0.18% across the same metrics.The framework also maintains high computational efficiency with a parameter count of 53.32 M and an inference speed of 19.0 FPS, striking an optimal Pareto balance between accuracy and deployment costs.Ablation studies and visualization analyses confirm that the proposed modules effectively resolve the "feature fracture" issue in high-overlap scenarios, achieving an impressive 95.51% counting accuracy at the whole-plant level. This robust performance across different maturation stages—from green-succulent to yellow-gray shriveled phases—confirms the model's superior adaptability to volatile field conditions. By effectively addressing the core challenges of rapeseed pod detection—namely dense overlap, extreme elongation, and orientation variability—SwinPodDet establishes a reliable, scalable, and end-to-end tool for high-throughput phenotyping. This work provides a solid algorithmic foundation for future automated yield estimation platforms and large-scale agronomic monitoring systems in the era of precision agriculture.