基于SwinPodDet的旋转框角果检测与计数分析

李婕; 陈诗雨; 李勉同; 孟萱; 涂静敏; 乔江伟

doi:10.11975/j.issn.1002-6819.202508202

基于SwinPodDet的旋转框角果检测与计数分析

SwinPodDet-based rapepod detection and counting with rotated bounding boxes

摘要

摘要: 角果是影响油菜产量的关键器官，高效、精准的角果检测是油菜产量性状表型分析的重要基础，针对现有目标检测方法难以适应遮挡严重、形态细长以及油菜角果朝向不一的问题，该研究提出了一种自动、高通量的角果检测方法。首先，构建旋转框角果检测网络SwinPodDet，采用R-SwinTransformer（Rapepod-SwinTransformer）骨干网络，引入各向异性移动窗口多头自注意力 （adaptive shifted window multi-head self-attention ，ASW-MSA ），以有效捕捉细长目标全局形状与局部细节特征；其次，提出细长特征增强模块 （elongated feature enhancer ，EFE），优化模型关注细长且具有明显方向属性目标的能力，进而提高对角果方向信息的表征能力；最后，设计多尺度上下文感知模块   （multi scale context channel attention， MSCAA），通过捕捉不同方向和尺度的细节特征，减少在密集遮挡或形态多变的油菜角果检测中的误检和漏检。结果表明，该模型在自建油菜角果旋转框数据集 （rotated bounding box rapepod dataset，RBRD）上检测精确率（P）和平均精度 （mAP ）分别达到98.50% 和 81.74%，且单幅图像推理速度仅为52.8 ms，在整株角果检测实例应用中，计数比达到95.51%。该模型在保证高精度的同时兼顾了检测效率，成功建立了一种便捷高效且精准的油菜角果检测方法，为油菜角果检测与计数提供高效技术手段。

Abstract: Accurate and high-throughput detection of rapeseed pods is a fundamental prerequisite for automated phenotypic analysis and genetic breeding. However, the slender morphology, unpredictable spatial orientation, and high-density overlap of pods in natural field environments pose significant challenges for conventional horizontal object detection frameworks. These traditional methods often suffer from severe background interference and feature fragmentation, failing to maintain bounding box integrity for elongated structures, which leads to labor-intensive manual post-processing. To address these limitations, this study proposes SwinPodDet, a sophisticated rotating object detection framework designed specifically for slender agricultural targets. The primary objective is to provide a non-destructive and automated pipeline for precisely localizing and quantifying individual pods directly from high-resolution field imagery, thereby bypassing the constraints of axis-aligned detectors.The proposed architecture introduces several key innovations to handle geometric and topological complexity. First, it utilizes a specialized backbone, R-SwinTransformer, which integrates an Adaptive Shifting Window Multi-Head Self-Attention (ASW-MHSA) mechanism. By dynamically adjusting the window aspect ratio and shift offsets, the backbone effectively captures the long-range dependencies of elongated pods, bridging the structural gap between local surface textures and global symmetry. Second, an Elongated Feature Enhancer (EFE) module is embedded to reinforce directional sensitivity. The EFE employs anisotropic depthwise separable convolutions—utilizing asymmetric 1×11 and 11×1 kernels—combined with a dual-attention mechanism to selectively amplify features along the pod's principal axis while suppressing orthogonal environmental noise and branch interference. Third, a Multi-Scale Context Channel Attention (MSCAA) module is integrated into the feature-pyramid neck. By utilizing four parallel heterogeneous branches—ranging from local average pooling to dilated separable convolutions—MSCAA adaptively fuses multi-scale contextual information through learnable weights, significantly mitigating missed detections in dense, overlapping clusters where boundary definitions are often blurred.The model was trained and rigorously validated on the newly constructed Rotated Bounding Box Rapeseed Pod Dataset (RBRD), a comprehensive benchmark containing 8,505 manually annotated rotating boxes that capture the crop across multiple growth stages. Experimental results demonstrate that SwinPodDet achieves a precision (P) of 98.50%, a recall (R) of 80.76%, and a mean average precision (mAP50) of 81.74%. Notably, SwinPodDet outperformed the baseline Oriented R-CNN by 13.42%, 0.42%, and 1.21% in P, R, and mAP50, respectively. Furthermore, compared to the rotating object detection network utilizing a standard Swin-Transformer backbone, the proposed model achieved respective improvements of 2.62%, 0.12%, and 0.18% across the same metrics.The framework also maintains high computational efficiency with a parameter count of 53.32 M and an inference speed of 19.0 FPS, striking an optimal Pareto balance between accuracy and deployment costs.Ablation studies and visualization analyses confirm that the proposed modules effectively resolve the "feature fracture" issue in high-overlap scenarios, achieving an impressive 95.51% counting accuracy at the whole-plant level. This robust performance across different maturation stages—from green-succulent to yellow-gray shriveled phases—confirms the model's superior adaptability to volatile field conditions. By effectively addressing the core challenges of rapeseed pod detection—namely dense overlap, extreme elongation, and orientation variability—SwinPodDet establishes a reliable, scalable, and end-to-end tool for high-throughput phenotyping. This work provides a solid algorithmic foundation for future automated yield estimation platforms and large-scale agronomic monitoring systems in the era of precision agriculture.

HTML全文

参考文献(54)

施引文献

资源附件(0)