基于改进BiSeNetV2的丘陵山区农田行端非通行区域实时分割方法

苏锐; 刘骋; 朱振涛; 高磊; 王玲; 陈度

doi:10.11975/j.issn.1002-6819.202601098

基于改进BiSeNetV2的丘陵山区农田行端非通行区域实时分割方法

Real-time segmentation of non-traversable regions at crop row ends in hilly farmland based on improved BiSeNetV2

摘要

摘要: 有效识别丘陵山区农田作物行端非通行区域，是保障智能农机装备实现自动化导航、路径规划与自主作业的关键。针对丘陵山区农田行端作业场景中非通行区域界定不清、背景干扰严重、边界模糊等感知难题，以及智能农机边缘端设备对模型轻量化与实时性的迫切需求，该研究提出一种基于改进BiSeNetV2的丘陵山区农田行端非通行区域实时分割方法。首先，基于自主研发的数据采集平台，在多地区、多场景条件下采集农田作物行端场景图像数据，构建非通行区域专用数据集。其次，在BiSeNetV2轻量化架构基础上，通过引入高效通道注意力模块（efficient channel attention, ECA），以增强细节分支的跨通道特征表达能力，并设计引导多尺度聚合分支（guided multi-scale aggregation, GMSA），融合1/2、1/4、1/8等不同尺度的语义与空间细节信息，提升模型对杂草覆盖、边界模糊及几何形态不规则等非通行区域的分割能力。模型评估结果表明，改进BiSeNetV2模型大小仅为2.5 MB，mIoU、mPA、Precision、Recall和F1分数分别达到92.58%、93.18%、93.05%、94.25%和93.64%，较基线模型分别提升了3.62、3.93、4.02、4.23和4.12个百分点，推理速度达到31.78帧/s，整体性能优于PSPNet、HRNet、SegNet、U-Net以及DeepLabv3+等经典分割模型。进一步地，在收获协作机器人边缘端设备上完成部署，开发了农田作物行端非通行区域实时感知系统，系统在线平均处理速率达14.68帧/s，满足农业机器人在丘陵山区农田作业场景对导航实时性的应用需求。该研究结果可为丘陵山区农业机器人行端换行路径规划提供技术支撑。

Abstract: Accurate segmentation of non-traversable regions at crop row ends is often required for autonomous navigation, row-changing path planning, and safe operation of agricultural machinery in hilly farmland. Row-end scenes can also present more complex visual features in hilly farmland, such as weed-covered bunds, irregular terrain, blurred boundaries, weak texture features, slopes, and complex soil–vegetation. Missed or false segmentation of non-traversable regions can easily occur in the field, especially for the boundary between the row-end area and the adjacent field. In addition, the perception model is required to maintain high segmentation accuracy and lightweight performance in agricultural robots, particularly with edge computing under limited computational resources. In this study, a real-time segmentation was proposed for non-traversable regions at crop row ends in hilly farmland using an improved BiSeNetV2 network. A row-end image dataset was constructed under various regions and scenes using data acquisition platform. 1280 valid images were retained after image screening, duplicate removal, and pixel-level annotation. 784 images were selected for model training, 196 validation images, and 300 testing images. To avoid potential data leakage due to high similarity between adjacent frames in continuous videos, the dataset was divided at the original level, according to video segments as the basic unit. Data augmentation was applied only into the training set, while the validation and testing sets remained as original images. The number of training samples was expanded from 784 to 5100 after augmentation. The diversity of training samples was improved to preserve the independence of evaluation. Two improvements were introduced for the lightweight architecture of BiSeNetV2: (1) An efficient channel attention (ECA) module was embedded to enhance local cross-channel feature interaction for the representation of fine spatial information, such as boundary and texture gradients. (2) A guided multi-scale aggregation (GMSA) branch was designed to fuse semantic and spatial features at 1/2, 1/4, and 1/8 scales. The network was strengthened to learn semantic structure and geometric boundary features from non-traversable regions after GMSA and auxiliary supervision. Thereby segmentation performance was improved under weed coverage, blurred boundaries, and irregular targets. The experimental results showed that the improved BiSeNetV2 achieved an mIoU of 92.58%, mPA of 93.18%, Precision of 93.05%, Recall of 94.25%, and F1 Score of 93.64%, which were 3.62, 3.93, 4.02, 4.23, and 4.12 percentage points higher than those of the baseline BiSeNetV2, respectively. The model size was only 2.5 MB, the computational cost was 9.8 G FLOPs, and the inference speed reached 31.78 frames/s. The better overall performance was achieved in segmentation accuracy, model compactness, computational cost, and inference efficiency, compared with PSPNet, HRNet, SegNet, U-Net, and DeepLabv3+. Visual comparison further indicated that the more complete and continuous segmentation masks were produced under complex row-end scenes. The model was then deployed on a Jetson Orin NX edge device mounted on a collaborative harvesting robot. A real-time perception was developed for non-traversable regions at crop row ends. An average processing speed of 14.68 frames/s was obtained in the online field test. Stable segmentation was maintained for the real-time perception during dynamic operation, especially when the scale and position of non-traversable regions changed in the camera view. The findings can provide the technical support to path planning for the row-switching and turning of agricultural machinery in the hilly farmland.

HTML全文

参考文献(31)

施引文献

资源附件(0)