高级检索+

基于改进BiSeNetV2的丘陵山区农田行端非通行区域实时分割方法

Real-time segmentation of non-traversable regions at crop row ends in hilly farmland based on improved BiSeNetV2

  • 摘要: 有效识别丘陵山区农田作物行端非通行区域,是保障智能农机装备实现自动化导航、路径规划与自主作业的关键。针对丘陵山区农田行端作业场景中非通行区域界定不清、背景干扰严重、边界模糊等感知难题,以及智能农机边缘端设备对模型轻量化与实时性的迫切需求,该研究提出一种基于改进BiSeNetV2的丘陵山区农田行端非通行区域实时分割方法。首先,基于自主研发的数据采集平台,在多地区、多场景条件下采集农田作物行端场景图像数据,构建非通行区域专用数据集。其次,在BiSeNetV2轻量化架构基础上,通过引入高效通道注意力模块(efficient channel attention, ECA),以增强细节分支的跨通道特征表达能力,并设计引导多尺度聚合分支(guided multi-scale aggregation, GMSA),融合1/2、1/4、1/8等不同尺度的语义与空间细节信息,提升模型对杂草覆盖、边界模糊及几何形态不规则等非通行区域的分割能力。模型评估结果表明,改进BiSeNetV2的模型大小仅为2.5 MB,mIoU、mPA、Precision、Recall和F1 Score指标分别达到92.58%、93.18%、93.05%、94.25%和93.64%,较基线模型分别提升了3.62、3.93、4.02、4.23和4.12个百分点,同时推理速度达到31.78帧/s,整体性能优于PSPNet、HRNet、SegNet、U-Net以及DeepLabv3+等经典分割模型。进一步地,在收获协作机器人边缘端设备上完成模型部署,开发了农田作物行端非通行区域实时感知系统,系统在线平均处理速率达14.68帧/s,能够满足农业机器人在丘陵山区农田作业场景对实时性的应用需求。该研究结果可为丘陵山区农业机器人行端换行路径规划提供技术支撑。

     

    Abstract: Accurate segmentation of non-traversable regions at crop row ends is important for autonomous navigation, row-changing path planning, and safe operation of agricultural machinery in hilly farmland. Compared with relatively regular field environments, row-end scenes in hilly farmland usually present more complex visual characteristics, such as weed-covered bunds, irregular terrain, blurred boundaries, weak texture features, slopes, and complex soil–vegetation backgrounds. These factors can easily lead to missed or false segmentation of non-traversable regions, especially when the boundary between the row-end area and the adjacent field is not clear. In addition, agricultural robots are usually equipped with edge computing devices with limited computational resources, requiring the perception model to maintain both high segmentation accuracy and lightweight performance. To address these problems, a real-time segmentation method for non-traversable regions at crop row ends in hilly farmland was proposed based on an improved BiSeNetV2 network. A dedicated row-end image dataset was constructed using a self-developed data acquisition platform under multi-region and multi-scene field conditions. After image screening, duplicate removal, and pixel-level annotation, 1280 valid images were retained for model training and evaluation, including 784 training images, 196 validation images, and 300 testing images. To avoid potential data leakage caused by high similarity between adjacent frames extracted from continuous videos, the dataset was divided at the original-image level using video segments as the basic unit. Data augmentation was applied only to the training set, while the validation and testing sets remained as original images. After augmentation, the number of training samples was expanded from 784 to 5100, which improved the diversity of training samples while preserving the independence of model evaluation. Based on the lightweight structure of BiSeNetV2, two improvements were introduced. First, an efficient channel attention (ECA) module was embedded into the detail branch to enhance local cross-channel feature interaction and improve the representation of boundary changes, texture gradients, and other fine spatial information. Second, a guided multi-scale aggregation (GMSA) branch was designed to fuse semantic and spatial detail features at 1/2, 1/4, and 1/8 scales. Through multi-scale guided aggregation and auxiliary supervision, the improved network strengthened the learning of semantic structure and geometric boundary features of non-traversable regions, thereby improving segmentation performance under weed coverage, blurred boundaries, and irregular target shapes. The experimental results showed that the improved BiSeNetV2 achieved an mIoU of 92.58%, mPA of 93.18%, Precision of 93.05%, Recall of 94.25%, and F1 Score of 93.64%, which were 3.62, 3.93, 4.02, 4.23, and 4.12 percentage points higher than those of the baseline BiSeNetV2, respectively. The model size was only 2.5 MB, the computational cost was 9.8 GFLOPs, and the inference speed reached 31.78 frames/s. Compared with PSPNet, HRNet, SegNet, U-Net, and DeepLabv3+, the proposed model achieved better overall performance in segmentation accuracy, model compactness, computational cost, and inference efficiency. Visual comparison results further indicated that the improved model produced more complete and continuous segmentation masks in complex row-end scenes. The improved model was further deployed on a Jetson Orin NX edge device mounted on a collaborative harvesting robot, and a real-time perception system for non-traversable regions at crop row ends was developed. The online field test showed that the system achieved an average processing speed of 14.68 frames/s. During dynamic operation, the system maintained stable segmentation results when the scale and position of non-traversable regions changed continuously in the camera view, meeting the real-time perception requirements of agricultural robots in hilly farmland. The proposed method can provide technical support for row-changing and turning path planning of agricultural machinery in complex hilly farmland environments.

     

/

返回文章
返回