Abstract:
Accurate segmentation of non-traversable regions at crop row ends is important for autonomous navigation, row-changing path planning, and safe operation of agricultural machinery in hilly farmland. Compared with relatively regular field environments, row-end scenes in hilly farmland usually present more complex visual characteristics, such as weed-covered bunds, irregular terrain, blurred boundaries, weak texture features, slopes, and complex soil–vegetation backgrounds. These factors can easily lead to missed or false segmentation of non-traversable regions, especially when the boundary between the row-end area and the adjacent field is not clear. In addition, agricultural robots are usually equipped with edge computing devices with limited computational resources, requiring the perception model to maintain both high segmentation accuracy and lightweight performance. To address these problems, a real-time segmentation method for non-traversable regions at crop row ends in hilly farmland was proposed based on an improved BiSeNetV2 network. A dedicated row-end image dataset was constructed using a self-developed data acquisition platform under multi-region and multi-scene field conditions. After image screening, duplicate removal, and pixel-level annotation,
1280 valid images were retained for model training and evaluation, including 784 training images, 196 validation images, and 300 testing images. To avoid potential data leakage caused by high similarity between adjacent frames extracted from continuous videos, the dataset was divided at the original-image level using video segments as the basic unit. Data augmentation was applied only to the training set, while the validation and testing sets remained as original images. After augmentation, the number of training samples was expanded from 784 to
5100, which improved the diversity of training samples while preserving the independence of model evaluation. Based on the lightweight structure of BiSeNetV2, two improvements were introduced. First, an efficient channel attention (ECA) module was embedded into the detail branch to enhance local cross-channel feature interaction and improve the representation of boundary changes, texture gradients, and other fine spatial information. Second, a guided multi-scale aggregation (GMSA) branch was designed to fuse semantic and spatial detail features at 1/2, 1/4, and 1/8 scales. Through multi-scale guided aggregation and auxiliary supervision, the improved network strengthened the learning of semantic structure and geometric boundary features of non-traversable regions, thereby improving segmentation performance under weed coverage, blurred boundaries, and irregular target shapes. The experimental results showed that the improved BiSeNetV2 achieved an mIoU of 92.58%, mPA of 93.18%, Precision of 93.05%, Recall of 94.25%, and F1 Score of 93.64%, which were 3.62, 3.93, 4.02, 4.23, and 4.12 percentage points higher than those of the baseline BiSeNetV2, respectively. The model size was only 2.5 MB, the computational cost was 9.8 GFLOPs, and the inference speed reached 31.78 frames/s. Compared with PSPNet, HRNet, SegNet, U-Net, and DeepLabv3+, the proposed model achieved better overall performance in segmentation accuracy, model compactness, computational cost, and inference efficiency. Visual comparison results further indicated that the improved model produced more complete and continuous segmentation masks in complex row-end scenes. The improved model was further deployed on a Jetson Orin NX edge device mounted on a collaborative harvesting robot, and a real-time perception system for non-traversable regions at crop row ends was developed. The online field test showed that the system achieved an average processing speed of 14.68 frames/s. During dynamic operation, the system maintained stable segmentation results when the scale and position of non-traversable regions changed continuously in the camera view, meeting the real-time perception requirements of agricultural robots in hilly farmland. The proposed method can provide technical support for row-changing and turning path planning of agricultural machinery in complex hilly farmland environments.