Abstract:
Detection of potholes on unstructured farm roads is a critical prerequisite for the autonomous navigation of agricultural machinery, yet this task remains challenging due to complex field environments, blurred target boundaries, irregular morphological characteristics, and significant scale variations. To address these issues, this study proposed a pothole detection model named POT-YOLOv11n, which was developed through systematic improvements to the baseline YOLOv11n architecture. The proposed model incorporated three key components: a Multi-scale Edge Information Enhancement module integrating multi-branch convolutional layers with channel-wise attention mechanisms to strengthen edge feature extraction and improve boundary discrimination for potholes with vague contours; a Pyramid Pooling and Large Kernel Attention Fusion module combining large kernel separable attention with spatial pyramid pooling to enhance multi-scale contextual feature capture; and a Feature Screening and Contextual Anchor Attention mechanism introduced in the neck network to optimize feature fusion and contextual modeling by focusing on salient features while suppressing irrelevant information from complex backgrounds. For static evaluation, a farm road pothole dataset was constructed containing diverse samples with varying scales, irregular shapes, and different lighting conditions, on which the improved POT-YOLOv11n model achieved a mean Average Precision of 83.20%, representing an improvement of 2.47 percentage points over the baseline YOLOv11n model, with an F1-score of 80.30% indicating balanced precision and recall performance, while maintaining a compact architecture with only 2.22 million parameters and achieving an inference speed of 248.83 frames per second that satisfied real-time requirements for autonomous navigation applications. Beyond static evaluation, dynamic field tests were conducted on a representative farm road segment containing typical potholes to assess practical applicability under realistic operating conditions, where the agricultural vehicle was operated at three conventional working speeds including low speed at 5 km/h, medium speed at 10 km/h, and high speed at 15 km/h, while performing continuous image acquisition and online detection. Results showed that as vehicle speed increased, motion blur effects intensified leading to a decline in detection accuracy: at low speed of 5 km/h, the model achieved a mean Average Precision of 80.8% slightly lower than the static test result with precision and recall of 82.1% and 79.6% respectively while the F1-score remained consistent with static testing confirming effective transferability of detection capabilities during real-world deployment; at medium speed of 10 km/h, the mean Average Precision decreased to 79.8% with precision and recall of 80.9% and 78.8% respectively; at high speed of 15 km/h, the mean Average Precision further declined to 75.5% with recall decreasing to 74.2% indicating increased missed detections under severe motion blur, yet maintaining a mean Average Precision of 75.5% under such high-speed conditions suggested that the proposed multi-scale edge enhancement module provided compensation against motion blur effects, preserving detection performance beyond what conventional architectures could achieve. These findings demonstrate that the proposed POT-YOLOv11n model achieves effective performance in detecting potholes on unstructured farm roads across different operating speeds by addressing the challenges of boundary ambiguity, scale variation, and motion blur through its integrated multi-scale edge enhancement, large kernel attention fusion, and feature screening mechanisms, while maintaining a lightweight architecture and high inference speed suitable for practical deployment, with consistent performance across static and dynamic tests particularly the robust results under low and medium speeds and the compensated performance at high speed underscoring the practical reliability of the proposed approach. Therefore, the POT-YOLOv11n model can provide reliable support for the autonomous navigation of agricultural machinery operating in complex field environments, contributing to the advancement of smart agriculture technologies and automated farming operations, with the combination of accuracy, efficiency, and robustness to motion blur positioning this model as a promising solution for real-time obstacle detection in vision-based navigation systems for agricultural applications.