Abstract:
A perception system is one of the most important components for the autonomous driving of agricultural machinery. However, there are only a few perception datasets specifically designed for agricultural scenarios, due to their difference from the typical urban scenarios in previous studies. In contrast to the urban examples, the agricultural applications can suffer from harsh working circumstances. It is often required for the perception sensors and algorithms. In this study, a low-cost perception system was proposed for the two-stage detection using millimeter-wave radar and a monocular camera. 3D object detection was then performed on the autonomous driving of the agricultural machinery under agricultural scenarios. Firstly, a multimodal perception dataset of the agricultural scenes was constructed to incorporate the LiDAR (light detection and ranging), INS (inertial navigation system), camera, and millimeter-wave radar data with a hardware-level data synchronization and target-level data annotation. Then the middle fusion strategy was used to build a neural network model, known as CFPNet. Preliminary detection of the target was also implemented with the improved network of the center point detection. Furthermore, the radar point cloud features were extracted from the frustum region of interest to supplement the image features. Finally, the preliminary detection information and radar feature were combined to perform a secondary detection. The 3D object attributes (depth, direction, and velocity) were regressed concurrently. The results show that the mAP (mean average precision) of the CFPNet on the self-built multimodal dataset of the agricultural perception was 86.5%, which was 5.5 percentage points higher than the baseline, and the mATE (mean average translation error) was 0.197 m lower than the baseline. An additional experiment on small object detection was conducted to verify the effectiveness of the CFPNet. The better performance was achieved in a recall rate of 1 for the selected small objects, which was 0.3 higher than before the improvement, indicating the better performance of the detection. Deployment experiments were conducted to test the applicability of the CFPNet. A frame rate of 7.4 frames per second was 211% of the baseline in the low-computing agricultural scenarios. Experiments on the public datasets were conducted to test the CFPNet in the rest scenarios. The favorable performance was achieved on the NuScenes public dataset, with the mATE, mASE, and mAVE of 0.792 m, 0.236, and 0.52 m/s, respectively. Since the CFPNet was specifically designed for monocular cameras, its mAP lagged behind. Furthermore, the CFPNet can directly provide the speed information of the target without the preceding and following frames. This finding can provide a feasible solution and technical support for the 3D object detection in agricultural scenarios, especially with low computing power.