Abstract:
Litchi, renowned as the "King of Tropical Fruits", is widely cultivated with approximately 95% of its total consumption dedicated to fresh eating. However, current harvesting practices predominantly rely on manual labor, which struggles to ensure operational efficiency. To address this, the present study developed a dual-camera collaborative vision detection system based on D455 and D435i depth cameras, integrating hardware and software to achieve automated harvesting. At the software level, the system operates in two distinct stages: far-view fruit detection and near-view pedicel detection. In the far-view stage, a dedicated dataset was constructed using LabelImg, and the decoupled detection head of YOLOv8n was optimized for the single-category litchi detection scenario by removing redundant structures to achieve lightweight performance. In the near-view stage, to minimize background interference, an oriented bounding box (OBB) detection approach was adopted. A near-view dataset was built using RolabelImg, and comparative training was conducted on mainstream OBB-capable models. Ultimately, YOLOv8n-OBB was selected as the pedicel detection model due to its optimal balance between accuracy and real-time performance. At the hardware level, the system employs a dual-camera configuration with varying perceptual ranges: the D455 camera, with its larger optimal depth perception range, serves as the far-view camera mounted on a mobile platform. It works in conjunction with the lightweight detection model to enable rapid fruit recognition and 3D centroid coordinate extraction, further integrated with a density-based spatial clustering algorithm (DBSCAN) to generate the global motion path for the robotic arm. The D435i camera, featuring a smaller optimal depth perception range, is deployed as the near-view camera above the end-effector. It performs morphological fitting of pedicels via OBB, complemented by a novel optimal matching and selection algorithm (OMSA), which achieves precise target pedicel screening in complex backgrounds by quantifying the spatial association strength between fruits and pedicels. Experimental results demonstrate that the system achieves a mean average precision (mAP) of 80.1% in far-view detection, with a detection speed of 110 frames per second. Parameter count and computational load were reduced by 12.5% and 29.2%, respectively, while detection speed increased by 18.3%. In near-view detection, recognition precision for fruits and pedicels reached 99.2% and 87.7%, respectively. The proposed OMSA algorithm achieved pedicel screening success rates of 88% for clustered litchi and 98% for single fruits, outperforming the conventional nearest-neighbor matching method. The overall harvesting success rate of the system reached 83%, with an average time consumption of 7.04 seconds per cluster, confirming its effectiveness and practicality in natural environments.