Abstract:
Pruning is one of the most critical steps in the cultivation of fruit trees. Current pruning robots have realized to recognize the side branch, and then locate the pruning points in recent years. However, it is still lacking in the effective end-effector pose estimation in intelligent selective pruning. This study aims to propose the pruning point localization and end-effector pose estimation using RGB-D images. The research object was also selected as the dormant high spindle-shaped apple trees. A depth camera (Intel RealSense D435i) was utilized to capture the RGB and depth data. A point-to-plane mapping was introduced to derive the 3D orientation and position of the pruning pose from the detected pixel coordinates and depth information. The spatial location was predicted for the cutting plane’s orientation relative to the pruning point — a key requirement for autonomous robotic pruning. In the perception pipeline, an improved version of the YOLOv8-seg model was employed to segment the trunk and primary branch regions from the RGB images. Furthermore, it was often lacking on the clear boundary features of the branch base masks, due to the unconventional annotation. The original YOLOv8-seg model failed to accurately locate and then segment these regions. A Global Attention Mechanism (GAM) module was introduced into the neck network of YOLOv8-seg. Each C2f block was then integrated across all feature levels. The feature maps were also recalibrated using channel-wise multiplication, in order to enhance the salient features while suppressing the irrelevant ones. The multi-scale information and reasoning were significantly enhanced for the high accuracy of the segmentation. The improved YOLOv8-seg model was achieved in a mask-level precision of 95.31%, recall of 93.79%, and an mAP
0.5 of 93.86%, thus outperforming the original YOLOv8-seg by 0.79 percentage points in precision, 2.63 percentage points in recall, and 1.47 percentage points in mAP
0.5. Once the trunk and primary branches were segmented, the OpenCV-based image processing was applied to calculate the diameters and spacing of the branches. The potential pruning points were identified to fit the rectangles around the base regions of the side branches, according to the empirical pruning. Field trials were carried out to validate the effectiveness of this approach. A better performance was achieved, with a decision accuracy of 88.3% and an average processing speed of 2.1 seconds per image. Extensive testing showed that the point-to-plane mapping of the pose estimation was achieved with a success rate of 89.9%, with an average computation time of 3.3 s per image. The total time of the average processing was 5.4 seconds per image from the image acquisition to the pose estimation. In conclusion, a framework was presented for the intelligent selective pruning of the apple trees using RGB-D input, in order to realize the accurate pruning point localization and end-effector pose estimation. Advanced deep learning models were also integrated with the image processing. The pruning pose can be expected to align with the specific angles for the tree's health. The point-to-plane mapping can be expected to determine the spatial location of the pruning points. The optimal orientation of the cutting plane can also be calculated to fully meet the horticultural requirements of the pruning actions. Specifically, the normal vector of the cutting plane was derived, according to the detected pruning points and surrounding branch structures. The manipulator's reachability and safety distances can be considered to generate feasible pruning poses for practical execution. The pruning end-effector pose estimation can also provide strong support for developing robotic pruning.