Abstract:
Daylilies are one of the most favorite vegetables in Asian areas. However, the challenging task remains on mechanical harvesting of the daylilies under field conditions, due to the subtle differences among maturity stages, significant size variations within the same class—especially among unripe buds—and severe occlusions caused by densely overlapping flowers and stems. It is often required for the accurate and robust localization of the picking points in real-world applications. In this study, an integrated framework was proposed to combine the multi-task semantic segmentation with the adaptive model compression. High-precision and efficient 3D picking point localization was achieved in dense daylily environments. A multi-task segmentation network, named DaylilyPick-Net, simultaneously performed the segmentation of the daylilies and their picking regions. Several architectures of the network were specially designed to enhance the performance under challenging conditions. Among them, an OIEM (occlusion information estimation module) equipped with Multi-path Weighted Coordinate Attention improved the feature discrimination in the occluded regions. A DIEM (dense information estimation module) facilitated the multi-scale feature integration over different network layers. A dynamic convolution was employed in the TSDS-Head (task synergistic dynamic segmentation head) for the various morphological characteristics in the dense clusters. Furthermore, a DLWS (dynamic loss weighting strategy) was introduced to automatically balance the learning process between the multi-class segmentation (classifying daylilies into different maturity stages) and the single-class segmentation (identifying picking regions), thereby mitigating the adverse effects of the task imbalance. An ACSP (adaptive collaborative self-optimizing pruning) strategy was proposed to ensure the practicality of the deployment on the resource-constrained devices. Two stages consisted of: an AD-Lamp (adaptive dynamic lamp) was reduced the model complexity using importance scores; and a GL-SBOA (global-enhanced secretary bird optimization algorithm) was fine-tuned the pruned model using an enhanced optimization algorithm. An improved Secretary Bird Optimization Algorithm was incorporated with the Cubic chaotic mapping for the population initialization and adaptive convergence factors. Search strategies were then refined over different phases to effectively navigate the high-dimensional hyperparameter space. Finally, the Daylily-3DPAC (daylily 3d picking-point positioning and angle calculation) was developed to translate the segmentation into the actionable 3D picking information. Segmentation masks from the DaylilyPick-Net were integrated with the depth information captured by an RGB-D camera. The 3D coordinates of the picking points were calculated after the centroid estimation of the intersection between maturity-specific and picking region masks, followed by depth projection. Additionally, the best cutting angle was determined using the orientation of the minor axis of the smallest bounding rectangle of the target region. A systematic evaluation was performed to verify the effectiveness of the framework. The experimental results show that the DaylilyPick-Net model achieved a mean Average Precision at 50% IoU (mAP@50), parameter count, floating-point operations (FLOPs), and detection frame rate of 59.62%, 13.01 M, 71.7 G, and 57 frames per second (fps), respectively. Compared with the Mask R-CNN, SOLOv2, and the YOLO series, the DaylilyPick-Net model was improved in the mAP@50 and the detection frame rate by 1.98 to 16.38 percentage points, 16 to 45 fps, respectively, whereas the parameter count and FLOPs were reduced by 14.57 to 75.26 M, and 42.3 to 278.2 G, respectively. The pruning strategy was achieved in an exceptional balance between compression and performance, even surpassing the accuracy of the original model at a 40% pruning ratio with a mAP@50 of 60.31%. Most importantly, the 3D localization was achieved in the picking point positioning errors of less than 0.30 cm in the simulated environments, fully meeting the precision requirements of the practical applications. This finding can provide an accurate and efficient solution for the 3D picking point localization in dense daylily harvesting scenarios. The DaylilyPick-Net was effectively segmented under complex field conditions, while the compression strategy facilitated the deployment with high accuracy. The framework can offer a robust technological foundation for the intelligent harvesting robots. Future work can be expected to integrate the 3D reconstruction for high spatial accuracy in real-world robotics.