Abstract:
The automated harvesting of daylilies under field conditions remains a challenging task due to several inherent complexities: subtle differences between maturity stages, significant size variations within the same class—especially among unripe buds—and severe occlusions caused by densely overlapping flowers and stems. These factors collectively hinder the accurate and robust localization of picking points in real-world applications. To address these challenges, this paper proposes an integrated framework that combines multi-task semantic segmentation with adaptive model compression, aiming to achieve high-precision and efficient 3D picking point localization in dense daylily environments. The core of our framework is a novel multi-task segmentation network named DaylilyPick-Net, which simultaneously performs the segmentation of daylilies and their corresponding picking regions. This network incorporates several specialized architectural innovations designed to enhance performance under challenging conditions. These include an OIEM (occlusion information estimation module) equipped with Multi-path Weighted Coordinate Attention to improve feature discrimination in occluded regions, a DIEM (dense information estimation module) that facilitates multi-scale feature integration across different network layers, and a TSDS-Head (task synergistic dynamic segmentation head) that employs dynamic convolution to adapt to various morphological characteristics in dense clusters. Furthermore, a DLWS (dynamic loss weighting strategy) is introduced to automatically balance the learning process between the multi-class segmentation task (classifying daylilies into different maturity stages) and the single-class segmentation task (identifying picking regions), thereby mitigating the adverse effects of task imbalance. To ensure the practicality of deployment on resource-constrained devices, an ACSP (adaptive collabora-tive self-optimizing pruning) strategy is proposed. This strategy consists of two main stages: first, a AD-Lamp (adaptive dynamic lamp) that reduces model complexity based on importance scores; and second, a GL-SBOA (global-enhanced secretary bird optimization algorithm) that fine-tunes the pruned model using an enhanced optimization algorithm. This algorithm, an improved version of the Secretary Bird Optimization Algorithm, incorporates Cubic chaotic mapping for population initialization, adaptive convergence factors, and refined search strategies across different phases to effectively navigate the high-dimensional hyperparameter space. Finally, the Daylily-3DPAC (daylily 3d picking-point positioning and angle calculation) is developed to translate the segmentation results into actionable 3D picking information. By integrating segmentation masks obtained from DaylilyPick-Net with depth information captured by an RGB-D camera, this algorithm calculates the 3D coordinates of picking points through centroid estimation of the intersection between maturity-specific masks and picking region masks, followed by depth projection. Additionally, the optimal cutting angle is determined based on the orientation of the minor axis of the minimum bounding rectangle of the target region. Comprehensive evaluations on a dedicated dataset demonstrate the effectiveness of the proposed framework. The experimental results show that the DaylilyPick-Net model proposed in this study achieves a mean Average Precision at 50% IoU (mAP@50), parameter count, floating-point operations (FLOPs), and detection frame rate of 59.62%, 13.01 M, 71.7 G, and 57 frames per second (fps), respectively. Compared to Mask R-CNN, SOLOv2, and models from the YOLO series, the DaylilyPick-Net model demonstrates improvements in mAP@50, reductions in parameter count and FLOPs, and an increase in detection frame rate by 1.98 to 16.38 percentage points, 14.57 to 75.17 M, 42.3 to 252.4 G, and 16 to 45 fps, respectively. The pruning strategy achieves an exceptional balance between compression and performance, even surpassing the accuracy of the original model at a 40% pruning ratio with a mAP@50 of 60.31%. Most importantly, the proposed 3D localization method achieves picking point positioning errors of less than 0.30 cm in simulated environments, meeting the precision requirements of practical applications. This research provides an accurate and efficient solution for 3D picking point localization in dense daylily harvesting scenarios. The proposed DaylilyPick-Net effectively handles segmentation under complex field conditions, while the model compression strategy facilitates deployment without sacrificing accuracy. The overall framework offers a robust technological foundation for the development of intelligent harvesting robots. Future work will focus on integrating 3D reconstruction technologies for enhanced spatial accuracy and implementing the system in real-world robotic platforms.