Abstract:
Orchid flowers are frequently characterized by random bending, tilting, and mutual occlusion in the complex natural environment. These morphological irregularities and postures have caused the serious occlusion and overlapping of flower organs (sepals and petals), making it difficult to directly and accurately extract phenotypic parameters, such as length, width, and area. Furthermore, manual measurement cannot fully meet the large-scale production in recent years, due to the labor-intensive, subjective, and damage to the fragile specimens. It is often required for the precise phenotypic extraction under these unconstrained conditions using conventional computer vision. In this study, a systematic extraction was proposed using deep learning framework with the MAS-YOLO instance segmentation model and the Pix2PixHD-CA generative adversarial network. Two stages included the accurate segmentation of occluded organs and morphological restoration of incomplete organs. 1) In the instance segmentation stage, a MAS-YOLO model was constructed using the YOLO11s-seg architecture. MobileNetv4 backbone network replaced with the original ones to realize the lightweight deployment on edge devices with limited computing resources. Universal Inverted Bottleneck (UIB) blocks were utilized to significantly reduce computational redundancy for the high feature extraction. 2) Adaptive Spatial Fusion (ASF) framework was integrated to weight and fuse features from different scales for the minimum the loss of small target information. Simultaneously, a Spatial Dynamic Integration (SDI) module was introduced to improve the feature response distinction between the orchid organs and the complex background. A dataset with 520 natural images (3865 annotated instances) was used to train and validate the segmentation model. 3) In the parameter extraction stage, a Pix2PixHD-CA generation model was developed to determine the morphological deviation between the segmented occluded organs and their real flattened states. A Coordinate Attention (CA) mechanism was embedded into the generator trunk of the Pix2PixHD network. Unlike standard channel attention, the CA mechanism decomposed channel attention into two parallel 1D feature encodings, allowing the network to form joint perception in both channel and spatial coordinate dimensions. Long-range dependencies were captured to preserve precise positional information for shape reconstruction. Consequently, the mapping relationship between the "deviated organ" and the "complete flattened organ" was established using 1500 pairs images after alignment. The images were generated to maintain high fidelity in the texture and edge trends. The results demonstrated that the superior performance was achieved in both segmentation and parameter extraction. In segmentation, the F1 score of the MAS-YOLO model increased from 0.840 (baseline) to 0.962, indicating the high accuracy to identify the occluded and bent organs. Simultaneously, the quantity of parameter was reduced from 10.08 to 8.95 M, indicating an optimal balance between segmentation accuracy and computational efficiency. The comparison after generative restoration was performed on the phenotypic parameters between the Pix2PixHD-CA extraction from flattened images and the measurements. The Coefficient of Determination (
R2) reached 0.928, 0.895, and 0.937, respectively, for organ length, width, and area. The Root Mean Square Errors (RMSE) were 1.45 mm, 0.25 mm, and 14.87 mm
2, respectively. The
R2 values of Pix2PixHD model for length, width, and area increased by 8.79%, 1.59%, and 3.65%, respectively, while the RMSE values decreased by 3.97%, 16.66%, and 23.34%, respectively, compared with the original ones without the attention mechanism. The optimal mapping was achieved to reduce the systematic errors caused by shape distortion using CA mechanism. Furthermore, the field test was conducted on unrelated samples. The
R2 values remained above 0.877 for all three phenotypic parameters, indicating the robust generalization of the model in real-world scenarios. The pipeline was also developed to reduce the interference of morphological occlusion in natural habitats. The lightweight high-precision segmentation of MAS-YOLO was effectively combined with the morphological restoration of Pix2PixHD-CA. The organ phenotypic parameters of orchid flower were accurately extracted to significantly reduce the labor intensity and subjective errors with manual measurement. The finding can provide strong technical support and high-quality data for orchid genetic breeding, ecological statistics, and evolutionary biology.