基于MAS-YOLO分割和Pix2PixHD-CA生成的兰花花朵器官表型参数提取方法

杨意; 朱文鼎; 张观康; 陈章宇文; 黄灿增; 郑洁; 李欣; 杨凤玺; 辜松

doi:10.11975/j.issn.1002-6819.202508047

基于MAS-YOLO分割和Pix2PixHD-CA生成的兰花花朵器官表型参数提取方法

Phenotypic parameter extraction method of orchid floral organs based on MAS-YOLO segmentation and Pix2PixHD-CA generation

摘要

摘要: 传统植物表型提取方法在应对自然生长状态下的兰花时，常面临复杂环境下器官分割精度低、且难以对遮挡与弯曲变形进行高保真形态生成恢复等问题。为应对这些挑战，该研究提出一种基于MAS-YOLO与Pix2PixHD-CA的兰花器官提取与生成恢复的方法。首先，基于YOLO11s-seg融合MobileNetv4、自适应空间融合(adaptive spatial fusion, ASF)模块及多层次特征融合（spatial dynamic integration，SDI）模块构建MAS-YOLO模型，以解决复杂背景下的精准分割兰花器官的难题；其次构建Pix2PixHD-CA模型，在Pix2PixHD生成模型中引入坐标注意力(coordinate attention, CA)，旨在优化花朵器官局部结构失真，并生成至完整展平形态的映射，进一步提升表型参数的提取精度。试验结果表明，相较于基线模型，MAS-YOLO模型的F1分数提升至0.962，参数量降至8.95 M。利用Pix2PixHD-CA生成展平兰花器官图像，提取的长、宽、面积表型参数与真实值的R²分别为0.928、0.895和0.937，RMSE分别为1.45 mm、0.25 mm和14.87 mm²。相比改进前，R²分别提升8.79%、1.59%和3.65%，RMSE分别降低3.97%、16.66%和23.34%。实地测试中各参数R²均保持在0.877以上。该研究可精准提取自然环境下的兰花器官表型参数，为复杂形态植物表型研究提供方法借鉴。

Abstract: Orchid flowers are frequently characterized by random bending, tilting, and mutual occlusion in the complex natural environment. These morphological irregularities and postures have caused the serious occlusion and overlapping of flower organs (sepals and petals), making it difficult to directly and accurately extract phenotypic parameters, such as length, width, and area. Furthermore, manual measurement cannot fully meet the large-scale production in recent years, due to the labor-intensive, subjective, and damage to the fragile specimens. It is often required for the precise phenotypic extraction under these unconstrained conditions using conventional computer vision. In this study, a systematic extraction was proposed using deep learning framework with the MAS-YOLO instance segmentation model and the Pix2PixHD-CA generative adversarial network. Two stages included the accurate segmentation of occluded organs and morphological restoration of incomplete organs. 1) In the instance segmentation stage, a MAS-YOLO model was constructed using the YOLO11s-seg architecture. MobileNetv4 backbone network replaced with the original ones to realize the lightweight deployment on edge devices with limited computing resources. Universal Inverted Bottleneck (UIB) blocks were utilized to significantly reduce computational redundancy for the high feature extraction. 2) Adaptive Spatial Fusion (ASF) framework was integrated to weight and fuse features from different scales for the minimum the loss of small target information. Simultaneously, a Spatial Dynamic Integration (SDI) module was introduced to improve the feature response distinction between the orchid organs and the complex background. A dataset with 520 natural images (3865 annotated instances) was used to train and validate the segmentation model. 3) In the parameter extraction stage, a Pix2PixHD-CA generation model was developed to determine the morphological deviation between the segmented occluded organs and their real flattened states. A Coordinate Attention (CA) mechanism was embedded into the generator trunk of the Pix2PixHD network. Unlike standard channel attention, the CA mechanism decomposed channel attention into two parallel 1D feature encodings, allowing the network to form joint perception in both channel and spatial coordinate dimensions. Long-range dependencies were captured to preserve precise positional information for shape reconstruction. Consequently, the mapping relationship between the "deviated organ" and the "complete flattened organ" was established using 1500 pairs images after alignment. The images were generated to maintain high fidelity in the texture and edge trends. The results demonstrated that the superior performance was achieved in both segmentation and parameter extraction. In segmentation, the F1 score of the MAS-YOLO model increased from 0.840 (baseline) to 0.962, indicating the high accuracy to identify the occluded and bent organs. Simultaneously, the quantity of parameter was reduced from 10.08 to 8.95 M, indicating an optimal balance between segmentation accuracy and computational efficiency. The comparison after generative restoration was performed on the phenotypic parameters between the Pix2PixHD-CA extraction from flattened images and the measurements. The Coefficient of Determination (R²) reached 0.928, 0.895, and 0.937, respectively, for organ length, width, and area. The Root Mean Square Errors (RMSE) were 1.45 mm, 0.25 mm, and 14.87 mm², respectively. The R² values of Pix2PixHD model for length, width, and area increased by 8.79%, 1.59%, and 3.65%, respectively, while the RMSE values decreased by 3.97%, 16.66%, and 23.34%, respectively, compared with the original ones without the attention mechanism. The optimal mapping was achieved to reduce the systematic errors caused by shape distortion using CA mechanism. Furthermore, the field test was conducted on unrelated samples. The R² values remained above 0.877 for all three phenotypic parameters, indicating the robust generalization of the model in real-world scenarios. The pipeline was also developed to reduce the interference of morphological occlusion in natural habitats. The lightweight high-precision segmentation of MAS-YOLO was effectively combined with the morphological restoration of Pix2PixHD-CA. The organ phenotypic parameters of orchid flower were accurately extracted to significantly reduce the labor intensity and subjective errors with manual measurement. The finding can provide strong technical support and high-quality data for orchid genetic breeding, ecological statistics, and evolutionary biology.

HTML全文

参考文献(42)

施引文献

资源附件(0)