Abstract:
Plant height is one of the most critical agronomic traits to reflect the growth status, health condition, and overall vigor of the crops. Accurate and efficient measurement of the plant height is also essential for the crop monitoring and yield estimation in precision agriculture. Conventional manual measurements on the plant height are often time-consuming, labor-intensive, and prone to human error. Recent advances in the computer vision and deep learning can be expected for the non-destructive measurements on the plant phenotyping. Advanced feature extraction, attention mechanisms, and depth-to-height conversion can also be integrated to provide the high precise estimation of the plant height. In this study, a monocular height regression method (MHRM) was proposed to estimate the height of the winter wheat using single-camera images. The RGB images were also captured in the field conditions as the input. A crop target detection module was firstly applied to locate the relevant plant regions. The MHRM was effectively reduced the interference from the soil, background vegetation, and the non-crop objects. The local crop regions were then fed into a refined feature depth network, which consisted of a feature extraction, a feature refinement and a depth prediction module. Among them, the feature extraction module was combined the convolutional neural networks with the channel attention mechanisms, in order to enhance the representational capacity of the plant features. The feature refinement module was further improved the feature quality using multi-scale convolutions, depthwise separable convolutions, and efficient channel attention mechanisms, thereby enhancing the robustness of the depth estimation under varying illumination and background. Finally, the depth prediction module was utilized to generate the pixel-level depth maps. Subsequently, the real-world plant heights were converted after height generation. Furthermore, a joint supervision was employed to incorporate both pixel-level reconstruction and scale-consistency loss during training. The dual-loss configuration was improved the precision of the depth estimation. The extracted values of the plant height were well consistent with the real-world measurements. A field experiment was conducted to evaluate the performance at the Shandong Taian Agricultural Meteorological Experimental Station. Winter wheat images were collected under natural lighting and field conditions. Four representative models of the monocular depth estimation were selected as the baselines: Boosting Monocular Depth Estimation with Local Planar Guidance (BTS), Fully Convolutional Residual Network (FCRN), Deep Ordinal Regression Network (DORN), and Dense Prediction Transformer (DPT). Quantitative results indicated that the refined feature depth network was achieved in the superior performance with the high robustness and applicability, compared with all baseline models. Specifically, there were the lower root mean square error (2.759), logarithmic root mean square error (0.157), relative error (0.152), and squared relative error (0.907). Subsequently, the estimated depth maps were transformed into the plant height measurements. A comparison was then made with manually collected ground-truth data. The method was achieved in an extraction accuracy of 98.74%, thereby outperforming BTS (92.68%), FCRN (97.17%), DORN (97.44%), and DPT (98.40%). The results demonstrated that the MHRM was reliably captured the crop height information, even in the complex field conditions. In conclusion, the monocular depth estimation with the attention-enhanced feature extraction can provide an accurate, efficient, and non-destructive solution for the winter wheat height measurement. The promising potential can also offer for the broader applications in precision agriculture, crop phenotyping, and field monitoring. A practical tool can be expected for the decision-making on the high productivity of the crops in modern agriculture. The finding can also highlight the attention mechanisms and multi-scale feature refinement for the depth prediction in agricultural scenarios.