基于RGB-D的香蕉果梳及其果指质量估测

付函; 余绍政; 刘烽; 褚璇; 佘楠; 莫东颖; 徐兴; 金莫辉; 段洁利

doi:10.11975/j.issn.1002-6819.202501125

基于RGB-D的香蕉果梳及其果指质量估测

Estimating the weight of banana hands and fingers using RGB-D

摘要

摘要: 为实现香蕉果梳及其果指质量的无损准确测定，提出一种融合彩色深度（RGB-D）图像与机器学习的预测方法。利用通用分割模型SAM（segment anything model）完成果梳中单果指的实例分割，并提取果梳与各果指的2D图像与3D点云特征；进而构建并优化了多元线性与5种非线性机器学习模型，对比分析其在不同观测视角（凸/凹面）、果指位置（内/外排）和特征组合下的预测性能。结果显示，随机森林（random forest, RF）模型表现最优性能：果梳整体质量方面凹面视角的预测精度（R²= 0.984, RMSE = 77.78 g, MAPE = 5.37%）优于凸面，且3D特征（点云表面积、凸包体积）在果梳整体质量预测方面起决定性作用；单果指质量方面外排果指的预测精度（R² = 0.794, RMSE = 13.14 g, MAPE = 6.12%）显著高于内排果指（R² = 0.668, RMSE = 17.47 g, MAPE = 9.07%），2D特征（如像素面积、轮廓周长）对外排果指的质量预测更为重要。评估策略对比显示，采用最优RF模型估算平均果指质量与直接推算外排单果指质量均可达到高精度水平（约80%样本相对误差<10%）。该研究所提方法能实现果梳及其果指质量的无损准确判定，为香蕉采后自动化分级提供了技术支撑。

Abstract: Conventional estimation of banana hands and fingers has often been confined to the complex, interlocked structures and frequent occlusions impeding. This research aims to develop a non-destructive, accurate, and rapid estimation of the mass of the entire banana hands and individual fingers. Imaging and computational techniques were also selected as an alternative to the manual or inadequate automated approaches. The more efficient post-harvest processing and grading were facilitated in the banana industry. Firstly, the morphology and geometry were acquired to predict the mass. The registered color and depth (RGB-D) images of the banana hands were captured from the convex and concave viewpoints using a controlled laboratory setup. Secondly, the individual fingers were then precisely segmented using the Segment Anything Model (SAM), a zero-shot instance segmentation. The occlusions were also managed without requiring the task-specific training datasets. The depth information was acquired to convert into the three-dimensional (3D) point clouds. The computational efficiency was then optimized using voxel grid down-sampling. The statistical outlier was also removed for the data integrity. Thirdly, the comprehensive features were extracted from the color images, including the two-dimensional (2D) morphology and size descriptors (such as pixel area, contour perimeter, and aspect ratio). The 3D geometric properties (including principal dimensions, surface area, and convex hull volume) were obtained from the processed point clouds. These multi-modal features were generated for the whole banana hands and each segmented finger. Finally, the mass prediction models were developed with a multiple linear regression (MLR) model as a baseline. Five non-linear machine learning algorithms were also used, including support vector regression (SVR), k-nearest neighbors (KNN), gradient boosting (GB), random forest (RF), and backpropagation neural network (BPNN). Model performance was assessed using R², RMSE (root mean squared error), and MAPE (mean absolute percentage error). Furthermore, the importance analysis was carried out to eliminate the recursive features, thus focusing primarily on the RF model. The influential predictors were then identified to construct and optimize the models. The comparative analysis showed that the RF model outperformed the rest. The non-linear approaches were then required to capture the relationships between extracted features and banana mass. In the mass estimation of the whole banana hand, the concave view was achieved in better performances (R² = 0.984, RMSE = 77.78 g, and MAPE=5.37%) with the RF model after optimization. The 3D features, particularly surface area and convex hull volume, were the most critical for the accurate prediction of the hand mass. In individual finger mass, the RF model was more accurate for the exposed outer fingers (viewed convexly; optimized RF: R² = 0.794, RMSE = 13.14 g, MAPE = 6.12%) than that for the occluded inner fingers (viewed concavely; optimized RF: R² = 0.668, RMSE = 17.47 g, MAPE = 9.07%). Interestingly, the 2D features (like pixel area and contour perimeter) dominated the mass prediction on the outer fingers. There was the differential feature importance of the finger position and visibility. Two strategies of mass estimation were also evaluated: An average finger mass was calculated (from the predicted total hand mass divided by actual finger count), and then directly predicted the individual finger mass. Both derived average finger mass (using the best hand model) and directly predicted outer finger mass was achieved with high accuracy (relative error <10% for ~80% of samples). The average mass method was better suited to assess the overall quality. While the direct prediction was offered the detailed data for accessible outer fingers. The computational efficiency showed that the direct estimation of the finger mass was faster (~7.7 s per hand), including the SAM segmentation step (~1 s), compared with the average mass estimation on the complex point cloud of the entire hand (~76.6 s per hand). The 3D features were calculated from the multiple simple point clouds which the less computationally demanding than those from single, large, and intricate ones. There was the "divide and conquer" benefit from the SAM-based segmentation. The RGB-D imaging and machine learning were integrated to validate the accurate, non-destructive mass determination of the banana hands and fingers. The utility of the SAM was obtained for the complex fruit segmentation in the agricultural contexts. The RF was the better modeling choice than the rest. Differential contributions of the 2D and 3D features were gained to quantify the varying viewpoint impacts. The direct estimation of the individual finger can offer detailed information and higher computational efficiency than before. The banana grading can also be developed to enhance post-harvest operations. While the 3D feature computation from the complex point clouds can be identified as the primary bottleneck in real-time industrial applications.

HTML全文

参考文献(36)

施引文献

资源附件(0)