高级检索+

基于两点机器学习方法的土壤有机质空间分布预测

王雨雪, 杨柯, 高秉博, 冯爱萍, 田娟, 姜传亮, 杨建宇

王雨雪, 杨柯, 高秉博, 冯爱萍, 田娟, 姜传亮, 杨建宇. 基于两点机器学习方法的土壤有机质空间分布预测[J]. 农业工程学报, 2022, 38(12): 65-73. DOI: 10.11975/j.issn.1002-6819.2022.12.008
引用本文: 王雨雪, 杨柯, 高秉博, 冯爱萍, 田娟, 姜传亮, 杨建宇. 基于两点机器学习方法的土壤有机质空间分布预测[J]. 农业工程学报, 2022, 38(12): 65-73. DOI: 10.11975/j.issn.1002-6819.2022.12.008
Wang Yuxue, Yang Ke, Gao Bingbo, Feng Aiping, Tian Juan, Jiang Chuanliang, Yang Jianyu. Prediction of the spatial distribution of soil organic matter based on two-point machine learning method[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(12): 65-73. DOI: 10.11975/j.issn.1002-6819.2022.12.008
Citation: Wang Yuxue, Yang Ke, Gao Bingbo, Feng Aiping, Tian Juan, Jiang Chuanliang, Yang Jianyu. Prediction of the spatial distribution of soil organic matter based on two-point machine learning method[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(12): 65-73. DOI: 10.11975/j.issn.1002-6819.2022.12.008

基于两点机器学习方法的土壤有机质空间分布预测

基金项目: 国家重点研发计划项目(2021YFE0102300);松嫩平原海伦地区黑土地地表基质层调查项目(DD20211589)

Prediction of the spatial distribution of soil organic matter based on two-point machine learning method

  • 摘要: 准确预测土壤有机质(Soil Organic Matter,SOM)空间分布对精细农业、耕地质量建设、生态环境保护以及固碳减排等均具有重要的意义。该研究探讨了基于两点机器学习方法(Two-point Machine Learning,TPML)提高SOM空间分布预测的可行性。以黑龙江省海伦市为研究区,以气候、地形地貌、社会经济和空间位置信息等因素作为辅助变量,充分利用空间位置信息和属性相似关系,有效处理SOM空间分布异质性及其与辅助变量间关系异质性,以提高TPML方法进行SOM空间分布预测的精度。采用随机森林、基于随机森林的回归克里格、反距离权重法和普通克里格(Ordinary Kriging,OK)方法作为对比,以平均绝对误差(Mean Absolute Error,MAE)、均方根误差(Root Mean Square Error,RMSE)、预测值与真实值相关系数(r)和决定系数(R2)作为评价指标,进行不同样本量下的多组对比试验,评价不同方法的预测精度。结果表明:1)研究区SOM含量在1.775~7.188 g/kg之间,平均值为3.179 g/kg,空间分布不均匀,呈东高西低的分布趋势。2)在不同样本量条件下,与其他模型相比,TPML的预测精度均最高,其MAE(0.088~0.097 g/kg)和RMSE(0.116~0.139 g/kg)均为最小,r(0.992~0.996)和R2(0.971~0.985)均为最高。3)预测值的误差标准差(理论误差)与实际误差具有相似的空间模式,说明TPML可以为预测结果提供合理的不确定性估计。综上,TPML模型可以通过同时利用空间自相关性和属性相似性来提高预测精度,该模型适用于预测具有一定空间自相关性且具有可用辅助数据的资源环境变量。
    Abstract: Abstract: An accurate prediction of the spatial distribution of Soil Organic Matter (SOM) is of great importance for precision agriculture, farmland quality construction, ecological environment protection, and soil carbon sequestration. However, the accuracy of prediction dominates by the heterogeneity of SOM spatial distribution and its relationship with auxiliary variables. Taking Hailun City, Heilongjiang Province (126°14′-127°45′ E, 48°58′-47°52′ N) of northeast China as the study area, this study aims to accurately and rapidly predict the SOM spatial distribution using a Two-Point Machine Learning Method (TPML) with the climate, topography, socio-economic, and spatial location as the auxiliary variables. The spatial location and auxiliary variables were also integrated to effectively deal with the heterogeneity of SOM spatial distribution and the heterogeneity of its relationship with auxiliary variables. The performance of TPML was then evaluated using the Random Forest (RF), RF regression kriging, inverse distance weighting, and Ordinary Kriging (OK) models. The performances of the models with samples of different sizes were also evaluated using the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), correlation coefficient between the predict and true value (r), and the coefficient of determination (R2). The results reveal that: 1) The SOM was predicted to range from 1.775 to 7.188 g/kg in the study area, with an average value of 3.179 g/kg. The spatial distribution of SOM spatially varied, with a trend of the high in the east and the low in the west. Meanwhile, the SOM content was positively correlated with the normalized difference vegetation index (NDVI), digital elevation, and mean annual precipitation, whereas, negatively correlated with the gross domestic product, mean annual air temperature, and topographic wetness index, particularly significantly related to the land use, landform, vegetation, and soil type. 2) The TPML presented the highest accuracy of prediction under different sample sizes, with the lowest MAE (0.088-0.097 g/kg) and RMSE (0.116-0.139 g/kg), while the highest r (0.992-0.996) and R2 (0.971-0.985). The MAE and RMSE of the TPML model were improved much more than 0.7 g/kg, while the r and R2 were improved by more than 0.2, and 0.9, respectively, compared with the most frequently-used OK. 3) There is a similar spatial pattern between the standard deviation of prediction errors (theoretical errors) and the actual errors, indicating that the TPML provided reasonable uncertainty estimates for the prediction. Consequently, the TPML can be expected to employ spatial autocorrelation and attribute similarity at the same time for higher spatial prediction accuracy. Anyway, the TPML spatial prediction of variables is feasible for the resource and environment with a certain degree of spatial autocorrelation and available auxiliary data.
  • [1] Wang Y Q, Zhang X C, Zhang J L, et al. Spatial variability of soil organic carbon in a watershed on the Loess Plateau[J]. Pedosphere, 2009, 19(4): 486-495
    [2] Tziachris P, Aschonitis V, Chatzistathis T, et al. Assessment of spatial hybrid methods for predicting soil organic matter using DEM derivatives and soil parameters[J]. Catena, 2019, 174: 206-216.
    [3] Zhang C T, Yang Y. Can the spatial prediction of soil organic matter be improved by incorporating multiple regression confidence intervals as soft data into BME method?[J]. Catena, 2019, 178: 322-334.
    [4] Dou X, Wang X, Liu H, et al. Prediction of soil organic matter using multi-temporal satellite images in the Songnen Plain, China[J]. Geoderma, 2019, 356: 113896.
    [5] Long J, Liu Y, Xing S, et al. Optimal interpolation methods for farmland soil organic matter in various landforms of a complex topography[J]. Ecological Indicators, 2020, 110: 105926.
    [6] Meng X, Bao Y, Ye Q, et al. Soil organic matter prediction model with satellite hyperspectral image based on optimized denoising method[J]. Remote Sensing, 2021, 13(12): 2273.
    [7] Nikou M, Tziachris P. Prediction and uncertainty capabilities of quantile regression forests in estimating spatial distribution of soil organic matter[J]. ISPRS International Journal of Geo-Information, 2022, 11: 130.
    [8] 马重阳,孙越琦,巫振富,等. 基于不同模型的区域尺度耕地表层土壤有机质空间分布预测[J]. 土壤通报,2021,52(6):1261-1272.Ma Chongyang, Sun Yueqi, Wu Zhenfu, et al. Spatial prediction of topsoil organic matter of arable land by different models at the regional scale[J]. Chinese Journal of Soil Science, 2021, 52(6): 1261-1272. (in Chinese with English abstract)
    [9] 尉芳,刘京,夏利恒,等. 陕西渭北旱塬区农田土壤有机质空间预测方法[J]. 环境科学,2022,43(2):1097-1107.Wei Fang, Liu Jing, Xia Liheng, et al. Spatial prediction method of farmland soil organic matter in Weibei dryland of Shaanxi province[J]. Environmental Science, 2022, 43(2): 1097-1107. (in Chinese with English abstract)
    [10] Qqla B, Txy B, Cqw A, et al. Spatially distributed modeling of soil organic matter across China: An application of artificial neural network approach[J]. Catena, 2013, 104: 210-218.
    [11] Liu Y, Guo L, Jiang Q, et al. Comparing geospatial techniques to predict SOC stocks[J]. Soil and Tillage Research, 2015, 148: 46-58.
    [12] Hengl T, Heuvelink G, Stein A. A generic framework for spatial prediction of soil variables based on regression-kriging[J]. Geoderma, 2004, 120: 75-93.
    [13] Jafari A, Khademi H, Finke P A, et al. Spatial prediction of soil great groups by boosted regression trees using a limited point dataset in an arid region, southeastern Iran[J]. Geoderma, 2014, 232: 148-163.
    [14] Kumar A, Lal R, Liu D. A geographically weighted regression kriging approach for mapping soil organic carbon stock[J]. Geoderma, 2012, 189: 627-634.
    [15] Zhu Q, Lin H S. Comparing ordinary kriging and regression kriging for soil properties in contrasting landscapes[J]. Pedosphere, 2010, 5: 594-606.
    [16] Jin L, Heap A D. A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors[J]. Ecological Informatics, 2011, 6(3/4): 228-241.
    [17] Jin L, Heap A D. Spatial interpolation methods applied in the environmental sciences: A review[J]. Environmental Modelling & Software, 2014, 53: 173-189.
    [18] Stein A, Varekamp C, Egmond C V. Zinc Concentrations in groundwater at different scales[J]. Journal of Environmental Quality, 1995, 24(6): 1205-1214.
    [19] Stein A, Hoogerwerf M, Bouma J. Use of soil-map delineations to improve (Co-)kriging of point data on moisture deficits[J]. Geoderma, 1988, 43(2/3): 163-177.
    [20] Liao Y, Li D, Zhang N, et al. Application of sandwich spatial estimation method in cancer mapping: A case study for breast cancer mortality in the Chinese mainland, 2005[J]. Statistical Methods in Medical Research, 2019, 28(12): 3609-3626.
    [21] Gao B, Hu M, Wang J, et al. Spatial interpolation of marine environment data using P-MSN[J]. International Journal of Geographical Information Science, 2020, 34(3): 577-603.
    [22] Zhou Y, Chen S, Zhu A X, et al. Revealing the scale- and location-specific controlling factors of soil organic carbon in Tibet[J]. Geoderma, 2021, 382: 114713.
    [23] Breiman L. Statistical modeling: The two cultures[J]. Statistical Science, 2001, 16: 199-215.
    [24] Tan Z, Yang Q, Zheng Y. Machine learning models of groundwater arsenic spatial distribution in Bangladesh: Influence of Holocene sediment depositional history[J]. Environmental Science & Technology, 2020, 54: 9454-9463.
    [25] Zhu A X, Liu J, Du F, et al. Predictive soil mapping with limited sample data[J]. European Journal of Soil Science, 2015, 66: 535-547.
    [26] Darmofal D. Spatial Analysis for the Social Sciences(Analytical Methods for Social Research)[M]. Cambridge: Cambridge University Press, 2015: 158-199.
    [27] Xiao M Y, Zhang G H, Breitkopf P, et al. Extended Co-Kriging interpolation method based on multi-fidelity data[J]. Applied Mathematics and Computation, 2018, 323: 120-131.
    [28] Georganos S, Grippa T, Niang G A, et al. Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling[J]. Geocarto International, 2021, 36(2): 121-136.
    [29] Hengl T, Heuvelink G B M, Rossiter D G. About regression-kriging: From equations to case studies[J]. Computers & Geosciences, 2007, 33: 1301-1315.
    [30] Sekuli? A, Kilibarda M, Heuvelink G B M, et al. Random Forest Spatial Interpolation[J]. Remote Sensing, 2020, 12: 1687.
    [31] Xu J, Zhang F, Ruan H, et al. Hybrid modelling of random forests and kriging with sentinel-2A multispectral imagery to determine urban brightness temperatures with high resolution[J]. International Journal of Remote Sensing, 2021, 42: 2174-2202.
    [32] 江厚龙,刘国顺,杨夏孟,等. 精准农业中不同取样间距下Kriging插值精度对比研究[J]. 土壤通报,2011,42(4):879-886.Jiang Houlong, Liu Guoshun, Yang Xiameng, et al. Comparison of kriging interpolation precision in different soil sampling interval in precision agriculture[J]. Chinese Journal of Soil Science, 2011, 42(4): 879-886. (in Chinese with English abstract)
    [33] Zhang S, Huang Y, Shen C, et al. Spatial prediction of soil organic matter using terrain indices and categorical variables as auxiliary information[J]. Geoderma, 2012, 171/172: 35-43.
    [34] 陈琳,任春颖,王宗明,等. 基于克里金插值的耕地表层土壤有机质空间预测[J]. 干旱区研究,2017,34(4):798-805.Chen Lin, Ren Chunying, Wang Zongming, et al. Prediction of spatial distribution of topsoil organic matter content in cultivated land using kriging methods[J]. Arid Zone Research, 2017, 34(4): 798-805. (in Chinese with English abstract)
    [35] Guo P T, Li M F, Luo W, et al. Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach[J]. Geoderma, 2015, 237: 49-59.
    [36] Lorenzo G, Marta C, Luca F, et al. Mapping soil organic carbon in Tuscany through the statistical combination of ground observations with ancillary and remote sensing data[J]. Geoderma, 2021, 404: 115386.
    [37] Gao B B, Stein A, Wang J, et al. A two point machine learning method for spatial prediction of soil pollution[J]. International Journal of Applied Earth Observation and Geoinformation, 2022, 108: 102742.
    [38] 刘焕军,张美薇,杨昊轩,等. 多光谱遥感结合随机森林算法反演耕作土壤有机质含量[J]. 农业工程学报,2020,36(10):134-140.Liu Huanjun, Zhang Meiwei, Yang Haoxuan, et al. Invertion of cultivated soil organic matter content combining multi-spectral remote sensing and random forest algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(10): 134-140. (in Chinese with English abstract)
    [39] 李德,陈文涛,乐章燕,等. 基于随机森林算法和气象因子的砀山酥梨始花期预报[J]. 农业工程学报,2020,36(12):143-151.Li De, Chen Wentao, Le Zhangyan, et al. Forecast method for the first flowering date of Dangshansu pear based on random forest algorithm and meteorological factors[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(12): 143-151. (in Chinese with English abstract)
    [40] 刘峻明,和晓彤,王鹏新,等. 长时间序列气象数据结合随机森林法早期预测冬小麦产量[J]. 农业工程学报,2019,35(6):158-166.Liu Junming, He Xiaotong, Wang Pengxin, et al. Early prediction of winter wheat yield with long time series meteorological data and random forest method[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(6): 158-166. (in Chinese with English abstract)
    [41] Wang J F, Li X H, Christakos G, et al. Geographical detectors-based health risk assessment and its application in the neural tube defects study of the Heshun region, China[J]. International Journal of Geographical Information Science, 2010, 24(1): 107-127.
    [42] 赵军,张久明,孟凯,等. 地统计学GIS在黑土区域土壤养分空间异质性分析中的应用-以海伦市为例[J]. 水土保持通报,2004,24(6):53-57.Zhao Jun, Zhang Jiupeng, Meng Kai, et al. Spatial heterogeneity of soil nutrients in blacksoil, China-A Case Study at Hailun County[J]. Bulletin of Soil and Water Conservation, 2004, 24(6): 53-57. (in Chinese with English abstract)
    [43] 李欣宇,宇万太,李秀珍. 遥感与地统计方法在表层土壤有机碳空间格局研究中的应用比较[J]. 农业工程学报,2009,25(3):148-152.Li Xinyu, Yu Wantai, Li Xiuzhen. Comparison and application of remote sensing and geostatistics methods to spatial distribution of surface soil organic carbon[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2009, 25(3): 148-152. (in Chinese with English abstract)
    [44] 黑龙江省海伦县土壤普查办公室. 海伦县土壤志[M]. 海伦:黑龙江省海伦县土壤普查办公室,1985.
    [45] 王建华,陶培峰,袁月,等. PSR框架下的黑龙江省海伦市耕地质量评价[J]. 地质与资源,2020,29(6):525-532.Wang Jianhua, Tao Peifeng, Yuan Yue, et al. PSR-Based evaluation of the cultivated land quality in Hailun city of Heilongjiang province[J]. Geology and resources, 2020, 29(6): 525-532. (in Chinese with English abstract)
    [46] Wager S, Hastie T, Efron B. Confidence Intervals forRandom Forests: The Jackknife and the Infinitesimal Jackknife[J]. Journal of Machine Learning Research: JMLR, 2014, 15: 1625-1651.
    [47] 刘艳芳,宋玉玲,郭龙,等. 结合高光谱信息的土壤有机碳密度地统计模型[J]. 农业工程学报,2017,33(2):183-191.Liu Yanfang, Song Yuling, Guo Long, et al. Geostatistical models of soil organic carbon density prediction based on soil hyperspectral reflectance[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(2): 183-191. (in Chinese with English abstract)
    [48] 徐占军,张媛,张绍良,等. 基于GIS与分区Kriging的采煤沉陷区土壤有机碳含量空间预测[J]. 农业工程学报,2018,34(10):253-259.Xu Zhanjun, Zhang Yuan, Zhang Shaoliang, et al. Spatial prediction of soil organic carbon content in coal mining subsidence area based on GIS and partition Kriging[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(10): 253-259. (in Chinese with English abstract)
  • 期刊类型引用(6)

    1. 张佳琦,潘瑜春,高世臣,赵亚楠,景胜强,周艳兵,郜允兵. 基于稀疏样点的土壤重金属含量模拟方法. 环境科学. 2024(04): 2417-2427 . 百度学术
    2. 高鹏利,任大陆,李朝辉,冯志强,苗洪运,乔林,王建武,杨永亮,张利明,李光辉. 基于Boruta算法和GA优化混合地统计模型的土壤有机质空间分布预测. 物探与化探. 2024(03): 747-758 . 百度学术
    3. 郭澎涛,肖秀绒,赵菊,李茂芬,李波,傅奠基. 样点稀少条件下基于环境相似性的土壤有机碳空间分布预测. 农业工程学报. 2024(15): 103-110 . 本站查看
    4. 王磊,高阳,沈振. 气候变化背景下农田土壤碳储量评估方法研究进展. 农业工程学报. 2024(16): 1-11 . 本站查看
    5. 王辰怡,高秉博,Sukhbaatar Chinzorig,冯权泷,冯爱萍,姜传亮,张中浩,及舒蕊. 2022年克鲁伦河流域土壤全氮含量与土壤全磷含量数据集. 农业大数据学报. 2023(03): 104-111 . 百度学术
    6. 初玉婷,李晓岚,廉海荣,潘瑜春. 基于特征代表性的土壤环境质量监测点布局优化方法. 农业环境科学学报. 2023(11): 2430-2439 . 百度学术

    其他类型引用(8)

计量
  • 文章访问数:  687
  • HTML全文浏览量:  0
  • PDF下载量:  286
  • 被引次数: 14
出版历程
  • 收稿日期:  2022-04-05
  • 修回日期:  2022-05-21
  • 发布日期:  2022-06-29

目录

    /

    返回文章
    返回