基于随机森林回归算法的用水总量影响因素解析——以广东省为例

李宁, 汪丽娜

李宁, 汪丽娜. 基于随机森林回归算法的用水总量影响因素解析——以广东省为例[J]. 华南师范大学学报(自然科学版), 2021, 53(1): 78-84. DOI: 10.6054/j.jscnun.2021012
引用本文: 李宁, 汪丽娜. 基于随机森林回归算法的用水总量影响因素解析——以广东省为例[J]. 华南师范大学学报(自然科学版), 2021, 53(1): 78-84. DOI: 10.6054/j.jscnun.2021012
LI Ning, WANG Lina. An Analysis of the Factors in Total Water Consumption Based on Random Forest Regression Algorithm: A Case Study of Guangdong Province[J]. Journal of South China Normal University (Natural Science Edition), 2021, 53(1): 78-84. DOI: 10.6054/j.jscnun.2021012
Citation: LI Ning, WANG Lina. An Analysis of the Factors in Total Water Consumption Based on Random Forest Regression Algorithm: A Case Study of Guangdong Province[J]. Journal of South China Normal University (Natural Science Edition), 2021, 53(1): 78-84. DOI: 10.6054/j.jscnun.2021012

基于随机森林回归算法的用水总量影响因素解析——以广东省为例

基金项目: 

国家自然科学基金项目 41501021

详细信息
    通讯作者:

    汪丽娜,Email: linawang2004@163.com

  • 中图分类号: TV213.4

An Analysis of the Factors in Total Water Consumption Based on Random Forest Regression Algorithm: A Case Study of Guangdong Province

  • 摘要: 通过构建包含人口、水资源、技术和经济4项因素和常住总人口、人口密度、水资源总量、降雨量、万元GDP用水量、万元工业增加值用水量、第一产业生产总值、第二产业生产总值和第三产业生产总值9个元素的层次评价体系,采用熵值法和随机森林回归算法,以广东省21个地级市为例,分析广东省用水总量的影响因素.研究结果表明:(1)从元素层角度分析,常住总人口、第三产业生产总值和第一产业生产总值是广东省用水总量的主要影响元素,而降雨量对广东省各地级市用水总量的影响最小;(2)从因素层角度分析,4项因素对广东省用水总量的影响由大到小依次为:经济因素、人口因素、水资源因素和技术因素;(3)综合元素层和因素层的分析,在人口、水资源、技术、经济因素中,影响广东省用水总量最大的元素分别为常住总人口、水资源总量、万元工业增加值用水量和第三产业生产总值.
    Abstract: A hierarchical evaluation system is constructed, including four factors (i.e., population, water resources, technology and economy) and nine elements (i.e., total resident population, population density, total water resources, rainfall, water consumption per 10 000 yuan of GDP, water consumption per 10 000 yuan of industrial added value, gross product of the primary industry, gross product of the secondary industry and gross product of the tertiary industry). The entropy method and the random forest regression algorithm are adopted to analyze the factors in the total water consumption in 21 prefecture-level cities in Guangdong Province. Three major results are obtained. First, in the element perspective, the total resident population, the gross product of the tertiary industry and the gross product of the primary industry are the main elements in the total water consumption in Guangdong Pro-vince, while rainfall has the least influence on the total water consumption of the prefecture-level cities in Guangdong Province. Second, in the factor perspective, the influence of the four factors on the total water consumption in Guangdong Province is in descending order: economic factors, population factors, water resources factors and technical factors. Third, based on the element and factor analysis, it can be seen that among the factors of population, water resources, technology and economic, the biggest elements that affect the total water consumption of Guangdong Province are the total resident population, total water resources, water consumption of 10 000 yuan per industrial added value and the gross product of the tertiary industry.
  • 图  1   广东省2018年各地级市用水总量的空间分布图

    Figure  1.   The spatial map of the total water consumption of prefecture-level cities in Guangdong Province in 2018

    图  2   广东省用水总量影响元素的决策树数量与误差关系

    Figure  2.   The relationship between the number of decision trees and error of the elements of the total water consumption in Guangdong Province

    图  3   广东省用水总量影响元素的相对重要性

    Figure  3.   The relative importance of the elements of the total water consumption in Guangdong Province

    图  4   广东省21个地级市用水总量的影响因素得分

    Figure  4.   The score of the factors in the total water consumption of 21 prefecture-level cities in Guangdong Province

    图  5   广东省用水总量影响因素的相对重要性

    Figure  5.   The relative importance of the factors in the total water consumption in Guangdong Province

    表  1   用水总量影响因素

    Table  1   The factors for total water consumption

    因素层 元素层 表征
    人口因素(A1) 常住总人口(A11/万人) 正向元素,表征一个地区统计期内常住总人口数量.
    人口密度(A12/(人·km-2)) 正向元素,表征一个地区统计期内相同面积下人口数量.
    水资源因素(A2) 水资源总量(A21/亿m3) 正向元素,表征一个地区统计期内水资源总量.
    降雨量(A22/mm) 正向元素,表征一个地区统计期内降水量的多少.
    技术因素(A3) 万元GDP用水量(A31/(m3·万元-1)) 负向元素,表征产生同样的GDP所用的水量.
    万元工业增加值用水量(A32/(m3·万元-1)) 负向元素,表征同样的工业增加值所用的水量.
    经济因素(A4) 第一产业生产总值(A41/万元) 正向元素,表征一个地区统计期内第一产业生产总值.
    第二产业生产总值(A42/万元) 正向元素,表征一个地区统计期内第二产业生产总值.
    第三产业生产总值(A43/万元) 正向元素,表征一个地区统计期内第三产业生产总值.
    注:A31和A32为负向元素,即元素值越大,因素层的值越小;其余为正向元素,即元素值越大,因素层的值越大. 标注因素正负的目的是为下文用熵值法将因素层用元素层的数值进行量化.
    下载: 导出CSV

    表  2   各元素熵值及权重

    Table  2   The entropy value and weight of each element

    指标 人口因素(A1) 水资源因素(A2) 技术因素(A3) 经济因素(A4)
    A11 A12 A21 A22 A31 A32 A41 A42 A43
    熵值 0.871 7 0.768 4 0.872 3 0.921 0 0.954 0 0.955 2 0.894 1 0.738 1 0.657 0
    权重/% 35.64 64.36 61.78 38.22 50.69 49.31 14.89 36.86 48.26
    下载: 导出CSV
  • [1]

    LAM K L, LANT P A, O'BRIEN K R, et al. Comparison of water-energy trajectories of two major regions experiencing water shortage[J]. Journal of Environment Management, 2016, 181: 403-412. http://www.ncbi.nlm.nih.gov/pubmed/27395015

    [2] 魏孟露. 节水型社会建设效果评估——以上海市闵行区为例[J]. 能源与节能, 2013(12): 104-105. doi: 10.3969/j.issn.2095-0802.2013.12.046

    WEI M L. An effect evaluation of a water-saving society——taking Minhang District Shanghai for example[J]. Energy and Energy Conservation, 2013(12): 104-105. doi: 10.3969/j.issn.2095-0802.2013.12.046

    [3] 张志红. 保定市徐水区工业节水思路、措施与效果[J]. 河北水利, 2020(1): 22-23. https://www.cnki.com.cn/Article/CJFDTOTAL-HBLS202001013.htm
    [4] 梁振东, 何晓静, 方红远. 基于聚类线性回归法的区域用水量影响因素分析[J]. 海河水利, 2016(3): 32-36;42. doi: 10.3969/j.issn.1004-7328.2016.03.012

    LIANG Z D, HE X J, FANG H Y. Analysis on impacting factors of regional water resources utilization based on clusterwise linear regression method[J]. Haihe Water Resources, 2016(3): 32-36;42. doi: 10.3969/j.issn.1004-7328.2016.03.012

    [5] 张陈俊, 章恒全, 陈其勇, 等. 中国用水量变化的影响因素分析——基于LMDI方法[J]. 资源科学, 2016, 38(7): 1308-1322. https://www.cnki.com.cn/Article/CJFDTOTAL-ZRZY201607012.htm

    ZHANG C J, ZHANG H Q, CHEN Q Y, et al. Factors influencing water use changes based on LMDI methods[J]. Resources Science, 2016, 38(7): 1308-1322. https://www.cnki.com.cn/Article/CJFDTOTAL-ZRZY201607012.htm

    [6] 成晋松, 吕惠进, 刘玲. 太原市用水量影响因素的灰色关联分析[J]. 水资源与水工程学报, 2012, 23(2): 109-111;115. https://www.cnki.com.cn/Article/CJFDTOTAL-XBSZ201202029.htm

    CHENG J S, LV H J, LIU L. Grey relational analysis of influence factors on water consumption in Taiyuan City[J]. Journal of Water Resources and Water Engineering, 2012, 23(2): 109-111;115. https://www.cnki.com.cn/Article/CJFDTOTAL-XBSZ201202029.htm

    [7] 张标, 刘秀丽. 我国用水量变动影响因素的结构分解分析[J]. 管理评论, 2015(5): 3-8. https://www.cnki.com.cn/Article/CJFDTOTAL-ZWGD201505002.htm

    ZHANG B, LIU X L. Structural decomposition analysis of impacting factors of China's water consumption changes[J]. Business Review, 2015(5): 3-8. https://www.cnki.com.cn/Article/CJFDTOTAL-ZWGD201505002.htm

    [8]

    BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32. doi: 10.1023/A:1010933404324

    [9] 崔东文, 金波. 基于随机森林回归算法的水生态文明综合评价[J]. 水利水电科技进展, 2014, 34(5): 56- 60;79. https://www.cnki.com.cn/Article/CJFDTOTAL-SLSD201405012.htm

    CUI D W, JIN B. Comprehensive evaluation of water ecological civilization based on random forests regression algorithm[J]. Advances in Science and Technology of Water Resources, 2014, 34(5): 56-60;79. https://www.cnki.com.cn/Article/CJFDTOTAL-SLSD201405012.htm

    [10] 赖成光, 陈晓宏, 赵仕威, 等. 基于随机森林的洪灾风险评价模型及其应用[J]. 水利学报, 2015, 46(1): 58-66. https://www.cnki.com.cn/Article/CJFDTOTAL-SLXB201501010.htm

    LAI C G, CHEN X H, ZHAO S W, et al. A flood risk assessment model based on Random Forest and its application[J]. Journal of Hydraulic Engineering, 2015, 46 (1): 58-66. https://www.cnki.com.cn/Article/CJFDTOTAL-SLXB201501010.htm

    [11] 张冰, 周步祥, 石敏. 基于灰色关联分析与随机森林回归模型的短期负荷预测[J]. 水电能源科学, 2017(4): 203-207. https://www.cnki.com.cn/Article/CJFDTOTAL-SDNY201704051.htm

    ZHANG B, ZHOU B X, SHI M. Short-term load forecasting based on grey correlation analysis and random forest regression model[J]. Water Resources and Power, 2017(4): 203-207. https://www.cnki.com.cn/Article/CJFDTOTAL-SDNY201704051.htm

    [12]

    GRAY K R, ALJABAR P, HECKEMANN R A, et al. Random forest-based similarity measures for multi-modal classification of Alzheimer's disease[J]. Neuroimage, 2013, 65: 167-175. doi: 10.1016/j.neuroimage.2012.09.065

    [13]

    STROBL C, BOULESTEIX A L, ZEILEIS A, et al. Bias in random forest variable importance measures: illustrations, sources and a solution[J]. BMC Bioinformatics, 2007, 8(1): 1-21. doi: 10.1186/1471-2105-8-1

    [14] 白鹏飞, 安琪, Nicolaas Frans de ROOIJ, 等. 基于多模型融合的互联网信贷个人信用评估方法[J]. 华南师范大学学报(自然科学版), 2017, 49(6): 119-123. doi: 10.6054/j.jscnun.2017170

    BAI P F, AN Q, DE ROOIJ N F, et al. Internet credit personal credit assessing method based on multi-model ensemble[J]. Journal of South China Normal University(Natural Science Edition), 2017, 49(6): 119-123. doi: 10.6054/j.jscnun.2017170

    [15] 广东省水利厅. 水资源公报(2018)[EB/OL]. (2019-07-02)[2020-08-13]. http://slt.gd.gov.cn/gs2018/content/post_2528678.html.
    [16]

    LIAW A, WIENER M. Classification and regression by random forest[J]. R News, 2002, 2(3): 18-22. http://www.mendeley.com/catalog/classification-regression-randomforest/

    [17] 武晓岩, 李康. 基因表达数据判别分析的随机森林方法[J]. 中国卫生统计, 2006, 23(6): 491-494. doi: 10.3969/j.issn.1002-3674.2006.06.004

    WU X Y, LI K. The application of random forests for the classification of gene expression data[J]. Chinese Journal of Health Statistics, 2006, 23(6): 491-494. doi: 10.3969/j.issn.1002-3674.2006.06.004

    [18] 杨沐晞. 基于随机森林模型的二手房价格评估研究[D]. 长沙: 中南大学, 2012.

    YANG M X. The price evaluation research of second-hand house based on the random forest model[D]. Changsha: Cenrtal South University, 2012.

    [19] 方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3): 32-38. https://www.cnki.com.cn/Article/CJFDTOTAL-TJLT201103007.htm

    FANG K N, WU J B, ZHU J P, et al. A review of technolo-gies on random forests[J]. Statistics & Information Forum, 2011, 26(3): 32-38. https://www.cnki.com.cn/Article/CJFDTOTAL-TJLT201103007.htm

    [20] 梁慧玲, 林玉蕊, 杨光, 等. 基于气象因子的随机森林算法在塔河地区林火预测中的应用[J]. 林业科学, 2016, 52(1): 89-98 https://www.cnki.com.cn/Article/CJFDTOTAL-LYKE201601011.htm

    LIANG H L, LIN Y R, YANG G, et al. Application of random forest algorithm on the forest fire prediction in Tahe area based on meteorological factors[J]. Forestry Science, 2016, 52(1): 89-98. https://www.cnki.com.cn/Article/CJFDTOTAL-LYKE201601011.htm

    [21] 袁久和, 祁春节. 基于熵值法的湖南省农业可持续发展能力动态评价[J]. 长江流域资源与环境, 2013, 22(2): 152-157. https://www.cnki.com.cn/Article/CJFDTOTAL-CJLY201302005.htm

    YUAN J H, QI C J. Dynamic assessment of regional agricultural sustainability of human province based on entropy method[J]. Resources and Environment in the Yangtze Basin, 2013, 22(2): 152-157. https://www.cnki.com.cn/Article/CJFDTOTAL-CJLY201302005.htm

    [22] 郭显光. 改进的熵值法及其在经济效益评价中的应用[J]. 系统工程理论与实践, 1998, 18(12): 98-102. https://www.cnki.com.cn/Article/CJFDTOTAL-XTLL812.018.htm

    GUO X G. Application of improved entropy method in evaluation of economic result[J]. Systems Engineering Theory & Practice, 1998, 18(12): 98-102. https://www.cnki.com.cn/Article/CJFDTOTAL-XTLL812.018.htm

    [23] 吴丹, 朱玉春. 基于随机森林方法的农村公共产品供给能力影响因素分析——以农田水利基础设施为例[J]. 财贸研究, 2012, 23(2): 39-44. https://www.cnki.com.cn/Article/CJFDTOTAL-CMYJ201202009.htm

    WU D, ZHU Y C. Influence factors on supply capability of rural public goods based on random forest: taking irrigation and water conservancy as an example[J]. Finance and Trade Research, 2012, 23(2): 39-44. https://www.cnki.com.cn/Article/CJFDTOTAL-CMYJ201202009.htm

    [24] 国家统计局. 中国统计年鉴(1999—2020)[EB/OL]. (2020-02-28)[2020-08-13]. http://www.stats.gov.cn/tjsj/ndsj/.
    [25] 金巍, 章恒全, 张洪波, 等. 城镇化进程中人口结构变动对用水量的影响[J]. 资源科学, 2018, 40(4): 784-796. https://www.cnki.com.cn/Article/CJFDTOTAL-ZRZY201804012.htm

    JIN W, ZHANG H Q, ZHANG H B, et al. The influence of population structural change on water consumption in urbanization[J]. Resources Science, 2018, 40(4): 784-796. https://www.cnki.com.cn/Article/CJFDTOTAL-ZRZY201804012.htm

    [26]

    KUNDZEWICZ Z W, KRYSANOVA V, BENESTAD R E, et al. Uncertainty in climate change impacts on water resources[J]. Environmental Science & Policy, 2018, 79: 1-8. http://www.sciencedirect.com/science/article/pii/S146290111730638X

    [27]

    FAN L X, GAI L T, TONG Y, et al. Urban water consumption and its influencing factors in China: evidence from 286 cities[J]. Journal of Cleaner Production, 2017, 166: 124-133. http://www.sciencedirect.com/science/article/pii/S0959652617317602

    [28] 广东省水利厅. 粤水资讯[EB/OL]. (2020-03-20)[2020-08-13]. http://slt.gd.gov.cn/yszx/.
    [29] 广东统计信息网. 广东统计年鉴2019年[EB/OL]. (2019-09-29)[2020-08-13]. http://stats.gd.gov.cn/gdtjnj/content/post_2639622.html.
  • 期刊类型引用(0)

    其他类型引用(9)

图(5)  /  表(2)
计量
  • 文章访问数:  571
  • HTML全文浏览量:  539
  • PDF下载量:  98
  • 被引次数: 9
出版历程
  • 收稿日期:  2020-09-23
  • 网络出版日期:  2021-03-23
  • 刊出日期:  2021-02-24

目录

    /

    返回文章
    返回