TY - JOUR
T1 - Benchmarking the performance and uncertainty of machine learning models in estimating scour depth at sluice outlets
AU - Le, Xuan Hien
AU - Hien, Le Thi Thu
AU - Ho, Hung Viet
AU - Lee, Giha
N1 - Publisher Copyright:
© 2024 The Authors.
PY - 2024/7/1
Y1 - 2024/7/1
N2 - This study investigates the performance of six machine learning (ML) models – Random Forest (RF), Adaptive Boosting (ADA), CatBoost (CAT), Support Vector Machine (SVM), Lasso Regression (LAS), and Artificial Neural Network (ANN) – against traditional empirical formulas for estimating maximum scour depth after sluice gates. Our findings indicate that ML models generally outperform empirical formulas, with correlation coefficients (CORR) ranging from 0.882 to 0.944 for ML models compared with 0.835–0.847 for empirical methods. Notably, ANN exhibited the highest performance, followed closely by CAT, with a CORR of 0.936. RF, ADA, and SVM performed competitive metrics around 0.928. Variable importance assessments highlighted the dimensionless densimetric Froude number (Fd) as significantly influential, particularly in RF, CAT, and LAS models. Furthermore, SHAP value analysis provided insights into each predictor’s impact on model outputs. Uncertainty assessment through Monte Carlo (MC) and Bootstrap (BS) methods, with 1,000 iterations, indicated ML’s capability to produce reliable uncertainty maps. ANN leads in performance with higher mean values and lower standard deviations, followed by CAT. MC results trend towards optimistic predictions compared with BS, as reflected in median values and interquartile ranges. This analysis underscores the efficacy of ML models in providing precise and reliable scour depth predictions.
AB - This study investigates the performance of six machine learning (ML) models – Random Forest (RF), Adaptive Boosting (ADA), CatBoost (CAT), Support Vector Machine (SVM), Lasso Regression (LAS), and Artificial Neural Network (ANN) – against traditional empirical formulas for estimating maximum scour depth after sluice gates. Our findings indicate that ML models generally outperform empirical formulas, with correlation coefficients (CORR) ranging from 0.882 to 0.944 for ML models compared with 0.835–0.847 for empirical methods. Notably, ANN exhibited the highest performance, followed closely by CAT, with a CORR of 0.936. RF, ADA, and SVM performed competitive metrics around 0.928. Variable importance assessments highlighted the dimensionless densimetric Froude number (Fd) as significantly influential, particularly in RF, CAT, and LAS models. Furthermore, SHAP value analysis provided insights into each predictor’s impact on model outputs. Uncertainty assessment through Monte Carlo (MC) and Bootstrap (BS) methods, with 1,000 iterations, indicated ML’s capability to produce reliable uncertainty maps. ANN leads in performance with higher mean values and lower standard deviations, followed by CAT. MC results trend towards optimistic predictions compared with BS, as reflected in median values and interquartile ranges. This analysis underscores the efficacy of ML models in providing precise and reliable scour depth predictions.
KW - Monte Carlo simulation
KW - SHAP values
KW - machine learning
KW - scour depth estimation
KW - sluice gate
KW - uncertainty quantification
UR - http://www.scopus.com/inward/record.url?scp=85199620044&partnerID=8YFLogxK
U2 - 10.2166/hydro.2024.297
DO - 10.2166/hydro.2024.297
M3 - Article
AN - SCOPUS:85199620044
SN - 1464-7141
VL - 26
SP - 1572
EP - 1588
JO - Journal of Hydroinformatics
JF - Journal of Hydroinformatics
IS - 7
ER -