FOSS4G-Asia 2024

Venkatesh Raghavan


Sessions

12-16
16:30
15min
Comparitive Evaluation of Machine Learning Models for Zoning Slope Failure Suceptibility: A Case study of Yen Bai Province, Vietnam
Tran Tung Lam, Tatsuya Nemoto, Venkatesh Raghavan, Xuan Quang Truong

Yen Bai Province in northern Vietnam, especially Mu Cang Chai (MCC) and Van Yen (VY) districts, are highly susceptible to slope failure due to rugged terrain, high rainfall and anthropogenic activities . In this research MCC was used as an area for training and testing the machine learning models, while VY serves for model validation due to similar topographic and geological conditions.

The methodology treats the slope failure prediction as a binary classification task (landslide/no-landslide). A balanced dataset of 286 landslide and 286 non-landslide points in MCC, along with 16 contributing factors, including topographic, geologic, hydrologic, anthropogenic and vegetation factors calculated from open data sources and made use from existing databases and from previous research on the area. Principal Component Analysis (PCA) and Pearson Correlation Coefficients refine the dataset by evaluating correlated factors and removing the least important ones, the size of the training dataset can be reduced while ensuring the performance of the ML models. Four ML models: Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), and Extreme Gradient Boosting (XGBoost) are trained and evaluated to select the best hyperparameter tuning for each model. Model accuracy is assessed via confusion matrices, accuracy score, ROC (Receiver operating characteristic) curves and AUC (Area under the ROC Curve).

Results show the models perform effectively in MCC with the average accuracy of all models being 0.74. The trained ML models with tuned hyper-parameters after running on MCC data, was validated on datasets for VY. The VY data also consists of 16 factors, with a data set of 308 landslide/non-landslide points. RF and XGBoost have the highest accuracy for both training and testing area (MCC) and Validation area (VY), with XGBoost showing a slightly higher accuracy score of 0.83 while RF scores 0.80.

The XGBoost model produces good results and could be further optimized to achieve even better zonation in future studies. The machine learning workflow can be applied on other areas that are prone to slope failures. Other geologic and weathering factors could be included in the analysis to further improve the model.

FOSS4G-Asia 2024 - Abstracts - General Track
Auditorium Hall 1