Several devastating landslides have occurred in the NW Himalayas, which has prompted several researchers to strive for improvement in landslide susceptibility modelling (LSM) methodologies. This research analyzes the effectiveness of alternative landslide partitioning techniques on LSM in the landslide-prone district, Muzaffarabad, Pakistan. We developed a landslide inventory of 961 landslides and then traditionally divided it into training (672; 70%) and testing (289; 30%) samples. These training samples (672) are processed by the Average Nearest Neighbour Index (ANNI) method to estimatethe spatial pattern of landslides in nature. The results provide an ANNI ratio of 0.672 confirming that the landslides distribution pattern is cluster in the complex Himalayan terrain of Muzaffarabad. Among 672, the majority of landslides (529; 79%) depict cluster behaviour, while 189 landslides (21%) depict random behaviour. To evaluate the effectiveness of landslide cluster samples in prediction, five machine learning algorithms (MLAs), that is, K-Nearest Neighbour (KNN), Na¨?ve Bayes (NB), Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Logistic Regression (LR) using proposed cluster (529) and traditional (672) random training samples along with 17 geo-environmental factors are executed. However, testing samples (289; 30%) separated at the initial stage remained the same to check the model's effectiveness. The areas under the curve (AUC-ROC), sensitivity, specificity, Kappa index and accuracy (ACC) have been used to evaluate the MLA's performances. An alternative partitioning technique (cluster) shows the highest predictive power with AUC-ROC values ranging from 0.96 to 0.86, Kappa index ranges from 0.76 to 0.60 and ACC ranges from 0.90 to 0.83. Conversely, the random partitioning approach performs less well with AUC-ROC values ranging from 0.95 to 0.83, Kappa index ranges from 0.70 to 0.49 and ACC ranges from 0.87 to 0.80. In comparison, the RF cluster sampling-based model outperforms the other models and their counterparts. The RF model achieved the highest accuracy (0.902), highest AUC (0.962) and highest Kappa index (0.755) followed by XGboost having ACC (0.885), AUC (0.95) and Kappa index (0.724) employing proposed cluster training samples. However, traditional random training samples yield comparatively low ACC of RF (0.868) and XGboost (0.862). These results confirm that cluster training sampling performs well in obtaining reliable and precise LSMs for complex Himalayan terrain. Although randomlandslide partitioning for training datasets is seldom utilized in LSM, this study highlights that cluster partitioning for landslide training datasets might be a realistic and reliable approach.

Assessing the effectiveness of alternative landslide partitioning in machine learning methods for landslide prediction in the complex Himalayan terrain

Maria Teresa Brunetti
2022

Abstract

Several devastating landslides have occurred in the NW Himalayas, which has prompted several researchers to strive for improvement in landslide susceptibility modelling (LSM) methodologies. This research analyzes the effectiveness of alternative landslide partitioning techniques on LSM in the landslide-prone district, Muzaffarabad, Pakistan. We developed a landslide inventory of 961 landslides and then traditionally divided it into training (672; 70%) and testing (289; 30%) samples. These training samples (672) are processed by the Average Nearest Neighbour Index (ANNI) method to estimatethe spatial pattern of landslides in nature. The results provide an ANNI ratio of 0.672 confirming that the landslides distribution pattern is cluster in the complex Himalayan terrain of Muzaffarabad. Among 672, the majority of landslides (529; 79%) depict cluster behaviour, while 189 landslides (21%) depict random behaviour. To evaluate the effectiveness of landslide cluster samples in prediction, five machine learning algorithms (MLAs), that is, K-Nearest Neighbour (KNN), Na¨?ve Bayes (NB), Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Logistic Regression (LR) using proposed cluster (529) and traditional (672) random training samples along with 17 geo-environmental factors are executed. However, testing samples (289; 30%) separated at the initial stage remained the same to check the model's effectiveness. The areas under the curve (AUC-ROC), sensitivity, specificity, Kappa index and accuracy (ACC) have been used to evaluate the MLA's performances. An alternative partitioning technique (cluster) shows the highest predictive power with AUC-ROC values ranging from 0.96 to 0.86, Kappa index ranges from 0.76 to 0.60 and ACC ranges from 0.90 to 0.83. Conversely, the random partitioning approach performs less well with AUC-ROC values ranging from 0.95 to 0.83, Kappa index ranges from 0.70 to 0.49 and ACC ranges from 0.87 to 0.80. In comparison, the RF cluster sampling-based model outperforms the other models and their counterparts. The RF model achieved the highest accuracy (0.902), highest AUC (0.962) and highest Kappa index (0.755) followed by XGboost having ACC (0.885), AUC (0.95) and Kappa index (0.724) employing proposed cluster training samples. However, traditional random training samples yield comparatively low ACC of RF (0.868) and XGboost (0.862). These results confirm that cluster training sampling performs well in obtaining reliable and precise LSMs for complex Himalayan terrain. Although randomlandslide partitioning for training datasets is seldom utilized in LSM, this study highlights that cluster partitioning for landslide training datasets might be a realistic and reliable approach.
2022
Istituto di Ricerca per la Protezione Idrogeologica - IRPI
Prediction performance
landslide partitioning
average nearest neighbour index
random forest
machine learning
Muzaffarabad
File in questo prodotto:
File Dimensione Formato  
prod_469123-doc_189854.pdf

solo utenti autorizzati

Descrizione: Article
Tipologia: Versione Editoriale (PDF)
Dimensione 5.89 MB
Formato Adobe PDF
5.89 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/419173
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact