Several devastating landslides have occurred in the NW Himalayas, which has prompted several researchers to strive for improvement in landslide susceptibility modelling (LSM) methodologies. This research analyzes the effectiveness of alternative landslide partitioning techniques on LSM in the landslide-prone district, Muzaffarabad, Pakistan. We developed a landslide inventory of 961 landslides and then traditionally divided it into training (672; 70%) and testing (289; 30%) samples. These training samples (672) are processed by the Average Nearest Neighbour Index (ANNI) method to estimatethe spatial pattern of landslides in nature. The results provide an ANNI ratio of 0.672 confirming that the landslides distribution pattern is cluster in the complex Himalayan terrain of Muzaffarabad. Among 672, the majority of landslides (529; 79%) depict cluster behaviour, while 189 landslides (21%) depict random behaviour. To evaluate the effectiveness of landslide cluster samples in prediction, five machine learning algorithms (MLAs), that is, K-Nearest Neighbour (KNN), Na¨?ve Bayes (NB), Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Logistic Regression (LR) using proposed cluster (529) and traditional (672) random training samples along with 17 geo-environmental factors are executed. However, testing samples (289; 30%) separated at the initial stage remained the same to check the model's effectiveness. The areas under the curve (AUC-ROC), sensitivity, specificity, Kappa index and accuracy (ACC) have been used to evaluate the MLA's performances. An alternative partitioning technique (cluster) shows the highest predictive power with AUC-ROC values ranging from 0.96 to 0.86, Kappa index ranges from 0.76 to 0.60 and ACC ranges from 0.90 to 0.83. Conversely, the random partitioning approach performs less well with AUC-ROC values ranging from 0.95 to 0.83, Kappa index ranges from 0.70 to 0.49 and ACC ranges from 0.87 to 0.80. In comparison, the RF cluster sampling-based model outperforms the other models and their counterparts. The RF model achieved the highest accuracy (0.902), highest AUC (0.962) and highest Kappa index (0.755) followed by XGboost having ACC (0.885), AUC (0.95) and Kappa index (0.724) employing proposed cluster training samples. However, traditional random training samples yield comparatively low ACC of RF (0.868) and XGboost (0.862). These results confirm that cluster training sampling performs well in obtaining reliable and precise LSMs for complex Himalayan terrain. Although randomlandslide partitioning for training datasets is seldom utilized in LSM, this study highlights that cluster partitioning for landslide training datasets might be a realistic and reliable approach.
Assessing the effectiveness of alternative landslide partitioning in machine learning methods for landslide prediction in the complex Himalayan terrain
Maria Teresa Brunetti
2022
Abstract
Several devastating landslides have occurred in the NW Himalayas, which has prompted several researchers to strive for improvement in landslide susceptibility modelling (LSM) methodologies. This research analyzes the effectiveness of alternative landslide partitioning techniques on LSM in the landslide-prone district, Muzaffarabad, Pakistan. We developed a landslide inventory of 961 landslides and then traditionally divided it into training (672; 70%) and testing (289; 30%) samples. These training samples (672) are processed by the Average Nearest Neighbour Index (ANNI) method to estimatethe spatial pattern of landslides in nature. The results provide an ANNI ratio of 0.672 confirming that the landslides distribution pattern is cluster in the complex Himalayan terrain of Muzaffarabad. Among 672, the majority of landslides (529; 79%) depict cluster behaviour, while 189 landslides (21%) depict random behaviour. To evaluate the effectiveness of landslide cluster samples in prediction, five machine learning algorithms (MLAs), that is, K-Nearest Neighbour (KNN), Na¨?ve Bayes (NB), Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Logistic Regression (LR) using proposed cluster (529) and traditional (672) random training samples along with 17 geo-environmental factors are executed. However, testing samples (289; 30%) separated at the initial stage remained the same to check the model's effectiveness. The areas under the curve (AUC-ROC), sensitivity, specificity, Kappa index and accuracy (ACC) have been used to evaluate the MLA's performances. An alternative partitioning technique (cluster) shows the highest predictive power with AUC-ROC values ranging from 0.96 to 0.86, Kappa index ranges from 0.76 to 0.60 and ACC ranges from 0.90 to 0.83. Conversely, the random partitioning approach performs less well with AUC-ROC values ranging from 0.95 to 0.83, Kappa index ranges from 0.70 to 0.49 and ACC ranges from 0.87 to 0.80. In comparison, the RF cluster sampling-based model outperforms the other models and their counterparts. The RF model achieved the highest accuracy (0.902), highest AUC (0.962) and highest Kappa index (0.755) followed by XGboost having ACC (0.885), AUC (0.95) and Kappa index (0.724) employing proposed cluster training samples. However, traditional random training samples yield comparatively low ACC of RF (0.868) and XGboost (0.862). These results confirm that cluster training sampling performs well in obtaining reliable and precise LSMs for complex Himalayan terrain. Although randomlandslide partitioning for training datasets is seldom utilized in LSM, this study highlights that cluster partitioning for landslide training datasets might be a realistic and reliable approach.File | Dimensione | Formato | |
---|---|---|---|
prod_469123-doc_189854.pdf
solo utenti autorizzati
Descrizione: Article
Tipologia:
Versione Editoriale (PDF)
Dimensione
5.89 MB
Formato
Adobe PDF
|
5.89 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.