Open Access Paper
13 September 2024 Tree-based explainable clustering for drought severity predictions in United States
Stelios P. Neophytides, Michalis Mavrovouniotis, Marinos Eliades, Felix Bachofer, Diofantos G. Hadjimitsis
Author Affiliations +
Proceedings Volume 13212, Tenth International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2024); 132121B (2024) https://doi.org/10.1117/12.3037320
Event: Tenth International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2024), 2024, Paphos, Cyprus
Abstract
Climate change drives the environment to more extreme weather events. Increased air, land surface and canopy surface temperatures affect the industry of agriculture in different ways. Significant crop damages and losses are emerging and spreading throughout different regions, accompanied by water scarcity and imposed restrictions on farmers' water usage. The Eastern Mediterranean, Middle East, and North Africa (EMMENA) region is one of the most affected areas globally. The United States (US) developed a system for monitoring droughts in different counties and classifying them into six categories (i.e., no drought, abnormally dry, moderate drought, severe drought, extreme drought, and exceptional drought) based on the assigned drought score. To predict drought scores, Artificial Intelligence (AI) methodologies are applied to a dataset that combines meteorological variables from the NASA Langley Research Center with drought scores from the US drought monitor system. The main objective of this work is to propose a novel explainable ΑΙ technique based on unsupervised learning for drought severity predictions and raise the awareness for drought events in the wider EMMENA region.

INTRODUCTION

Numerous extreme weather events are the outcome of the climate crisis. Among these events, droughts are a major threat to global water security, agricultural productivity, and ecosystem’s health. Accurate prediction of drought events is essential for developing mitigation strategies[1], [2]. By leveraging the ability of Artificial Intelligence (AI) to analyze vast datasets of environment and climate variables, advanced methods are developed for drought predictions.

Different clustering algorithms are tested on segmentation of Southern It0aly to drought regions and then regression models are used for droughts’ time-series forecasting. The hybrid M5P-SVR model achieved an R-squared (R2 or coefficient of determination) of 0.91 [3]. Another work uses well-known drought indice0.s such as standardized precipitation index (SPI), standardized precipitation evapotranspiration Index (SPEI) and standardized runoff index (SRI) to assess the drought in semi-arid environments. Several machine learning (ML) models are utilized for the prediction of those indices using different meteorological variables (e.g., temperature, rainfall, etc.). The hybrid wavelet-GPR achieved the highest accuracy with R2 of 0.809 [4]. A similar work showed that wavelet-enhanced multi-layer perceptron neural network (NN) achieved the highest accuracy in drought prediction [5]. Additional work uses a hybrid deep learning model which consists of a convolutional NN as the feature extractor and a long short-term memory NN as the temporal predictor for droughts[6].

However, all the aforementioned methodologies lack explainability and interpretation [7], [8]. Only a few works examine the capabilities of using explainable AI (XAI) to understand the features with positive contribution for accurate predictions related to drought monitoring. A study conducted in the area of New South Wales, Australia showed that it is important to interpret such models based on region and shorter time periods instead of decade-based explanations[9]. An extension of this study explains how climatic variables are important at a monthly scale, as well as their varying annual ranges based on SHAP-based (SHapley Additive exPlanations) explanations [10]. Extreme gradient boosting (XGB) model is used to explore the drought impacts in the United States (US). Specifically, the patterns between the SPI and drought impact reveal that negative values of the index are positively leading the model to complex drought impacts [11]. A study conducted for Canadian droughts using the interpretable XGB model achieved an overall accuracy of 71.3% in predicting drought maps.

The application of SHAP-based explanations identified the relation between the drought event that took place in 2015 in Prairie, Canada with the El-Niño event which reduced the water availability [12]. An extension of this work employed remote sensing data too, suggests that the satellite-based evaporative stress index, soil moisture and groundwater levels are effectual features for drought onset and intensification [13]. Countries and regions with advanced technologies and infrastructures like U.S. [14], Germany[15] and North America[16] have developed expert systems for drought monitoring in high temporal resolution. Similar systems are yet not developed for Cyprus and the rest of EMMENA region which is characterized as a climate change hotspot. Thus, there is an urgent need for systematic monitoring of such extreme events [17], [18]. However, there is a lack of data related to the drought severity in the EMMENA region.

In this work, a tree-based explainable clustering methodology is proposed using data from US drought monitor system. This methodology can be helpful in predicting drought severity by using meteorological data in the EMMENA region to raise awareness. Clustering ML models are characterized as black box [19]. Therefore, various methodologies have been developed to add explainability on clustering algorithms like k-means through decision trees [20]. Such methodologies are proposing iterative techniques to extract high distinct clusters[21]. Similarly, those techniques are also applied in kernel clustering[20] and k-medians clustering [21].The proposed methodology uses k-means algorithm and SHAP-based XAI techniques applied in drought monitoring, and thus, can be effective in cases where the ground truth labels are missing and/or interpretability is necessary. The rest of the paper is structured as follows. Section 2 describes the proposed methodology and the dataset used in this study, Section 3 presents the experimental setup and the evaluation strategy used. Section 4 gives an overview of the experimental results regarding model’s performance and insights derived from the SHAP-based explanations. Finally, Section 5 concludes this work.

MATERIALS AND METHODS

2.1

Drought monitoring dataset

For this study a Kaggle dataset (https://www.kaggle.com/datasets/cdminix/us-drought-meteorological-data/data, visited on 05/03/2024) is collected. The dataset contains meteorological variables acquired from the NASA Langley Research Centre POWER Project and annotated based on the drought scores from the US drought monitor system. The measurements are acquired for the period of January 2000 to January 2020. Furthermore, each sample in the dataset is matched with the observation’s date and the US county. Tables 1 and 2 describe all the different meteorological variables used and provide descriptive statistics for the dataset. Furthermore, Table 3 provides an overview of the various classes defined by the US drought monitor, which serve as the labels for this study.

Table 1.

Description of each meteorological variable of the dataset.

VariableDescriptionUnit
PRECTOTPrecipitationmm/day
PSSurface PressurekPA
Q2VMSpecific Humidity at 2 Metersg/kg
T2MDEWDew/Frost Point at 2 MetersoC
T2MWETWet Bulb Temperature at 2 MetersoC
T2MTemperature at 2 MetersoC
T2M_MAXMaximum Temperature at 2 MetersoC
T2M_MINMinimum Temperature at 2 MetersoC
T2M_RANGERange of Temperature at 2 MetersoC
TSEarth Skin TemperatureoC
WS10MWind Speed at 10 Metersm/s
WS10M_MAXMaximum Wind Speed at 10 Metersm/s
WS10M_MINMinimum Wind Speed at 10 Metersm/s
WS10M_RANGERange of Wind Speed at 10 Metersm/s
WS50MWind Speed at 50 Metersm/s
WS50M_MAXMaximum Wind Speed at 50 Metersm/s
WS50M_MINMinimum Wind Speed at 50 Metersm/s
WS50M_RANGERange of Wind Speed at 50 Metersm/s

Table 2.

Basic descriptive statistics for all the meteorological data in which min, max, mean, and stdev represent the minimum, maximum, mean, and standard deviation values of the dataset.

Variableminmaxmeanstdev
PRECTOT0.00021.4401.7643.702
PS80.140103.27096.9314.717
Q2VM0.56020.5708.0034.573
T2MDEW-21.91025.4207.5269.855
T2MWET-20.92025.4207.5679.783
T2M-18.93038.91015.11911.023
T2M_MAX-15.0346.1521.59011.735
T2M_MIN-23.6330.519.31110.498
T2M_RANGE0.4723.0312.2794.014
TS-19.5140.7515.33111.370
WS10M0.658.793.3191.552
WS10M_MAX0.9612.364.8692.214
WS10M_MIN0.015.571.7701.129
WS10M_RANGE0.398.63.0991.640
WS50M1.0911.375.2031.971
WS50M_MAX1.6914.797.4352.375
WS50M_MIN0.028.632.8561.863
WS50M_RANGE0.7410.264.5781.850

Table 3.

Drought severity classes as defined by US Drought Monitor

Drought ScoreDescriptionLabel in dataset
NDNo Drought0
D0Abnormally Dry1
D1Moderate Drought2
D2Severe Drought3
D3Extreme Drought4
D4Exceptional Drought5

1.2

Explaining clustering through decision trees

The k-means [24] is an unsupervised machine learning algorithm able to discriminate samples (data points) into different clusters according to their similarities in the data space. The algorithm is always dependent on the value of k which is defined before the execution of the model. The k-means assigns each sample to the cluster with the nearest mean (cluster’s centroid). At the end, the data space is split into Voronoi triangles.

For a set of observations (x1, x2,…, xn) where each observation represents a d-dimensional real vector, the algorithms aim to split the observations into k clusters (≤ n) denoted as S = {S1, S2, …, Sk}, in order to minimize the intra-cluster variance using sum of squares as defined in equation 1:

00051_PSISDG13212_132121B_page_4_1.jpg

where μi is the mean (cluster’s centroid) of points in Si and calculated with the equation (2):

00051_PSISDG13212_132121B_page_4_2.jpg

where |Si | is the size of Si.

After the execution of clustering and the evaluation of the agreement with the ground-truth classes, a decision tree is trained on all the samples. Decision trees are non-parametric supervised learning methods for classification and regression tasks. A tree is generated to predict the values of an output variable by exploring potential decision rules from the features. The trees are in general interpretable and explainable models. Thus, it eases the process of interpretation and explanation of clustering algorithms like k-means.

Following the training of a decision tree model, SHAP-based explanations are applied. SHAP is an XAI methodology based on the cooperative game theory and consequently uses the Shapley values. Each feature is considered as a “player” and the Shapley value for each player deputize its contribution to the output value. Shapley values are calculated by equation 3:

00051_PSISDG13212_132121B_page_4_3.jpg

where N is the number of players (features), and v is the function which subsets the players and represents a characteristic function. The v means that if S is a set of players, the v(S) is the total worth of coalition S and describes the expected sum of payoffs the coalition can obtain by cooperation. The n is the total number of players and i is the current player.

EXPERIMENTAL SETUP

The k-means algorithm is employed for clustering. A trial-and-error strategy is used to determine the number of maximum iterations until the algorithm reaches convergence. During the tuning, the algorithm is tested with 200,300, 400 and 500 maximum iterations. According to the applied tuning, maximum iterations are set to 500. The number of clusters is set to 6, according to the number of drought scores defined by U.S. Drought Monitor. The decision tree which is used for the explainability assessment is tuned as follows: gini criterion is selected, the best split is selected at each node, and the nodes are expanded until all leaves are pure. In contrast to the typical ML methodologies, in this study a train/validation/test split is not necessary. The objective of training a decision tree is to use its explainability to calculate the SHAP values and proceed with clusters’ exploration. Therefore, all the available data are used.

3.1

Evaluation Metrics

In this study, four distinct metrics are used to evaluate the accuracy of the proposed methodology in the subsection 2.2. The agreement between the ground-truth classes and the predicted clusters of the k-means algorithm is defined as accuracy. Furthermore, Silhouette score is calculated to understand the distinction between the different clusters. The first metric is the Rand Index (RI) (or Rand Score) as defined in equation 4 which quantifies the similarity between two data clusterings.

00051_PSISDG13212_132121B_page_5_1.jpg

where TP is the number of True Positive pairs (both points belong in the same cluster in predicted cluster and ground truth class), TN is the number of True Negative pairs (both points belong in different cluster in predicted cluster and ground truth class), FP is the number of False Positive pairs (both points belong in the same cluster in the predicted cluster and in different clusters in ground truth classification) and FN is the number of False Negative pairs (both points are in different predicted cluster but in the same ground truth class).

The second metric is the adjusted rand index (ARI), defined in equation 5, which calculates the similarity between predicted clusters and ground truth classes for all the pairs in a certain dataset and counts the number of pairs that are correctly clustered or not.

00051_PSISDG13212_132121B_page_5_2.jpg

where RI is as defined in equation 4, 𝔼 [RI]is the expected RI and max (RI) is the maximum RI.

Fowlkes-Mallows (FM) index computes the similarity between the two clusters by comparing the pairs of points and is defined in equation 6.

00051_PSISDG13212_132121B_page_5_3.jpg

where TP is the True Positive, FP is the False Positive, and FN is the False Negative.

The final metric is the homogeneity score which asses if a cluster contains only data points from the ground truth classes. Homogeneity score is defined in equation 7.

00051_PSISDG13212_132121B_page_5_4.jpg

where H(C|K) is the conditional entropy of the class distribution in the given cluster and H(C) is the entropy of the class distribution.

All the above metrics are taking a range from 0 to 1, where 0 indicates no similarity between clusters and ground truth classes and 1 indicates absolute similarity. Another metric used to determine the similarity of different data points in each cluster with the rest data points in the cluster, is the Silhouette which is defined by the equations 8 and 9.

00051_PSISDG13212_132121B_page_5_5.jpg

where s(i) is the silhouette score of a single data point i, a(i) is the average distance from the point to the other points in the same cluster, b(i) is the minimum average distance from the point to points of a different cluster.

00051_PSISDG13212_132121B_page_5_6.jpg

where Silhouette is the overall silhouette score of the clustering analysis and n is the total number of data points.

EXPERIMENTAL RESULTS

4.1

Model’s performance

The metrics defined in equations 4-7 and 9 are used to evaluate the agreement between predicted clusters and ground truth classes. Table 1 gives the experimental results of the proposed model. According to the evaluation metrics, it can be observed that the clustering achieves a full agreement of the predicted clusters with the ground truth classes, i.e., RI = 1.0, ARI = 1.0, FM = 1.0 and Homogeneity = 1.0. On the other hand, Silhouette analysis achieves an average score of 0.19, which means that there are samples assigned to a specific cluster that are similar with samples in a different cluster. Figure 1 shows the silhouette analysis’ results of the predicted clusters. All the clusters exceed the average silhouette score (showed with red dashed line) while at the same time all, but cluster “1”, there are data points who achieved a negative silhouette value which means that those data points are assigned to wrong cluster.

Figure 1.

Silhouette of clusters. X-axis represents the silhouette coefficient value while the y-axis represents each cluster. The red dashed line shows the overall silhouette score.

00051_PSISDG13212_132121B_page_6_1.jpg

Table 1.

Experimental results of the k-means algorithm

RIARIFMHomogeneitySilhouette
1.01.01.01.00.19

In fact, a typical strategy before clustering is to determine the optimal number of clusters. There are different techniques like the nbclust [25], which provides the optimal number of clusters based on an exhaustive analysis of 30 different evaluation metrics. However, this strategy is not applicable for this study since the number of different drought states for each county are defined by US drought monitor system.

1.3

Explainability

At each step of building a decision tree, the algorithm chooses the most informative feature to split the data. In this specific case this separation is measured by gini impurity [26]. At the end, the decision tree can estimate the contribution of each feature to decrease impurity. Figure 2 shows the score of each feature in terms of importance in impurity reduce.

Figure 2.

Feature importance according to the Decision Tree. From left to right, is the most important to the least important feature. X-axis define each feature and y-axis represents their importance score.

00051_PSISDG13212_132121B_page_7_1.jpg

According to the feature importance, as estimated by the supervised algorithm, the range of air temperature (i.e., T2M_RANGE) at 2 meters and the precipitation (i.e., PRECTOT) are the most important features in discriminating the samples into the clusters. Conversely, the wind speed at 10 meters (i.e., WS10M_MIN) and the temperature of the wet bulb (i.e., T2MWET) are the least important features. From the importance scores, it can be observed that none of the features is highly important, but the difference of the most and least important in comparison with the rest of the features is noticeable.

A negative Shapley (or SHAP) value for a specific feature means that is pushing the model towards the examined class whereas a positive a Shapley value means the opposite. The SHAP summary plots for the classes “No Drought” and “Exceptional Drought” are presented in Figures 3 and 4, respectively. Both plots present the contribution of each feature (y-axis) in model’s decisions according to the Shapley value (x-axis). The majority of high surface pressure (i.e., PS) values have a positive Shapley value, which means that aids the model to classify the samples as “No Drought”. In contrast, high surface pressure values have a negative impact to the model for class “Exceptional Drought”. Furthermore, higher maximum temperature (i.e., T2M_MAX) values tend to push the model away from “No Drought” class whereas lower values of the same feature push the model towards this class. From Figures 4 and 5 it can be that higher recorded T2M_MAX are not leading the model to the class “Exceptional Drought” class. Low precipitation (i.e., PRECTOT) values show that are not sufficiently helping the model, while the majority have a negative Shapley value. In contrast, most of the low PRECTOT values have a positive contribution towards the “Exceptional Drought” class. Lower surface temperature (TS) values are leading the model to classify data points as “No Drought”, while they have a negative impact to the “Exceptional Drought” class. It is clear that higher temperature range (i.e., T2M_RANGE) drives the model towards the “Exceptional Drought” class, but it is questionable for the “No Drought” class.

Figure 3.

SHAP summary plot for samples classified as “No drought”. X-axis defines the calculated Shapley value for each data point and y-axis defines the different features involved in the dataset. The colour differentiation distinct the feature value for each point (low values are blue and high values are red).

00051_PSISDG13212_132121B_page_8_1.jpg

Figure 4.

SHAP summary plot for samples classified as “Exceptional drought”. X-axis defines the calculated Shapley value for each data point and y-axis defines the different features involved in the dataset. The colour differentiation distinct the feature value for each point (low values are blue and high values are red).

00051_PSISDG13212_132121B_page_9_1.jpg

CONCLUSION

In this work, a novel XAI strategy based on decision trees is used on the unsupervised learning algorithm k-means. It is observed that the k-means clustering algorithm with 500 iterations is in a full agreement with the ground truth classes of the dataset, while at the same time the silhouette evaluation metric suggests that the clusters are not distinct to each other. According to the SHAP-based XAI applied to the decision tree trained on clusters, the most effective predictive features are the maximum air temperatures at 2 meters, the range of air temperatures at 2 meters, the precipitation, the surface temperature, and the surface pressure. Based on XAI techniques, it is easier to understand the importance of these features in classifying extreme classes like “No Drought” and “Exceptional Drought”. A further examination on this work is going to be conducted with the incorporation of data related to soil variables.

ACKNOWLEDGEMENTS

This work was partially supported by the European Union’s HORIZON Research and Innovation Programme under grant agreement No 101120657, project ENFIELD (European Lighthouse to Manifest Trustworthy and Green AI), the AI-OBSERVER project funded from the European Union’s Horizon Europe Framework Programme HORIZON WIDERA-2021-ACCESS-03 (Twinning) under the Grant Agreement No 101079468, and the ‘EXCELSIOR’: ERATOSTHENES: Excellence Research Centre for Earth Surveillance and Space-Based Monitoring of the Environment H2020 Widespread Teaming project (www.excelsior2020.eu). The ‘EXCELSIOR’ project has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No 857510, from the Government of the Republic of Cyprus through the Directorate General for the European Programmes, Coordination and Development and the Cyprus University of Technology.

REFERENCES

[1] 

J. Qiu, Z. Shen, and H. Xie, “Drought impacts on hydrology and water quality under climate change,” Science of The Total Environment, 858 159854 (2023). https://doi.org/10.1016/j.scitotenv.2022.159854 Google Scholar

[2] 

K. Furtak and A. Wolińska, “The impact of extreme weather events as a consequence of climate change on the soil moisture and on the quality of the soil environment and agriculture – A review,” CATENA, 231 107378 (2023). https://doi.org/10.1016/j.catena.2023.107378 Google Scholar

[3] 

F. Di Nunno and F. Granata, “Spatio-temporal analysis of drought in Southern Italy: a combined clustering-forecasting approach based on SPEI index and artificial intelligence algorithms,” Stoch Environ Res Risk Assess, 37 (6), 2349 –2375 (2023). https://doi.org/10.1007/s00477-023-02390-8 Google Scholar

[4] 

M. Achite, O. M. Katipoglu, S. Şenocak, N. Elshaboury, O. Bazrafshan, and H. Y. Dalkiliç, “Modeling of meteorological, agricultural, and hydrological droughts in semi-arid environments with various machine learning and discrete wavelet transform,” Theor Appl Climatol, 154 (1–2), 413 –451 (2023). https://doi.org/10.1007/s00704-023-04564-4 Google Scholar

[5] 

S. M. E. Azimi, S. J. Sadatinejad, A. Malekian, and M. H. Jahangir, “Application of artificial intelligence hybrid models for meteorological drought prediction,” Nat Hazards, (2022). https://doi.org/10.1007/s11069-022-05779-w Google Scholar

[6] 

A. Danandeh Mehr, A. Rikhtehgar Ghiasi, Z. M. Yaseen, A. U. Sorman, and L. Abualigah, “A novel intelligent deep learning predictive model for meteorological drought forecasting,” J Ambient Intell Human Comput, 14 (8), 10441 –10455 (2023). https://doi.org/10.1007/s12652-022-03701-7 Google Scholar

[7] 

F. Bodria, F. Giannotti, R. Guidotti, F. Naretto, D. Pedreschi, and S. Rinzivillo, “Benchmarking and survey of explanation methods for black box models,” Data Min Knowl Disc, 37 (5), 1719 –1778 (2023). https://doi.org/10.1007/s10618-023-00933-9 Google Scholar

[8] 

V. Hassija et al., “Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence,” Cogn Comput, 16 (1), 45 –74 (2024). https://doi.org/10.1007/s12559-023-10179-8 Google Scholar

[9] 

A. Dikshit and B. Pradhan, “Explainable AI in drought forecasting,” Machine Learning with Applications, 6 100192 (2021). https://doi.org/10.1016/j.mlwa.2021.100192 Google Scholar

[10] 

A. Dikshit and B. Pradhan, “Interpretable and explainable AI (XAI) model for spatial drought prediction,” Science of The Total Environment, 801 149797 (2021). https://doi.org/10.1016/j.scitotenv.2021.149797 Google Scholar

[11] 

B. Zhang, F. K. Abu Salem, M. J. Hayes, K. H. Smith, T. Tadesse, and B. D. Wardlow, “Explainable machine learning for the prediction and assessment of complex drought impacts,” Science of The Total Environment, 898 165509 (2023). https://doi.org/10.1016/j.scitotenv.2023.165509 Google Scholar

[12] 

J. Mardian, C. Champagne, B. Bonsal, and A. Berg, “A Machine Learning Framework for Predicting and Understanding the Canadian Drought Monitor,” Water Resources Research, 59 (8), e2022WR033847 (2023). https://doi.org/10.1029/2022WR033847 Google Scholar

[13] 

J. Mardian, C. Champagne, B. Bonsal, and A. Berg, “Understanding the Drivers of Drought Onset and Intensification in the Canadian Prairies: Insights from Explainable Artificial Intelligence (XAI),” Journal of Hydrometeorology, 24 (11), 2035 –2055 (2023). https://doi.org/10.1175/JHM-D-23-0036.1 Google Scholar

[14] 

Y. Kuwayama, A. Thompson, R. Bernknopf, B. Zaitchik, and P. Vail, “Estimating the Impact of Drought on Agriculture Using the U.S. Drought Monitor,” American J Agri Economics, 101 (1), 193 –210 (2019). https://doi.org/10.1093/ajae/aay037 Google Scholar

[15] 

M. Zink et al., “The German drought monitor,” Environ. Res. Lett., 11 (7), 074002 (2016). https://doi.org/10.1088/1748-9326/11/7/074002 Google Scholar

[16] 

M. D. Svoboda, M. J. Hayes, D. A. Wilhite, and T. Tadesse, “Recent Advances in Drought Monitoring,” Google Scholar

[17] 

K. Themistocleous et al., “Cyprus enters the space arena with “Excelsior ” H2020 Teaming project and the Eratosthenes Centre of Excellence: Why Cyprus? Why Excelsior? What are the needs and opportunities?,” [Online], (2024) https://meetingorganizer.copernicus.org/EGU2020/EGU2020-21801.html Google Scholar

[18] 

M. Eliades et al., “Earth Observation in the EMMENA Region: Scoping Review of Current Applications and Knowledge Gaps,” Remote Sensing, 15 (17), 4202 (2023). https://doi.org/10.3390/rs15174202 Google Scholar

[19] 

M. Louhichi, R. Nesmaoui, M. Mbarek, and M. Lazaar, “Shapley Values for Explaining the Black Box Nature of Machine Learning Model Clustering,” Procedia Computer Science, 220 806 –811 (2023). https://doi.org/10.1016/j.procs.2023.03.107 Google Scholar

[20] 

C. Kingsford and S. L. Salzberg, “What are decision trees?,” Nat Biotechnol, 26 (9), 1011 –1013 (2008). https://doi.org/10.1038/nbt0908-1011 Google Scholar

[21] 

E. Laber, L. Murtinho, and F. Oliveira, “Shallow decision trees for explainable k -means clustering,” Pattern Recognition, 137 109239 (2023). https://doi.org/10.1016/j.patcog.2022.109239 Google Scholar

[22] 

M. Fleissner, L. C. Vankadara, and D. Ghoshdastidar, “Explaining Kernel Clustering via Decision Trees,” arXiv[Online], (2024) http://arxiv.org/abs/2402.09881 Google Scholar

[23] 

S. Dasgupta, N. Frost, M. Moshkovitz, and C. Rashtchian, “Explainable $k$-Means and $k$-Medians Clustering,” arXiv[Online], (2020) http://arxiv.org/abs/2002.12538 Google Scholar

[24] 

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Information Sciences, 622 178 –210 (2023). https://doi.org/10.1016/j.ins.2022.11.139 Google Scholar

[25] 

M. Charrad, N. Ghazzali, V. Boiteau, and A. Niknafs, “NbClust : An R Package for Determining the Relevant Number of Clusters in a Data Set,” J. Stat. Soft., 61 (6), (2014). https://doi.org/10.18637/jss.v061.i06 Google Scholar

[26] 

Y. Yuan, L. Wu, and X. Zhang, “Gini-Impurity Index Analysis,” IEEE Trans. Inform. Forensic Secur., 16 3154 –3169 (2021). https://doi.org/10.1109/TIFS.2021.3076932 Google Scholar
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Stelios P. Neophytides, Michalis Mavrovouniotis, Marinos Eliades, Felix Bachofer, and Diofantos G. Hadjimitsis "Tree-based explainable clustering for drought severity predictions in United States", Proc. SPIE 13212, Tenth International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2024), 132121B (13 September 2024); https://doi.org/10.1117/12.3037320
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Decision trees

Artificial intelligence

Environmental monitoring

Machine learning

Agriculture

Water

Back to Top