GIS implementation and classterization of potential blood donors using the agglomerative hierarchical clustering method

The blood needs of PMI (Indonesian Red Cross) in the Surabaya City area are sometimes erratic, the problem occurs because the amount of blood demand continues to increase while the blood supply is running low. As the main objective of this research, data mining was applied to able to cluster the blood donor data in UTD-PMI Surabaya City Center which was to determine both potential and no potential donors and also visualize the pattern of donor distribution in Geographic Information System (GIS). Agglomerative Hierarchical Clustering was applied to obtain the clustering result from the existing of 8757 donors. The experiment result shown that the cluster quality was quite good which reached 0.6065410 using Silhouette Coefficient. We concluded the one interesting analysis that private male employees with blood type O, and live in the eastern part of Surabaya City are the most potential donors.


INTRODUCTION
The Indonesian Red Cross (PMI) is an independent and neutral organization in the Indonesian state, for its activities covering the social and humanitarian field. In carrying out all its activities, PMI always adheres to the seven principles of the International Red Cross and Red Crescent, namely humanity, volunteerism, neutrality, equality, independence, unity and universality (Raufun et al., 2019). In its implementation, the Indonesian Red Cross also does not make distinctions but rather prioritizes objects to victims who desperately need immediate help for the safety of their souls.
Blood donation is one of the humanitarian activities that aims to assist and assist community members who need blood, blood donation activities are organized and managed by the Indonesian Red Cross. Blood supply is often not constant, it happens because the number of donors is always uncertain or fluctuates, so it will be a problem when the amount of blood demand increases while the blood supply is running low (Atmaja et al., 2018). PMI in the City of Surabaya always conducts socialization aimed at raising public awareness to conduct regular blood donations, by disseminating information thoroughly to all elements of society of all ages, professions and regions. This method was deemed ineffective because each element of society who had donated had different characteristics to receive the information presented.
It is hoped that through the existing donor data at the Surabaya City Center PMI, the clustering process can be carried out using the Agglomerative Hierarchical Clustering (AHC) method and can be implemented into the visualization of the Geographical Information System (GIS) which is useful as a visualization of potential donor distribution patterns by determining the region. from donors. So that it can be focused on where the dissemination of information must be done to be more effic

REASEARCH METHOD
In this study, there are several stages and methods that can be used as materials to solve problems in the study. The system design in this study is shown in Figure 1: The image part with relationship ID rId9 was not found in the file.
The image part with relationship ID rId9 was not found in the file.
The image part with relationship ID rId9 was not found in the file.
The image part with relationship ID rId9 was not found in the file.
The image part with relationship ID rId9 was not found in the file.
The image part with relationship ID rId9 was not found in the file.

Data mining
Data mining is a series of processes to explore added value in the form of information that has not been known manually from a database (Atmaja et al., 2018). Data Mining Method has been known since 1990 (Iqbal, 2019). Basically Data Mining is at the core of the Knowledge Discovery in Database (KDD) process which involves algorithms to explore data, develop models and find previously unknown patterns.

Clustering
Grouping or can be called clustering is a method for finding and grouping data that has (similarity) or similar characteristics between one data and another. There are two types of data clustering, namely partitional clustering and hierarchical clustering (Dani et al., 2019). With partitional clustering, data objects are divided into sub-set clusters that do not overlap so that each data object is in one sub-set. Meanwhile, hierarchical clustering is a nested cluster arranged as a hierarchical tree.

Min-max normalization
Min-Max Normalization is a normalization method by performing linear transformations of the original data so that it can produce a balance of value comparisons between data before and after processing (A. Nasution. H. Khotimah. N. Chamidah, 2019). This method can use equation (1).
Information : x '= normalized data value x = actual data value xmin = smallest data value xmax = largest data value

Agglomerative hierarchical clustering method
Agglomerative Hierarchical Clustering is a clustering method that can group objects in data into a hierarchy (Bachtiar et al., 2017) . In this method, there are two types of grouping, namely Agglomerative (bottom-up) and Divisive (top-down). Agglomerative Hierarchical Clustering is a bottomup clustering method which combines several clusters into a single cluster.
The process starts from each data as a cluster, then recursively scatter looking for the closest group as a pair which will then be combined into a larger group (Suhirman & Wintolo, 2019) . The process will continue to be repeated until it appears to form a hierarchy. The following is an equation for calculating the distance between data contained in the Agglomerative Hierarchical Clustering method using the distance matrix formula (2) and (3).

Evaluation of the silhoutte coefficient
Silhoutte Coefficient (SC) figures used in testing the quality of the clusters formed (Wardani et al., 2019). This method can be used as a test method in conducting research, especially in the clustering method. Calculation of the Silhoutte Coefficient value is through equation (7).
Information : Si = silhouette coefficient ai = the average distance between the ith object and all objects in the same group bi = the average distance of the ith object with all objects in different groups The results of the Silhoutte Coefficient calculation have a range between -1 to 1, it can be said to be good if it is positive (Dani et al., 2019) and it is said to be bad if it is negative.

Geographical information system
Geographical Information System is a computer technology-based system that is used as data storage that can manipulate geographic information. In addition, GIS can present information or data in graphical form using maps as an interface (Teknik et al., 2016). The several stages carried out in this research are :

Data Collection
Collecting data in this study using donor data obtained at the Indonesian Red Cross Blood Transfusion Unit (UTD-PMI) Surabaya City Center, totaling 8757 donor data. The data has an extension of * .xlsx with six supporting attributes as shown in Table 1, the data used as a dataset ranges from 2013 to 2018.

Pre-Processing Data
At this stage, it is a data pre-processing stage where there are two processes, namely data cleaning and data normalization. a. Cleaning Data Data cleaning includes several operations including identification of data entry without data and entry of lost data. b.
Data Normalization Data normalization is carried out with the aim of making all data variables in the study within the same value range, so that it will be able to minimize the differences between research variables. In normalizing the data in this study using Min-Max Normalization can be seen from equation (1).

Data Processing (AHC)
Data processing in this study was carried out using the Agglomerative Hierarchical Clustering method to obtain clusters from donor data that would produce potential and non-potential donors, by utilizing matrix calculations between Euclidean Distance data distances in equation (2) with the AHC Average Linkage process. (average distance) in equation (6).

Cluster Validation Test
The next thing is to test the validity of the cluster, with the aim of seeing the goodness or quality of the results of the cluster analysis. In this study using the Silhoutte Coefficient (SC) method as cluster validation, which is in equation (7).

GIS implementation
The final stage is the implementation stage into the Geographical Information System (GIS), which aims to find the location of the potential distribution of blood donors by utilizing the region / region attribute data of donors in the city of Surabaya.

Data discretization formation
The initial stage in establishing potential and non-potential donor clusters in this study is to determine the data variables used in the formation of the cluster model in the form of gender, goal, number, age and profession. By making changes to categorical data to be discrete as shown in Table  2  c. Profession: government employees= 1, private = 2, army / police= 3, students = 4, farmers / factory workers= 5, housewife= 6, self-employed = 7 and others = 8

Normalization of data (pre-processing)
There are lots of data that have different value ranges so that it is required to carry out the normalization process. The donor dataset is transformed using the min-max normalization method by processing the minimum and maximum values of each attribute. The range used in this method is 0 to 1 as shown in Table 3.

Establishment of a donor data clustering model
At the donor data processing stage, the first thing to do is determine the estimated number of data clusters to be formed and calculate the distance between data by ensuring that the dataset used is discrete data. The Euclidean Distance method is a method of calculating the distance between data used in this study. The next stage is to carry out the cluster formation process. Cluster formation carried out in this study uses Agglomerative Hierarchical Clustering -Average Linkage, namely by determining the closeness between two groups of the average distance between two data from a different cluster.
The results obtained from the three processes will display a dendrogram visualization of the results of clustering potential and non-potential donor data as illustrated in Figure 2.  Furthermore, the array can be displayed in a form like Table 4 by adding cluster attributes to the processed donor dataset, aiming to find out which patterns in the donor dataset labeled 1 and 0. By doing this process it will be easier to find out the status of the donor.

Evaluation of cluster results
As the final stage of donor data processing by calculating the accuracy of cluster validity using the Silhoutte Coefficient (SC), the results show that the level of accuracy obtained reaches a value of 0.6065410 as shown in Figure 4. The image part with relationship ID rId9 was not found in the file.

Visualization of geographic information systems
The implementation process of the existing programs in this study is to visualize a Geographical Information System (GIS) by utilizing the region's attributes as a support for the distribution of donors. At this stage of the process the researchers listed five areas in the city of Surabaya, namely North, West, East, South and Center by utilizing the latitude and longitude of these areas.
The following Figure 5 is a login display in the distribution system for potential donors in the form of a website. The results of the GIS visualization are illustrated in Figure 6, it can be seen that the distribution of donors found on the map of the City of Surabaya produces five points of potential donor areas. Each region can also display insights in the form of a percentage of the number of potential and non-potential donors, gender, blood type, and occupation of the donor. In this discussion, an example is taken to display the percentage in the eastern part of the city of Surabaya. Figure 7 shows the percentage of donors in the Eastern Region which is shown to have two parts, namely 49% for potential donors and 51% for non-potential donors.

Figure 7. Percentage of potential and non-potential donors
The image part with relationship ID rId9 was not found in the file.
The image part with relationship ID rId9 was not found in the file.
The image part with relationship ID rId9 was not found in the file. Figure 8 shows the percentage of blood donors based on gender in the Eastern Region, the information displayed is that potential donors are 5% for women and 95% for men, and 16% for women and 84% for men who are not potential donors. Figure 8. Percentage of potential and non-potential donors by gender\ Figure 9 shows the percentage of blood donors based on blood type in the Eastern Region, the information displayed is the potential donors of 23% for blood group A, 8% for blood group AB, 31% for blood group B and 38% for blood group O. While the donors which is not potential for 20% for blood group A, 7% for blood group AB, 31% for blood group B and 42% for blood group O. Figure 9. Percentage of potential and non-potential donors by blood group Figure 10 shows the percentage of donor employment in the Eastern Region shown for potential donors of 30% for PNS, 46% for private sector, 2% for Army / Police, 8% for Self-employed, 1% for Housewives, 4% for Others and 9% for students, and 9% for civil servants, 65% for private sector, 7% for Army / Police, 6% for self-employed, 1% for housewives, 4% for Others and 8% for Student. Figure 10. Percentage of potential and non-potential donors by occupation The image part with relationship ID rId9 was not found in the file.
The image part with relationship ID rId9 was not found in the file.
The image part with relationship ID rId9 was not found in the file.

CONCLUSIONS
From the results of the experiments conducted, it can be seen that the data used were 8,757 blood donors obtained from UTD-PMI Surabaya City Center. By applying the clustering method using Agglomerative Hierarchical Clustering, the results are quite good, namely the quality of the cluster reaches the Silhoutte Coefficient (SC) value of 0.6065410. The results of the clustering analysis of donor data that have been carried out can be drawn one interesting conclusion that male gender private employees with blood type O from the East Region of Surabaya City are the most potential donors.