Prospect Prediction Model Of Indonesian Telematics Medium Large Size Enterprises Using Deep Learning Approach

Analysis of business prospects is an important part of predicting a country's economic conditions. Currently, the prediction of prospects for medium-big sized enterprises (MLE) in the telematics sector has not been widely researched and represented as a factor of economic development in Indonesia. In fact, in accordance with the development of the Industrial Revolution 4.0, the telematics sector business is one of the pillars that is a priority to be developed in Indonesia. The main purpose of this study is to construct the prediction model for prospects in the Indonesian telematics LME sector using a deep learning approach. We used data from the 2016 National Economic Census as many as 2500 preprocessed data. The deep learning approach in this study used a multilayer perceptrón (MLP) architecture, 17 attributes, 3 hidden layers and 5 target classes. The attributes in question include province, business owner education, legal entity status, length of operation, business network, total assets, business lava, number of workers, difficulties, partnerships, marketing innovations, comparison of profit with the previous year, and development plans. The target class of prospects are excellent, good, neutral, bad and very bad. The optimal results were achieved in epoch 50 conditions with a learning reate of 0.2 and an accuracy rate of 98.80%. Based on the prediction model, this business prospect can be used as a reference for the development of MLE in the telematics sector in Indonesia. This prospect model still lacks visualization and attribute analysis that affects the classification of prospects for Indonesian telematics MLE. Research development opportunities can be carried out through the integration of the whitebox model in the deep learning model and complementing a web-based graphical user interface (GUI) to make it easier for stakeholders to develop strategies based on the strength of attributes that affect the prospects for MLE Telematics Indonesia. This is expected to boost the competitiveness of the prospects for Indonesian telematics MLE. Keyword : Deep learning, Prediction model, Prospects, Telematics enterprises, Whitebox


Introduction
The development of Information Technology (ICT) or telematics has now become an important requirement for humans to solve problems in various fields effectively and efficiently. Telematics is a combination of communication network systems and information systems used by the Indonesian government to become one of the priority areas for Indonesia's development in the economic sector. Therefore, in increasing competitiveness in the field of telematics, the government has made efforts through the Economic Census in the Small Medium Enterprises (SME) and Medium Large Enterprises (MLE) sectors in several regions to determine business prospects for economic development. Analysis of business prospects is an important part of predicting future economic conditions. In addition, it is very useful to map several business sectors that have increasing prospects with the aim of providing growth to the economic sector in Indonesia. Several studies related to the condition of Indonesian telematics have been carried out to determine the classification model of telematics service businesses, especially telematics MSEs through decision rules and hybrid mining approaches (Tosida et al. 2018). Furthermore, Tosida et al. (2019a;2019b) has reviewed the telematics development of Small and Medium Enterprises (MSEs) which needs to be done in accordance with current industrial developments. This is intended to provide input to the government, especially for related ministries and agencies, in formulating policies related to the development of telematics for Small and Medium Enterprises (SMEs) in Indonesia.
The Statistics Indonesia-Government Bureau (BPS) has published the results of the Economic Census in the 2016 period related to business prospects for Large Medium Enterprises (MLE), but there has not been an in-depth study of the prospects and economic development related to MLE telematics in Indonesia. In measuring business prospects, several attributes that have been tested statistically are fundamental to provide an overview of the current and future economic conditions (Said, Candraningtyas 2006;Popova et al. 2019;Al-Busaidi, Al-Muharrami 2020). The attributes used to analyze the prospects for MLE telematics still use the existing attributes in the Economic Census data, including areas, years of operation, capital, constraints, and business development plans.
This MLE telematics classification is an attempt at the process of grouping business prospect classes based on predetermined parameters. This is done to determine the level of similarity and differences in MLE telematics characters in a data group with a larger data group. Therefore, a certain technique is needed to determine the tendency pattern of telelmatics MLE characters. Tosida et al. (2020a) has implemented human resource intelligence business for telematics businesses in Indonesia based on the Balanced Scorecard on customer and internal business aspects through clustering and pipeline and datalake approaches. As for Wu et al, (2016) who implemented the Deep Belief Network (DBN) algorithm using a limited Boltzman engine compared to the Multilayer Perceptron (MLP) algorithm and the Support Vector Machine (SVM) algorithm. The results show that the flexibility of a deep learning model can provide strong support for fully-informed credit score measurement and a very complex credit risk assessment. This study aims to classify Indonesian telematics MLE using a deep learning approach. The data used for the MLE classification process is the 2016 Economic Census data which is limited to 6 provinces, namely North Sumatra, Banten, DKI Jakarta, DI Yogyakarta, Gorontalo, and Central Kalimantan.

Method
The method used in this study uses the Data Mining stage or also called Knowladge Discovey in Databases (KDD), with details of the stages presented in Figure 1. Stage 1: Data Cleaning Data cleaning is carried out by filling in blank data with mean (for numeric data), mode (for categorical data), removing inconsistent data and outliers according to Tosida et al, (2015). Based on the data sources obtained previously amounted to 3961 data, and after cleaning the data, 3400 data were obtained. The data used in this study consisted of 17 attributes and 5 target classes as benchmarks (parameters). The attribute description can be seen in Table 1.

Stage 2: Data Selection
The data selection stage is carried out through attributes related to internet and computer users. Telematics business is basically carried out using the Sugeng computer internet network (2020). Therefore, after excluding data, the attributes related to businesses that do not use the internet or do not use computers, and the data used for the classification process is 2500 data.

Stage 3: Data Transformation
The data transformation stage on the attributes of the operating year, which originally had 35 options, became 2 choices as shown in Table 1 and refers to the research of Tosida et al. (2015). Asset data 2015 and 2016, total profit and number of workers are transformed into categorical data referring to Sugeng (2020) and Tosida et al. (2015).

Stage 4: Mining Process
The mining process is carried out with classification techniques and deep learning approaches using the Multilayer Perceptron (MLP) model of Tosida et al. (2020b). The architecture that was built, which consists of four layers described in Figure 5, with the architectural model consisting of: 1) input layer (4 neurons), 2) first hidden layer (3 neurons), 3) second hidden layer ( 4 neoron), 4) Output layer (1 neuron). Learning Rate (ɋ) is 0.25 and the maximum is Epoch 500. Classification using MLP considers response data that has values. So that in determining the testing data and training data, it is necessary to do stratification. This ensures that all data in the class attribute can be included in testing data and training data. Furthermore, there is the MLP classification model. Then proceed by testing the accuracy of the model obtained. Indonesian mathematics is carried out through the architecture shown in Figure 2.

Tahap 5: Pattern Evaluation
This stage is carried out to identify interesting patterns into Knowledge Based. The results of data mining techniques in the form of distinctive patterns and predictive models are evaluated to assess model performance.

Tahap 6: Knowledge Presentation
This stage is carried out to identify interesting patterns into Knowledge Based. The results of data mining techniques in the form of distinctive patterns and predictive models are evaluated to assess model performance.

Algoritma Mutilayer Perceptron (MLP)
Multilayer perceptron (MLP) according to (Ajay, Mahmood 2019) also called Multilayer Feed Forward Neural Network is the most extensive algorithm to use. The multilayer perspective (MLP) consists of the input layer, hidden layer and the output layer of (Tika, Adiwijaya 2019). The following is an explanation of each layer: 1. Input Layer functions to receive input values from each record in the data. The number of input nodes is equal to the number of Variable Predictors. 2. Hidden Layer transforms the input value in the network, each node in the Hidden layer is connected to the previous Hidden layer nodes or from the nodes in the input layer and the next Hidden layer nodes and at the output layer node where the number of Hidden layers can be any . 3. The Output Layer is a line that comes from the hidden layer or input layer and returns the output value corresponding to the predection variable. The output from the output layer is usually a floating value between 0 and 1.
Initialize the weight of each input vector for xt = 1,2… .., n activation sets for the input unit on the input neuron, and initialize the weight on each signal for each neuron h1, h2 The value of r is the number of input (feature) data input, x is the value of the feature / vector, w is the vector weight. The value of v is then activated to produce an output signal. The activation function used is binary sigmoid or bipolar sigmoid. The sigmoid activation function becomes the equation. = (2) e = Exponen. y = Sigmoid To propagate error signals, start from the output layer and work your way back to the hidden layer. The error signal in neuron y in the iteration p is given in the equation.
The procedure used to update the weights on the hidden layer connection to the output layer.
The condition experienced is that the neuron input at the output layer is different from the neuron input at the xi input layer. Therefore, what is used to calculate the weight correction is the y neuron output signal in the hidden layer yj to replace xi, the weight correction in MLP is calculated by the equation.
η is the learning rate, while ∆ (p) is the error gradient on neuron y in the output layer in the iteration to p. To calculate the error gradient on the binary sigmoid activation function, it is obtained.

Evaluasi Confusion Matrix
A confusion matrix is a table consisting of the number of rows of test data that the classification model predicts true and false. In this study, a system research will be held based on the accuracy of the system predicting the target class of the MLE Indonesian telematics prospects using confusion matrix in Table 2 showing confusion matrix that can be used for calculations in the (Pangestu et al. 2020).
2. Recall is the proportion of the correct prediction of prospect class in the business sector.

Discussion
The research experimentation carried out based on 3 scenarios. Scenario 1 is an experiment conducted on mixed data (categorical and numeric) or without any data transformation. Scenario 2 is an experiment which is carried out by applying data transformation (numerical data is transformed into categorical data). Scenario 3 is done by applying transformation and attribute selection. The results of applying the deep learning approach to the classification of Indonesian telematics MLE based on business prospects using 5 target classes can be seen in Figure 3.  Figure 4, and the results of evaluation using confusion matrix are shown in Figure 5. In Figure 4 the blue line shows the training data process and the orange color shows the data testing process which explains the unstable process line, so that the accuracy level is not good, namely 44.32%. The information obtained shows that the value in the contingency table between actual data and predictive data is weak. This can be seen in the diagonal matrix, which is getting purple. Figure 6, explaining the amount of data from the classification results of MLE telematics in Indonesia before the transformation is carried out, the data for each prospect class is obtained, namely: 1) Very good (1457 data), 2) Good (28 data), 3) Bad (177 data), 4) Very bad (27 data), 5) Neutral (811 data). The system that has been implemented uses the Deep Learning approach model, designed to process data in the form of a grid (Luo et al. 2016). Neural Network Neutral was created to be able to be used for classification of labeled data using a supervised learning method that is able to classify the form of image values and to recognize sounds. The classification process of MLE telematics before the transformation is carried out, which has several attributes in the form of numerical continue.
The experimental results of the Indonesian telematics MLE classification model using scenario 2 are shown in Figures 7, 8, and 9. The deep learning model was experimented with in scenario 1 conditions with Epoch 20 and Learning Rate (ɋ) was 0.25. Figure 7 shows the relationship between training data and testing data on sensitivity and specificity. This shows that the cut-off point for sensitivity and sensitivity is in a position that corresponds to sensitivity. This condition technically describes two dimensions where the (true) positive level lies on the Y axis line, while for (false) positive it lies on the X axis line. The lower the point towards the left (0.0), then it is expressed as a classification approaching / being negative, whereas the higher the point to the right (1.1), it is stated as a predictive classification to approach / be positive. In Figure 7, showing the blue line is the result of the accuracy of the training data and the orange color shows the testing data process, so that the accuracy level is 94.73%. Figure 5 shows the results of the evaluation confusion matrix, which can be concluded that the relationship between predictor variables is very large or strong. This can be seen from the correlation value which is close to the actual value, so it can be seen on the diagonal matrix in cream and pink colors, which means it has good accuracy. Figure 9 provides an overview of the results of the visualization of the classification results of MLE telematics in Indonesia using the Deep Learning approach method. The details of the classification results for each class are: 1) Very good (1296 data), 2) Good (487 data), 3) Bad (76 data), 4) Very bad (90 data), 5) Neutral (551 data). Classification results show that MLE telematics Indonesia is dominated by businesses that have very good prospects. This can be used as a potential to strengthen Indonesia's economic competitiveness, supported by empowering telematics businesses in economic development. However, the classification results also show that there are still 4.7% of telematics MLEs with bad and very bad prospects. The results of this classification can be used as a prediction model to determine the performance of MLE telematics in Indonesia, so that business strengthening strategies can be mapped (Tosida et al. 2019;Popova et al. 2019).
In previous research, Tosida et al. (2020b) uses the deep learning method to map the strength of Indonesian telematics MSEs which aims to predict the eligibility of beneficiaries for Indonesian telematics MSEs. The data used also comes from the 2016 Economic Census. The algorithm used is the Convolutional Neural Network, by converting the data into a single dimension (Shresta et al. 2019). The size of the data matrix which was originally 16 x 1 was converted to 4 x 4. This strategy was able to increase the accuracy up to 99.03%.
The results of the classification model using scenario 3 were carried out by selecting attributes, according to Tosida et al. (2015). In this study, attribute selection was carried out for UMK telematics Indonesia 2006 using the AHP decision model and involved three telematics experts. The experts involved come from academia, practitioners and government. Attribute selection was carried out on the 10 highest attributes of AHP results (Tosida et al. 2015). This attribute selection technique aims to provide information in updating the output on the prospect classification of MLE Telematics Indonesia. The results of the classification using scenario 3 are shown in Figures 10, 11 and 12. . Sensitivity level of model 2 Figure 11. Evaluation tabel of model 3 Figure 12. Composition of model 3 Figure 10 shows the accuracy rate of 98.80%. The results of confusion matrix evaluation are shown in Figure 12. The condition of the relationship between predictor variables is very strong. This can be seen from the correlation value that is close to the actual value. It can be seen that the diagonal matrix is cream and pink, which means it has a good accuracy. Figure 12 explains the number of data from the classification results of MLE telematics in Indonesia scenario 3, and the data for each class, namely: 1) Very good (1290 data), 2) Good (482 data), 3) Bad (78 data), 4) Very bad ( 163 data), 5) Neutral (477 data). The scenario 1 classification model shows that the single dimensional deep learning approach has not been able to map the data well, even though it shows a tendency of the composition of the classification results that is the same as the classification results using the scenario 2 and 3 models.The results of all experimentation of the telematics MLE classification model are shown in Table 3.  Table 3 the value of the Learning rate regulates how much updates / updates are made to certain parameter values. If the parameter value is small enough, then the error function value is guaranteed to decrease after the update. Therefore, the value is usually set small, for each Learning rate value that is used. Based on several trials, the best learning rate values in each scenario were 0.30, 0.25, 0.20. The use of Epoch functions to measure the accuracy of actual data with predictions. Based on the trial results, the best number of Epochs is 200, 20 and 50. Precision is the accuracy of the actual data with predictions and the best value is shown in the results of the scenario 3 model (Flach 2015). Success in classification can be measured at the recall value and the highest recall is obtained from the scenario classification model trial 3. Loss validation is a loss related to all possibilities generated by the model. A good loss validation also functions to produce the lowest error. The smallest error is generated by experimentation scenario 3. The related accuracy results show the results of the presentation of success in the dataset classification process. The highest accuracy results were also generated from scenario 3.
The weakness of this research is that it has not been able to show that the level of attribute strength affects the formation of the classification model. Therefore, it is necessary to develop further research, by integrating the whitebox approach with Bohanec et al. (2017). This technical whitebox is very prospective to be applied in this study, so that the resulting classification model is easier to interpret by stakeholders.

Conclusion
The classification of Indonesian telematics MLE using the deep learning approach gives good results. The data used in this study is the MLE telamatika Indonesia data sourced from the 2016 Economic Census conducted by BPS Indonesia. The target class is the prospect of Indonesian telematics MLE which consists of 5 classes, namely very good, good, neutral, bad and very bad. The Indonesian telematics MLE classification model uses an architecture of 4 first hidden layers, 3 second hidden layers and 4 third hidden layers. This condition is the most optimal condition. The highest accuracy was obtained in scenario 3 which involved the process of transformation and attribute selection. The accuracy rate of the model is 98.80%, recall is 95.00%, precision is 98.01% and lost validation is 0.012. This condition is obtained at a learning rate of 0.2 and an epoch of 50. However, the deep learning approach has not been successfully used to build a telematics MLE classification model that uses a mixture of numerical and categorical data. This research needs to be developed by integrating the deep learning model with the whitebox approach, so that the model is easier to interpret and analyze in relation to the level of attribute strength that affects the classification model. This is very useful for stakeholders to formulate strategies for strengthening MLE telematics as a sector that plays an important role in Indonesia's development.