Implementation of Deep Learning to Detect Indonesian Hoax News with Convolutional Neural Network Method
Abstract
This study aims to establish and test a model that is used to determine valid news and hoax news. The method used is the Convolutional Neural Network (CNN) method and Word2Vec as embeddings. The research stages consist of data collection, pre-processing, word embeddings, model formation and testing the results obtained. The data used is 958 news. After testing with the distribution of data by 80% as training data and 20% as test data and 5 times epoch, the model that has been formed can determine valid news and hoax news well. In this study, a model with a vector dimension of 400 as input data and a multiple filter size of 3,4,5 became the best model. The resulting accuracy, precision and recall are 0.91. These results are influenced by the selection of the size of the vector dimensions on the output of Word2Vec, the selection of the filter size on the convolution layer and the addition of the Indonesian Wikipedia corpus into the corpus used.
Downloads
References
Apriyono, A. (2021). Warga di Pesisir Kota Kupang Panik Termakan Hoaks Bakal Ada Tsunami. Liputan6. https://www.liputan6.com/regional/read/4525533/warga-di-pesisir-kota-kupang-panik-termakan-hoaks-bakal-ada-tsunami
Azizah, K. N. (2021). Susu Beruang Habis Diborong Warga +62, Ini Kata Nestle. DetikHealth. https://health.detik.com/berita-detikhealth/d-5631596/susu-beruang-habis-diborong-warga-62-ini-kata-nestle
CNNIndonesia. (2018). Warga Panik Akibat Hoaks Gempa-Tsunami di Sulbar. https://www.cnnindonesia.com/nasional/20181001165424-24-334695/video-warga-panik-akibat-hoaks-gempa-tsunami-di-sulbar
Damar, A. M. (2017). Jumlah Aduan Hoax dan SARA Lampaui Konten Pornografi. Liputan6. https://www.liputan6.com/tekno/read/3053599/jumlah-aduan-hoax-dan-sara-lampaui-konten-pornografi
Hidayatillah, R., Mirwan, M., Hakam, M., & Nugroho, A. (2019). Levels of Political Participation Based on Naive Bayes Classifier. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(1), 73. https://doi.org/10.22146/ijccs.42531
Kurniawan, A. A., & Mustikasari, M. (2021). Implementasi Deep Learning Menggunakan Metode CNN dan LSTM untuk Menentukan Berita Palsu dalam Bahasa Indonesia. Jurnal Informatika Universitas Pamulang, 5(4), 544. https://doi.org/10.32493/informatika.v5i4.6760
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations ofwords and phrases and their compositionality. Advances in Neural Information Processing Systems, 1–9.
Oshikawa, R., Qian, J., & Wang, W. Y. (2018). A Survey on Natural Language Processing for Fake News Detection. http://arxiv.org/abs/1811.00770
Pratama, A. R., Mustajib, M., & Nugroho, A. (2020). Deteksi Citra Uang Kertas dengan Fitur RGB Menggunakan K-Nearest Neighbor. Jurnal Eksplora Informatika, 9(2), 163–172. https://doi.org/10.30864/eksplora.v9i2.336
Septanto, H. (2018). Pengaruh Hoax dan Ujaran Kebencian Sebuah Cyber Crime dengan Teknologi Sederhana di Kehidupan Sosial Masyarakat. Jurnal Sains Dan Teknologi, 5(2), 157–162.
Yang, X., Xu, S., Wu, H., & Bie, R. (2019). Sentiment Analysis of Weibo Comment Texts Based on Extended Vocabulary and Convolutional Neural Network. Procedia Computer Science, 147, 361–368. https://doi.org/10.1016/j.procs.2019.01.239
Zhang, X., Zhao, J., & Lecun, Y. (2015). Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems, 2015-Janua, 649–657.
Zhang, Y., & Wallace, B. (2015). A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification. http://arxiv.org/abs/1510.03820
Zhong, B., Xing, X., Love, P., Wang, X., & Luo, H. (2019). Convolutional neural network: Deep learning-based classification of building quality problems. Advanced Engineering Informatics, 40(February), 46–57. https://doi.org/10.1016/j.aei.2019.02.009
Copyright (c) 2021 Cheevin Yoviananda, Aryo Nugroho, Tresna Maulana Fahrudin
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.