Deteksi Spam Pada Email Berbasis Fitur Konten Menggunakan Naïve Bayes

Nur Qodariyah Fitriyah; Hardian Oktavianto; Hasbullah Hasbullah

doi:10.32528/justindo.v5i1.3414

Deteksi Spam Pada Email Berbasis Fitur Konten Menggunakan Naïve Bayes

Nur Qodariyah Fitriyah, Hardian Oktavianto, Hasbullah Hasbullah

Abstract

Penelitian menunjukkan bahwa terdapat lebih dari 3 milyar akun email di dunia dengan frekuensi pengiriman email sekitar 205 – 294 milyar setiap hari. Salah satu masalah yang muncul dari pengiriman email yang luar biasa ini adalah adanya spam email. Salah satu solusi untuk mengatasi permasalahan spam email tersebut adalah dengan teknik penyaringan spam email. Penyaringan spam email dapat dilakukan dengan menggunakan pendekatan teori berbasis pembelajaran, yaitu dengan klasifikasi. Penelitian ini menerapkan algoritma Naive Bayes untuk melakukan klasifikasi spam email sehingga dari dataset email, akan dikelompokkan menjadi 2 yaitu spam email dan non- spam email. Hasil uji dengan menggunakan k-fold cross validation sebagai pembagian data latih dan data uji, menghasilkan kesimpulan bahwa nilai rata – rata data terklasifikasi benar adalah sebesar 3903, sedangkan nilai rata – rata data terklasifikasi salah adalah sebesar 698, rata – rata akurasi sebesar 84.8%, sedangkan rata – rata precision dan recall berturut – turut adalah 0.86 dan 0.85. Akurasi, precision, dan recall tertinggi diperoleh ketika menggunakan nilai k=9.

Kata kunci: deteksi, klasifikasi, spam email, naive bayes

ABSTRACT

Research shows that there are more than 3 billion email accounts in the world with a frequency of sending emails around 205 - 294 billion every day. One problem that arises from sending this extraordinary email is the existence of spam email. One solution to overcome the problem of email spam is by email spam filtering techniques. Email spam filtering can be done using a learning-based theory approach, namely classification. This study applies the Naive Bayes algorithm to classify email spam so that from the email dataset, it will be grouped into 2 namely spam email and non-spam email. The test results using k-fold cross validation as a division of training data and test data, resulting in the conclusion that the average value of correctly classified data is 3903, while the average value of classified data is 698, the average accuracy is 84.8% , while the average precision and recall are 0.86 and 0.85, respectively. The highest accuracy, precision, and recall are obtained when using the value k = 9.

Keywords: detection, classification, spam email, naive bayes

Full Text:

PDF (Bahasa Indonesia)

References

ALURKAR, A. A., RANADE, S. B., JOSHI, S. V., RANADE, S. S., SONEWAR, P. A., MAHALLE, P. N., & DESHPANDE, A. V. 2017. A Proposed Data Science Approach for Email Spam Classification using Machine Learning Techniques. Internet of Things Business Models, Users, and Networks (pp. 1-5). Copenhagen: IEEE.

CHANDRA, W. N., INDRAWAN, G., & SUKAJAYA, I. N. 2016. Spam Filtering Dengan Metode Pos Tagger Dan Klasifikasi Naïve Bayes. Jurnal Ilmiah Teknologi dan Informasia ASIA, X(1), 47-55.

HOPKINS, M., REEBER, E., FORMAN, G., & SUERMONDT, J. 2018. Spambase Dataset. Retrieved from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/

MUJTABA, G., SHUIB, L., RAJ, R. G., MAJEED, N., & AL-GARADI, M. A. 2017. Email Classification Research Trends: Review and Open Issues. IEEE Access, 9044-9064.

RAZAK, S. B., & MOHAMAD, A. F. 2013. Identification of Spam email Based on Information from Email Header. 13th International Conference on Intellient Systems Design and Applications (pp. 347-353). Bangi: IEEE.

RUSLAND, N. F., WAHID, N., KASIM, S., & HAFIT, H. 2017. IOP Conference Series: Materials Science and Engineering. International Research and Innovation Summit (IRIS2017). 226, p. 012091. Melaka, Malaysia: IOP Publishing.

SAFUAN, WAHONO, R. S., & SUPRIYANTO, C. 2015. Penanganan Fitur Kontinyu dengan Feature Discretization Berbasis Expectation Maximization Clustering untuk Klasifikasi Spam email Menggunakan Algoritma ID3. Journal of Intelligent Systems, I(2), 148-155. Retrieved from http://journal.ilmukomputer.org

VYAS, T., PRAJAPATI, P., & GADHWAL, S. 2015. A Survey and Evaluation of Supervised Machine Learning Techniques for Spam e-mail Filtering. IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT) (pp. 1-7). COIMBATORE, TAMIL NADU, INDIA: IEEE.

WIJAYANTO, A. W., & TAKDIR. 2014. Fighting Cyber Crime in Email Spamming: An Evaluation of Fuzzy Clustering Approach to Classify Spam Messages. International Conference on Information Technology Systems and Innovation (ICITSI) (pp. 19-24). Bandung-Bali: IEEE.

DOI: https://doi.org/10.32528/justindo.v5i1.3414

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

View My Stats

Username
Password
Remember me