Detecting Phishing Websites using Decision Trees: A Machine Learning Approach

Ashar Ahmed Fazal, Maryam Daud

  • Ashar Ahmed Fazal Department of Criminology and Forensic Sciences, Lahore Garrison University, Lahore
  • Maryam Daud University of Engineering and Technology, Lahore

Abstract

This study emphasises the value of feature selection and preprocessing in improving model performance and demonstrates the efficiency of decision trees in identifying phishing websites. Internet users are significantly threatened by phishing websites, hence a strong detection strategy is required. The Phishing Websites Dataset from the UCI Machine Learning Repository, which contains 30 website-related features, is used in the study together with a decision tree classifier from the scikit-learn package. The dataset is preprocessed to remove invalid and missing values, and the most pertinent features are chosen for model training. 80% of the dataset is utilised to train the model, while the remaining 20% is used for testing. The findings demonstrate the decision tree classifier's precision in detecting phishing websites, scoring 95.97% accurate and showing a high true positive rate (96.64%) and a negligible (3.04%) false positive rate using the confusion matrix. This study highlights the significance of feature selection and preprocessing for optimal model performance in addition to validating the efficacy of decision trees in phishing detection. The method described here can be helpful for businesses and individuals looking to protect themselves from phishing assaults, and the given data visualisations make it easier to understand datasets and assess models.

Published
2023-07-03