A Detecting Phishing URLs using LSTM-CNN hybrid Deep Learning Model
Abstract
Phishing is the act of deceiving the users of sensitive information through fraudulent websites. The conventional types of detection such as blacklisting or rule-based systems tend to be insufficient against the recently created or concealed phishing URLs. In this work, a deep learning-based algorithm based on the hybrid Long Short-term Memory (LSTM) and Convolutional Neural Network (CNN) is offered. In contrast to LSTM, CNN discovers local features, so the overall model based on both approaches is more effective than the one using each of them separately. Its goal is to correctly label the URLs as phishing or not at the character-level. The labelled URLs are then tokenized, padded to a fixed length and run through the model. The hybrid architecture is modelled to the binary classification and assessed with such metrics as accuracy, precision, recall, F1-score, balanced accuracy, Matthew correlation coefficient (MCC), and ROC-AUC. The findings indicate that the hybrid model is more successful compared to baseline models as it is able to learn spatial patterns and sequential patterns. The architecture presents a high possibility of real-time phishing detection since it is scalable and accurate. It additionally provides an encouraging lay-down to future proactive and automatized phishing prevention systems.