Enhanced Ensemble Learning Approaches for Malicious URL Detection: A Comparative Analysis of Advanced Hybrid Models

Imran Ahmad; Sunal Faraz Hayat; Muhammad Arshad; Khalil Aslam; Shazia Yousaf; Hafiz Muneeb Ahmad; Amara Javed

doi:10.54692/ijeci.2025.0902/261

Authors

Imran Ahmad Riphah Institute of Informatics, Riphah International University Malakand Campus, Lower Dir, Pakistan
Sunal Faraz Hayat Pakistan Navy, Islamabad, Pakistan
Muhammad Arshad University of Layyah, Layyah, Pakistan
Khalil Aslam Sharif College of Engineering and Technology, Lahore, Pakistan
Shazia Yousaf Fazaia College of Education for Women, Lahore, Pakistan
Hafiz Muneeb Ahmad IITECH College of Computer Sciences, IITECH Gujranwala, Pakistan
Amara Javed University of Gujrat, Gujrat, Pakistan

DOI:

https://doi.org/10.54692/ijeci.2025.0902/261

Keywords:

obfuscation, PICOS-based methodological framing, algorithmic URL generators, aAdaBoost, XGBoost, malicious URL detection systems

Abstract

Malicious URLs have become a constant menace on cybersecurity, serving as entry points to phishing campaigns, malware distribution and identity theft. The conventional blacklist and heuristic-based systems are becoming less effective in detecting these dynamic URLs especially those that use domain obfuscation algorithms, fast-flux hosts and algorithmic URL generators. Use of machine learning (ML) in the classification of URLs has already been thoroughly examined, but there is little comparative evidence regarding novel methods of sophisticated ensemble learning. This paper experimentally compares five ensemble algorithms, including Random Forest, Gradient Boosting, XGBoost, Stacking Classifier and AdaBoost, using the Malicious Webpages Dataset that has 1, 781 samples and 21 lexical, host-based, DNS and network features. The academic rigor of the paper is enhanced by systematic preprocessing, PICOS-based methodological framing, and literature synthesis based on PRISMA. Findings showed that XGBoost has the best accuracy of 98.31 %, precision of 97.85 %, and recall of 98.77 % and F1-score of 98.31 % which is better than the baseline AdaBoost accuracy of 96.89 %. The existence of confusion matrices, ROC curves, indicators of computational efficiency and feature importance rankings also confirm the high performance and ability of XGBoost to act in real-time. The research adds to a full comparative study, to the level of greater method clarity and practical considerations to create efficient malicious URL detection systems.

Enhanced Ensemble Learning Approaches for Malicious URL Detection: A Comparative Analysis of Advanced Hybrid Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

Information