Lazy Learning Paradigms for Malicious URL Classification: A Comprehensive Evaluation of Instance-Based Detection Models

Sehrush Seemab Awan; Imran Ahmad; Abdul Wahab Waseem; Ali Raza Latif; Ayesha Tariq; Taqadas Ur Rehman; Saddam Ali

doi:10.54692/ijeci.2025.0902/263

Authors

Sehrush Seemab Awan Department of Computer Science, UMEABIC, Leeds, United Kingdom
Imran Ahmad International Collaborative Research Group, Lahore, Pakistan
Abdul Wahab Waseem International Collaborative Research Group, Lahore, Pakistan
Ali Raza Latif International Collaborative Research Group, Lahore, Pakistan
Ayesha Tariq International Collaborative Research Group, Peshawar , Pakistan
Taqadas Ur Rehman International Collaborative Research Group, Lahore, Pakistan
Saddam Ali International Collaborative Research Group, Lahore, Pakistan

DOI:

https://doi.org/10.54692/ijeci.2025.0902/263

Keywords:

Lazy Learning Algorithms, K-Nearest Neighbors Classification, Malicious URL Detection, Instance-Based Learning, Cybersecurity Threat Mitigation, Locally Weighted Learning, Case-Based Reasoning Systems

Abstract

Malicious URLs are also sustainable tools of cyberattacks that facilitate phishing attacks, ransomware execution, and credential gathering operations. Conventional methods of detection that are based on signature databases and rule-based heuristics are not effective when dealing with polymorphic attacks and zero-day exploits. Although much effort has been put on eager learning algorithms, little has been done on lazy learning algorithms that do not attempt generalization until query time, which would be used to detect URL threats. This study is a strict comparative evaluation of three lazy learning algorithms K-Nearest Neighbors, Locally Weighted Learning and Case-Based Reasoning in terms of the Malicious Webpages Dataset of (the base data consisted of 1,781 instances, the comparative evaluation was conducted on the balanced set of 2,260 instances) 2260 instances and 21 unique features, such as lexical properties, host characteristics, DNS attributes, and network behavior patterns. It has been experimentally demonstrated that KNN using optimized distance measures has a better classification score of 97.47 % accuracy, 96.92 % precision, 98.15 % recall and 97.53 % F1-score, compared to LWL (96.34 % accuracy) and CBR (95.69 % accuracy). The present study allows adding empirical data to the idea of instance-based classification techniques and provides the basis of future developmental benchmarks in adaptive learning applications in the field of cybersecurity.

Lazy Learning Paradigms for Malicious URL Classification: A Comprehensive Evaluation of Instance-Based Detection Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

Information