Lazy Learning Paradigms for Malicious URL Classification: A Comprehensive Evaluation of Instance-Based Detection Models

Authors

  • Sehrush Seemab Awan Department of Computer Science, UMEABIC, Leeds, United Kingdom
  • Imran Ahmad International Collaborative Research Group, Lahore, Pakistan
  • Abdul Wahab Waseem International Collaborative Research Group, Lahore, Pakistan
  • Ali Raza Latif International Collaborative Research Group, Lahore, Pakistan
  • Ayesha Tariq International Collaborative Research Group, Peshawar , Pakistan
  • Taqadas Ur Rehman International Collaborative Research Group, Lahore, Pakistan
  • Saddam Ali International Collaborative Research Group, Lahore, Pakistan

DOI:

https://doi.org/10.54692/ijeci.2025.0902/263

Keywords:

Lazy Learning Algorithms, K-Nearest Neighbors Classification, Malicious URL Detection, Instance-Based Learning, Cybersecurity Threat Mitigation, Locally Weighted Learning, Case-Based Reasoning Systems

Abstract

Malicious URLs are also sustainable tools of cyberattacks that facilitate phishing attacks, ransomware execution, and credential gathering operations. Conventional methods of detection that are based on signature databases and rule-based heuristics are not effective when dealing with polymorphic attacks and zero-day exploits. Although much effort has been put on eager learning algorithms, little has been done on lazy learning algorithms that do not attempt generalization until query time, which would be used to detect URL threats. This study is a strict comparative evaluation of three lazy learning algorithms K-Nearest Neighbors, Locally Weighted Learning and Case-Based Reasoning in terms of the Malicious Webpages Dataset of (the base data consisted of 1,781 instances, the comparative evaluation was conducted on the balanced set of 2,260 instances) 2260 instances and 21 unique features, such as lexical properties, host characteristics, DNS attributes, and network behavior patterns. It has been experimentally demonstrated that KNN using optimized distance measures has a better classification score of 97.47 % accuracy, 96.92 % precision, 98.15 % recall and 97.53 % F1-score, compared to LWL (96.34 % accuracy) and CBR (95.69 % accuracy). The present study allows adding empirical data to the idea of instance-based classification techniques and provides the basis of future developmental benchmarks in adaptive learning applications in the field of cybersecurity.

Downloads

Published

2025-12-30

Issue

Section

Articles