A Stacked Heterogeneous Ensemble Learning Framework for Ransomware Network Traffic Detection Using WEKA: An Empirical Study on CIC-IDS2018

Authors

  • Saddam Ali International Collaborative Research Group, Lahore Pakistan
  • Muhammad Tayyab Waqar Department of Computer Sciences, University of Management and Technology, Lahore, Pakistan
  • Fareeha Akbar International Collaborative Research Group, Lahore Pakistan
  • Abdul Wahid Soomro Department of Computer Systems and Technology, Universiti Malaya, Kuala Lumpur 50603, Malaysia
  • Khalid Ali Department of Computer Science, University College of Dera Murad Jamali, Lasbela University of Agriculture, Water and Marine Science Uthal, Balochistan, Pakistan.
  • Muhammad Ali International Collaborative Research Group, Islamabad, Pakistan

DOI:

https://doi.org/10.54692/ijeci.2026.1001/270

Keywords:

Ransomware detection, WEKA, stacked generalisation, CIC-IDS2018, network intrusion detection, machine learning

Abstract

Network intrusions related to ransomware attacks are growing in both number and sophistication. A transparent, reproducible, and generalizable detection framework is therefore essential for diverse network traffic patterns. This study develops and validates a stacked heterogeneous ensemble: Random Forest, J48, Naïve Bayes, SMO and k-NN were employed as base learners and Logistic Regression as a meta-learner in WEKA 3.8.6, and per-fold predictions were exported for further statistical analysis on a ransomware-associated network-traffic classification dataset, CIC-IDS2018. Eighty raw attributes were reduced to twenty-two using the preprocessing pipeline following the CRISP-DM framework, in which the steps of cleaning, normalisation and correlation-based feature selection were performed. Within each cross-validation fold, SMOTE was applied independently to the training partition to balance the minority class. The stacked ensemble achieved the highest accuracy of 99.18±0.18% (using stratified ten-fold cross-validation with three independent seeds: (1, 7, 42)), an F1-score of 0.940, an AUC of 0.978 and an MCC of 0.936. The proposed model achieved statistically significant improvements over the best single classifier (Random Forest, accuracy 98.3%, F1-score 0.910), as demonstrated by paired t-tests (p = 0.00021) and the Friedman–Nemenyi procedure (χ² = 37.9, p < 0.001, CD = 1.62). The results demonstrate that heterogeneous stacking effectively captures complementary decision boundaries that individual WEKA classifiers cannot. The proposed pipeline is transparent and open-source, making it suitable for resource-constrained network intrusion detection and digital-forensic triage.

Published

2026-07-01

Issue

Section

Articles