This is an outdated version published on 2025-06-30. Read the most recent version.

Mining the Shadows: A Hybrid NLP Framework for Dark Web Cybercrime Investigation

Authors

  • Muhammad Bilal Khan Department of Computer Science National College of Business Administration and Economic, Pakistan.
  • Ans Riaz School of Physics, Engineering and Computer Science, University of Hertfordshire, UK
  • Kusar Perveen University of Engineering and Technology, Lahore 54890, Pakistan

DOI:

https://doi.org/10.54692/ijeci.2025.0901/246

Keywords:

Dark Web, digital forensics teams, malicious software, Natural Language Processing, cybercrime

Abstract

The Dark Web is one of the central hubs of cyber-crime, where such actors discuss campaigns, trade illegal materials, and sell malware. The traditional audit of such environments is non-scalable and inefficient, limited by sheer scale, linguistic diversity and intentional content obfuscation. This article proposes a hybrid Natural Language Processing (NLP) system that can be used to investigate cybercrime automatically on the Dark Web forums. The system was developed to build on the earlier research and transformer-based models like BERT and RoBERTa have been employed with the typical preprocessing steps. Custom components deal with named-entity recognition (NER), topic modeling, sentiment and intent classification and extraction of threat-keywords. Author-tracking across aliases can be achieved with the help of lexical and behavioral features based on stylometric profiling. Experimental analyses show high precision of identifying entities, clustering cybercriminal dialogue and intent categorization, which exceeds baseline models by precision and recall measure. Additional distinction of the system is achieved by the inclusion of a rule-aware ethical scraping protocol as well as an IRB-friendly data-processing layer. Using the conversion of raw and noisy forum text to structured threat intelligence, the framework enables scalable, real-time operation to surveillance the landscape of cybercriminal ecosystems and to provide actionable intelligence to cybersecurity researchers, digital forensics experts, commercial law-enforcement agencies, and any downstream consumers of threat data

Downloads

Published

2025-06-30

Versions

Issue

Section

Articles