Enhanced Malicious Email Detection Using Large Language Models and Web-Based URL Scraping

Khalil, Ibrahim

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF SOFTWARE ENGINEERING
→
Thesis Report
→
View Item

Enhanced Malicious Email Detection Using Large Language Models and Web-Based URL Scraping

Khalil, Ibrahim

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17267

Date: 2025-01-15

Abstract:

This study focuses on enhancing malicious email detection through the integration of spam classification, URL analysis, and content-based risk assessment using large language models (LLMs). Traditional methods often address spam and URL detection separately, limiting their effectiveness in identifying sophisticated threats. To bridge this gap, a unified approach was developed, training and fine-tuning a model for both spam and URL classification, with additional functionality to scrape and analyze web content associated with embedded URLs. The initial model demonstrated moderate performance, achieving accuracies of 78.4% for spam classification and 74.4% for URL classification. After fine-tuning, significant improvements were observed, with accuracies rising to 98.0% and 90.2%, respectively. Furthermore, this study highlights the potential of LLMs to analyze web-scraped content and provide interpretable explanations of risks, such as phishing, malware, or fraud, ensuring users are well-informed about potential threats. The objectives of this research include enhancing LLM-based email detection by combining spam and URL detection methods and adding an additional security layer by examining URL contents. The results demonstrate that LLMs not only improve detection accuracy but also effectively communicate potential risks, paving the way for more robust and interpretable email security solutions. This research contributes to advancing the use of LLMs for secure and intelligent email threat detection systems.