Building an Intelligent Defense: Machine Learning-Driven Phishing Detection in a Web-Based Solution

Hossain, Md. Anowar

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

Building an Intelligent Defense: Machine Learning-Driven Phishing Detection in a Web-Based Solution

Hossain, Md. Anowar

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17368

Date: 2025-01-13

Abstract:

Phishing attacks remain a critical threat to online security, necessitating the need for effective detection methods as traditional methods are not much effective in current challenging world. Where every technology is change so fast. In this research, we aim to develop an advanced phishing detection system leveraging machine learning (ML) techniques. The dataset utilized in this study comes from various sources, including the OpenPhish dataset for phishing URLs, Majestic Million’s 1 million websites for legitimate URLs. To ensure dataset balance, equal proportions of short and long URLs are included. Our approach focuses on minimizing feature redundancy to increase detection accuracy and reduce computational complexity. We select a critical subset of features that are most relevant for phishing detection, optimizing both performance and dataset size. Furthermore, we apply various classifiers, including Logistic Regression, K-Nearest Neighbors Classifier, Gradient Boosting Classifier, AdaBoost Classifier and Hybrid Machine Learning Model, to identify the most effective algorithm for detecting phishing websites with a focus on the features of classification to facilitate immediate detection. This study's core involves assessing the efficacy of various ML algorithms and feature sets. We measure each classifier's accuracy, precision, recall, and F1 score to determine the optimal combination for phishing detection. Additionally, we prioritize predicting speed to ensure real-time detection capabilities. The proposed system aims to address existing limitations in phishing detection, such as low latency and lack of comparative analysis between algorithms. By leveraging a diverse dataset and optimizing feature selection, we aim to develop a robust and efficient phishing detection model capable of accurately identifying malicious URLs while minimizing false positives.