Abstract:
Phishing attacks remain a critical threat to online security, necessitating the
need for effective detection methods as traditional methods are not much
effective in current challenging world. Where every technology is change so fast.
In this research, we aim to develop an advanced phishing detection system
leveraging machine learning (ML) techniques. The dataset utilized in this study
comes from various sources, including the OpenPhish dataset for phishing URLs,
Majestic Million’s 1 million websites for legitimate URLs. To ensure dataset
balance, equal proportions of short and long URLs are included. Our approach
focuses on minimizing feature redundancy to increase detection accuracy and
reduce computational complexity. We select a critical subset of features that are
most relevant for phishing detection, optimizing both performance and dataset
size. Furthermore, we apply various classifiers, including Logistic Regression,
K-Nearest Neighbors Classifier, Gradient Boosting Classifier, AdaBoost
Classifier and Hybrid Machine Learning Model, to identify the most effective
algorithm for detecting phishing websites with a focus on the features of
classification to facilitate immediate detection. This study's core involves
assessing the efficacy of various ML algorithms and feature sets. We measure
each classifier's accuracy, precision, recall, and F1 score to determine the optimal
combination for phishing detection. Additionally, we prioritize predicting speed
to ensure real-time detection capabilities. The proposed system aims to address
existing limitations in phishing detection, such as low latency and lack of
comparative analysis between algorithms. By leveraging a diverse dataset and
optimizing feature selection, we aim to develop a robust and efficient phishing
detection model capable of accurately identifying malicious URLs while
minimizing false positives.