dc.description.abstract |
Phishing attacks have emerged as a prevalent method hackers employ to deceive users and get
unauthorized access to their personal information. These attacks aim to deceive users into
revealing sensitive information, such as passwords, credit card information, or social security
numbers. The attackers frequently adopt the personas of reputable organizations, such as banking
institutions, email service providers, or online retailers, to mislead unsuspecting victims. Machine
learning plays a crucial role in phishing attack detection. Researchers have implemented many
solutions based on machine learning. Several web scraping features may hinder the effectiveness
of machine learning algorithms. The reliance on the characteristics depending on third parties
poses challenges for machine learning models in the context of real-time phishing detection. This
paper presents a methodology for recognizing distinct characteristics of URLs not affiliated with
the target website, which may be used to detect fraudulent efforts to get sensitive information
promptly. For our test, we utilized a total of 40,980 URLs obtained from various sources, including
both legitimate and phishing ones. We explored a range of feature selection and the most
appropriate classification ways to detect phishing URLs; out of all the approaches, the Random
Forest classifier produced the most outstanding accuracy of 99.98%. |
en_US |