Imbalance Data Classification to Identify Fraudulent Transactions

Karim, Rafat; Mahmud, Md. Rifat; Maksuda; Jannatus Saiyem, MD.

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

dc.contributor.author	Karim, Rafat
dc.contributor.author	Mahmud, Md. Rifat
dc.contributor.author	Maksuda
dc.contributor.author	Jannatus Saiyem, MD.
dc.date.accessioned	2020-10-12T08:12:09Z
dc.date.available	2020-10-12T08:12:09Z
dc.date.issued	2019-12-10
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/4661
dc.description	Data is considered to be the oil or fuel for the next generation. Data mining is one of the most widely used methods to extract hidden information from large datasets. The main goal of mining is knowledge discovery from databases, which in known as KDD. Mining and discovery is quite similar in the domain of data mining. We discover knowledge by doing mining from the databases. Now the question is how to learn from the dataset. The answer is that there is some classification algorithm for the data mining filed and these are Support Vector Machine algorithm, Decision tree algorithm, Artificial Neural Network algorithm and Adaboost algorithm etc. We train the dataset using these algorithms which classifies for us. Depending on the same or different scenarios, these algorithms' accuracy could be different. Most of the time this problem occurs for bi class datasets, and it also can occur for multi-class datasets as well. Another important term is supervised and unsupervised learning. In supervised learning class labels are known and at unsupervised learning class labels are unknown. And about our dataset this is supervised learning because which mentioned algorithm we have used those are best for supervised learning.	en_US
dc.description.abstract	Because of the expansion of social media and globalization now a days, peta byte scale of data is being generated in every second. Data mining is the process of extracting knowledge from this huge amount of data. Data mining applications are becoming more useful and key pre-requisite for any kind of business scenarios. However, for certain applications is supervised learning, lack of sufficient data for certain classes creates data imbalance problem. For example, in a credit card fraud detection application, most of the transactions are not fraud and few of them are fraud. In our research, we have applied some classification techniques on an imbalanced data set. We have tested synthetic data from a financial payment system because it is a great challenge to obtain real dataset. Synthetic data is artificially constructed which mimics real world events. We have tested Decision tree, Support Vector Machine, Artificial Neural Network and Adaboost algorithms to treat with class imbalance problem. Among these algorithms, we find promising accuracy from Adaboost compared of others. So in this paper, our main target is that for an imbalance dataset which classification algorithm performs better.	en_US
dc.language.iso	en	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Data processing	en_US
dc.subject	Data Mining	en_US
dc.subject	Making Change (Money Transaction)	en_US
dc.title	Imbalance Data Classification to Identify Fraudulent Transactions	en_US
dc.type	Other	en_US