dc.contributor.author |
Karim, Rafat |
|
dc.contributor.author |
Mahmud, Md. Rifat |
|
dc.contributor.author |
Maksuda |
|
dc.contributor.author |
Jannatus Saiyem, MD. |
|
dc.date.accessioned |
2020-10-12T08:12:09Z |
|
dc.date.available |
2020-10-12T08:12:09Z |
|
dc.date.issued |
2019-12-10 |
|
dc.identifier.uri |
http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/4661 |
|
dc.description |
Data is considered to be the oil or fuel for the next generation. Data mining is one of
the most widely used methods to extract hidden information from large datasets. The
main goal of mining is knowledge discovery from databases, which in known as KDD.
Mining and discovery is quite similar in the domain of data mining. We discover
knowledge by doing mining from the databases.
Now the question is how to learn from the dataset. The answer is that there is some
classification algorithm for the data mining filed and these are Support Vector Machine
algorithm, Decision tree algorithm, Artificial Neural Network algorithm and
Adaboost algorithm etc. We train the dataset using these algorithms which
classifies for us. Depending on the same or different scenarios, these algorithms'
accuracy could be different. Most of the time this problem occurs for bi class datasets,
and it also can occur for multi-class datasets as well. Another important term is
supervised and unsupervised learning. In supervised learning class labels are known
and at unsupervised learning class labels are unknown. And about our dataset this is
supervised learning because which mentioned algorithm we have used those are best
for supervised learning. |
en_US |
dc.description.abstract |
Because of the expansion of social media and globalization now a days, peta byte
scale of data is being generated in every second. Data mining is the process of
extracting knowledge from this huge amount of data. Data mining applications are
becoming more useful and key pre-requisite for any kind of business scenarios.
However, for certain applications is supervised learning, lack of sufficient data for
certain classes creates data imbalance problem. For example, in a credit card fraud
detection application, most of the transactions are not fraud and few of them are
fraud. In our research, we have applied some classification techniques on an
imbalanced data set. We have tested synthetic data from a financial payment system
because it is a great challenge to obtain real dataset. Synthetic data is artificially
constructed which mimics real world events. We have tested Decision tree, Support
Vector Machine, Artificial Neural Network and Adaboost algorithms to treat with
class imbalance problem. Among these algorithms, we find promising accuracy from
Adaboost compared of others. So in this paper, our main target is that for an
imbalance dataset which classification algorithm performs better. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Daffodil International University |
en_US |
dc.subject |
Data processing |
en_US |
dc.subject |
Data Mining |
en_US |
dc.subject |
Making Change (Money Transaction) |
en_US |
dc.title |
Imbalance Data Classification to Identify Fraudulent Transactions |
en_US |
dc.type |
Other |
en_US |