Abstract:
Drug addiction is the incapability to refrain from consuming a legal or illegal chemical,
drug, activity, or substance despite harmful consequences. It can lead to a comprehensive
range of complications that harm personal relationships, professional goals, and overall
health. It is one of the deadliest problems for a country like Bangladesh, where there are a
large number of young people. Thus, we need to keep an eye on the young generation of
our country before getting addicted to drugs. We must take efficient steps to facilitate the
prevention of drug addiction. In this paper, we will predict the risk of any individual
towards drug addiction using machine learning classification algorithms. First, we studied
some related journals, papers and then talked to doctors, counselors, and drug-addicted
people. As a result, we found some primary risk factors for addiction to drugs. Then we got
a dataset from Kaggle based on the risk of drug addiction, but there was not enough data to
use in the study. That's why we create a questionnaire according to each feature of the
Kaggle dataset. We collected data from a couple of drug rehabilitation centers in Dhaka,
Bangladesh, such as FERA Rehabilitation Center, AMI Addiction Management Institute,
etc. We also collected data from a few Colleges and Universities. Our dataset includes some
notable features such as age, gender, various psychological problems, lack of family ties,
satisfaction in workplace or education, living with drug users, the influence of friends, and
staying at a friend's house at night, etc. Our dataset contains both addicted and non-addicted
samples. Our research has two outcomes: one is "Yes' means addicted, and the other is 'No'
means non-addicted. After collecting the data, we processed all the data and got a processed
dataset. Then we applied six machine learning algorithms to our processed dataset and
compared the result of each algorithm. The algorithms we incorporated are Logistic
Regression, Decision Tree, Random Forest, Naive Bayes, Support Vector Machine (SVM),
and k-Nearest Neighbor (kNN). Among the algorithms, Naive Bayes came up with the
highest accuracy of 90.9%, and Decision Tree delivered the least of which is 77.68%.
Moreover, using a feature selection technique called chi-square, we got the most influential causes of drug addiction.