Abstract:
This paper will anticipate depression among people by analyzing various demographic, social and economic variables that are a combination of their everyday life and economic statuses. The variables provided in the dataset are sex, age, marital status, family size, education level, asset conditions, source of income and spending, and investment behavior, all that were chosen since they have an indirect way of showing the level of emotional and psychological stress. The data was refined properly before the analysis proper, in terms of filling in missing values, coding categorical variables, and scaling the numerical variables to ensure similarity across the variables. There were 14 ML models used on both the original dataset and a SMOTE-balanced version so that there was equal treatment of both the imbalanced and balanced performance. Accuracy, precision, recall, F1-score and AUC-ROC were used to evaluate each model. Random Forest, XGBoost, LightGBM, Stacking, and Voting Classifier algorithms were the most useful and have demonstrated the greatest accuracy at 0.9755 on the original dataset as well as high precision and F1-scores. Ensemble-based models also performed well in the SMOTE dataset, with LightGBM and the Random Forest achieving more than 0.97 accuracy. These results show that prediction of depression is very effective when there is strong ensemble learning, and the use of structured socioeconomic and lifestyle-based data is employed. The general findings show that ML can be used to early detect depression risks, particularly in settings where the psychological assessment resources are scarce.