Abstract:
Liver illnesses constitute a serious worldwide health issue that are impacted by various risk factors, including genetic, environmental, and lifestyle components. Early identification is crucial to avoid serious health repercussions. This article proposes a machine learning-based approach to detect important risk variables and anticipate liver disease using SHAP (SHapley Additive exPlanations) values. Using a vast dataset, we utilize a number of machine learning algorithms, such as Random Forest, K-Nearest Neighbor, Support Vector Machines, Logistic Regression, Navie Bayes, Decision Tree, and XGBoost, to boost prediction accuracy. The procedure includes tight data preparation, feature selection, and model evaluation applying performance criteria including precision, recall, F1-score and accuracy. With an average ROC-AUC rate of 99.9%, the XGBoost model fared the best, closely followed by the Random Forest model with a ROC-AUC rate of 99.3%. SHAP values are used to interpret each characteristic's contribution and provide information about the important risk variables. Our findings suggest that incorporating SHAP data considerably boosts the model's interpretability and performance. The objective of this study is to give clinicians a reliable tool for early identification of individuals who are at risk, so that timely and personalized drugs may be administered to lessen the symptoms of liver problems.