Abstract:
Hormone-Bounding Proteins (HBPs) play a critical role to maintain the movement, stability, and introduction of hormones. Despite the assistance of the existing computer programs, it remains difficult to get the correct predictions of HBP due to the presence of duplicate datasets, the absence of diversity in features, and ineffective generalization of the models. In this paper, the authors introduce an Ensemble Learning Framework optimized using Differentiated Evolution (DE) that is able to identify HBPs directly based on protein sequences at a high degree of accuracy. We used a complete benchmark data to compare the performance of the model. We examined 6 possible options of encoding features and Dipeptide Composition (DPC) was found to be the most successful in distinguishing between them. We assembled a weighted combination of seven various machine learning classifiers.DE was used to identify the best weights. The final model performed well on a test set, and had an accuracy of 98.00, sensitivity of 100 percent and an MCC of 0.9608. It is a cheap and highly accurate computational approach that is an excellent choice to conduct a large-scale proteomic analysis and identify biomarkers related to hormones.