Abstract:
This work offers a data intensive framework for assessing seedling growth in various soil media by predicting plant performance from key environmental conditions. The farming yields are hampered by poor utilization of data and conventional selection of soil resulting in trial-anderror soil selection. To resolve this issue, plant growth measurements were recorded from three different plant varieties: Lal Shak(Red Amaranth), Dhoniya(Coriander) and Lau(Bottle Gourd), which were cultivated in various soil mediums. Environmental factors such as temperature, humidity, luminosity, soil moisture, and NPK nutrient levels were recorded using digital sensors. The Linear Regression, Random Forest, and XGBoost machine learning algorithms were trained for predicting the growth of the plants. Among the models, XGBoost achieved the highest predictive accuracy (R² = 0.91). However, the Random Forest model (R² = 0.89) was selected as the optimal model for this study. This decision was based on its excellent balance of high accuracy, superior interpretability, and robustness against overfitting, making it a more practical and reliable tool for generating actionable insights. The Linear Regression baseline model fell short because it could not learn complex interactions. The research discovered that the most important factors were the levels of nutrients, light, and moisture in the soil, and the optimum medium of choice varied among the various varieties. The results establish how predictive models can inform soil and growth condition choice based on data and thereby enhance agricultural performance. The contribution of this work is a proof of concept framework with potential for scalability and practical for smallholder farmers and agricultural researchers to optimize seedling cultivation using data analytics instead of manual observation alone.