Abstract:
This thesis develops and evaluates an end-to-end machine learning pipeline for forecasting next-day realized volatility and volatility spikes in six major cryptocurrencies ADA, BTC, DOGE, ETC, ETH and LINK using hourly OHLCV data aggregated to daily frequency. The pipeline performs data cleaning, feature engineering, temporal splitting, model training and diagnostic reporting, creating a rich feature set that includes HAR style lagged realized volatility, rolling statistics, session-specific volatility, market-relative indicators and cyclical time encodings. Regression models (HAR baseline, Ridge, Elastic Net, Random Forest, Gradient Boosting and Histogram-based Gradient Boosting) are trained to predict next-day realized volatility, while parallel classification models (Logistic Regression, Random Forest, Gradient Boosting and Histogram-based Gradient Boosting) identify extreme volatility spikes defined by rolling 90th-percentile thresholds. Models are estimated on a strictly chronological split with training up to 2022, validation in 2023 and out-of-sample testing from 2024 onward, with optional walk-forward cross-validation to assess temporal robustness. The best regression model, HistGradientBoosting, attains a test RMSE of 0.00382 and R2=0.093, while the best classifier, HistGradientBoostingClassifier, achieves an F1 score of 0.38 and ROC-AUC of 0.75, with substantially higher F1 in high-volatility regimes (0.60). These forecasts are embedded in an inverse-volatility trading strategy that overlays a moving-average directional signal, producing a 7.98% cumulative return, annualized Sharpe ratio of 0.52 and maximum drawdown of 10.9% on the test period. Per-coin and per-regime diagnostics reveal heterogeneous predictability, with stronger performance for smaller coins such as DOGE and ADA and during turbulent market conditions. Overall, the thesis contributes atransparent, reusable multi-asset pipeline that links machine learning-basedcryptocurrency volatility forecasts to economically interpretable, risk-adjusted trading performance.