Smart Meter Forecasting with Prophet Algorithm and PySpark MLlib Pipeline
Project Overview
This project focuses on building a scalable and accurate time series forecasting pipeline by integrating Apache Spark and Facebook Prophet to predict the future total energy cost recorded by Smart KWH hardware sensors.
Leveraging Apache Spark, the project efficiently handles large volumes of sensor data using distributed computing for high-performance data processing. Custom Spark MLlib pipelines were used for data cleaning, outlier removal, and formatting to ensure the dataset is optimized for forecasting.
Facebook Prophet, a robust time series forecasting model, is then applied individually to each sensor (DSN) to model and predict future energy costs. The model accommodates seasonality, trend shifts, and holiday effects, making it suitable for complex and non-stationary data.
Each model is evaluated using metrics including MAE, MSE, RMSE, and MAPE, with visualizations provided to compare predicted values against actual historical data. Among all sensors, the model for SMART KWH SBY SN3 achieved the best performance with the lowest MAPE of 5.34%.
By combining Apache Spark's large-scale data processing capabilities with Prophet's forecasting strength, the project demonstrates a highly efficient and reliable approach to predictive analytics in the energy domain. This approach empowers organizations to optimize energy management, reduce operational costs, and make informed, data-driven decisions.
Features
