MLFGCN: short-term residential load forecasting via graph attention temporal convolution network

1 Introduction

With the development of society, human demand for electricity is constantly increasing, among which residential electricity consumption is increasing rapidly. According to the World Energy Outlook 2023 (IEA, 2023), residential electricity consumption accounts for 23% of the world’s total annual electricity consumption, with a growth rate faster than any other energy consumption, and is expected to exceed 45% by 2050. The growing residential load is becoming increasingly important to maintain a balance between electricity supply and demand (Afzalan and Jazizadeh, 2019). In the electricity market, residential load forecasting is crucial for decision-makers to carry out activities such as electricity planning, pricing, power quality assessment and customer behavior analysis (Heydari et al., 2020; Rafati et al., 2020).

With the introduction of the concept of energy Internet, smart solutions such as smart cities, smart grids are constantly promoted. The deployment of new energy equipment, various flexible loads and new energy vehicle charging piles has made it increasingly difficult to maintain a balance between supply and demand in the power grid. Accurate residential load forecasting is the effective way to solve this problem. Residential load forecasting is to explore changing patterns of residential electricity demand and forecast the load values of a certain period in the future, which is crucial for stable operation of the power system. It has been researched for decades and involves various aspects. Among them, short-term residential load forecasting is the key to analyzing user-side demand and provides an important guarantee for the development of daily power generation planning and the safe operation of the power grid. However, compared with grid-level forecasting, residential electricity consumption has higher uncertainty. As shown in Figure 1, the curve of grid-level load is relatively gentle and has strong regularity, which leads to the forecasting more easier. However, for the user-level electricity consumption, the load curve of a single house has strong volatility due to the differences in user lifestyle habits. The uncertainty and randomness make accurate short-term residential load forecasting more challenging (Yang et al., 2022; Yamasaki et al., 2024; Tan et al., 2023), which is the focus of this study.

Figure 1. The difference between grid-level and user-level load.

Short-term residential load forecasting has been studied for decades as a category of time series forecasting. However, existing time series forecasting methods only use temporal features for prediction and cannot fully explore the valuable information in the data. Recent researches have found that there is a certain correlation between different residential load series, which can be utilized to improve the accuracy of load forecasting. With the development of graph neural networks (GNNs), spatiotemporal load forecasting methods based on GNNs have attracted much attention. Although the prediction accuracy has been significantly improved by introducing GNN based models, there are still some shortcomings. Firstly, existing prediction algorithms mainly focus on improving the model structure to more effectively extract the spatiotemporal features of load data, while ignoring the construction of input features. In the process of load forecasting, the construction of input features enable models to capture potential multi-level dependencies in load data more effectively. Secondly, there is a lack of effective graph construction that includes comprehensive and multi-perspective information when learning spatial features.

To address the aforementioned issues, we propose a novel multi-level feature fusion model based on graph attention temporal convolutional network (MLFGCN) for short-term residential load forecasting. The proposed MLFGCN reconstructs the input load data to better capture the periodic, temporal and spatial dependencies. Additionally, we design two types of adjacency matrices to construct a multi-level information graph, which enables the forecasting model to capture features of load data more comprehensively. The main contributions of this paper are as follows:

• We design an feature reconstruction mechanism for input load series considering the temporal correlations and periodic characteristics of the load data. High-quality feature matrix is obtained by feature reconstruction mechanism, which effectively improves the learning ability of the model. For the input data, two types of adjacency matrices are designed to learn the potential multi-level dependencies in load series. Compared with the traditional Euclidean-based adjacency matrix, we introduce fast dynamic time warping (fastDTW) algorithm to generate the similarity adjacency matrices of individual houses and multiple houses, respectively.

• A novel multi-level feature fusion model based on graph attention temporal convolutional network is proposed for short-term residential load forecasting. TCN with gating mechanism is introduced to learn potential long-term dependencies in the original load data. Two graph attention convolutional modules are then designed to capture potential multi-level dependencies. Finally, the outputs of each module are fused through an information fusion layer to obtain the highly accurate forecasting results.

• We conduct validation experiments on two real-world datasets, which demonstrate that our proposed model is always better than the baselines.

The remainder of the paper is organized as follows. Section 2 provides a discussion of related work. In Section 3, we present the framework of our proposed method in detail. The experimental setup and analysis are described in Section 4. Finally, Section 5 provides the conclusion.

2 Related work

Residential load forecasting is more challenging due to its high randomness and volatility, which is very different from grid-level load forecasting (Zheng et al., 2019). In recent years, many residential load forecasting methods have been proposed, which can be divided from different perspectives (as shown in Figure 2). From the perspective of forecasting time scale, residential load forecasting can be divided into ultra short-term forecasting, short-term forecasting, medium-term forecasting and long-term forecasting. From the perspective of modeling method, residential load forecasting can be divided into statistical models and artificial intelligence (AI) forecasting models. From the perspective of spatiotemporal correlations, residential load forecasting methods can be divided into time series forecasting based methods and spatiotemporal forecasting based methods. In this study, we mainly focus on short-term residential load forecasting based on spatiotemporal forecasting methods.

Figure 2. Classification of residential load forecasting methods.

2.1 Time series forecasting

Short-term residential load forecasting has been studied as a time series forecasting problem for many years. The traditional forecasting methods include statistical methods, such as exponential smoothing, auto-regressive moving average (ARMA) (Moon et al., 2021), auto-regressive integrated moving average (ARIMA) (Mahia et al., 2019), and gray model, etc. This type of models is relatively simple, but its accuracy for nonlinear prediction tasks is limited. In recent years, machine learning methods have shown their superiority in capturing temporal correlations and strong generalization ability. Introducing machine learning into load forecasting field can greatly improve the forecasting accuracy (Singh and Mohapatra, 2021; Xia et al., 2023). Shi et al. (2017) proposed a novel pooling-based deep recurrent neural network (DNN) for household load forecasting, which can address the over-fitting problem by increasing the diversity and volume of data. This work made the first attempt to explore the feasibility of deep learning in the application of individual load forecasting and achieved good prediction results. The experimental results showed that the proposed method outperforms SVR by 13.1%, ARIMA by 19.5% and classical deep RNN by 6.5% in terms of RMSE. Chen et al. (2022) proposed a new multi-cycle self-augmented neural network (MultiCycleNet) for household short-term load forecasting. MultiCycleNet learns user’s electricity consumption mode by considering the circular correlation in the load profiles to obtain more accurate forecasting results. The work is the first to use relevant load series considering contextual information from historical data for feature learning of household electricity consumption pattern. The experiments on two publicly available datasets show that the proposed framework outperforms the baselines by 11.14, 9.02, 19.83 and 10.46% in terms of, MAE, MAPE, MSE, and RMSE, respectively. In the recent studies, Transformer-based time series forecasting models have also been introduced into short-term load forecasting research. Ran et al. (2023) proposed a hybrid model incorporating decomposition techniques and Transformer for short-term load forecasting. The proposed model used the mode decomposition techniques to decompose the load data into multiple subseries. Then, these subseries are calculated by sample entropy and recombined based on the principle of combining similar values. The recombined subseries are input into the Transformer model to obtain the final prediction results. Although methods based on time series forecasting have greatly improved the accuracy of short-term residential load forecasting, they mainly focused on temporal correlation (e.g., historical load and weather information) and do not fully consider the spatial correlation of load series.

2.2 Spatiotemporal load forecasting

Recently, some researchers found that the load distribution of different houses also have a high spatial correlation, so the concept of spatial dependence was introduced into short-term load forecasting (Yin and Xie, 2021; Liu and Chen, 2021; Jalali et al., 2021). Tascikaraoglu and Sanandaji (2016) the potential spatial correlation between the electricity load of target house and surrounding houses has been mined and used to improve the accuracy of load forecasting. Sajjad et al. (2020) proposed a hybrid residential load forecasting model combining convolution neural network (CNN) and gated recurrent units (GRU). In the proposed CNN-GRU model, CNN are introduced to extract the spatial features of the input load data. The output of CNN are fed into GRU to get the final forecasting results. Although CNN is an effective model for extracting spatial features, it cannot handle non-Euclidean structure data. It is obvious that users with similar geographical locations may have similar electricity consumption patterns due to the similar external environments and holiday effects. Furthermore, users who are geographically far apart but have similar living habit may also have similar electricity consumption patterns. Therefore, methods based on non-Euclidean distance are more suitable for learning the spatial dependencies in load sequences. Recently, the GNN has attracted much attention due to its powerful capabilities in modeling and feature extraction of non-Euclidean structured data. The spatiotemporal forecasting models based on GNN have been successfully applied in load forecasting (Wang et al., 2022; Wang et al., 2022; Feng et al., 2022). Lin et al. (2021) proposed a spatial–temporal short-term load forecasting model based on GCN. The proposed model adopted self-adaptive graph waveNet framework, which was originally designed for audio generation (Oord et al., 2016). For the proposed model, spatial correlations in load series are captured by GCN with self-adaptive adjacency matrix, temporal correlations are learned by TCN. This work is the first attempt to introduce GCN to capture spatial–temporal correlations in electric load. Cheung et al. (2021) the spatial–temporal GCN (STGCN) method was adopted to capture the spatial and temporal correlations in load data for more accurate forecasting results. Experimental results on dataset collected in Iowa showed that the proposed model exhibited significantly better performance in real load prediction than other baselines. Wei et al. (2023) proposed a novel spatial–temporal embedding GNN (STEGNN) for short-term load forecasting. The proposed model first constructed the directed static graphs and directed dynamic graphs. Then, exponential moving average and GCN are combined to capture the spatial and temporal correlations to obtain accurate load forecasting results. Table 1 summarizes and compares the relevant forecasting methods.

Table 1. Methods comparison between this study and related works.

3 Methodology

Based on the analysis of residential load data, this study proposes MLFGCN model for short-term residential load forecasting. MLFGCN learns potentially dependence from historical load data to obtain high-accuracy future load values without any additional information.

3.1 Problem formulation

We can represent the residential network as a graph G=VEA , where V=" separators=",,"> h1h2…hN , the attention coefficients between two neighbor nodes vi and vj can be expressed as Equation 5:

eij=σWhi,Whj,j∈Ni (5)

where W is weight matrix, j∈Ni,Ni is a set of neighbor nodes of node vi . In order to make attention coefficient easier to calculate and compare, we introduced softmax function to normalize them. It can be written as Equation 6.

aij=softmaxeij=expeijΣk∈Niexpeik (6)

Then, the features are weighted and summed up using attention coefficients.

hi=σ∑j∈NiαijWkhj (7)

In order to stabilize the learning process of self attention, we use multi-head attention to obtain rich representations. Specifically, K independent attention mechanisms execute Equation 7 and then concatenate their features together to achieve the final results.

ĥi=∥k=1Kσ∑j∈NiαijWkhj (8)

In Equation 8, || represents concatenation. The output of GAT can be written as Equation 9:

Zl=A˜Zl−1Wl=Wlĥ (9)

where Z0=X , A˜=As for self-similarity feature leaning and A˜=Ac for cross-similarity feature leaning. Here, we use MaxPooling to manipulate the connections of each hidden state. The output Fs of self-similarity feature learning module and the output Fc of the cross-similarity feature learning module can be written as Equations 10 and 11, respectively:

Fs=softmaxAsReluAsX̂Ws1Ws2 (10) Fc=softmaxAcReluAcXWc1Wc2 (11) 3.5 Temporal convolution module

The TConv module is designed based on gated TCN to obtain long-term temporal dependencies of the load series. As shown in Figure 10, we design a gating mechanism to filter out weak connections and obtain optimized features. Compared to RNN-based neural networks, TCN reduces parameter complexity by using the expanded causal convolution operation. The window size of TCN grows exponentially with the number of layers, which allows a larger receptive field with only a few convolution operations. Let X be the input, the output Fa of the gated TCN can be expressed as Equation 12:

Fa=tanhTCNaΧ⊙σTCNbΧ (12)

where tanh and σ are two different activation functions, TCNa.and TCNb. are two TCNs, ⊙ represents element-wise product.

Figure 10. The structure of the TConv module.

3.6 Information fusion

After the above calculation process, high-dimensional features from GAConv module and TConv module are obtained. Then, we effectively fuse these valuable features to improve the accuracy of load forecasting. We adopt addition for information aggregation to generate the final predictions. The specific calculation process can be written as Equation 13:

Y=αFa+βFc+γFs (13)

where α , β and γ are the learnable parameters.

Finally, we summarize the proposed MLFGCN as shown in Algorithm 2.

ALGORITHM 2 MLFGCN for short-term load forecasting

Input: The load observed data X=X1,X2,⋯XN∈ℝN×T

1. Generate reconstructed input load data X̂ from X;

2. Generate self-similarity adjacency matrix As and cross-similarity adjacency matrix Ac for the load graph G through Algorithm 1;

3. Get the periodic feature Fs by GAConv module using self-similarity adjacency matrix As ,

Fs=softmaxAsReluAsX̂Ws1Ws2 ;

4. Get the interdependent feature Fc by GAConv module using cross-similarity adjacency matrix Ac , Fc=softmaxAcReluAcXWc1Wc2 ;

5. Get the temporal feature Fa by TConv module, Fa=tanhTCNa(X)⊙σTCNb(X)

6. Get the output Y by integrating Fa , Fc and Fs , Y=αFa+βFc+γFs ;

7. Return the output;

8. Calculate the loss of MLFGCN

3.7 Loss function of MLFGCN

There are noise and outliers in the electric load data, which have a negative impact on the prediction results. To address this issue, we select Huber Loss as the loss function. Huber loss function is widely used in regression problems that combines the advantages of mean square error and mean absolute error. Huber loss function is more robust when dealing with outliers and can effectively reduce the influence of outliers on the model. It can be written as Equation 14:

LŶY={12Ŷ−Y2,|Ŷ−Y|≤δδ|Ŷ−Y|−12δ2,|Ŷ−Y|>δ (14)

where δ is hyperparameter to control sensitivity of the loss. Y and Ŷ are the real load values and the predictions, respectively.

4 Experiment and result analysis 4.1 Datasets

In this section, we validate the superiority of the proposed MLFGCN model on several real-world cases and analyze the experimental results.

Case 1: This experimental dataset is from OpenEI (National Renewable Energy Laboratory, 2014), which includes loads for all major types of residential and commercial buildings across all climate regions in the United States. The dataset is collected at 1-h resolution. We demonstrate the effectiveness of the algorithm by randomly selecting 15 houses in Los Angeles (LA).

Case 2: This experimental dataset is from a real power grid in the United States provided by Iowa State University (Bu et al., 2019). The power grid contains 240 nodes from three feeders including 17 nodes in Feeder_A dataset, 60 nodes in Feeder_B, and 163 nodes in Feeder_C. The data of each node are the measurements from the users’ smart meters, which is collected at 1-h resolution.

Table 2 summarizes the characteristics of these datasets. We first preprocess the sample data and use z-score normalization to normalize the load data.

Χz=Χ−meanΧstdΧ (15)

In Equation 15, meanΧ and stdΧ are the mean value and the standard deviation of the historical load series, respectively.

Table 2. Summary of the experimental datasets.

4.2 Evaluation metrics

The mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean square error (RMSE) are used to evaluate the accuracy of the proposed model. For them, the lower the value, the better the forecasting performance. MAE, MAPE and RMSE are defined as:

MAE=1n∑t=1n|yt−ŷt| (16) MAPE=1n∑t=1n|yt−ŷtyt|×100% (17) RMSE=∑t=1nyt−ŷt2 (18)

In Equations 16, 17 and 18, yt and ŷt refer to the real load values and the predicted load values of the model at time step t, respectively. n is the number of samples.

4.3 Baselines and experimental settings

In this paper, five load forecasting models are selected as the baselines to validate the performance of the proposed MLFGCN model. The baseline models include mainstream load forecasting methods, among which SVR belongs to statistical methods, LSTM is the most commonly used time series forecasting method, CNN-GRU is spatiotemporal load forecasting method based on Euclidean distance, STGCN Multi-hop and Ada GWN are spatiotemporal load forecasting methods based on non-Euclidean distance.

• SVR: Support vector regression (SVR) is a regression method based on support vector machine (SVM), commonly used for time series prediction.

• LSTM: Long short-term memory network (LSTM), which performs well in long time series forecasting.

• CNN-GRU: CNN-GRU model, which is a hybrid model combing CNN and GRU for short-term residential load forecasting.

• STGCN Multi-hop: Spatial–temporal graph convolutional networks (STGCN) with the input graph nodes more than one hop away as neighbors, which is a spatiotemporal model to predict the load consumption values for each customer (Cheung et al., 2021).

• Ada-GWN: Spatial–temporal residential short-term load forecasting network based on Graph WaveNet framework (Lin et al., 2021).

We divide the experimental dataset into training set, validation set and test set in a ratio of 6:2:2. To make a fair comparison with the baseline models, all forecasting models used for experiments are conducted with Pytorch framework on servers under the same configuration. We set the search length of the fastDTW to be 24. Huber loss is selected as the loss function and the Adam optimizer is used for optimization. The learning rate is set to 0.001, the epoch is 200, and the batch size is 32. The parameter settings are the same for all models. We set three TGA blocks for load forecasting, which contains an independent TConv block and two GAConv blocks. Each experimental dataset was evaluated more than 10 times to ensure the accuracy of the results.

4.4 Experimental results and analysis

The experiments are divided into three parts, and the experimental results are discussed in three aspects: performance analysis of the proposed MLFGCN model, impact analysis of the number of houses and ablation experiments. The experimental results show that the proposed MLFGCN model has better prediction performance compared with baseline models.

4.4.1 Performance analysis of MLFGCN

We first evaluate the performance of MLFGCN on case 1. The experimental results are shown in Table 3.

Table 3. Performance comparison of load forecasting models on the LA dataset.

Figure 11 visualizes the results for three metrics MAPE, RMSE, and MAE, respectively. It can be seen that, compared with the traditional SVR model, MAE, MAPE and RMSE values of MLFGCN model decreases by 70.93, 27.74, and 72.68%. Although SVR is widely used in time series prediction tasks, there are still limitations when dealing complex nonlinear relationships. At the same time, MLFGCN has higher forecasting accuracy compared with the models dedicated to temporal prediction such as LSTM, because only learning temporal features cannot capture valuable information comprehensively. CNN-GRU, STGCN Multi-hop models and Ada-GWN all consider the spatial–temporal features in load data, but there is still a big gap between them. STGCN Multi-hop and Ada-GWN achieved better prediction results than CNN-GRU because spatial modeling based on non-Euclidean distance is more suitable for power load data. Even so, compared with Ada-GWN, MAE, MAPE and RMSE of MLFGCN model decreases by 12, 7.65, and 14%, respectively. In summary, MLFGCN model proposed in this paper can effectively utilize historical load data information to accurately predict future load values and is superior to the baseline models.

Figure 11. Performance comparison of load forecasting on the LA dataset.

4.4.2 Impact analysis of the number of houses

To analyze the impact of the number of houses on model performance, a real-word dataset from Iowa, USA, was selected for this study. The dataset contains load data of 240 units from three feeders with 17, 60, and 163 houses, respectively. Three baseline models, CNN-GRU, STGCN Multi-hop, and Ada-GWN are selected as the comparison models. The results are shown in Table 4.

Table 4. Comparative experimental results on the dataset of case 2.

Figures 12–15 visualize the experimental results on the datasets of the three feeders: Feeder_A, Feeder_B, and Feeder_C. Feeder_Sum is all load data for the three subregions. It can be seen that the CNN-GRU model performs well in the Feeder_A, with a MAE value only 5.8% higher than MLFGCN. However, in Feeder_B, Feeder_C, and Feeder_Sum, where the number of houses is relatively high, the gap between MLFGCN and the other baselines will become larger and larger as the number of houses increases. Similar to MLFGCN, the prediction accuracy of Ada-GWN also continuously improves with the increase of the number of houses. It can be seen that CNN-GRU is more suitable for the case with a few houses. When the number of houses is small, CNN-GRU has about the same predictive accuracy as MLFGCN. The values of MAE, MAPE, and RMSE of STGCN Multi-hop are stable around 1.7, 27.5, and 3.6 for different number of houses, which indicates that STGCN Multi-hop is minimally affected by the number of houses.

Figure 12. Comparative experimental results on Feeder_A.

Figure 13. Comparative experimental results on Feeder_B.

Figure 14. Comparative experimental results on Feeder_C.

Figure 15. Comparative experimental results on Feeder_Sum.

For the MLFGCN model proposed in this paper, the predictive performance advantage is not significant when the number of houses is small. As the number of houses continues to increase, the performance advantages of MLFGCN gradually become apparent. Especially on the Feeder_Sum dataset, where MAE, MAPE, and RMSE values of MLFGCN model decreases by 26.52, 10.20, and 13.66% compared to CNN-GRU, and decreases by 17.90, 5.25, and 9.20% compared to STGCN Multi-hop. As the number of houses increases, the MLFGCN model can learn richer features by comparing and analyzing load series with similar patterns, which can improve the generalization ability and prediction accuracy of the forecasting model.

4.4.3 Ablation experiments

This section analyzes the necessity of input feature construction and the effectiveness of each part of the proposed model, respectively. The experimental results show that each part of MLFGCN is effective on the prediction results.

Comparison experiments were first conducted on the Feeder_Sum dataset to validate the input feature construction, and experimental results are shown in Table 5. The results show that the MAE, MAPE, and RMSE values of the model with input feature reconstruction decreased by 13.04, 8.82, and 21.65%, respectively, which indicates that modeling the raw input data can improve the forecasting accuracy.

Table 5. Impact of input feature construction on predictive performance of MLFGCN.

Then, we verify the effect of adjacency matrix construction, TConv module and GAConv module on the forecasting performance. We design three variants named MLFGCNI, MLFGCNII, and MLFGCNIII, whose specific configuration are shown in Table 6. MLFGCNI is designed to replace the adjacency matrix construction of MLFGCN with an adaptive adjacency matrix. MLFGCNII and MLFGCNIII are variants of MLFGCN with TConv module or GAConv module removed, respectively, while the rest remain unchanged. The ablation experiments were conducted on both LA and Feeder_Sum datasets. The results are shown in Table 7.

View original article

FRONTIERS IN NEUROROBOTICS

分享书签

0 0 0 0 0 0 0

More from this channel

MLFGCN: short-term residential load forecasting via graph attention temporal convolution network

留言 (0)