Recurrent Neural Networks come into picture when any model needs context to be able to provide an output based on the input. When we are dealing with Time series data, we know the future values are affected by the past values.
The RNN networks have a “memory”, it is able to remember what it has calculated so far. In order to achieve this, the RNN network has a loop which allows information to persist over time. While we input \(X_t\) and an output \(Y_t\) is generated at time ‘t’, the RNN also generates an internal state update \(h_t\) which contains information the internal state. This information is passed on to the next time step ‘t + 1’. Such networks with loops in them are called Recurrent because information is passed from one time step ‘t’ to another ‘t + 1’, internally within the network.
If we look at the unrolled version above. First, it takes x(0) from the data and then outputs y(0) while generating h(0). In the next time step, h(0) and x(1) are input to generate x(2). This way, it keeps remembering context while training.
The RNN, by using a simple recurrence relation \(h_t = f_w(h_{t-1}, x_t)\) is able to process sequential data. At each time step t, the RNN generates the internal state \(h_t\) which is a function of the previous internal state and current input. It then applies a function ‘f’ which is parameterised by weights w. This function ‘f’ is the activation function say sigmoid or tanh or ReLu.
Along with output \(Y_t\) the network also generates an error/loss \(L_t\) at every time step. We can sum these individual losses to get a Total Loss \(L\). This Total Loss \(L\) should be minimized so that we have a good fit. For this, the RNN network, backpropagates errors at each individual time step and finally across all time steps all the way back from where we are currently to the beginning of the sequence. This process is called BPTT because all errors are flowing back in time to the beginning of our data sequence.
We need to manipulate the data into a format which the Neural Network can work with. First, we use quantmod package to create 12 time lagged attributes (12 because the data is monthly).
After selecting necessary lags, we can take Log to sort of reduce the magnitude of the series and then Normalize the data to bring them in the range of 0 and 1.
= read.csv("data.csv")[,3]
data = read.csv("data.csv")[,1]
date
require(quantmod)
= as.zoo(data)
x
# Selecting Lags
= Lag(x , k=1)
x1 = Lag(x , k=2)
x2 = Lag(x , k=3)
x3 = Lag(x , k=4)
x4 = Lag(x , k=5)
x5 = Lag(x , k=6)
x6 = Lag(x , k=7)
x7 = Lag(x , k=8)
x8 = Lag(x , k=9)
x9 = Lag(x ,k=10)
x10 = Lag(x ,k=11)
x11 = Lag(x ,k=12)
x12 = cbind(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x)
x
# Log-transformation
= log(x)
x
# Missing Value Removal
= x [-(1:12),]
x = data.matrix(x)
x
# Normalization
= min(x) #12.57856
min.xng = max(x) #14.10671
max.xng = (x - min(x))/(max(x) - min(x))
x = function(x,xmax, xmin){ x*(xmax-xmin)+ xmin }
unscale #
= as.matrix(x [,1])
x1 = as.matrix(x [,2])
x2 = as.matrix(x [,3])
x3 = as.matrix(x [,4])
x4 = as.matrix(x [,5])
x5 = as.matrix(x [,6])
x6 = as.matrix(x [,7])
x7 = as.matrix(x [,8])
x8 = as.matrix(x [,9])
x9 = as.matrix(x[,10])
x10 = as.matrix(x[,11])
x11 = as.matrix(x[,12])
x12 = as.matrix(x[,13])
y #
= 224
n.train
#Preparing Train dataset
= as.matrix(y [1:n.train])
ytrain = as.matrix(t(x1 [1:n.train,]))
x1_train = as.matrix(t(x2 [1:n.train,]))
x2_train = as.matrix(t(x3 [1:n.train,]))
x3_train = as.matrix(t(x4 [1:n.train,]))
x4_train = as.matrix(t(x5 [1:n.train,]))
x5_train = as.matrix(t(x6 [1:n.train,]))
x6_train = as.matrix(t(x7 [1:n.train,]))
x7_train = as.matrix(t(x8 [1:n.train,]))
x8_train = as.matrix(t(x9 [1:n.train,]))
x9_train = as.matrix(t(x10[1:n.train,]))
x10_train = as.matrix(t(x11[1:n.train,]))
x11_train = as.matrix(t(x12[1:n.train,]))
x12_train #
= array(c(x1_train,x2_train,x3_train,x4_train,x5_train,x6_train,
x_train
x7_train,x8_train,x9_train,x10_train,x11_train,x12_train)dim = c(dim(x1_train),12))
,
#Preparing Test dataset
= as.matrix(t(x1 [(n.train+1):nrow(x1), ]))
x1test = as.matrix(t(x2 [(n.train+1):nrow(x2), ]))
x2test = as.matrix(t(x3 [(n.train+1):nrow(x3), ]))
x3test = as.matrix(t(x4 [(n.train+1):nrow(x4), ]))
x4test = as.matrix(t(x5 [(n.train+1):nrow(x5), ]))
x5test = as.matrix(t(x6 [(n.train+1):nrow(x6), ]))
x6test = as.matrix(t(x7 [(n.train+1):nrow(x7), ]))
x7test = as.matrix(t(x8 [(n.train+1):nrow(x8), ]))
x8test = as.matrix(t(x9 [(n.train+1):nrow(x9), ]))
x9test = as.matrix(t(x10[(n.train+1):nrow(x10),]))
x10test = as.matrix(t(x11[(n.train+1):nrow(x11),]))
x11test = as.matrix(t(x12[(n.train+1):nrow(x12),]))
x12test = as.matrix(y [(n.train+1):nrow(x12 )])
ytest = array(c(x1test,x2test,x3test,x4test,x5test,x6test,
xtest
x7test,x8test,x9test,x10test,x11test,x12test),dim = c(dim(x1test), 12))
require(rnn)
The model is estimated using the trainr function in the RNN package. It requires the attribute data be passed as a 3 dimensional array. The first dimension is the number of samples, the second dimension time, and the third dimension captures the variables. This is reflected by xtest object.
We use set.seed(2018) through out to allow reproducibility.
The RNN network is specified by network_type = “rnn”.
We will build 3 models with different values of hidden_dim which tells how many layers and neurons are there in the network. All models have the same learning rate of 0.05 and sigmoid activation function.
#
set.seed(2018)
= trainr(Y = t(ytrain), X = x_train , learningrate = 0.05,
model1 hidden_dim = 15, numepochs = 500,
network_type = "rnn",sigmoid = "logistic")
#
set.seed(2018)
= trainr(Y = t(ytrain), X = x_train , learningrate = 0.05,
model2 hidden_dim = c(9,3) , numepochs = 300,
network_type = "rnn",sigmoid = "logistic")
#
set.seed(2018)
= trainr(Y = t(ytrain), X = x_train , learningrate = 0.05,
model3 hidden_dim = c(5,2) , numepochs = 550,
network_type = "rnn",sigmoid = "logistic")
As the model is training, it output an error figure. Here we plot these errors and iteration index.
= t(model1 $error)
error1 = t(model2$error)
error2 = t(model3$error)
error3 #
par(mfrow=c(3,1))
plot(error1, type="l", main="Iterative Errors")
plot(error2, type="l", main="Iterative Errors")
plot(error3, type="l",main="Iterative Errors")
We use the _predictr_ function to make predictions.Now because we had taken log and normalized, the predicted values are not in original but log-scaled units.
To convert them back to original units, we take exponential and unscale them using a user-define function. We do the same with the actual test data values as well.
# Prediction
= t(predictr(model1, xtest))
pred1test = exp(unscale(pred1test, max.xng, min.xng))
pred1.act = ts(matrix(pred1.act), end = c(2020,8), frequency = 12)
pred1.act
= t(predictr(model2, xtest))
pred2test = exp(unscale(pred2test, max.xng, min.xng))
pred2.act = ts(matrix(pred2.act), end = c(2020,8), frequency = 12)
pred2.act
= t(predictr(model3, xtest))
pred3test = exp(unscale(pred3test,max.xng, min.xng))
pred3.act = ts(matrix(pred3.act), end = c(2020,8), frequency = 12)
pred3.act
# Actual Data
= exp(unscale(ytest , max.xng, min.xng))
y.act = ts(matrix(y.act), end = c(2020,8), frequency = 12)
y.act
# Final Output
= cbind(date[237:248], # Prediction Dates
result round(y.act,2), # Test data
round(pred1.act,2),# Model.1
round(pred2.act,2),# Model.2
round(pred3.act,2))# Model.3
We can say that all three models have more or less captured the underlying dynamics of the data.
library(Metrics)
round(c( mape(result$actual, result$model1),
rmse(result$actual, result$model1),
mape(result$actual, result$model2),
rmse(result$actual, result$model2),
mape(result$actual, result$model3),
rmse(result$actual, result$model3) ), 5)
## [1] 0.08477 88893.05230 0.07977 96615.96953 0.08900 99173.33448
U.S. Consumption of Electricity Generated by Natural Gas | ||||
---|---|---|---|---|
Date | Actual1 | model_11 | model_21 | model_31 |
Sep 1, 2020 | 1006071.2 | 1104944.6 | 943044.0 | 1077797.8 |
Oct 1, 2020 | 924056.2 | 929203.7 | 925937.0 | 947461.9 |
Nov 1, 2020 | 737935.2 | 882783.3 | 869885.1 | 885618.1 |
Dec 1, 2020 | 839912.6 | 865456.6 | 872521.1 | 838168.7 |
Jan 1, 2021 | 833783.3 | 891370.7 | 871471.2 | 883557.2 |
Feb 1, 2021 | 759358.2 | 873895.0 | 850812.1 | 841373.4 |
Mar 1, 2021 | 715165.1 | 803378.6 | 767081.0 | 767787.8 |
Apr 1, 2021 | 724125.8 | 729239.0 | 722215.3 | 759326.4 |
May 1, 2021 | 787027.2 | 844170.5 | 816924.2 | 868377.9 |
Jun 1, 2021 | 1051774.8 | 996437.3 | 949022.0 | 963258.8 |
Jul 1, 2021 | 1199673.3 | 1077251.3 | 1029848.8 | 1035439.0 |
Aug 1, 2021 | 1223328.0 | 1089798.1 | 1031520.2 | 1033907.2 |
Source: US Energy Information Adminstration | ||||
1
Thousand Mcf
|