This package can be use to specify and estimate Dynamic Factor Models in a very efficient way to provide consistent forecasts. Recent version of the package also includes news analysis. Analyzing news, which are defined as the discrepancy between the newly released figures and its forecasts, helps to interpret forecast revisions. As mentioned by Banbura and Modugno (2010), it enables us to produce statements like “the forecast was revised up by … because of higher than expected release of …”.
This R package uses the efficient libraries of JDemetra+ v3. The way the package was conceived is inspired by the GUI add-in developed for JDemetra+ V2 and it provides about the same functionality (except for the real-time simulation), but in the flexible R environment.
This package relies on the specific Java libraries of JDemetra+ v3 and on the package rjd3toolkit of rjdverse. Prior the installation, you must ensure to have a Java version >= 17.0 on your computer. If you need to use a portable version of Java to fill this request, you can follow the instructions in the installation manual.
In addition to a Java version >= 17.0, you must have a recent version of the R packages rJava (>= 1.0.6) and RProtobuf (>=0.4.17) that you can download from CRAN.
The package rjd3nowcasting depends on the package rjd3toolkit that you must install from GitHub beforehand.
# To get the current stable version (from the latest release):
### install.packages("remotes")
remotes::install_github("rjdverse/rjd3toolkit@*release")
remotes::install_github("rjdverse/rjd3nowcasting@*release", build_vignettes = TRUE)
# or to get the current development version from GitHub:
remotes::install_github("rjdverse/rjd3nowcasting")
Note that depending on the R packages that are already installed on your computer, you might also be asked to install or re-install some other packages from CRAN.
Once the package is loaded, there are four steps to follow:
Detailed information concerning each step follows below the example.
# Quick start example
## 1. Data
set.seed(100)
data0 <- stats::ts(
data = matrix(rnorm(500), 100, 5),
frequency = 12,
start = c(2010, 1)
)
data0[100, 1] <- data0[99:100, 2] <- data0[(1:100)[-seq(3, 100, 3)], 5] <- NA
data1 <- stats::ts(
data = rbind(data0, c(NA, NA, 1, 1, NA)),
frequency = 12,
start = c(2010, 1)
)
data1[100,1] <- data1[99,2] <- 1
## 2. Create or update the model
### Create model from scratch
dfm0 <- create_model(nfactors=2,
nlags=2,
factors_type = c("M", "M", "YoY", "M", "Q"),
factors_loading = matrix(data = TRUE, 5, 2),
var_init = "Unconditional")
### Update model
# ! Recall: due to potential presence of local minimum and lack of
# identification issue, it is always better to start from a previously
# estimated model when available.
est0 <- estimate_em(dfm0, data0) # cfr. next step
dfm1 <- est0$dfm # R object (list) to potentially save from one time to another
# or, equivalently,
dfm1 <- create_model(nfactors=2,
nlags=2,
factors_type = c("M", "M", "YoY", "M", "Q"),
factors_loading = matrix(data = TRUE, 5, 2),
var_init = "Unconditional",
var_coefficients = est0$dfm$var_coefficients,
var_errors_variance = est0$dfm$var_errors_variance,
measurement_coefficients = est0$dfm$measurement_coefficients,
measurement_errors_variance = est0$dfm$measurement_errors_variance)
## 3. Estimate the model
est1 <- estimate_ml(dfm1, data1)
# or est1<-estimate_em(dfm1, data1)
# or est1<-estimate_pca(dfm1, data1)
## 4. Get results
rslt1 <- get_results(est1)
print(rslt1)
fcst1 <- get_forecasts(est1, n_fcst = 2)
print(fcst1)
plot(fcst1)
news1 <- get_news(est0, data1, target_series = "Series 1", n_fcst = 2)
print(news1)
plot(news1)
This step is external to the package. Recall that DFM require all
input data to be stationary. Once the data have been prepared
accordingly and imported in R, it is required to create a time-series
object with the data by using the well-known stats::ts()
function like in the example.
In case of dynamic work, the columns of the dataset should remain the same from one time to another and in the same order. Only additional rows can be added reflecting the new data coming in.
The function create_model()
enables you to build a new
model.
The state-space representation of Dynamic Factor Model can be written
as follows $$
\begin{aligned}
y_t &= Z f_t + \epsilon_t, \quad \epsilon_t \sim N(0, R_t) \\
f_t &= A_1 f_{t-1} + ... + A_p f_{t-p} + \eta_t, \quad \eta_t \sim
N(0, Q_t)
\end{aligned}
$$ where the measurement equation links the observations to the
underlying factors. Those factors, as shown in the second equation,
follow a VAR process of order p. The number of factors to consider and
the order p of the VAR process are to be defined in the first two
arguments of the function create_model()
.
The third argument factors_type
defines the link between
the series and the factors (Z matrix). This link can be more or less
sophisticated depending on the variables. Three options are possible for
the moment:
A variable expressed in terms of monthly growth rates can be linked to a factor representing the underlying monthly growth rate of the economy by defining the factor type as “M” for this variable (default).
A monthly or quarterly variable that is correlated with the the underlying quarterly growth rate of the economy can be linked to a weighted average of the factors representing the underlying monthly growth rate of the economy. Such a weighted average is meant to represent quarterly growth rates, and it can be implemented by defining the factor type as “Q” for this variable.
A variable can also be linked to the cumulative sum of the last 12 monthly factors. If the model is designed in such a way that the monthly factors represent monthly growth rates, the resulting cumulative sum boils down to the year-on-year growth rate. Thus, variables expressed in terms of year-on-year growth rates or surveys that are correlated with the year-on-year growth rates of the reference series should be linked to the factors in this way. The factor type should be defined as “YoY” in this case.
The fourth and last compulsory argument refers to the factors loading that can incorporate zero restrictions. Users must mention there which factors load on which variables.
The argument var_init
tells whether the first unobserved
factors values should be defined considering the unconditional
distribution (recommended) or should be set equal to zero.
The last four arguments var_coefficients
,
var_errors_variance
, measurement_coefficients
and measurement_errors_variance
can be used to create a
model based on a previous estimate of the model (see section Update an
existing model). The default value of those four arguments is NULL
meaning that the model will be created from scratch.
In case of dynamic work, a similar model was previously estimated
based on an older version of the data. In that case, it is recommended
not to create a new model from scratch but to start from the previously
estimated model. For that, it must be made recoverable from the previous
time. One option is to save the required information from one time to
another using the base function saveRDS()
(see section 3 to
know what exactly should be saved). Reasons for starting from a
previously estimated model when available are faster convergence during
the estimation step and the possibility to avoid running into another
local minimum, resulting in parameters estimates that could potentially
be very different from the previous time (especially since the model is
not fully identifiable).
To generate a new model from a previously estimated one, there are two possibilities:
Set the new R object directly from the previous one, or
Use the function create_model()
while filling the
arguments var_coefficients
,
var_errors_variance
, measurement_coefficients
and measurement_errors_variance
with their previously
estimated values.
The function create_model()
returns a R object called
‘JD3_DfmModel’. This is just a list of six elements that fully
characterize the model. The list includes the estimated coefficient of
the VAR equation and the variance-covariance matrix of the error terms,
the estimated coefficient of the measurement equation and the
idiosyncratic variance of the error terms, the type of initialisation
and the link to consider between the series and the factor (i.e. the
argument factors_type
). This is a R list of matrices and
vectors that can easily be saved from one time to another using for
example the function saveRDS()
.
Parameters can be estimated using different algorithms. One of the three available functions should be picked for the purpose of estimation:
estimate_pca()
estimates the model
parameters using only Principal Component Analysis (PCA). Although this
is fast, this approach is not recommended, especially if some series are
quarterly series or series associated to year-on-year growth rates (see
section 2.1).estimate_em()
estimates the model
parameters using the EM algorithm (with initial values given by PCA by
default). The function includes a few optional arguments which can be
used to tune the estimation process.estimate_ml()
estimates the model
parameters by Maximum Likelihood (by default, with initial values given
by the EM algorithm whose initial values are given by PCA). The function
includes several optional arguments which can be used to tune the
estimation process. The function estimate_ml()
is
recommended, although it can be argued that the function
estimate_em()
, which is somewhat faster, also constitutes a
good solution.The three functions have two compulsory arguments which are necessary
to estimate parameters: the model, i.e. an object of class
‘JD3_DfmModel’ typically generated by the create_model()
function, and the dataset which must be a mts
object. All
three functions return the same R object, an object of class
‘JD3_DfmEstimates’ that can be used as input for the results functions
(see section 4). Note that the returned object is just a R list
containing various elements.
In addition to the selected algorithm, estimation speed depends on the size of the model. Models with one or two factors will be fastly estimated (in a few seconds), also when the number of variables is large. However, the estimation of more complex models may take minutes to converge.
Dynamic factor models require a prior standardization of the data. This is an essential step which can lead to confusion in certain situations. The usual mechanism is quite simple and is divided into three stages:
This means that both the likelihood of the model and the estimates of the parameters, will be given by the transformed data. However, final results like the forecasts and the forecasts errors variance of the transformed series will be converted for the raw data.
By default, the data are standardized. If, for some reasons, your
dataset already contains standardized data, the standardization step can
be skipped by defining standardized = TRUE
in the
estimation function.
We need to pay particular attention to the standardization step when
working dynamically. For instance, if you do not wish to re-estimate the
model (see section 3.3), you must also provide the initial mean and
standard deviation of each variables calculated at the time of the last
estimation of the model. The argument input_standardization
in each estimation function is for that purpose. Note that for news
analysis (see section 4.3), the mean and standard deviation considered
for the standardization step must be the same for the old and the new
datasets. In practice, they are calculated based on the old dataset.
The three estimation functions include a boolean argument
re_estimate
that indicate whether the model should be
re-estimated (default) or not.
Note that for news analysis (see section 4.3), the model is kept
unchanged between the previous and the current period to track the
impact of news. Hence, to retrieve the same forecasts as those return by
the get_news()
function, we should consider
re_estimate = FALSE
and the previous standardization input
should be added in the argument input_standardization
(see
section 3.2).
In case of dynamic work, some R object should be passed from one time
to another (see section 2.2). To do that, the user is invited to use the
functions saveRDS()
and readRDS()
from base
R.
What to save depends whether the intention of the user is to perform news analysis.
If the intention is not to perform news analysis and just to
re-estimate the model each time and update the forecasts, only the
estimated model should be saved from one time to another. This is an
object of class ‘JD3_DfmModel’, generated as part of the output of the
function estimate_pca()
, estimate_em()
or
estimate_ml()
, where the default/previous estimates of the
parameters are replaced by the new ones. The updated model is the
element referred to as ‘dfm’ in the list returned by the estimation
functions.
If the intention is to perform news analysis, the entire object/list
returned by the function estimate_pca()
,
estimate_em()
or estimate_ml()
, i.e. an R
object of class ‘JD3_DfmEstimates’, should be saved. Optionally, a
matrix with the standardization input used at the time of the initial
estimate (i.e. the mean and standard deviation used to standardize data)
can be saved as well. At the time of the initial estimate, the formatted
matrix containing this information can be found in the preprocessing
section of the output of the function get_results()
(see
section 4.1). This could be used for instance to retrieve the
concordance of the forecasts between the functions
get_forecasts()
and get_news()
.
Results are split in three parts.
The function get_results()
can be used to obtain results
related to
The function get_results()
has a single argument which
is an object of class ‘JD3_DfmEstimates’ typically generated by the
function estimate_pca()
, estimate_em()
or
estimate_ml()
. It returns an object of class
‘JD3_DfmResults’ which is a list of the aforementioned output. A generic
print()
function can be applied on its output and returns
(by default) nicely formatted results related to the parameters
estimates.
The function get_forecasts()
can be used to obtain
forecasts of the variables, as well as the forecast errors standard
deviation. You have access to both the forecasts of the transformed
series (see section 3.2) and the raw series. As part of the output list,
there is also extra output referred to as ‘forecasts_only’. Those are
just an extract of the forecasts of the raw series which contains only
the forecasts, i.e. where the rest of the series does not appear
together with the forecasts. Note that for quarterly series (factor type
“Q”), the forecast at the last month of the quarter should be the one
considered. For instance, if the variable under consideration is made of
quarterly growth rates, each forecast figure corresponds to the growth
rate of the last three months compared with the three previous months
(e.g. in August, it is the estimate of the growth rate between
June-July-August and March-April-May).
The function get_forecasts()
has two arguments. One is
an object of class ‘JD3_DfmEstimates’ typically generated by the
function estimate_pca()
, estimate_em()
or
estimate_ml()
. The other is the number of forecasting
periods to consider, starting from the most up-to-date variable.
Two generic functions can be applied to the object returned by the
function get_forecasts()
. A print()
function
will return (by default) the forecasts only. A plot
function can be used to visualize the series and the forecasts as well a
80% prediction interval around the forecasts.
There are two kind of differences between two consecutive updates of a dataset:
The purpose of news analysis is to monitor the impact of (1) on the
forecasts. Those impacts can be scrutinized in details by using the
get_news()
function. This function displays the impact of
the difference between the newly released figures and their forecast
based on the revised figures (i.e. the old data + (2)).
The function get_news()
has four arguments:
estimate_pca()
,
estimate_em()
or estimate_ml()
. As the purpose
of news analysis is to monitor the impact of newly released figures on
forecasts, the model is kept unchanged between the previous and the new
release. Hence, the previously estimated model should be the one
specified here. Note that the pre-standardization of the data (see
section 3.2) is also calculated based on the previous release.mts
objectThe list of output returned by the function get_news()
contains the weights of the news, their impact and the forecasts for
both the transformed (see section 3.2) and the raw series. The weights
given to each news represent their relevance for the variable of
interest. The impacts are the weights of the news times their size. They
give the impact of each piece of news on the forecast revisions of the
variable of interest. Therefore they allow users to understand how the
revisions can be decomposed in terms of the news components for the
various series. The generic plot()
function can be used
directly on object of class ‘JD3_DfmNews’ (i.e. object generated by the
function get_news()
) to quickly visualize all impacts with
a nicely formatted barchart. This is similar to what was included in the
GUI add-in of
JDemetra+ V2.x.
Finally, the forecasts returned by the function
get_news()
include:
In addition to the plot()
function, there are two more
generic functions that can be applied to an object of class
‘JD3_DfmNews’. The function summary()
will give you a
summary of the weights and impacts of each news on the variable of
interest for each forecasting period. The print()
function
returns the same table as the summary()
function together
with the information related to the forecasts.
Banbura, Marta and Modugno, Michele (2010) “Maximum Likelihood Estimation Of Factors Models On Data Sets With Arbitrary Pattern Of Missing Data” Working Paper Series NO 1189 ECB.
De Antonio Liedo, David (2014) “Nowcasting Belgium” Working paper Research NO 256.