Searches through the model space specified in the specials to identify the
best ARIMA model, with the lowest AIC, AICc or BIC value. It is implemented
using stats::arima()
and allows ARIMA models to be used in the fable
framework.
ARIMA(
formula,
ic = c("aicc", "aic", "bic"),
selection_metric = function(x) x[[ic]],
stepwise = TRUE,
greedy = TRUE,
approximation = NULL,
order_constraint = p + q + P + Q <= 6 & (constant + d + D <= 2),
unitroot_spec = unitroot_options(),
trace = FALSE,
...
)
Model specification (see "Specials" section).
The information criterion used in selecting the model.
A function used to compute a metric from an Arima
object which is minimised to select the best model.
Should stepwise be used? (Stepwise can be much faster)
Should the stepwise search move to the next best option immediately?
Should CSS (conditional sum of squares) be used during model
selection? The default (NULL
) will use the approximation if there are more than
150 observations or if the seasonal period is greater than 12.
A logical predicate on the orders of p
, d
, q
,
P
, D
, Q
and constant
to consider in the search. See "Specials" for
the meaning of these terms.
A specification of unit root tests to use in the
selection of d
and D
. See unitroot_options()
for more details.
If TRUE
, the selection_metric of estimated models in the
selection procedure will be outputted to the console.
Further arguments for stats::arima()
A model specification.
The fable ARIMA()
function uses an alternative parameterisation of
constants to stats::arima()
and forecast::Arima()
. While the
parameterisations are equivalent, the coefficients for the constant/mean
will differ.
In fable
, if there are no exogenous regressors, the parameterisation used
is:
$$(1-\phi_1B - \cdots - \phi_p B^p)(1-B)^d y_t = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$$
In stats and forecast, an ARIMA model is parameterised as:
$$(1-\phi_1B - \cdots - \phi_p B^p)(y_t' - \mu) = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$$
where \(\mu\) is the mean of \((1-B)^d y_t\) and \(c = \mu(1-\phi_1 - \cdots - \phi_p )\).
If there are exogenous regressors, fable
uses the same parameterisation as
used in stats and forecast. That is, it fits a regression with ARIMA(p,d,q)
errors:
$$y_t = c + \beta' x_t + z_t$$
where \(\beta\) is a vector of regression coefficients, \(x_t\) is a vector of exogenous regressors at time \(t\), and \(z_t\) is an ARIMA(p,d,q) error process:
$$(1-\phi_1B - \cdots - \phi_p B^p)(1-B)^d z_t = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$$
For details of the estimation algorithm, see the
arima
function in the stats package.
The specials define the space over which ARIMA
will search for the model that best fits the data. If the RHS of formula
is left blank, the default search space is given by pdq() + PDQ()
: that is, a model with candidate seasonal and nonseasonal terms, but no exogenous regressors. Note that a seasonal model requires at least 2 full seasons of data; if this is not available, ARIMA
will revert to a nonseasonal model with a warning.
To specify a model fully (avoid automatic selection), the intercept and pdq()/PDQ()
values must be specified. For example, formula = response ~ 1 + pdq(1, 1, 1) + PDQ(1, 0, 0)
.
The pdq
special is used to specify non-seasonal components of the model.
pdq(p = 0:5, d = 0:2, q = 0:5,
p_init = 2, q_init = 2, fixed = list())
p | The order of the non-seasonal auto-regressive (AR) terms. If multiple values are provided, the one which minimises ic will be chosen. |
d | The order of integration for non-seasonal differencing. If multiple values are provided, one of the values will be selected via repeated KPSS tests. |
q | The order of the non-seasonal moving average (MA) terms. If multiple values are provided, the one which minimises ic will be chosen. |
p_init | If stepwise = TRUE , p_init provides the initial value for p for the stepwise search procedure. |
q_init | If stepwise = TRUE , q_init provides the initial value for q for the stepwise search procedure. |
fixed | A named list of fixed parameters for coefficients. The names identify the coefficient, beginning with either ar or ma , followed by the lag order. For example, fixed = list(ar1 = 0.3, ma2 = 0) . |
The PDQ
special is used to specify seasonal components of the model. To force a non-seasonal fit, specify PDQ(0, 0, 0)
in the RHS of the model formula. Note that simply omitting PDQ
from the formula will not result in a non-seasonal fit.
PDQ(P = 0:2, D = 0:1, Q = 0:2, period = NULL,
P_init = 1, Q_init = 1, fixed = list())
P | The order of the seasonal auto-regressive (SAR) terms. If multiple values are provided, the one which minimises ic will be chosen. |
D | The order of integration for seasonal differencing. If multiple values are provided, one of the values will be selected via repeated heuristic tests (based on strength of seasonality from an STL decomposition). |
Q | The order of the seasonal moving average (SMA) terms. If multiple values are provided, the one which minimises ic will be chosen. |
period | The periodic nature of the seasonality. This can be either a number indicating the number of observations in each seasonal period, or text to indicate the duration of the seasonal window (for example, annual seasonality would be "1 year"). |
P_init | If stepwise = TRUE , P_init provides the initial value for P for the stepwise search procedure. |
Q_init | If stepwise = TRUE , Q_init provides the initial value for Q for the stepwise search procedure. |
fixed | A named list of fixed parameters for coefficients. The names identify the coefficient, beginning with either sar or sma , followed by the lag order. For example, fixed = list(sar1 = 0.1) . |
Exogenous regressors can be included in an ARIMA model without explicitly using the xreg()
special. Common exogenous regressor specials as specified in common_xregs
can also be used. These regressors are handled using stats::model.frame()
, and so interactions and other functionality behaves similarly to stats::lm()
.
The inclusion of a constant in the model follows the similar rules to stats::lm()
, where including 1
will add a constant and 0
or -1
will remove the constant. If left out, the inclusion of a constant will be determined by minimising ic
.
xreg(..., fixed = list())
... | Bare expressions for the exogenous regressors (such as log(x) ) |
fixed | A named list of fixed parameters for coefficients. The names identify the coefficient, and should match the name of the regressor. For example, fixed = list(constant = 20) . |
# Manual ARIMA specification
USAccDeaths %>%
as_tsibble() %>%
model(arima = ARIMA(log(value) ~ 0 + pdq(0, 1, 1) + PDQ(0, 1, 1))) %>%
report()
#> Series: value
#> Model: ARIMA(0,1,1)(0,1,1)[12]
#> Transformation: log(value)
#>
#> Coefficients:
#> ma1 sma1
#> -0.4713 -0.5926
#> s.e. 0.1230 0.1933
#>
#> sigma^2 estimated as 0.001379: log likelihood=109.31
#> AIC=-212.63 AICc=-212.19 BIC=-206.39
# Automatic ARIMA specification
library(tsibble)
#>
#> Attaching package: ‘tsibble’
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, union
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
tsibbledata::global_economy %>%
filter(Country == "Australia") %>%
model(ARIMA(log(GDP) ~ Population))
#> Warning: NaNs produced
#> # A mable: 1 x 2
#> # Key: Country [1]
#> Country `ARIMA(log(GDP) ~ Population)`
#> <fct> <model>
#> 1 Australia <LM w/ ARIMA(2,0,0) errors>