Estimate an ARIMA model

Searches through the model space specified in the specials to identify the best ARIMA model, with the lowest AIC, AICc or BIC value. It is implemented using stats::arima() and allows ARIMA models to be used in the fable framework.

ARIMA(
  formula,
  ic = c("aicc", "aic", "bic"),
  selection_metric = function(x) x[[ic]],
  stepwise = TRUE,
  greedy = TRUE,
  approximation = NULL,
  order_constraint = p + q + P + Q <= 6 & (constant + d + D <= 2),
  unitroot_spec = unitroot_options(),
  trace = FALSE,
  ...
)

Arguments

formula: Model specification (see "Specials" section).
ic: The information criterion used in selecting the model.
selection_metric: A function used to compute a metric from an Arima object which is minimised to select the best model.
stepwise: Should stepwise be used? (Stepwise can be much faster)
greedy: Should the stepwise search move to the next best option immediately?
approximation: Should CSS (conditional sum of squares) be used during model selection? The default (NULL) will use the approximation if there are more than 150 observations or if the seasonal period is greater than 12.
order_constraint: A logical predicate on the orders of p, d, q, P, D, Q and constant to consider in the search. See "Specials" for the meaning of these terms.
unitroot_spec: A specification of unit root tests to use in the selection of d and D. See unitroot_options() for more details.
trace: If TRUE, the selection_metric of estimated models in the selection procedure will be outputted to the console.
...: Further arguments for stats::arima()

Value

A model specification.

Parameterisation

The fable ARIMA() function uses an alternative parameterisation of constants to stats::arima() and forecast::Arima(). While the parameterisations are equivalent, the coefficients for the constant/mean will differ.

In fable, if there are no exogenous regressors, the parameterisation used is:

$$(1-\phi_1B - \cdots - \phi_p B^p)(1-B)^d y_t = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$$

In stats and forecast, an ARIMA model is parameterised as:

$$(1-\phi_1B - \cdots - \phi_p B^p)(y_t' - \mu) = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$$

where $\mu$ is the mean of $(1-B)^d y_t$ and $c = \mu(1-\phi_1 - \cdots - \phi_p )$.

If there are exogenous regressors, fable uses the same parameterisation as used in stats and forecast. That is, it fits a regression with ARIMA(p,d,q) errors:

$$y_t = c + \beta' x_t + z_t$$

where $\beta$ is a vector of regression coefficients, $x_t$ is a vector of exogenous regressors at time $t$, and $z_t$ is an ARIMA(p,d,q) error process:

$$(1-\phi_1B - \cdots - \phi_p B^p)(1-B)^d z_t = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$$

For details of the estimation algorithm, see the arima function in the stats package.

Specials

The specials define the space over which ARIMA will search for the model that best fits the data. If the RHS of formula is left blank, the default search space is given by pdq() + PDQ(): that is, a model with candidate seasonal and nonseasonal terms, but no exogenous regressors. Note that a seasonal model requires at least 2 full seasons of data; if this is not available, ARIMA will revert to a nonseasonal model with a warning.

To specify a model fully (avoid automatic selection), the intercept and pdq()/PDQ() values must be specified. For example, formula = response ~ 1 + pdq(1, 1, 1) + PDQ(1, 0, 0).

pdq

The pdq special is used to specify non-seasonal components of the model.


pdq(p = 0:5, d = 0:2, q = 0:5,
    p_init = 2, q_init = 2, fixed = list())

`p`	The order of the non-seasonal auto-regressive (AR) terms. If multiple values are provided, the one which minimises `ic` will be chosen.
`d`	The order of integration for non-seasonal differencing. If multiple values are provided, one of the values will be selected via repeated KPSS tests.
`q`	The order of the non-seasonal moving average (MA) terms. If multiple values are provided, the one which minimises `ic` will be chosen.
`p_init`	If `stepwise = TRUE`, `p_init` provides the initial value for `p` for the stepwise search procedure.
`q_init`	If `stepwise = TRUE`, `q_init` provides the initial value for `q` for the stepwise search procedure.
`fixed`	A named list of fixed parameters for coefficients. The names identify the coefficient, beginning with either `ar` or `ma`, followed by the lag order. For example, `fixed = list(ar1 = 0.3, ma2 = 0)`.

PDQ

The PDQ special is used to specify seasonal components of the model. To force a non-seasonal fit, specify PDQ(0, 0, 0) in the RHS of the model formula. Note that simply omitting PDQ from the formula will not result in a non-seasonal fit.


PDQ(P = 0:2, D = 0:1, Q = 0:2, period = NULL,
    P_init = 1, Q_init = 1, fixed = list())

`P`	The order of the seasonal auto-regressive (SAR) terms. If multiple values are provided, the one which minimises `ic` will be chosen.
`D`	The order of integration for seasonal differencing. If multiple values are provided, one of the values will be selected via repeated heuristic tests (based on strength of seasonality from an STL decomposition).
`Q`	The order of the seasonal moving average (SMA) terms. If multiple values are provided, the one which minimises `ic` will be chosen.
`period`	The periodic nature of the seasonality. This can be either a number indicating the number of observations in each seasonal period, or text to indicate the duration of the seasonal window (for example, annual seasonality would be "1 year").
`P_init`	If `stepwise = TRUE`, `P_init` provides the initial value for `P` for the stepwise search procedure.
`Q_init`	If `stepwise = TRUE`, `Q_init` provides the initial value for `Q` for the stepwise search procedure.
`fixed`	A named list of fixed parameters for coefficients. The names identify the coefficient, beginning with either `sar` or `sma`, followed by the lag order. For example, `fixed = list(sar1 = 0.1)`.

xreg

Exogenous regressors can be included in an ARIMA model without explicitly using the xreg() special. Common exogenous regressor specials as specified in common_xregs can also be used. These regressors are handled using stats::model.frame(), and so interactions and other functionality behaves similarly to stats::lm().

The inclusion of a constant in the model follows the similar rules to stats::lm(), where including 1 will add a constant and 0 or -1 will remove the constant. If left out, the inclusion of a constant will be determined by minimising ic.


xreg(..., fixed = list())

`...`	Bare expressions for the exogenous regressors (such as `log(x)`)
`fixed`	A named list of fixed parameters for coefficients. The names identify the coefficient, and should match the name of the regressor. For example, `fixed = list(constant = 20)`.

Examples

# Manual ARIMA specification
USAccDeaths %>%
  as_tsibble() %>%
  model(arima = ARIMA(log(value) ~ 0 + pdq(0, 1, 1) + PDQ(0, 1, 1))) %>%
  report()
#> Series: value 
#> Model: ARIMA(0,1,1)(0,1,1)[12] 
#> Transformation: log(value) 
#> 
#> Coefficients:
#>           ma1     sma1
#>       -0.4713  -0.5926
#> s.e.   0.1230   0.1933
#> 
#> sigma^2 estimated as 0.001379:  log likelihood=109.31
#> AIC=-212.63   AICc=-212.19   BIC=-206.39

# Automatic ARIMA specification
library(tsibble)
#> 
#> Attaching package: ‘tsibble’
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, union
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
tsibbledata::global_economy %>%
  filter(Country == "Australia") %>%
  model(ARIMA(log(GDP) ~ Population))
#> Warning: NaNs produced
#> # A mable: 1 x 2
#> # Key:     Country [1]
#>   Country   `ARIMA(log(GDP) ~ Population)`
#>   <fct>                            <model>
#> 1 Australia    <LM w/ ARIMA(2,0,0) errors>