Searches through the model space specified in the specials to identify the best ARIMA model, with the lowest AIC, AICc or BIC value. It is implemented using stats::arima() and allows ARIMA models to be used in the fable framework.

ARIMA(
formula,
ic = c("aicc", "aic", "bic"),
selection_metric = function(x) x[[ic]],
stepwise = TRUE,
greedy = TRUE,
approximation = NULL,
order_constraint = p + q + P + Q <= 6 & (constant + d + D <= 2),
unitroot_spec = unitroot_options(),
trace = FALSE,
...
)

## Arguments

formula Model specification (see "Specials" section). The information criterion used in selecting the model. A function used to compute a metric from an Arima object which is minimised to select the best model. Should stepwise be used? (Stepwise can be much faster) Should the stepwise search move to the next best option immediately? Should CSS (conditional sum of squares) be used during model selection? The default (NULL) will use the approximation if there are more than 150 observations or if the seasonal period is greater than 12. A logical predicate on the orders of p, d, q, P, D, Q and constant to consider in the search. See "Specials" for the meaning of these terms. A specification of unit root tests to use in the selection of d and D. See unitroot_options() for more details. If TRUE, the selection_metric of estimated models in the selection procedure will be outputted to the console. Further arguments for stats::arima()

## Value

A model specification.

## Parameterisation

The fable ARIMA() function uses an alternate parameterisation of constants to stats::arima() and forecast::Arima(). While the parameterisations are equivalent, the coefficients for the constant/mean will differ.

In fable, the parameterisation used is:

$$(1-\phi_1B - \cdots - \phi_p B^p)(1-B)^d y_t = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$$

In stats and forecast, an ARIMA model is parameterised as:

$$(1-\phi_1B - \cdots - \phi_p B^p)(y_t' - \mu) = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t$$

where $$\mu$$ is the mean of $$(1-B)^d y_t$$ and $$c = \mu(1-\phi_1 - \cdots - \phi_p )$$.

## Specials

The specials define the space over which ARIMA will search for the model that best fits the data. If the RHS of formula is left blank, the default search space is given by pdq() + PDQ(): that is, a model with candidate seasonal and nonseasonal terms, but no exogenous regressors. Note that a seasonal model requires at least 2 full seasons of data; if this is not available, ARIMA will revert to a nonseasonal model with a warning.

To specify a model fully (avoid automatic selection), the intercept and pdq()/PDQ() values must be specified. For example, formula = response ~ 1 + pdq(1, 1, 1) + PDQ(1, 0, 0).

### pdq

The pdq special is used to specify non-seasonal components of the model.

pdq(p = 0:5, d = 0:2, q = 0:5,
p_init = 2, q_init = 2, fixed = list())

 p The order of the non-seasonal auto-regressive (AR) terms. If multiple values are provided, the one which minimises ic will be chosen. d The order of integration for non-seasonal differencing. If multiple values are provided, one of the values will be selected via repeated KPSS tests. q The order of the non-seasonal moving average (MA) terms. If multiple values are provided, the one which minimises ic will be chosen. p_init If stepwise = TRUE, p_init provides the initial value for p for the stepwise search procedure. q_init If stepwise = TRUE, q_init provides the initial value for q for the stepwise search procedure. fixed A named list of fixed parameters for coefficients. The names identify the coefficient, beginning with either ar or ma, followed by the lag order. For example, fixed = list(ar1 = 0.3, ma2 = 0).

### PDQ

The PDQ special is used to specify seasonal components of the model. To force a non-seasonal fit, specify PDQ(0, 0, 0) in the RHS of the model formula. Note that simply omitting PDQ from the formula will not result in a non-seasonal fit.

PDQ(P = 0:2, D = 0:1, Q = 0:2, period = NULL,
P_init = 1, Q_init = 1, fixed = list())

 P The order of the seasonal auto-regressive (SAR) terms. If multiple values are provided, the one which minimises ic will be chosen. D The order of integration for seasonal differencing. If multiple values are provided, one of the values will be selected via repeated heuristic tests (based on strength of seasonality from an STL decomposition). Q The order of the seasonal moving average (SMA) terms. If multiple values are provided, the one which minimises ic will be chosen. period The periodic nature of the seasonality. This can be either a number indicating the number of observations in each seasonal period, or text to indicate the duration of the seasonal window (for example, annual seasonality would be "1 year"). P_init If stepwise = TRUE, P_init provides the initial value for P for the stepwise search procedure. Q_init If stepwise = TRUE, Q_init provides the initial value for Q for the stepwise search procedure. fixed A named list of fixed parameters for coefficients. The names identify the coefficient, beginning with either sar or sma, followed by the lag order. For example, fixed = list(sar1 = 0.1).

### xreg

Exogenous regressors can be included in an ARIMA model without explicitly using the xreg() special. Common exogenous regressor specials as specified in common_xregs can also be used. These regressors are handled using stats::model.frame(), and so interactions and other functionality behaves similarly to stats::lm().

The inclusion of a constant in the model follows the similar rules to stats::lm(), where including 1 will add a constant and 0 or -1 will remove the constant. If left out, the inclusion of a constant will be determined by minimising ic.

xreg(..., fixed = list())

 ... Bare expressions for the exogenous regressors (such as log(x)) fixed A named list of fixed parameters for coefficients. The names identify the coefficient, and should match the name of the regressor. For example, fixed = list(constant = 20).

## Examples

# Manual ARIMA specification
USAccDeaths %>%
as_tsibble() %>%
model(arima = ARIMA(log(value) ~ 0 + pdq(0, 1, 1) + PDQ(0, 1, 1))) %>%
report()
#> Series: value
#> Model: ARIMA(0,1,1)(0,1,1)
#> Transformation: log(value)
#>
#> Coefficients:
#>           ma1     sma1
#>       -0.4713  -0.5926
#> s.e.   0.1230   0.1933
#>
#> sigma^2 estimated as 0.001379:  log likelihood=109.31
#> AIC=-212.63   AICc=-212.19   BIC=-206.39
# Automatic ARIMA specification
library(tsibble)
#>
#> Attaching package: ‘tsibble’#> The following objects are masked from ‘package:base’:
#>
#>     intersect, setdiff, unionlibrary(dplyr)
#>
#> Attaching package: ‘dplyr’#> The following objects are masked from ‘package:stats’:
#>
#>     filter, lag#> The following objects are masked from ‘package:base’:
#>
#>     intersect, setdiff, setequal, uniontsibbledata::global_economy %>%
filter(Country == "Australia") %>%
model(ARIMA(log(GDP) ~ Population))
#> Warning: NaNs produced#> # A mable: 1 x 2
#> # Key:     Country 
#>   Country   ARIMA(log(GDP) ~ Population)
#>   <fct>                            <model>
#> 1 Australia    <LM w/ ARIMA(2,0,0) errors>