Determining ARIMA Model Orders (p and q) for Hong Kong Airport Traffic Data

A Comprehensive Guide to Identifying ARIMA Parameters Using ACF and PACF

Key Takeaways

Understanding ACF and PACF: Crucial tools for identifying the autoregressive (p) and moving average (q) components in ARIMA models.
Step-by-Step Analysis: Systematic approach involving data transformation, differencing, and plot interpretation to ensure stationarity and model adequacy.
Model Validation: Importance of using statistical tests and information criteria like AIC and BIC to validate and select the optimal ARIMA model.

Introduction to ARIMA Models

The Autoregressive Integrated Moving Average (ARIMA) model is a cornerstone in time series forecasting, particularly renowned for its flexibility in capturing different types of temporal dependencies. The model is characterized by three primary parameters: p (autoregressive order), d (degree of differencing), and q (moving average order). Determining the appropriate values for p and q is essential for building an effective model that accurately captures the underlying patterns in the data.

Data Preparation and Initial Analysis

Loading and Structuring the Data

The provided R code meticulously prepares the Hong Kong airport traffic data for analysis. The data, spanning from January 1998 to December 2016, is converted into a time series object with a monthly frequency. This structuring is pivotal as ARIMA models require data with a clear temporal order and consistent frequency to effectively model temporal dependencies.

Visual Inspection for Trend and Seasonality

Plotting the time series is the first step in understanding the data's behavior. The plot reveals an overall upward trend and noticeable seasonal fluctuations, indicative of regular patterns that recur at fixed intervals, such as monthly cycles in airport traffic. Recognizing these patterns guides subsequent transformations to achieve stationarity, a prerequisite for ARIMA modeling.

Ensuring Stationarity Through Transformation

Log Transformation

To stabilize the variance and mitigate heteroscedasticity (i.e., non-constant variance), a logarithmic transformation is applied to the time series. This transformation is particularly effective in converting exponential growth trends into linear trends, facilitating easier modeling and interpretation.

Differencing for Mean Stationarity

First-order differencing is employed to remove linear trends, ensuring that the time series has a constant mean over time. The Augmented Dickey-Fuller (ADF) test is conducted post-differencing to statistically confirm stationarity. A significant p-value (typically < 0.05) allows us to reject the null hypothesis of a unit root, indicating that the series is stationary in terms of its mean.

Seasonal Differencing

Given the evident seasonal patterns, seasonal differencing with a lag of 12 (corresponding to monthly data) is performed. This step eliminates recurring seasonal effects, further stabilizing the series and preparing it for ARIMA modeling. The ADF test is again applied to verify stationarity after seasonal differencing.

Identifying ARIMA Orders (p and q) Using ACF and PACF

Understanding ACF and PACF

The Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) are graphical tools that help in identifying the appropriate orders of the ARIMA model. While ACF measures the correlation between the time series and its lagged values, PACF measures the correlation between the series and its lagged values after removing the effects of intermediate lags.

Interpreting the ACF Plot

The ACF plot of the seasonally differenced series (dhkltraf_12) shows a significant spike at lag 1, followed by a gradual decay. This pattern is characteristic of a Moving Average (MA(q)) process, where the ACF tails off rather than cutting off sharply. In this context, the gradual decline suggests the presence of a moving average component with order q=1.

Interpreting the PACF Plot

Conversely, the PACF plot displays a sharp cutoff after lag 2, with significant spikes at lags 1 and 2, followed by non-significant values. This pattern aligns with an Autoregressive (AR(p)) process, where the PACF cuts off sharply. Therefore, the observed cutoff suggests an autoregressive component with order p=2.

Synthesizing ACF and PACF Insights

Combining insights from both ACF and PACF plots leads to the identification of the ARIMA model parameters. The PACF indicating p=2 and the ACF suggesting q=1 collectively point towards an ARIMA(2,1,1) model for the non-seasonal components. Additionally, considering the seasonal differencing applied earlier, the full model becomes ARIMA(2,1,1)(P,1,Q)₁₂, where P and Q represent the seasonal autoregressive and moving average orders, respectively.

Step-by-Step Approach to Determining p and q

1. Ensuring Stationarity

Before delving into identifying p and q, it's imperative to ensure that the time series is stationary. This involves transforming the data through log transformation and differencing (both first-order and seasonal). The ADF tests confirm stationarity post each transformation, setting the stage for accurate ARIMA modeling.

2. Analyzing the ACF Plot

The ACF plot for the seasonally differenced series reveals a slow decay with a prominent spike at lag 1. This gradual decline is indicative of a Moving Average process, suggesting an MA(1) component. A sharp cutoff in the ACF would have pointed towards a higher-order MA process.

3. Analyzing the PACF Plot

The PACF plot shows significant spikes at the first two lags, beyond which the values taper off. This sharp cutoff is characteristic of an Autoregressive process of order p=2. If the PACF had tapered off, it would have suggested the absence of a strong AR component.

4. Combining Insights for ARIMA Parameters

The combination of a slowly decaying ACF and a sharply cutting-off PACF leads to the specification of an ARIMA(2,1,1) model. Here, 2 represents the AR order, 1 denotes the degree of differencing, and 1 signifies the MA order. Incorporating seasonality, the model extends to ARIMA(2,1,1)(P,1,Q)₁₂, where P and Q need to be determined based on seasonal ACF and PACF plots.

Model Validation and Selection

Fitting the ARIMA Model

With the identified parameters, the next step is to fit the ARIMA model to the data using statistical software such as R. Functions like auto.arima() can automate this process by selecting the best-fitting model based on information criteria, but manual specification allows for more control based on ACF and PACF insights.

Evaluating Model Fit

Post-fitting, it's crucial to assess the model's adequacy. This involves examining residuals to ensure they resemble white noise, indicating that the model has captured all underlying patterns in the data. Additionally, metrics like Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are employed to compare different models, with lower values suggesting a better fit.

Iterative Refinement

Modeling is often an iterative process. If the initial model does not satisfy diagnostic checks, adjustments to the orders p and q, or even the inclusion of exogenous variables, may be necessary. Continuous refinement ensures that the final model provides robust and reliable forecasts.

Conclusion

Identifying the appropriate orders p and q in an ARIMA model is a methodical process that hinges on a thorough analysis of ACF and PACF plots. In the context of the Hong Kong airport traffic data, the ACF and PACF suggested an ARIMA(2,1,1) model for non-seasonal components, augmented by seasonal differencing to account for recurring patterns. Rigorous model validation through residual analysis and information criteria ensures that the selected model is both accurate and reliable for forecasting purposes.

References

people.duke.edu

Identifying the orders of AR and MA terms in an ARIMA model

baeldung.com

Choosing the best q and p from ACF and PACF plots in ARMA-type modeling

stackoverflow.com

Decide p, q values based on ACF and PACF graphs and identify parameters of SARIM

medium.com

How to Interpret ACF and PACF plots for Identifying AR, MA, ARMA, or ARIMA Models

stats.stackexchange.com

How to decide p, P, q, Q of ARIMA through ACF and PACF?

rdocumentation.org

ADF Test in R