The Engle-Granger Two-Step Method: A Practitioner's Guide to Cointegration Testing
The concept of cointegration is a cornerstone of modern time-series analysis, particularly in the realm of quantitative trading. It describes a statistical property of two or more non-stationary time series variables, in which a linear combination of them is stationary. For traders, this is a effective idea. It suggests that even if individual asset prices wander unpredictably (exhibit unit root behavior), there might be a long-term, economically meaningful relationship that binds them together. When this relationship is temporarily broken, a trading opportunity may arise.
The Engle-Granger two-step method, developed by Robert Engle and Clive Granger, is a foundational technique for testing for cointegration. While more advanced methods like the Johansen test exist, the Engle-Granger approach is intuitive and widely used for its simplicity, especially when analyzing pairs of assets. This article provides a detailed, practical guide to implementing the Engle-Granger test for pairs trading, aimed at traders with a solid understanding of statistical concepts.
Understanding the Core Idea: Mean Reversion
At its heart, pairs trading is a strategy that bets on mean reversion. The core assumption is that the spread between two cointegrated assets will revert to its long-term mean. When the spread widens, the strategy involves shorting the outperforming asset and buying the underperforming one. When the spread narrows, the positions are closed for a profit. The Engle-Granger test provides a formal framework for identifying such pairs.
The Two-Step Procedure
The Engle-Granger test consists of two main steps:
- Estimating the long-run relationship: This involves running a simple Ordinary Least Squares (OLS) regression of one asset's price on the other's. The residuals of this regression represent the spread between the two assets.
- Testing the residuals for stationarity: This step involves testing the residuals from the first step for a unit root. If the residuals are found to be stationary, it implies that the two assets are cointegrated.
Let's break down each step in detail.
Step 1: Estimating the Long-Run Relationship
Suppose we have two stocks, A and B, whose prices are non-stationary (I(1)). We want to test if they are cointegrated. The first step is to estimate the following regression:
Price_A = β * Price_B + c + ε
Price_A = β * Price_B + c + ε
Where:
Price_AandPrice_Bare the prices of the two stocks.βis the cointegration coefficient, which represents the long-term relationship between the two prices.cis a constant.εis the residual, which represents the deviation from the long-run equilibrium.
The residuals are calculated as:
ε = Price_A - (β * Price_B + c)
ε = Price_A - (β * Price_B + c)
These residuals represent the spread between the two assets. If the two assets are cointegrated, this spread should be stationary.
Step 2: Testing the Residuals for Stationarity
The second step is to test the residuals (ε) for stationarity. This is typically done using the Augmented Dickey-Fuller (ADF) test. The ADF test is a statistical test for a unit root in a time series sample. The null hypothesis of the ADF test is that the time series has a unit root (i.e., it is non-stationary). The alternative hypothesis is that the time series is stationary.
The ADF test involves estimating the following regression:
Δε_t = α * ε_{t-1} + Σ(γ_i * Δε_{t-i}) + u_t
Δε_t = α * ε_{t-1} + Σ(γ_i * Δε_{t-i}) + u_t
Where:
Δε_tis the first difference of the residuals at time t.ε_{t-1}is the lagged residual.Δε_{t-i}are the lagged first differences of the residuals.αandγ_iare coefficients.u_tis the error term.
The null hypothesis is that α = 0. If we can reject the null hypothesis, it means that the residuals are stationary, and therefore the two assets are cointegrated.
A Practical Example
Let's consider two hypothetical stocks, STOCK_A and STOCK_B. We have 252 daily closing prices for each stock. First, we would plot the prices to visually inspect their behavior. Let's assume they both appear to be non-stationary.
Step 1: OLS Regression
We run an OLS regression of STOCK_A on STOCK_B:
import numpy as np
import statsmodels.api as sm
# Assuming stock_a_prices and stock_b_prices are numpy arrays
stock_b_prices_with_const = sm.add_constant(stock_b_prices)
model = sm.OLS(stock_a_prices, stock_b_prices_with_const)
results = model.fit()
beta = results.params[1]
const = results.params[0]
residuals = results.resid
import numpy as np
import statsmodels.api as sm
# Assuming stock_a_prices and stock_b_prices are numpy arrays
stock_b_prices_with_const = sm.add_constant(stock_b_prices)
model = sm.OLS(stock_a_prices, stock_b_prices_with_const)
results = model.fit()
beta = results.params[1]
const = results.params[0]
residuals = results.resid
Let's say the regression results give us a beta of 1.5 and a const of 10. The residuals are then calculated as:
residuals = stock_a_prices - (1.5 * stock_b_prices + 10)*
Step 2: ADF Test on Residuals
Now we perform the ADF test on the residuals.
from statsmodels.tsa.stattools import adfuller
adf_result = adfuller(residuals)
print(f'ADF Statistic: {adf_result[0]}')
print(f'p-value: {adf_result[1]}')
from statsmodels.tsa.stattools import adfuller
adf_result = adfuller(residuals)
print(f'ADF Statistic: {adf_result[0]}')
print(f'p-value: {adf_result[1]}')
If the p-value is less than a chosen significance level (e.g., 0.05), we reject the null hypothesis of the ADF test. This means the residuals are stationary, and we can conclude that STOCK_A and STOCK_B are cointegrated.
Trading the Cointegrated Pair
Once we have identified a cointegrated pair, we can use the residuals (the spread) to generate trading signals. A common approach is to normalize the spread by calculating its z-score:
z_score = (spread - mean(spread)) / std(spread)
We can then set entry and exit thresholds based on the z-score. For example:
- Entry signal (short the spread): When the z-score crosses above a certain threshold (e.g., 2.0), we short
STOCK_Aand buybetaunits ofSTOCK_B. - Entry signal (long the spread): When the z-score crosses below a certain threshold (e.g., -2.0), we buy
STOCK_Aand shortbetaunits ofSTOCK_B. - Exit signal: When the z-score reverts to its mean (crosses 0), we close the positions.
Limitations of the Engle-Granger Test
While the Engle-Granger test is a useful tool, it has some limitations:
- Single cointegrating vector: The test can only identify a single cointegrating relationship. If there are more than two assets, there could be multiple cointegrating vectors. In such cases, the Johansen test is more appropriate.
- Choice of dependent variable: The results of the test can be sensitive to the choice of the dependent variable in the OLS regression. It's a good practice to run the test both ways (i.e., regressing A on B and B on A).
- Small sample properties: The test may not perform well in small samples.
- Structural breaks: The cointegrating relationship may not be stable over time. Structural breaks in the data can lead to misleading results.
Conclusion
The Engle-Granger two-step method is a effective and intuitive technique for identifying cointegrated pairs of assets. By understanding and applying this test, traders can develop sophisticated mean-reversion strategies. However, it's important to be aware of the test's limitations and to use it as part of a comprehensive trading framework that includes risk management and further statistical analysis.
