The Engle-Granger Two-Step Method: A Practitioner's Guide to Cointegration Testing

The concept of cointegration is a cornerstone of modern time-series analysis, particularly in the realm of quantitative trading. It describes a statistical property of two or more non-stationary time series variables, in which a linear combination of them is stationary. For traders, this is a effective idea. It suggests that even if individual asset prices wander unpredictably (exhibit unit root behavior), there might be a long-term, economically meaningful relationship that binds them together. When this relationship is temporarily broken, a trading opportunity may arise.

The Engle-Granger two-step method, developed by Robert Engle and Clive Granger, is a foundational technique for testing for cointegration. While more advanced methods like the Johansen test exist, the Engle-Granger approach is intuitive and widely used for its simplicity, especially when analyzing pairs of assets. This article provides a detailed, practical guide to implementing the Engle-Granger test for pairs trading, aimed at traders with a solid understanding of statistical concepts.

Understanding the Core Idea: Mean Reversion

At its heart, pairs trading is a strategy that bets on mean reversion. The core assumption is that the spread between two cointegrated assets will revert to its long-term mean. When the spread widens, the strategy involves shorting the outperforming asset and buying the underperforming one. When the spread narrows, the positions are closed for a profit. The Engle-Granger test provides a formal framework for identifying such pairs.

The Two-Step Procedure

The Engle-Granger test consists of two main steps:

Estimating the long-run relationship: This involves running a simple Ordinary Least Squares (OLS) regression of one asset's price on the other's. The residuals of this regression represent the spread between the two assets.
Testing the residuals for stationarity: This step involves testing the residuals from the first step for a unit root. If the residuals are found to be stationary, it implies that the two assets are cointegrated.

Let's break down each step in detail.

Step 1: Estimating the Long-Run Relationship

Suppose we have two stocks, A and B, whose prices are non-stationary (I(1)). We want to test if they are cointegrated. The first step is to estimate the following regression:

Price_A = β * Price_B + c + ε

Price_A = β * Price_B + c + ε

Where:

Price_A and Price_B are the prices of the two stocks.
β is the cointegration coefficient, which represents the long-term relationship between the two prices.
c is a constant.
ε is the residual, which represents the deviation from the long-run equilibrium.

The residuals are calculated as:

ε = Price_A - (β * Price_B + c)

ε = Price_A - (β * Price_B + c)

These residuals represent the spread between the two assets. If the two assets are cointegrated, this spread should be stationary.

Step 2: Testing the Residuals for Stationarity

The second step is to test the residuals (ε) for stationarity. This is typically done using the Augmented Dickey-Fuller (ADF) test. The ADF test is a statistical test for a unit root in a time series sample. The null hypothesis of the ADF test is that the time series has a unit root (i.e., it is non-stationary). The alternative hypothesis is that the time series is stationary.

The ADF test involves estimating the following regression:

Δε_t = α * ε_{t-1} + Σ(γ_i * Δε_{t-i}) + u_t

Δε_t = α * ε_{t-1} + Σ(γ_i * Δε_{t-i}) + u_t

Where:

Δε_t is the first difference of the residuals at time t.
ε_{t-1} is the lagged residual.
Δε_{t-i} are the lagged first differences of the residuals.
α and γ_i are coefficients.
u_t is the error term.

The null hypothesis is that α = 0. If we can reject the null hypothesis, it means that the residuals are stationary, and therefore the two assets are cointegrated.

A Practical Example

Let's consider two hypothetical stocks, STOCK_A and STOCK_B. We have 252 daily closing prices for each stock. First, we would plot the prices to visually inspect their behavior. Let's assume they both appear to be non-stationary.

Step 1: OLS Regression

We run an OLS regression of STOCK_A on STOCK_B:

python

import numpy as np
import statsmodels.api as sm

# Assuming stock_a_prices and stock_b_prices are numpy arrays
stock_b_prices_with_const = sm.add_constant(stock_b_prices)
model = sm.OLS(stock_a_prices, stock_b_prices_with_const)
results = model.fit()

beta = results.params[1]
const = results.params[0]
residuals = results.resid

import numpy as np
import statsmodels.api as sm

# Assuming stock_a_prices and stock_b_prices are numpy arrays
stock_b_prices_with_const = sm.add_constant(stock_b_prices)
model = sm.OLS(stock_a_prices, stock_b_prices_with_const)
results = model.fit()

beta = results.params[1]
const = results.params[0]
residuals = results.resid

Let's say the regression results give us a beta of 1.5 and a const of 10. The residuals are then calculated as:

residuals = stock_a_prices - (1.5 * stock_b_prices + 10)*

Step 2: ADF Test on Residuals

Now we perform the ADF test on the residuals.

python

from statsmodels.tsa.stattools import adfuller

adf_result = adfuller(residuals)

print(f'ADF Statistic: {adf_result[0]}')
print(f'p-value: {adf_result[1]}')

from statsmodels.tsa.stattools import adfuller

adf_result = adfuller(residuals)

print(f'ADF Statistic: {adf_result[0]}')
print(f'p-value: {adf_result[1]}')

If the p-value is less than a chosen significance level (e.g., 0.05), we reject the null hypothesis of the ADF test. This means the residuals are stationary, and we can conclude that STOCK_A and STOCK_B are cointegrated.

Trading the Cointegrated Pair

Once we have identified a cointegrated pair, we can use the residuals (the spread) to generate trading signals. A common approach is to normalize the spread by calculating its z-score:

z_score = (spread - mean(spread)) / std(spread)

We can then set entry and exit thresholds based on the z-score. For example:

Entry signal (short the spread): When the z-score crosses above a certain threshold (e.g., 2.0), we short STOCK_A and buy beta units of STOCK_B.
Entry signal (long the spread): When the z-score crosses below a certain threshold (e.g., -2.0), we buy STOCK_A and short beta units of STOCK_B.
Exit signal: When the z-score reverts to its mean (crosses 0), we close the positions.

Limitations of the Engle-Granger Test

While the Engle-Granger test is a useful tool, it has some limitations:

Single cointegrating vector: The test can only identify a single cointegrating relationship. If there are more than two assets, there could be multiple cointegrating vectors. In such cases, the Johansen test is more appropriate.
Choice of dependent variable: The results of the test can be sensitive to the choice of the dependent variable in the OLS regression. It's a good practice to run the test both ways (i.e., regressing A on B and B on A).
Small sample properties: The test may not perform well in small samples.
Structural breaks: The cointegrating relationship may not be stable over time. Structural breaks in the data can lead to misleading results.

Conclusion

The Engle-Granger two-step method is a effective and intuitive technique for identifying cointegrated pairs of assets. By understanding and applying this test, traders can develop sophisticated mean-reversion strategies. However, it's important to be aware of the test's limitations and to use it as part of a comprehensive trading framework that includes risk management and further statistical analysis.

Category	Pairs Cointegration
Read time	7 minutes
Published	Feb 28, 2026