Main Page > Articles > Pairs Cointegration > The Engle-Granger Two-Step Method: A Practitioner's Guide to Cointegration Testing

The Engle-Granger Two-Step Method: A Practitioner's Guide to Cointegration Testing

From TradingHabits, the trading encyclopedia · 7 min read · February 28, 2026
The Black Book of Day Trading Strategies
Free Book

The Black Book of Day Trading Strategies

1,000 complete strategies · 31 chapters · Full trade plans

The concept of cointegration is a cornerstone of modern time-series analysis, particularly in the realm of quantitative trading. It describes a statistical property of two or more non-stationary time series variables, in which a linear combination of them is stationary. For traders, this is a effective idea. It suggests that even if individual asset prices wander unpredictably (exhibit unit root behavior), there might be a long-term, economically meaningful relationship that binds them together. When this relationship is temporarily broken, a trading opportunity may arise.

The Engle-Granger two-step method, developed by Robert Engle and Clive Granger, is a foundational technique for testing for cointegration. While more advanced methods like the Johansen test exist, the Engle-Granger approach is intuitive and widely used for its simplicity, especially when analyzing pairs of assets. This article provides a detailed, practical guide to implementing the Engle-Granger test for pairs trading, aimed at traders with a solid understanding of statistical concepts.

Understanding the Core Idea: Mean Reversion

At its heart, pairs trading is a strategy that bets on mean reversion. The core assumption is that the spread between two cointegrated assets will revert to its long-term mean. When the spread widens, the strategy involves shorting the outperforming asset and buying the underperforming one. When the spread narrows, the positions are closed for a profit. The Engle-Granger test provides a formal framework for identifying such pairs.

The Two-Step Procedure

The Engle-Granger test consists of two main steps:

  1. Estimating the long-run relationship: This involves running a simple Ordinary Least Squares (OLS) regression of one asset's price on the other's. The residuals of this regression represent the spread between the two assets.
  2. Testing the residuals for stationarity: This step involves testing the residuals from the first step for a unit root. If the residuals are found to be stationary, it implies that the two assets are cointegrated.

Let's break down each step in detail.

Step 1: Estimating the Long-Run Relationship

Suppose we have two stocks, A and B, whose prices are non-stationary (I(1)). We want to test if they are cointegrated. The first step is to estimate the following regression:

Price_A = β * Price_B + c + ε

Where:

  • Price_A and Price_B are the prices of the two stocks.
  • β is the cointegration coefficient, which represents the long-term relationship between the two prices.
  • c is a constant.
  • ε is the residual, which represents the deviation from the long-run equilibrium.

The residuals are calculated as:

ε = Price_A - (β * Price_B + c)

These residuals represent the spread between the two assets. If the two assets are cointegrated, this spread should be stationary.

Step 2: Testing the Residuals for Stationarity

The second step is to test the residuals (ε) for stationarity. This is typically done using the Augmented Dickey-Fuller (ADF) test. The ADF test is a statistical test for a unit root in a time series sample. The null hypothesis of the ADF test is that the time series has a unit root (i.e., it is non-stationary). The alternative hypothesis is that the time series is stationary.

The ADF test involves estimating the following regression:

Δε_t = α * ε_{t-1} + Σ(γ_i * Δε_{t-i}) + u_t

Where:

  • Δε_t is the first difference of the residuals at time t.
  • ε_{t-1} is the lagged residual.
  • Δε_{t-i} are the lagged first differences of the residuals.
  • α and γ_i are coefficients.
  • u_t is the error term.

The null hypothesis is that α = 0. If we can reject the null hypothesis, it means that the residuals are stationary, and therefore the two assets are cointegrated.

A Practical Example

Let's consider two hypothetical stocks, STOCK_A and STOCK_B. We have 252 daily closing prices for each stock. First, we would plot the prices to visually inspect their behavior. Let's assume they both appear to be non-stationary.

Step 1: OLS Regression

We run an OLS regression of STOCK_A on STOCK_B:

python
import numpy as np
import statsmodels.api as sm

# Assuming stock_a_prices and stock_b_prices are numpy arrays
stock_b_prices_with_const = sm.add_constant(stock_b_prices)
model = sm.OLS(stock_a_prices, stock_b_prices_with_const)
results = model.fit()

beta = results.params[1]
const = results.params[0]
residuals = results.resid

Let's say the regression results give us a beta of 1.5 and a const of 10. The residuals are then calculated as:

residuals = stock_a_prices - (1.5 * stock_b_prices + 10)*

Step 2: ADF Test on Residuals

Now we perform the ADF test on the residuals.

python
from statsmodels.tsa.stattools import adfuller

adf_result = adfuller(residuals)

print(f'ADF Statistic: {adf_result[0]}')
print(f'p-value: {adf_result[1]}')

If the p-value is less than a chosen significance level (e.g., 0.05), we reject the null hypothesis of the ADF test. This means the residuals are stationary, and we can conclude that STOCK_A and STOCK_B are cointegrated.

Trading the Cointegrated Pair

Once we have identified a cointegrated pair, we can use the residuals (the spread) to generate trading signals. A common approach is to normalize the spread by calculating its z-score:

z_score = (spread - mean(spread)) / std(spread)

We can then set entry and exit thresholds based on the z-score. For example:

  • Entry signal (short the spread): When the z-score crosses above a certain threshold (e.g., 2.0), we short STOCK_A and buy beta units of STOCK_B.
  • Entry signal (long the spread): When the z-score crosses below a certain threshold (e.g., -2.0), we buy STOCK_A and short beta units of STOCK_B.
  • Exit signal: When the z-score reverts to its mean (crosses 0), we close the positions.

Limitations of the Engle-Granger Test

While the Engle-Granger test is a useful tool, it has some limitations:

  • Single cointegrating vector: The test can only identify a single cointegrating relationship. If there are more than two assets, there could be multiple cointegrating vectors. In such cases, the Johansen test is more appropriate.
  • Choice of dependent variable: The results of the test can be sensitive to the choice of the dependent variable in the OLS regression. It's a good practice to run the test both ways (i.e., regressing A on B and B on A).
  • Small sample properties: The test may not perform well in small samples.
  • Structural breaks: The cointegrating relationship may not be stable over time. Structural breaks in the data can lead to misleading results.

Conclusion

The Engle-Granger two-step method is a effective and intuitive technique for identifying cointegrated pairs of assets. By understanding and applying this test, traders can develop sophisticated mean-reversion strategies. However, it's important to be aware of the test's limitations and to use it as part of a comprehensive trading framework that includes risk management and further statistical analysis.