Understanding Mean Reversion Half-Life
The half-life of mean reversion quantifies how quickly a financial series returns to its long-term average. It measures the time a deviation from the mean needs to decay by 50%. A shorter half-life shows stronger mean reversion. A longer half-life suggests weaker mean reversion or a trending characteristic.
Traders use half-life to set optimal holding periods for mean reversion strategies. It informs position sizing and risk management. Without an estimated half-life, a mean reversion strategy operates without a clear exit signal for profitable trades.
Estimating Half-Life with Ornstein-Uhlenbeck Process
The Ornstein-Uhlenbeck (OU) process models mean-reverting behavior. Its discrete-time representation allows for parameter estimation from historical data. The OU process is:
$dX_t = \theta(\mu - X_t)dt + \sigma dW_t$
Here, $X_t$ is the price at time $t$. $\mu$ represents the long-term mean. $\theta$ is the speed of reversion. $\sigma$ denotes the volatility. $dW_t$ is a Wiener process.
The discrete form of the OU process simplifies estimation:
$X_{t+1} - X_t = \alpha + \beta X_t + \epsilon_t$
This resembles a linear regression. $X_{t+1} - X_t$ is the dependent variable. $X_t$ is the independent variable. $\epsilon_t$ is the error term.
From the regression coefficients, we derive $\theta$. The relationship is:
$\beta = -(1 - e^{-\theta \Delta t})$
Solving for $\theta$:
$\theta = -\frac{1}{\Delta t} \ln(1 + \beta)$
The half-life ($T_{1/2}$) is then calculated as:
$T_{1/2} = \frac{\ln(2)}{\theta}$
We select an appropriate $\Delta t$. If daily data is used, $\Delta t = 1$. The half-life will be in days.
Practical Estimation Example: SPY
Let's estimate the half-life for the S&P 500 ETF (SPY) using daily closing prices. We use data from January 1, 2010, to December 31, 2023.
First, download historical SPY data. Many platforms provide this, including Yahoo Finance or Quandl.
Calculate daily returns. The Ornstein-Uhlenbeck process models the price series directly, not returns. So, we use the actual closing prices.
Assume we have a pandas DataFrame named df_spy with a 'Close' column.
import pandas as pd
import numpy as np
import statsmodels.api as sm
# Assume df_spy is loaded with 'Close' prices
# For demonstration, let's create a dummy series:
dates = pd.to_datetime(pd.date_range(start='2010-01-01', end='2023-12-31', freq='B'))
np.random.seed(42)
prices = 100 + np.cumsum(np.random.randn(len(dates)) * 0.5 + 0.05)
df_spy = pd.DataFrame({'Close': prices}, index=dates)
# Calculate Xt+1 - Xt
df_spy['Price_Diff'] = df_spy['Close'].diff()
# Lag the Close price to get Xt
df_spy['Price_Lagged'] = df_spy['Close'].shift(1)
# Drop NaN values created by diff() and shift()
df_spy.dropna(inplace=True)
# Define dependent and independent variables
Y = df_spy['Price_Diff']
X = df_spy['Price_Lagged']
# Add a constant for the intercept (alpha)
X = sm.add_constant(X)
# Perform linear regression
model = sm.OLS(Y, X)
results = model.fit()
print(results.summary())
# Extract beta coefficient
beta = results.params['Price_Lagged']
print(f"Estimated beta: {beta}")
# Calculate theta
delta_t = 1 # Daily data
theta = -np.log(1 + beta) / delta_t
print(f"Estimated theta (speed of reversion): {theta}")
# Calculate half-life
half_life = np.log(2) / theta
print(f"Estimated half-life: {half_life:.2f} days")
import pandas as pd
import numpy as np
import statsmodels.api as sm
# Assume df_spy is loaded with 'Close' prices
# For demonstration, let's create a dummy series:
dates = pd.to_datetime(pd.date_range(start='2010-01-01', end='2023-12-31', freq='B'))
np.random.seed(42)
prices = 100 + np.cumsum(np.random.randn(len(dates)) * 0.5 + 0.05)
df_spy = pd.DataFrame({'Close': prices}, index=dates)
# Calculate Xt+1 - Xt
df_spy['Price_Diff'] = df_spy['Close'].diff()
# Lag the Close price to get Xt
df_spy['Price_Lagged'] = df_spy['Close'].shift(1)
# Drop NaN values created by diff() and shift()
df_spy.dropna(inplace=True)
# Define dependent and independent variables
Y = df_spy['Price_Diff']
X = df_spy['Price_Lagged']
# Add a constant for the intercept (alpha)
X = sm.add_constant(X)
# Perform linear regression
model = sm.OLS(Y, X)
results = model.fit()
print(results.summary())
# Extract beta coefficient
beta = results.params['Price_Lagged']
print(f"Estimated beta: {beta}")
# Calculate theta
delta_t = 1 # Daily data
theta = -np.log(1 + beta) / delta_t
print(f"Estimated theta (speed of reversion): {theta}")
# Calculate half-life
half_life = np.log(2) / theta
print(f"Estimated half-life: {half_life:.2f} days")
After running this code with actual SPY data, we might get results like:
- Estimated beta: -0.00015
- Estimated theta: 0.00015
- Estimated half-life: 4620.98 days
This half-life of over 4600 days (approximately 12.6 years) for SPY's raw price indicates very weak mean reversion. This aligns with the long-term upward trend of the stock market. A simple price series like SPY does not strongly mean revert over short to medium timeframes. This result highlights that raw prices of growing assets often trend, not mean-revert.
Applying to a Mean-Reverting Series: SPY-QQQ Spread
A more fitting application involves a pair of assets. Consider the spread between SPY and QQQ (Nasdaq 100 ETF). This spread often exhibits mean-reverting behavior.
Let's use the log ratio of SPY and QQQ prices from January 1, 2010, to December 31, 2023.
# Assume df_spy and df_qqq are loaded with 'Close' prices
# For demonstration, let's create dummy series for QQQ:
prices_qqq = 90 + np.cumsum(np.random.randn(len(dates)) * 0.6 + 0.07)
df_qqq = pd.DataFrame({'Close': prices_qqq}, index=dates)
# Align dataframes by date
df_combined = pd.DataFrame({'SPY': df_spy['Close'], 'QQQ': df_qqq['Close']}).dropna()
# Calculate the log ratio spread
df_combined['Spread'] = np.log(df_combined['SPY'] / df_combined['QQQ'])
# Calculate Xt+1 - Xt for the spread
df_combined['Spread_Diff'] = df_combined['Spread'].diff()
# Lag the Spread to get Xt
df_combined['Spread_Lagged'] = df_combined['Spread'].shift(1)
df_combined.dropna(inplace=True)
# Define dependent and independent variables for the spread
Y_spread = df_combined['Spread_Diff']
X_spread = df_combined['Spread_Lagged']
# Add a constant
X_spread = sm.add_constant(X_spread)
# Perform linear regression for the spread
model_spread = sm.OLS(Y_spread, X_spread)
results_spread = model_spread.fit()
print(results_spread.summary())
# Extract beta coefficient for the spread
beta_spread = results_spread.params['Spread_Lagged']
print(f"Estimated beta for spread: {beta_spread}")
# Calculate theta for the spread
theta_spread = -np.log(1 + beta_spread) / delta_t
print(f"Estimated theta for spread: {theta_spread}")
# Calculate half-life for the spread
half_life_spread = np.log(2) / theta_spread
print(f"Estimated half-life for spread: {half_life_spread:.2f} days")
# Assume df_spy and df_qqq are loaded with 'Close' prices
# For demonstration, let's create dummy series for QQQ:
prices_qqq = 90 + np.cumsum(np.random.randn(len(dates)) * 0.6 + 0.07)
df_qqq = pd.DataFrame({'Close': prices_qqq}, index=dates)
# Align dataframes by date
df_combined = pd.DataFrame({'SPY': df_spy['Close'], 'QQQ': df_qqq['Close']}).dropna()
# Calculate the log ratio spread
df_combined['Spread'] = np.log(df_combined['SPY'] / df_combined['QQQ'])
# Calculate Xt+1 - Xt for the spread
df_combined['Spread_Diff'] = df_combined['Spread'].diff()
# Lag the Spread to get Xt
df_combined['Spread_Lagged'] = df_combined['Spread'].shift(1)
df_combined.dropna(inplace=True)
# Define dependent and independent variables for the spread
Y_spread = df_combined['Spread_Diff']
X_spread = df_combined['Spread_Lagged']
# Add a constant
X_spread = sm.add_constant(X_spread)
# Perform linear regression for the spread
model_spread = sm.OLS(Y_spread, X_spread)
results_spread = model_spread.fit()
print(results_spread.summary())
# Extract beta coefficient for the spread
beta_spread = results_spread.params['Spread_Lagged']
print(f"Estimated beta for spread: {beta_spread}")
# Calculate theta for the spread
theta_spread = -np.log(1 + beta_spread) / delta_t
print(f"Estimated theta for spread: {theta_spread}")
# Calculate half-life for the spread
half_life_spread = np.log(2) / theta_spread
print(f"Estimated half-life for spread: {half_life_spread:.2f} days")
After running this with actual data, we might see results like:
- Estimated beta for spread: -0.015
- Estimated theta for spread: 0.0151
- Estimated half-life for spread: 45.90 days
A half-life of 45.90 days for the SPY-QQQ log ratio spread indicates a much stronger mean-reverting characteristic than the raw SPY price. This suggests that deviations from the spread's mean tend to revert by 50% within roughly 46 trading days.
Interpreting Half-Life for Strategy Design
The half-life provides important input for strategy design. A half-life of 46 days means a typical mean reversion trade on the SPY-QQQ spread should target an exit within this timeframe. Holding positions significantly longer than the half-life reduces the probability of mean reversion contributing to profit. The mean reversion effect diminishes over longer horizons.
Traders use this information to:
- Set profit targets: If a trade opens when the spread is 2 standard deviations from its mean, and the half-life is 46 days, the trader might target an exit when the spread returns to 1 standard deviation within that period.
- Manage risk: If a trade has not reverted after 2-3 half-lives, the mean reversion assumption might be invalid or the market regime has changed. This can trigger a stop-loss.
- Optimize rebalancing frequency: For portfolio-level mean reversion, assets with shorter half-lives might require more frequent rebalancing.
The Ornstein-Uhlenbeck process assumes stationarity. The SPY-QQQ spread is more likely to be stationo than individual SPY prices. Always test for stationarity using tests like Augmented Dickey-Fuller (ADF) before applying mean reversion models. A non-stationo series will yield a meaningless half-life.
This estimation provides a statistical basis for trade duration. Traders can adjust this based on market conditions, volatility, and specific strategy rules.
