The 3 AM Realization
At 3:14 AM on a Tuesday, you find it. The paper that promises everything: a novel alpha factor with a Sharpe ratio of 2.3, published in a peer-reviewed journal, backed by five years of out-of-sample testing. You read it twice. Then a third time. The mathematics are sound. The intuition is elegant. The results seem... achievable.
You spend the next six weeks trying to reproduce it. You request the same dataset. You implement the exact formulas from the paper. You run the backtest. The Sharpe ratio comes back as 0.4.
What went wrong?
This scenario plays out in quant teams around the world every week. Academic papers are written to communicate ideas, not to provide production-ready implementations. The gap between "readable math" and "runnable code" is filled with implicit knowledge, data selection decisions, transaction cost assumptions, and implementation choices that authors rarely document.
This article provides a systematic framework for bridging that gap. We will walk through the complete pipeline: how to read a quant paper critically, how to acquire the data you actually need versus the data the authors claim to use, how to structure your backtest to isolate the strategy's true signal from overfitting artifacts, and how to diagnose why your results diverge from the original.
Along the way, we will demonstrate each step with production-grade Python code using TickDB's API for data acquisition, because a reproducibility pipeline is only as reliable as its data infrastructure.
Module 1: Reading Papers as Engineers
Most quant researchers read papers to understand the strategy. Reproducing a paper requires reading it to identify implementation requirements. These are different tasks, and they demand different reading strategies.
1.1 The First Pass: Identify the Signal Architecture
On your first read, your goal is to understand the strategy's core logic. Ask these questions:
- What is the input data? (Price? Volume? Order book? Macroeconomic?)
- What is the transformation? (Ranking? Normalization? Signal construction?)
- What is the output? (Alpha score? Ranking? Directional forecast?)
Do not get bogged down in the mathematical proofs. Those are important for understanding why the strategy works, but they are not what prevents you from reproducing the results.
1.2 The Second Pass: Catalog Every Data Dependency
This is where most reproducibility efforts fail. Authors describe their data in broad strokes ("we use daily US equity data from 2000 to 2020") but omit critical specifics. Your second pass must extract every data dependency with precision.
Create a data dependency table. For each data series the paper uses, document:
| Data field | Explicit in paper? | Assumptions needed |
|---|---|---|
| Adjusted close vs. unadjusted close | Explicit in some papers | Often implied, never stated |
| Split adjustments | Usually not mentioned | CRSP-adjusted vs. raw |
| Dividend adjustments | Varies widely | Price returns vs. total returns |
| Survivorship bias | Almost never discussed | Critical for equity long-short |
| Market capitalization | Sometimes used for weighting | Float-adjusted vs. total shares |
1.3 The Third Pass: Reverse-Engineer the Backtest Design
Academic papers optimize for readability, not for reproducible engineering. You need to reconstruct the backtest design from fragments. Key questions:
Universe definition: Which stocks? How many? What exclusion rules (financials, utilities, ADRs)?
Rebalancing frequency: Daily? Weekly? Monthly? And at what time of day?
Transaction cost model: Fixed commission? Percentage of notional? Spread cost? Most papers use 0.1% one-way as a placeholder, but this assumption alone can turn a profitable strategy into a breakeven one.
Long/short construction: Equal weight? Value-weighted? Factor-neutral? The long-short construction often accounts for more of the returns than the alpha signal itself.
Risk management: Are stops used? Position limits? Leverage constraints?
Module 2: Data Acquisition Infrastructure
With your data dependency table in hand, you can now build the data acquisition layer. This is where TickDB's API becomes essential: it provides 10+ years of cleaned, aligned US equity OHLCV data suitable for cross-cycle backtesting.
2.1 Setting Up the Data Pipeline
Production-grade data acquisition requires heartbeat, reconnection logic, rate-limit handling, and environment-variable-based authentication. Here is the foundational client:
import os
import time
import random
import logging
from typing import Optional
import requests
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class TickDBClient:
"""
Production-grade TickDB API client with:
- Exponential backoff + jitter on reconnection
- Rate-limit handling (code 3001 + Retry-After)
- Environment-variable-based authentication
- Timeout on all HTTP requests
"""
def __init__(self, api_key: Optional[str] = None):
self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
if not self.api_key:
raise ValueError(
"TickDB API key not found. Set TICKDB_API_KEY environment variable."
)
self.base_url = "https://api.tickdb.ai/v1"
self.max_retries = 5
self.base_delay = 1.0
self.max_delay = 32.0
def _request_with_retry(self, method: str, endpoint: str, **kwargs) -> dict:
"""
Execute HTTP request with exponential backoff, jitter, and rate-limit handling.
"""
timeout = kwargs.pop("timeout", (3.05, 10))
retry_count = 0
while retry_count <= self.max_retries:
try:
response = requests.request(
method=method,
url=f"{self.base_url}{endpoint}",
headers={
"X-API-Key": self.api_key,
"Content-Type": "application/json",
},
timeout=timeout,
**kwargs,
)
data = response.json() if response.content else {}
# Handle rate limiting
if data.get("code") == 3001:
retry_after = int(
response.headers.get("Retry-After", 5)
)
logger.warning(
f"Rate limited. Retrying after {retry_after}s."
)
time.sleep(retry_after)
continue
# Handle authentication errors
if data.get("code") in (1001, 1002):
raise ValueError(
f"Authentication error ({data.get('code')}): "
f"Invalid API key — check TICKDB_API_KEY env var."
)
# Handle symbol not found
if data.get("code") == 2002:
raise KeyError(
f"Symbol not found — verify via /v1/symbols/available."
)
if data.get("code") == 0:
return data.get("data", {})
else:
raise RuntimeError(
f"API error {data.get('code')}: {data.get('message')}"
)
except requests.exceptions.Timeout:
retry_count += 1
delay = min(self.base_delay * (2 ** retry_count), self.max_delay)
jitter = random.uniform(0, delay * 0.1)
logger.warning(
f"Request timeout. Retrying in {delay + jitter:.2f}s "
f"(attempt {retry_count}/{self.max_retries})."
)
time.sleep(delay + jitter)
except requests.exceptions.RequestException as e:
retry_count += 1
delay = min(self.base_delay * (2 ** retry_count), self.max_delay)
jitter = random.uniform(0, delay * 0.1)
logger.warning(
f"Request failed: {e}. Retrying in {delay + jitter:.2f}s "
f"(attempt {retry_count}/{self.max_retries})."
)
time.sleep(delay + jitter)
raise RuntimeError(
f"Max retries ({self.max_retries}) exceeded for {endpoint}."
)
def get_kline(
self,
symbol: str,
interval: str = "1d",
limit: int = 500,
start_time: Optional[int] = None,
end_time: Optional[int] = None,
) -> list:
"""
Fetch historical OHLCV (kline) data for backtesting.
Args:
symbol: Trading symbol (e.g., "AAPL.US", "NVDA.US")
interval: Candle interval ("1d", "1h", "5m", etc.)
limit: Number of candles to fetch (max 1000 per request)
start_time: Unix timestamp (ms) for range start
end_time: Unix timestamp (ms) for range end
Returns:
List of OHLCV candles, sorted oldest to newest
"""
params = {"symbol": symbol, "interval": interval, "limit": limit}
if start_time:
params["start_time"] = start_time
if end_time:
params["end_time"] = end_time
return self._request_with_retry("GET", "/market/kline", params=params)
def get_symbols(self, market: str = "US") -> list:
"""
List available symbols for a given market.
Args:
market: Market code ("US", "HK", "CRYPTO", etc.)
Returns:
List of available trading symbols
"""
return self._request_with_retry(
"GET", f"/symbols/available", params={"market": market}
)
# ⚠️ Engineering warning:
# For high-frequency live trading systems, replace this synchronous client
# with an async implementation using aiohttp.
# The above is designed for backtesting and lower-frequency strategies.
2.2 Fetching the Backtest Universe
With the client in place, you can now construct your universe data. For a typical US equity strategy, you need to fetch a broad universe of stocks. The following function builds a daily OHLCV DataFrame for a list of symbols over a given date range:
import pandas as pd
from datetime import datetime, timedelta
from tqdm import tqdm
def fetch_universe_ohlcv(
client: TickDBClient,
symbols: list[str],
start_date: str,
end_date: str,
interval: str = "1d",
) -> pd.DataFrame:
"""
Fetch OHLCV data for a universe of symbols.
Args:
client: TickDBClient instance
symbols: List of tickers (e.g., ["AAPL.US", "MSFT.US"])
start_date: Start date string (YYYY-MM-DD)
end_date: End date string (YYYY-MM-DD)
interval: Candle interval
Returns:
DataFrame with columns: timestamp, symbol, open, high, low, close, volume
"""
start_ts = int(datetime.strptime(start_date, "%Y-%m-%d").timestamp() * 1000)
end_ts = int(datetime.strptime(end_date, "%Y-%m-%d").timestamp() * 1000)
all_candles = []
for symbol in tqdm(symbols, desc="Fetching OHLCV data"):
try:
candles = client.get_kline(
symbol=symbol,
interval=interval,
limit=1000,
start_time=start_ts,
end_time=end_ts,
)
if not candles:
continue
df = pd.DataFrame(candles)
df["symbol"] = symbol
df["timestamp"] = pd.to_datetime(df["t"], unit="ms")
df = df.rename(columns={
"o": "open",
"h": "high",
"l": "low",
"c": "close",
"v": "volume",
})
all_candles.append(df[["timestamp", "symbol", "open", "high", "low", "close", "volume"]])
except Exception as e:
logger.warning(f"Failed to fetch {symbol}: {e}")
continue
if not all_candles:
return pd.DataFrame()
combined = pd.concat(all_candles, ignore_index=True)
combined = combined.sort_values(["symbol", "timestamp"]).reset_index(drop=True)
logger.info(
f"Fetched {len(combined):,} candles for {combined['symbol'].nunique()} symbols "
f"from {start_date} to {end_date}."
)
return combined
# Usage example:
# client = TickDBClient()
# universe_data = fetch_universe_ohlcv(
# client=client,
# symbols=["AAPL.US", "MSFT.US", "GOOGL.US", "AMZN.US", "META.US"],
# start_date="2015-01-01",
# end_date="2024-12-31",
# )
Module 3: Signal Construction and Factor Implementation
With clean data in hand, you can now implement the strategy itself. This is where precision matters most: a single normalization step, one missing adjustment, or a data alignment error will corrupt your entire backtest.
3.1 Reconstructing the Factor from the Paper
Academic papers typically describe factors using mathematical notation that maps directly to pandas operations. The challenge is translating that notation into code that handles edge cases gracefully.
Consider a paper that describes a momentum factor as follows: "We rank stocks by their 12-month return, skipping the most recent month, and go long the top decile and short the bottom decile, rebalancing monthly."
This description requires several implementation decisions that the paper leaves implicit:
import numpy as np
def compute_momentum_factor(
prices: pd.DataFrame,
lookback_months: int = 12,
skip_months: int = 1,
min_lookback_days: int = 200,
) -> pd.DataFrame:
"""
Compute the classic momentum factor: 12-month return, skipping the last month.
The paper specifies: rank stocks by their cumulative return over months T-12
to T-1, excluding the most recent month (T).
Implementation decisions (not specified in the paper):
- We use trading days as a proxy for months (252 trading days/year)
- We require at least min_lookback_days of data to compute the factor
- Returns are computed as log returns for continuous compounding
- NaN values are excluded from ranking (stocks without sufficient history)
Args:
prices: DataFrame with columns [timestamp, symbol, close]
lookback_months: Number of months for momentum lookback (converted to trading days)
skip_months: Number of months to skip at the end (momentum skip period)
min_lookback_days: Minimum trading days required for a valid factor value
Returns:
DataFrame with columns [timestamp, symbol, momentum_factor]
"""
df = prices.copy()
df = df.sort_values(["symbol", "timestamp"])
# Convert month specifications to trading day approximations
lookback_days = int(lookback_months * (252 / 12))
skip_days = int(skip_months * (252 / 12))
# Compute log returns
df["log_return"] = df.groupby("symbol")["close"].transform(
lambda x: np.log(x / x.shift(1))
)
# Compute cumulative return over the lookback period
# Shift by (lookback_days + skip_days) to exclude the skip period
def rolling_cumulative_return(series, window: int) -> pd.Series:
return series.rolling(window=window, min_periods=int(window * 0.75)).sum()
df["momentum_raw"] = df.groupby("symbol")["log_return"].transform(
lambda x: rolling_cumulative_return(x, lookback_days - skip_days).shift(skip_days)
)
# Drop stocks with insufficient history
df = df.groupby("symbol").apply(
lambda g: g[g.groupby("symbol").cumcount() >= lookback_days]
).reset_index(drop=True)
# Rank within each timestamp (cross-sectional ranking)
# Rank 1 = best momentum (highest return)
df["momentum_factor"] = df.groupby("timestamp")["momentum_raw"].rank(
pct=True, ascending=True
)
result = df[["timestamp", "symbol", "momentum_factor"]].copy()
return result
def construct_long_short_portfolios(
factor_df: pd.DataFrame,
top_pct: float = 0.1,
bottom_pct: float = 0.1,
) -> pd.DataFrame:
"""
Construct long-short portfolios from a factor ranking.
The paper specifies: go long the top decile, short the bottom decile.
This implementation generalizes to any top/bottom percentile.
Args:
factor_df: DataFrame with columns [timestamp, symbol, factor_value]
top_pct: Fraction of universe to go long (e.g., 0.1 = top decile)
bottom_pct: Fraction of universe to go short (e.g., 0.1 = bottom decile)
Returns:
DataFrame with columns [timestamp, symbol, position]
where position = 1 (long), -1 (short), or 0 (no position)
"""
df = factor_df.copy()
def assign_positions(group):
n = len(group)
top_n = int(np.ceil(n * top_pct))
bottom_n = int(np.ceil(n * bottom_pct))
# Sort by factor value descending (highest factor = best)
group = group.sort_values("factor_value", ascending=False)
group["position"] = 0
group.iloc[:top_n, group.columns.get_loc("position")] = 1
group.iloc[-bottom_n:, group.columns.get_loc("position")] = -1
return group
result = df.groupby("timestamp", group_keys=False).apply(assign_positions)
return result.reset_index(drop=True)
3.2 Handling the Data Adjustments the Paper Forgot
This is where your results will most likely diverge from the paper's reported performance. Three common issues:
Survivorship bias: The paper's universe likely consisted only of stocks that survived to the end of the sample period. Your backtest must include delisted stocks (or at minimum, account for survivorship by using a point-in-time universe).
Adjustment methodology: CRSP-adjusted returns differ from unadjusted returns by 1–3% annually on average. Using the wrong adjustment will shift your factor returns by a meaningful amount.
Data cleaning: Academic datasets often exclude penny stocks, stocks below a price threshold, or stocks with insufficient liquidity. If the paper does not specify these filters, you need to decide whether to apply them conservatively (fewer exclusions, closer to reality for a live strategy) or match the paper exactly (more exclusions, for accurate comparison).
def apply_universe_filters(
prices: pd.DataFrame,
min_price: float = 5.0,
min_volume: float = 100_000,
min_volume_days_pct: float = 0.8,
) -> pd.DataFrame:
"""
Apply conservative liquidity and price filters.
These filters are not specified in most academic papers.
They represent common-sense risk management for live deployment.
Args:
prices: DataFrame with columns [timestamp, symbol, close, volume]
min_price: Minimum average price over the lookback window
min_volume: Minimum average daily volume
min_volume_days_pct: Fraction of days that must meet the min_volume threshold
Returns:
Filtered DataFrame
"""
df = prices.copy()
# Compute trailing 30-day average price and volume per symbol
df["avg_price_30d"] = df.groupby("symbol")["close"].transform(
lambda x: x.rolling(window=30, min_periods=20).mean()
)
df["avg_volume_30d"] = df.groupby("symbol")["volume"].transform(
lambda x: x.rolling(window=30, min_periods=20).mean()
)
# Compute fraction of days above volume threshold over trailing 60 days
def volume_compliance(group):
trailing = group["volume"].shift(1).rolling(window=60, min_periods=30)
above_threshold = trailing > min_volume
return above_threshold.mean()
df["volume_compliance"] = df.groupby("symbol").apply(
lambda g: pd.Series(
volume_compliance(g).values, index=g.index
)
).reset_index(level=0, drop=True)
# Apply filters
mask = (
(df["avg_price_30d"] >= min_price) &
(df["avg_volume_30d"] >= min_volume) &
(df["volume_compliance"] >= min_volume_days_pct)
)
filtered = df[mask].copy()
logger.info(
f"Universe filter: {df['symbol'].nunique()} symbols before → "
f"{filtered['symbol'].nunique()} symbols after. "
f"Rows removed: {len(df) - len(filtered):,} "
f"({100 * (1 - len(filtered)/len(df)):.1f}%)."
)
return filtered
Module 4: Backtesting Engine with Transaction Cost Modeling
A backtest without realistic transaction costs is not a strategy evaluation. It is a fantasy. Your backtest engine must model the costs that exist in live trading, not the idealized costs that make the paper look good.
4.1 The Transaction Cost Model
Transaction costs in equity markets consist of three components:
| Component | Typical magnitude | How to model |
|---|---|---|
| Commission | $0.005–$0.005 per share | Fixed per-share charge |
| Spread cost | 0.5–2 bps for liquid stocks | Half-spread × position size |
| Market impact | Non-linear, depends on order size | Assumed proportional for small orders |
For a liquid US equity strategy trading the top/bottom decile of a 500-stock universe, a reasonable cost model is:
class TransactionCostModel:
"""
Three-component transaction cost model.
Model components:
1. Commission: $0.005 per share (e.g., Interactive Brokers tier)
2. Half-spread: 0.5 bps for liquid stocks (configurable)
3. Market impact: 0.25 bps (conservative for decile-portfolio turnover)
Total one-way cost ≈ 1.0 bp for a $100M portfolio with $10M per side
"""
def __init__(
self,
commission_per_share: float = 0.005,
half_spread_bps: float = 0.5,
market_impact_bps: float = 0.25,
):
self.commission_per_share = commission_per_share
self.half_spread_bps = half_spread_bps
self.market_impact_bps = market_impact_bps
def one_way_cost_bps(self, price: float, shares: int) -> float:
"""
Compute one-way transaction cost in basis points.
Args:
price: Execution price per share
shares: Number of shares traded
Returns:
One-way cost in basis points
"""
notional = price * shares
commission_cost = shares * self.commission_per_share
spread_cost = notional * (self.half_spread_bps / 10_000)
impact_cost = notional * (self.market_impact_bps / 10_000)
total_cost = commission_cost + spread_cost + impact_cost
cost_bps = (total_cost / notional) * 10_000
return cost_bps
def round_trip_cost_bps(self, price: float, shares: int) -> float:
"""Round-trip cost (entry + exit) in basis points."""
return 2 * self.one_way_cost_bps(price, shares)
def run_backtest(
prices: pd.DataFrame,
positions: pd.DataFrame,
tcm: TransactionCostModel,
initial_capital: float = 10_000_000,
) -> pd.DataFrame:
"""
Run a backtest with transaction costs and portfolio-level P&L.
Args:
prices: DataFrame with columns [timestamp, symbol, close, volume]
positions: DataFrame with columns [timestamp, symbol, position]
(position = 1 for long, -1 for short, 0 for no position)
tcm: TransactionCostModel instance
initial_capital: Starting portfolio value
Returns:
DataFrame with daily portfolio returns and cumulative performance
"""
# Align prices and positions by timestamp
prices_sorted = prices.sort_values(["timestamp", "symbol"])
positions_sorted = positions.sort_values(["timestamp", "symbol"])
# Compute daily returns
prices_sorted["daily_return"] = prices_sorted.groupby("symbol")["close"].pct_change()
# Merge positions with daily returns
merged = pd.merge(
prices_sorted[["timestamp", "symbol", "close", "daily_return"]],
positions_sorted[["timestamp", "symbol", "position"]],
on=["timestamp", "symbol"],
how="left",
)
merged["position"] = merged["position"].fillna(0)
# Detect position changes (trades) and compute transaction costs
merged = merged.sort_values(["symbol", "timestamp"])
merged["prev_position"] = merged.groupby("symbol")["position"].shift(1).fillna(0)
merged["trade"] = merged["position"] - merged["prev_position"]
merged["trade_abs"] = merged["trade"].abs()
# Estimate position value per symbol (equal-weight across active positions)
daily_stats = merged.groupby("timestamp").agg(
n_positions=("position", lambda x: (x != 0).sum()),
total_trades=("trade_abs", "sum"),
avg_price=("close", "mean"),
).reset_index()
# Assume equal allocation across active positions
merged = merged.merge(daily_stats, on="timestamp")
merged["position_value"] = (initial_capital * 0.9) / merged["n_positions"].replace(0, 1)
merged["shares"] = (merged["position_value"] / merged["close"]).astype(int)
# Compute transaction costs per row
merged["trade_cost_bps"] = merged.apply(
lambda row: tcm.round_trip_cost_bps(row["close"], row["shares"])
if row["trade_abs"] > 0 else 0,
axis=1,
)
# Daily P&L: return from price movement + transaction cost drag
merged["daily_pnl_pct"] = merged["position"] * merged["daily_return"]
merged["cost_drag_pct"] = -merged["trade_abs"] * merged["trade_cost_bps"] / 10_000
# Aggregate to portfolio level
portfolio = merged.groupby("timestamp").agg(
gross_return_pct=("daily_pnl_pct", "sum"),
cost_drag_pct=("cost_drag_pct", "sum"),
n_positions=("n_positions", "first"),
).reset_index()
portfolio["net_return_pct"] = portfolio["gross_return_pct"] + portfolio["cost_drag_pct"]
portfolio["cumulative_return"] = (1 + portfolio["net_return_pct"]).cumprod() - 1
portfolio["portfolio_value"] = initial_capital * (1 + portfolio["cumulative_return"])
return portfolio
4.2 Walk-Forward Validation
Academic papers typically report in-sample results. A production-quality backtest requires out-of-sample validation. The walk-forward approach trains on a rolling window and tests on the subsequent period:
def walk_forward_backtest(
prices: pd.DataFrame,
factor_func: callable,
train_months: int = 36,
test_months: int = 12,
rebalance_days: int = 21,
) -> pd.DataFrame:
"""
Walk-forward backtest with rolling train/test windows.
Args:
prices: DataFrame with columns [timestamp, symbol, close, volume]
factor_func: Function that takes (train_data, prices) and returns factor DataFrame
train_months: Training window in months (trading days = months * 21)
test_months: Testing window in months
rebalance_days: Rebalance frequency within the test window
Returns:
DataFrame with out-of-sample performance metrics
"""
train_days = train_months * 21
test_days = test_months * 21
step_days = test_days # Non-overlapping test windows
all_results = []
# Get sorted timestamps
timestamps = sorted(prices["timestamp"].unique())
start_idx = train_days
while start_idx + test_days <= len(timestamps):
train_end = start_idx
test_end = min(start_idx + test_days, len(timestamps))
train_data = prices[prices["timestamp"] < timestamps[train_end]].copy()
test_data = prices[
(prices["timestamp"] >= timestamps[train_end]) &
(prices["timestamp"] < timestamps[test_end])
].copy()
if len(train_data) < train_days * 0.8:
start_idx += step_days
continue
# Compute factors on training data
factor_df = factor_func(train_data, prices)
# Get positions at test start
test_start_positions = factor_df[
factor_df["timestamp"] == timestamps[train_end]
].copy()
test_positions = test_start_positions.assign(
timestamp=test_data["timestamp"].values[0]
)
# Run backtest
tcm = TransactionCostModel()
result = run_backtest(test_data, test_positions, tcm)
if len(result) > 0:
result["train_end"] = timestamps[train_end]
result["test_end"] = timestamps[test_end]
all_results.append(result)
start_idx += step_days
if not all_results:
return pd.DataFrame()
combined = pd.concat(all_results, ignore_index=True)
return combined
Module 5: Results Comparison and Gap Analysis
Once you have your backtest results, you need to compare them to the paper's reported results and diagnose any discrepancies. This is both a debugging exercise and a learning exercise: the gaps often reveal hidden assumptions in the paper's design.
5.1 Standard Comparison Metrics
def compute_performance_metrics(returns: pd.Series) -> dict:
"""
Compute a comprehensive set of performance metrics.
These metrics allow direct comparison with the paper's reported results.
"""
n = len(returns)
if n < 20:
return {"error": "Insufficient data points"}
excess_returns = returns
annualized_return = excess_returns.mean() * 252
annualized_vol = excess_returns.std() * np.sqrt(252)
sharpe = annualized_return / annualized_vol if annualized_vol > 0 else 0
# Downside deviation (Sortino denominator)
downside_returns = excess_returns[excess_returns < 0]
downside_vol = (
downside_returns.std() * np.sqrt(252) if len(downside_returns) > 0 else 0
)
sortino = annualized_return / downside_vol if downside_vol > 0 else 0
# Max drawdown
cumulative = (1 + excess_returns).cumprod()
running_max = cumulative.expanding().max()
drawdown = (cumulative - running_max) / running_max
max_drawdown = drawdown.min()
# Win rate
win_rate = (excess_returns > 0).mean()
# Profit factor
gross_profits = excess_returns[excess_returns > 0].sum()
gross_losses = abs(excess_returns[excess_returns < 0].sum())
profit_factor = gross_profits / gross_losses if gross_losses > 0 else np.inf
# Average win / average loss
avg_win = excess_returns[excess_returns > 0].mean() if len(excess_returns[excess_returns > 0]) > 0 else 0
avg_loss = excess_returns[excess_returns < 0].mean() if len(excess_returns[excess_returns < 0]) > 0 else 0
return {
"annualized_return": annualized_return,
"annualized_volatility": annualized_vol,
"sharpe_ratio": sharpe,
"sortino_ratio": sortino,
"max_drawdown": max_drawdown,
"win_rate": win_rate,
"profit_factor": profit_factor,
"avg_win": avg_win,
"avg_loss": avg_loss,
"n_observations": n,
"n_trading_days": n,
}
def compare_with_paper(
your_metrics: dict,
paper_metrics: dict,
tolerance_bps: dict = None,
) -> pd.DataFrame:
"""
Compare your backtest results against the paper's reported metrics.
Args:
your_metrics: Dict of metrics from compute_performance_metrics()
paper_metrics: Dict of reported metrics from the paper
tolerance_bps: Allowed difference in basis points per metric
Returns:
DataFrame showing comparison with pass/fail status
"""
if tolerance_bps is None:
# Default tolerance: 20 bps annualized return, 0.1 Sharpe
tolerance_bps = {
"annualized_return": 0.02, # 2% absolute difference
"sharpe_ratio": 0.2,
"max_drawdown": 0.05,
}
comparison = []
for metric, your_value in your_metrics.items():
if metric == "error":
continue
paper_value = paper_metrics.get(metric, None)
if paper_value is None:
continue
if isinstance(your_value, (int, float)) and isinstance(paper_value, (int, float)):
diff = your_value - paper_value
tolerance = tolerance_bps.get(metric, 0.1)
passed = abs(diff) <= tolerance
comparison.append({
"metric": metric,
"your_value": round(your_value, 4),
"paper_value": round(paper_value, 4),
"difference": round(diff, 4),
"status": "✅ Pass" if passed else "❌ Gap",
})
return pd.DataFrame(comparison)
5.2 Diagnostic Checklist for Diverging Results
When your Sharpe ratio comes back as 0.4 instead of 2.3, systematically eliminate the following sources of divergence:
| Check | How to verify | Common fix |
|---|---|---|
| Data source mismatch | Compare your return distribution to the paper's | Use the same adjusted close data the paper references |
| Lookback window definition | Print the dates of the first and last factor signals | Check whether the paper counts calendar days or trading days |
| Survivorship bias | Compare your universe size to the paper's | Acquire delisted stock data or use a point-in-time survivorship-free dataset |
| Cost assumptions | Run the backtest with zero costs | If results still diverge, costs are not the cause |
| Long-short construction | Check whether your long and short legs are equally weighted | Many papers implicitly use a 50/50 long/short split; verify this |
| Exclusion filters | Count stocks excluded by your price/volume filters | Try running without filters to match the paper's implicit universe |
| Rebalancing timing | Check whether the paper rebalances at open, close, or some other time | This alone can shift annual returns by 2–4% |
Module 6: Closing — The Honest Answer
There is a version of this story that ends with you building a profitable, live strategy. There is also a version that ends with you publishing a critical review of the paper's methodology. Both are valid outcomes. The difference lies not in your coding ability, but in your relationship with the data.
Reproducibility is not about matching a number. It is about understanding the system that produced the number. Every time you run a backtest that disagrees with the paper, you have an opportunity to learn something — about the strategy, about the data, about your own assumptions.
The pipeline we have built in this article is designed to make that learning systematic. Use TickDB's historical OHLCV data to establish a clean, reproducible data foundation. Use the factor construction framework to implement strategies with full transparency about your assumptions. Use the transaction cost model to ground your results in economic reality. Use the walk-forward validation to test whether the strategy generalizes beyond the sample period.
And when your Sharpe ratio comes back as 0.4 instead of 2.3, run the diagnostic checklist before you abandon the strategy. The gap is not always a failure of the strategy. Sometimes it is a failure of the paper's design. Sometimes it is an opportunity to build something better.
Next Steps
If you want to reproduce academic strategies with institutional-grade data, sign up at tickdb.ai for a free API key and access 10+ years of cleaned US equity OHLCV data for backtesting.
If you need to compare multiple data sources or validate your results across vendors, TickDB provides a unified API for cross-market data acquisition, covering US equities, HK equities, and crypto.
If you are working with a quant team, reach out to enterprise@tickdb.ai for institutional plans with higher rate limits, dedicated support, and extended historical coverage.
If you use AI coding assistants, search for and install the tickdb-market-data SKILL in your AI tool's marketplace to get TickDB API integration directly in your development environment.
This article does not constitute investment advice. Backtested results are based on historical simulation and do not guarantee future performance. Transaction costs, slippage, and market impact are approximated and may differ from actual live trading conditions. Always conduct out-of-sample validation and paper trading before deploying any strategy with real capital.