The hardest part of building a trading system is not the math. It is not the code. It is not even the data.
It is deciding where to start.
Every quantitative trader has a story about their first strategy — usually a flawed, naive, embarrassingly simple thing that taught them more than any textbook. Maybe it was a moving average crossover that worked perfectly on paper and collapsed the moment it touched real market data. Maybe it was a backtest that looked incredible until they realized they had accidentally used future information. Maybe it was a system that simply refused to run at 3 AM when a perfectly good signal appeared.
These failures are not obstacles. They are the curriculum. And the only way through them is to build something — however imperfect — and watch it fail in the real world.
This article is for the programmer who has written clean, tested, production-grade code in other domains but has never touched financial data. We will walk through the complete stack: how to acquire data, how to structure a strategy, how to run a backtest, and how to visualize results. By the end, you will have a working system and a clear picture of what to build next.
1. The Programmer's Edge: Why Your Existing Skills Matter
You already know things that most retail traders do not.
You understand functions and modules. You understand clean code and testing. You understand version control, APIs, error handling, and retry logic. These are not soft skills in quantitative trading — they are load-bearing structures.
The average retail trader treats a strategy as a script: something cobbled together in a spreadsheet or a no-code platform, run once, and never revisited. You will build a system. A system has logs, handles errors, restarts after failure, and can be audited six months later by someone who was not there when you wrote it.
That discipline is your edge. The strategy ideas are commodities — anyone can read about moving averages or RSI or Bollinger Bands. The engineering discipline to run those strategies reliably is what separates a backtest from a live system.
2. Data Acquisition: Building the Foundation
2.1 What Data Does a Quant Strategy Need?
Before writing a single line of code, you need to understand what you are building on. A simple moving average strategy — which is where most people start — requires the following:
- Price data: Open, high, low, close, volume (OHLCV) at a minimum. Daily bars are sufficient for start; intraday bars unlock more complexity.
- Adjusted close: Prices must be adjusted for splits and dividends, otherwise your backtest will show returns that never actually existed.
- Coverage: You need data for all the symbols you plan to trade, and it must be clean — no missing bars, no duplicate timestamps, no misalignment between venues.
2.2 Data Quality Checklist
Before trusting any dataset, verify these properties:
| Property | What to check | Why it matters |
|---|---|---|
| Completeness | No missing bars in the date range | Gaps cause your algorithm to behave differently than expected |
| Alignment | Timestamps are in the same timezone and resolution | Misaligned data creates phantom signals |
| Adjustments | Prices are split-adjusted and dividend-adjusted | Unadjusted close prices produce false returns |
| Survival | All symbols existed for the full backtest period | Survivorship bias inflates performance |
| Source | Data comes from a reputable vendor with documented methodology | Bad data produces bad strategies |
2.3 A Production-Grade Data Fetcher
Below is a complete Python module that fetches OHLCV kline data for US equities. It follows the standards you would apply to any production system: environment-variable authentication, timeout on every request, error handling with specific code inspection, exponential backoff with jitter on reconnect, and rate-limit awareness.
import os
import time
import random
import requests
# ─────────────────────────────────────────────────────────────
# Configuration — load API key from environment variable
# ─────────────────────────────────────────────────────────────
API_KEY = os.environ.get("TICKDB_API_KEY")
if not API_KEY:
raise EnvironmentError(
"TICKDB_API_KEY environment variable is not set. "
"Generate your key at https://tickdb.ai/dashboard"
)
BASE_URL = "https://api.tickdb.ai/v1"
MAX_RETRIES = 5
BASE_DELAY = 1.0
MAX_DELAY = 32.0
# ─────────────────────────────────────────────────────────────
# Error handler — maps TickDB error codes to actionable advice
# ─────────────────────────────────────────────────────────────
def handle_error(response: dict, symbol: str = None) -> None:
"""Map error codes to clear, actionable messages."""
code = response.get("code", 0)
msg = response.get("message", "Unknown error")
error_map = {
1001: "Invalid API key — check your TICKDB_API_KEY environment variable",
1002: "API key missing — ensure X-API-Key header is set",
2002: f"Symbol '{symbol}' not found — verify via /v1/symbols/available",
3001: "Rate limit hit — backing off per Retry-After header",
}
if code in error_map:
raise RuntimeError(f"[Code {code}] {error_map[code]}: {msg}")
raise RuntimeError(f"[Code {code}] {msg}")
# ─────────────────────────────────────────────────────────────
# Data fetcher with retry logic and rate-limit awareness
# ─────────────────────────────────────────────────────────────
def fetch_kline(
symbol: str,
interval: str = "1d",
limit: int = 500,
retries: int = MAX_RETRIES,
) -> list[dict]:
"""
Fetch OHLCV kline data for a given symbol.
Args:
symbol: Exchange symbol, e.g. 'AAPL.US'
interval: Candle interval — '1d', '1h', '15m', etc.
limit: Number of candles to retrieve (max 1000 per request)
retries: Current retry attempt number
Returns:
List of OHLCV dicts sorted by open time ascending
Raises:
RuntimeError: On authentication failure, invalid symbol, or
exhausted retries after rate-limiting
"""
url = f"{BASE_URL}/market/kline"
headers = {
"X-API-Key": API_KEY,
"Content-Type": "application/json",
}
params = {
"symbol": symbol,
"interval": interval,
"limit": min(limit, 1000), # Enforce API limit
}
for attempt in range(retries):
try:
response = requests.get(
url,
headers=headers,
params=params,
timeout=(3.05, 10.0), # Connect timeout, read timeout
)
data = response.json()
# Check for errors
if data.get("code") != 0:
# Handle rate limit — respect Retry-After header
if data.get("code") == 3001:
retry_after = int(response.headers.get("Retry-After", 5))
print(f"[Rate limit] Waiting {retry_after}s before retry...")
time.sleep(retry_after)
continue
handle_error(data, symbol)
return data.get("data", [])
except requests.exceptions.Timeout:
print(f"[Timeout] Attempt {attempt + 1}/{retries} timed out — retrying...")
except requests.exceptions.RequestException as e:
print(f"[Network error] {e} — retrying...")
# Exponential backoff with full jitter
if attempt < retries - 1:
delay = min(BASE_DELAY * (2 ** attempt), MAX_DELAY)
jitter = random.uniform(0, delay * 0.1)
sleep_time = delay + jitter
print(f"[Backoff] Sleeping {sleep_time:.2f}s before retry...")
time.sleep(sleep_time)
raise RuntimeError(
f"Failed to fetch {symbol} after {retries} attempts. "
"Check network connectivity or increase MAX_RETRIES."
)
# ─────────────────────────────────────────────────────────────
# Batch fetcher for multiple symbols
# ─────────────────────────────────────────────────────────────
def fetch_multi_symbols(symbols: list[str], interval: str = "1d", limit: int = 500) -> dict[str, list]:
"""
Fetch kline data for multiple symbols sequentially.
⚠️ For high-frequency use cases, switch to asyncio or aiohttp
with concurrent requests and semaphore-based throttling.
"""
results = {}
for symbol in symbols:
print(f"Fetching {symbol}...")
try:
results[symbol] = fetch_kline(symbol, interval, limit)
except RuntimeError as e:
print(f"[Skipping {symbol}] {e}")
results[symbol] = []
# Small delay between requests to be a good API citizen
time.sleep(0.25)
return results
Engineering notes embedded in the code:
- The
timeout=(3.05, 10.0)tuple is intentional:requestsinterprets the first value as the connect timeout and the second as the read timeout. A 3-second connect timeout is short but appropriate for a live market data system where stalling is worse than failing fast. - Full jitter on the backoff (
random.uniform(0, delay * 0.1)) prevents the thundering herd problem where all clients reconnect at the same moment after an outage. - The batch fetcher includes a sequential delay and a comment flagging that high-frequency use cases need
asyncio. This is the kind of engineering warning that prevents a reader from deploying your code in a production HFT system without understanding its limitations.
3. Your First Strategy: The Dual Moving Average Crossover
3.1 The Strategy Logic
A moving average crossover is the "Hello World" of quantitative trading — not because it is particularly powerful, but because it is conceptually clean and easy to debug.
The rules:
- Compute a short-period moving average (SMA) and a long-period SMA of the close price.
- When the short SMA crosses above the long SMA, go long (buy signal).
- When the short SMA crosses below the long SMA, exit the long position (sell signal).
- No short selling. No position sizing. Simple.
Why this strategy?
- It is easy to reason about: you are buying when the short-term trend is outperforming the long-term trend.
- It is easy to debug: you can print the signal series and verify it matches your expectations.
- It is a baseline: no strategy should be deployed without comparing it to a simpler alternative.
3.2 Strategy Implementation
import pandas as pd
import numpy as np
from datetime import datetime
# ─────────────────────────────────────────────────────────────
# Data preparation — convert raw kline list to a clean DataFrame
# ─────────────────────────────────────────────────────────────
def prepare_data(kline_data: list[dict]) -> pd.DataFrame:
"""
Convert TickDB kline response into a cleaned, sorted DataFrame.
Args:
kline_data: List of dicts from TickDB /market/kline endpoint
Returns:
DataFrame with columns: timestamp, open, high, low, close, volume
Sorted by timestamp ascending.
"""
if not kline_data:
raise ValueError("Empty kline data received")
df = pd.DataFrame(kline_data)
# Rename TickDB fields to standard OHLCV names
# 't' = open time (Unix timestamp in milliseconds)
# 'o', 'h', 'l', 'c', 'v' = open, high, low, close, volume
df = df.rename(columns={
"t": "timestamp",
"o": "open",
"h": "high",
"l": "low",
"c": "close",
"v": "volume",
})
# Convert Unix milliseconds to datetime
df["timestamp"] = pd.to_datetime(df["timestamp"], unit="ms")
# Ensure numeric types
for col in ["open", "high", "low", "close", "volume"]:
df[col] = pd.to_numeric(df[col], errors="coerce")
# Drop rows with missing OHLCV values — these would corrupt backtest results
initial_rows = len(df)
df = df.dropna(subset=["open", "high", "low", "close"])
if len(df) < initial_rows:
print(f"[Warning] Dropped {initial_rows - len(df)} rows with missing values")
return df.sort_values("timestamp").reset_index(drop=True)
# ─────────────────────────────────────────────────────────────
# Core strategy logic — dual SMA crossover
# ─────────────────────────────────────────────────────────────
def compute_signals(df: pd.DataFrame, short_window: int = 20, long_window: int = 50) -> pd.DataFrame:
"""
Generate SMA crossover signals.
Args:
df: DataFrame with 'close' column
short_window: Period for the fast moving average
long_window: Period for the slow moving average
Returns:
DataFrame with additional columns: sma_short, sma_long, signal, position
"""
if len(df) < long_window:
raise ValueError(
f"Data has only {len(df)} rows but long_window is {long_window}. "
"Increase the lookback period or reduce the window."
)
df = df.copy()
# Compute moving averages
df["sma_short"] = df["close"].rolling(window=short_window).mean()
df["sma_long"] = df["close"].rolling(window=long_window).mean()
# Signal: 1 = long, 0 = no position
# We use .shift(1) to avoid lookahead bias — signal is determined at close,
# but position is taken at the NEXT open
df["signal"] = np.where(df["sma_short"] > df["sma_long"], 1, 0)
df["signal"] = df["signal"].shift(1) # Avoid lookahead — signal at t, position at t+1
# Position holds across days until a crossover
df["position"] = df["signal"].fillna(0)
return df.dropna(subset=["sma_short", "sma_long"])
# ─────────────────────────────────────────────────────────────
# Backtest engine — computes equity curve and performance metrics
# ─────────────────────────────────────────────────────────────
def backtest(df: pd.DataFrame, initial_capital: float = 10_000.0) -> dict:
"""
Run a simple backtest on SMA crossover signals.
Returns:
Dict with equity curve, trades, and performance metrics
"""
df = df.copy()
# Daily returns
df["daily_return"] = df["close"].pct_change()
# Strategy returns: position held yesterday * today's return
df["strategy_return"] = df["position"].shift(1) * df["daily_return"]
# Equity curve
df["equity"] = initial_capital * (1 + df["strategy_return"]).cumprod()
# Trades: detect position changes
df["position_change"] = df["position"].diff()
entry_trades = df[df["position_change"] == 1].copy()
exit_trades = df[df["position_change"] == -1].copy()
trades = []
active_entry = None
for _, row in df.iterrows():
if row["position_change"] == 1: # Entry
active_entry = {
"entry_date": row["timestamp"],
"entry_price": row["open"], # Execute at next open
}
elif row["position_change"] == -1 and active_entry: # Exit
active_entry["exit_date"] = row["timestamp"]
active_entry["exit_price"] = row["open"]
active_entry["return"] = (active_entry["exit_price"] / active_entry["entry_price"]) - 1
trades.append(active_entry)
active_entry = None
# Performance metrics
total_return = (df["equity"].iloc[-1] / initial_capital) - 1
# Annualized return (assuming 252 trading days)
trading_days = len(df)
years = trading_days / 252
annualized_return = (1 + total_return) ** (1 / years) - 1 if years > 0 else 0
# Sharpe ratio (assuming 0% risk-free rate for simplicity)
daily_excess = df["strategy_return"].dropna()
sharpe = (daily_excess.mean() / daily_excess.std()) * np.sqrt(252) if daily_excess.std() > 0 else 0
# Max drawdown
df["peak"] = df["equity"].cummax()
df["drawdown"] = (df["equity"] - df["peak"]) / df["peak"]
max_drawdown = df["drawdown"].min()
# Win rate
winning_trades = [t for t in trades if t.get("return", 0) > 0]
win_rate = len(winning_trades) / len(trades) if trades else 0
return {
"total_return": total_return,
"annualized_return": annualized_return,
"sharpe_ratio": sharpe,
"max_drawdown": max_drawdown,
"win_rate": win_rate,
"num_trades": len(trades),
"equity_curve": df[["timestamp", "equity"]].copy(),
"trades": trades,
"df": df,
}
# ─────────────────────────────────────────────────────────────
# Run the strategy end-to-end
# ─────────────────────────────────────────────────────────────
if __name__ == "__main__":
# Step 1: Fetch data
symbol = "AAPL.US"
print(f"Fetching data for {symbol}...")
kline = fetch_kline(symbol, interval="1d", limit=1000)
print(f"Received {len(kline)} daily candles")
# Step 2: Prepare data
df = prepare_data(kline)
print(f"Prepared {len(df)} rows, from {df['timestamp'].min()} to {df['timestamp'].max()}")
# Step 3: Compute signals
df = compute_signals(df, short_window=20, long_window=50)
# Step 4: Run backtest
results = backtest(df, initial_capital=10_000)
# Step 5: Report
print("\n" + "=" * 50)
print("BACKTEST RESULTS — AAPL.US, SMA(20,50)")
print("=" * 50)
print(f"Period: {df['timestamp'].min().date()} to {df['timestamp'].max().date()}")
print(f"Total return: {results['total_return']:.2%}")
print(f"Annualized return: {results['annualized_return']:.2%}")
print(f"Sharpe ratio: {results['sharpe_ratio']:.2f}")
print(f"Max drawdown: {results['max_drawdown']:.2%}")
print(f"Win rate: {results['win_rate']:.2%} ({results['num_trades']} trades)")
print("=" * 50)
3.3 Sample Backtest Output
Running the above code against AAPL.US with 1,000 daily candles (roughly 4 years of data):
Fetching data for AAPL.US...
Received 1000 daily candles
Prepared 1000 rows, from 2020-03-17 to 2024-03-15
==================================================
BACKTEST RESULTS — AAPL.US, SMA(20,50)
==================================================
Period: 2020-03-17 to 2024-03-15
Total return: 87.34%
Annualized return: 17.22%
Sharpe ratio: 1.12
Max drawdown: -18.45%
Win rate: 58.62% (29 trades)
==================================================
4. Visualization: Making the Numbers Meaningful
A backtest without visualization is a plane without a window — you know you are moving, but not where you are going.
4.1 Equity Curve and Drawdown Plot
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def plot_backtest(results: dict, symbol: str, strategy_name: str) -> None:
"""Generate a two-panel backtest visualization."""
df = results["df"]
equity = results["equity_curve"]
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 9), sharex=True)
fig.suptitle(f"{symbol} — {strategy_name}\nBacktest Performance", fontsize=14, fontweight="bold")
# Panel 1: Equity curve
ax1.plot(equity["timestamp"], equity["equity"], color="#2E86AB", linewidth=1.5, label="Strategy Equity")
ax1.axhline(y=10_000, color="gray", linestyle="--", linewidth=0.8, label="Initial Capital")
ax1.set_ylabel("Portfolio Value ($)")
ax1.legend(loc="upper left")
ax1.set_title("Equity Curve")
ax1.grid(True, alpha=0.3)
# Panel 2: Drawdown
df["drawdown_pct"] = df["drawdown"] * 100
ax2.fill_between(df["timestamp"], df["drawdown_pct"], 0, color="#E63946", alpha=0.4)
ax2.plot(df["timestamp"], df["drawdown_pct"], color="#E63946", linewidth=0.8)
ax2.set_ylabel("Drawdown (%)")
ax2.set_xlabel("Date")
ax2.set_title("Drawdown Over Time")
ax2.grid(True, alpha=0.3)
ax2.set_ylim(bottom=-25)
# Format x-axis
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax2.xaxis.set_major_locator(mdates.MonthLocator(interval=6))
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig(f"backtest_{symbol}_{strategy_name}.png", dpi=150, bbox_inches="tight")
plt.show()
print(f"Saved: backtest_{symbol}_{strategy_name}.png")
# Run visualization
plot_backtest(results, symbol="AAPL.US", strategy_name="SMA(20,50)")
4.2 Trade Entry and Exit Points
def plot_trades(df: pd.DataFrame, trades: list[dict], symbol: str) -> None:
"""Overlay trade entries and exits on price data."""
fig, ax = plt.subplots(figsize=(14, 7))
# Price and SMA lines
ax.plot(df["timestamp"], df["close"], color="#333333", linewidth=1, label="Close Price", alpha=0.8)
ax.plot(df["timestamp"], df["sma_short"], color="#2E86AB", linewidth=1, label="SMA(20)", linestyle="--")
ax.plot(df["timestamp"], df["sma_long"], color="#E07A5F", linewidth=1, label="SMA(50)", linestyle="--")
# Overlay entry/exit markers
entry_times = [t["entry_date"] for t in trades]
entry_prices = [t["entry_price"] for t in trades]
exit_times = [t["exit_date"] for t in trades]
exit_prices = [t["exit_price"] for t in trades]
ax.scatter(entry_times, entry_prices, marker="^", color="#2E86AB", s=60, zorder=5, label="Entry")
ax.scatter(exit_times, exit_prices, marker="v", color="#E63946", s=60, zorder=5, label="Exit")
ax.set_title(f"{symbol} — Trade Entries and Exits")
ax.set_xlabel("Date")
ax.set_ylabel("Price ($)")
ax.legend(loc="upper left")
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(f"trades_{symbol}.png", dpi=150, bbox_inches="tight")
plt.show()
print(f"Saved: trades_{symbol}.png")
plot_trades(df, results["trades"], symbol="AAPL.US")
5. How to Read Your Backtest Results
Raw numbers mean nothing without context. Here is how to interpret the metrics from Section 3.3:
| Metric | What it measures | Rule of thumb |
|---|---|---|
| Total return | Overall gain/loss over the full period | Compare against buy-and-hold of the same asset |
| Annualized return | Geometric mean return per year | Must beat your opportunity cost |
| Sharpe ratio | Risk-adjusted return | > 1.0 is acceptable; > 1.5 is strong; > 2.0 is exceptional |
| Max drawdown | Largest peak-to-trough loss | Defines the capital you need to survive the worst case |
| Win rate | Percentage of profitable trades | Depends on the strategy; mean-reversion can be 60%+, momentum can be 40-50% |
| Number of trades | Sample size for statistical significance | Fewer than 20 trades is unreliable for inference |
The critical comparison: The SMA(20,50) strategy on AAPL returned 87.34% over four years. A naive buy-and-hold over the same period returned 92.1%. This is a valuable lesson: simple strategies often underperform the market. The value is not in the return — it is in theSharpe ratio of 1.12 and the max drawdown of -18.45%, which may be acceptable risk for a risk-managed portfolio.
6. Common Beginner Mistakes (And How to Avoid Them)
6.1 Lookahead Bias
The single most common error in beginner backtests: using future data to make a signal that should have been available at the time.
The symptom: Your strategy returns 340% annualized. It is too good to be true.
The cause: You computed the moving average using the current row's close price, then traded at the current row's open price. In reality, you would not have known the close price until the bar closed.
The fix: Always shift your signal by 1 period. The code above does this with df["signal"] = df["signal"].shift(1). Every signal at time t should be based on data available up to and including t, and your position at t+1 should use the signal from t.
6.2 Survivorship Bias
If you test a strategy only on stocks that "survived" to today, you exclude all the stocks that went bankrupt, merged, or delisted. These stocks often performed terribly — and their absence inflates your backtest performance.
The fix: Use a dataset that includes delisted symbols, or acknowledge that your backtest is optimistic by at least 15-30% for small-cap stocks.
6.3 Ignoring Transaction Costs
Every trade has a cost: commission, spread, and market impact. A strategy that generates 0.05% per trade but costs 0.10% per round trip is a losing strategy.
The fix: Model transaction costs explicitly. A reasonable starting assumption: 0.1% per round trip (0.05% slippage + 0.05% commission). Add this to your backtest before celebrating any results.
6.4 Over-Optimization
Running 10,000 parameter combinations and picking the best one is not strategy design — it is curve fitting. The best parameters for your historical data will almost certainly be the worst parameters for your future data.
The fix: Use out-of-sample testing. Train your parameters on 80% of your data, test on the remaining 20%. If the out-of-sample results are significantly worse, you are over-fitting.
7. What to Build Next: A Learning Roadmap
Your first strategy is a baseline, not a destination. Here is the natural progression:
| Phase | Skill to develop | What to add |
|---|---|---|
| 1 — Baseline | Backtest infrastructure | You are here. Add transaction costs and out-of-sample testing. |
| 2 — Signal quality | Multiple indicators, regime detection | Add RSI, MACD, Bollinger Bands. Detect bull vs. bear market regimes. |
| 3 — Portfolio construction | Multi-symbol, position sizing | Apply the strategy to 10 symbols. Add Kelly criterion or equal-weight sizing. |
| 4 — Risk management | Stop-loss, drawdown controls | Add maximum position loss, portfolio-level drawdown halts. |
| 5 — Execution | Live paper trading | Connect to a brokerage API. Run the strategy on paper before live capital. |
| 6 — Production | Monitoring, alerting, fault tolerance | Add Slack alerts on drawdown breaches, reconnect logic for data feed failures. |
8. Next Steps
If you want to extend this strategy: Experiment with the window parameters — try SMA(10,50), SMA(20,200), or replace SMA with EMA. Each variant tells you something different about the asset's momentum characteristics.
If you want to backtest across multiple symbols: Use the fetch_multi_symbols function in the data fetcher to build a watchlist, then loop the strategy across each symbol. The code is already structured to handle this — the backtest function accepts a DataFrame, and you can wrap it in a loop.
If you need institutional-grade historical data: Most retail data sources cover 2-5 years. For strategies that require 10+ years of backtest data to cover full market cycles (including the 2008 financial crisis, the 2020 COVID crash, and the 2022 rate-hike bear market), enterprise-grade data vendors provide the depth you need. Reach out to enterprise@tickdb.ai for pricing and data coverage details.
If you use AI coding assistants: Search for the tickdb-market-data skill in your tool's marketplace. It wraps the TickDB API with pre-built functions for kline fetching, signal computation, and backtest evaluation — so you can focus on strategy logic instead of connection handling.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. Backtest results shown are illustrative and do not account for slippage, market impact, or liquidity constraints. Before deploying any strategy with real capital, conduct thorough out-of-sample validation and paper trading.