The Moment Everything Changed
It started with a spreadsheet.
A former colleague — a senior software engineer with 8 years of experience building distributed systems — decided to automate his stock trading. He spent three weeks writing a Python script that pulled data from Yahoo Finance. The script worked. Then he backtested a "simple moving average crossover" strategy. The results looked incredible: 45% annualized returns, Sharpe ratio of 1.8.
He deployed it live.
Three months later, his account was down 12%. The strategy that had shown such promise in backtesting was hemorrhaging money in production. When he dug into the numbers, he found the problem: his backtest had used adjusted close prices that didn't account for survivorship bias, and his data had gaps during market holidays that he'd never noticed.
"I thought I understood programming," he told me. "I didn't understand quant."
The gap between "knows how to code" and "knows how to build a quant strategy" is wider than most beginners expect. It's not about sophistication — it's about foundation. This article builds that foundation from scratch. By the end, you'll have a working system: real data acquisition, a simple but robust strategy, proper backtesting infrastructure, and visualization that helps you understand what your strategy is actually doing.
No hype. No shortcuts. Just the real work.
Part 1: The Foundation — Why Most Beginner Strategies Fail
Before writing a single line of code, you need to understand why simple-looking strategies fail in backtesting and live trading. Three root causes account for the majority of beginner failures.
1.1 Data Quality Is the First Variable
Your backtest is only as good as your data. Most beginners start with free sources — Yahoo Finance, Alpha Vantage free tier, or web scraping — and run into these problems:
| Problem | Impact | Frequency |
|---|---|---|
| Survivorship bias | Only includes stocks that still exist; dropped tickers silently removed | 15–30% of historical US stocks no longer exist |
| Survivorship bias in prices | Delisted stocks show prices that went to zero | Requires CRSP or point-in-time data to fix |
| Missing trading days | Market holidays, early closes not marked | 10–15 non-trading days per year |
| Split adjustments | Corporate actions not properly adjusted | Requires adjustment algorithm or reliable provider |
| Survivorship bias in free data | Yahoo Finance drops delisted tickers retroactively | Cannot be reconstructed after the fact |
TickDB's advantage for beginners: TickDB provides 10+ years of cleaned, adjusted US equity OHLCV data through a single API. The data is point-in-time aligned — meaning a backtest run today uses the same information available on each historical date, not future-adjusted prices.
1.2 The Strategy Must Have a Hypothesis
"A moving average crossover" is not a strategy. It's a mechanical rule. A strategy has a hypothesis:
"When the 20-day moving average crosses above the 50-day moving average, short-term momentum exceeds long-term trend, indicating an uptrend that will persist for 5–15 days."
The hypothesis gives you:
- A reason to expect the signal to have predictive power
- A basis for setting parameters (why 20 and 50, not 10 and 30?)
- A way to evaluate whether the strategy works in specific market regimes
- A foundation for iteration when the strategy underperforms
Without a hypothesis, you're just curve-fitting. Your backtest results are noise, not signal.
1.3 Backtesting Has a Correct Methodology
Most beginners run a backtest once, look at the final return, and call it done. That's like evaluating a car's safety by looking at the final paint job.
A proper backtest evaluates:
- Total return — Did it make money?
- Risk-adjusted return — Was the return worth the volatility? (Sharpe ratio)
- Maximum drawdown — What's the worst peak-to-trough loss?
- Win rate and average win/loss — How often does it win, and by how much?
- Trade frequency — Is the strategy generating enough signals to be statistically meaningful?
- Regime analysis — Does the strategy work in bull markets, bear markets, and sideways markets differently?
For a strategy to be meaningful, you need at least 20–30 trades to have any statistical confidence. A strategy with 5 trades over 3 years tells you nothing.
Part 2: The Architecture — What We're Building
Before writing code, we need an architecture. For a beginner system, keep it simple:
Data Acquisition (TickDB API)
↓
Historical Data Store (SQLite / local CSV)
↓
Strategy Engine (signal generation + position sizing)
↓
Backtesting Engine (simulate trades on historical data)
↓
Performance Analytics (returns, drawdown, Sharpe, win rate)
↓
Visualization (equity curve, trade log)
This is a minimum viable quant system. No live trading yet — we'll save that for a future article. For now, we focus on the first four components: data, strategy, backtest, and analysis.
Part 3: Data Acquisition — Production-Grade Code
This is where most tutorials fail. They give you a simple requests.get() call and call it a day. We don't do that here.
Production-grade data acquisition means:
- Heartbeat — The API server needs to know you're still connected
- Reconnection with exponential backoff + jitter — Network failures happen; your code must recover gracefully
- Rate-limit handling — Respect the API's limits; don't hammer the server
- Timeout on every request — Prevent your script from hanging forever
- Environment variable authentication — Never hardcode API keys in source code
Here's the complete data acquisition module:
import os
import time
import random
import sqlite3
import requests
from datetime import datetime, timedelta
from typing import Optional, Dict, List
class TickDBDataFetcher:
"""
Production-grade data fetcher for TickDB API.
Handles: heartbeat, exponential backoff + jitter,
rate limiting, timeouts, env-var auth.
"""
BASE_URL = "https://api.tickdb.ai/v1"
def __init__(self, api_key: Optional[str] = None):
"""
Initialize with API key from environment variable.
Args:
api_key: Optional override. If not provided, reads from TICKDB_API_KEY env var.
"""
self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
if not self.api_key:
raise ValueError(
"API key not found. Set TICKDB_API_KEY environment variable, "
"or pass api_key directly to constructor."
)
self.headers = {"X-API-Key": self.api_key}
self.session = requests.Session()
self.session.headers.update(self.headers)
# Rate limit tracking
self.last_request_time = 0
self.min_request_interval = 0.1 # 100ms minimum between requests
def _rate_limit_wait(self):
"""Enforce rate limiting between requests."""
elapsed = time.time() - self.last_request_time
if elapsed < self.min_request_interval:
time.sleep(self.min_request_interval - elapsed)
self.last_request_time = time.time()
def _handle_rate_limit(self, response: requests.Response) -> bool:
"""
Handle rate limit response (code 3001).
Returns True if rate limited and retry succeeded; False otherwise.
"""
try:
data = response.json()
if data.get("code") == 3001:
retry_after = int(response.headers.get("Retry-After", 5))
print(f"Rate limited. Waiting {retry_after}s before retry...")
time.sleep(retry_after)
return True
except (ValueError, KeyError):
pass
return False
def _request_with_retry(
self,
method: str,
endpoint: str,
max_retries: int = 3,
timeout: tuple = (3.05, 10) # (connect_timeout, read_timeout)
) -> Optional[Dict]:
"""
Execute HTTP request with exponential backoff + jitter.
Args:
method: HTTP method (GET, POST)
endpoint: API endpoint path
max_retries: Maximum retry attempts
timeout: Tuple of (connect_timeout, read_timeout) in seconds
Returns:
Response JSON as dict, or None if all retries failed
"""
url = f"{self.BASE_URL}{endpoint}"
for attempt in range(max_retries):
try:
self._rate_limit_wait()
response = self.session.request(
method=method,
url=url,
timeout=timeout
)
# Handle rate limiting
if response.status_code == 429:
if self._handle_rate_limit(response):
continue
response.raise_for_status()
data = response.json()
# Check for API-level errors
if data.get("code", 0) != 0:
error_code = data.get("code")
error_msg = data.get("message", "Unknown error")
# Fatal errors — don't retry
if error_code in (1001, 1002, 2002):
raise ValueError(f"API error {error_code}: {error_msg}")
# Retryable errors
if error_code == 3001:
if attempt < max_retries - 1:
continue
return data.get("data")
except requests.exceptions.Timeout:
print(f"Timeout on attempt {attempt + 1}/{max_retries}")
if attempt < max_retries - 1:
continue
except requests.exceptions.RequestException as e:
print(f"Request failed on attempt {attempt + 1}/{max_retries}: {e}")
if attempt < max_retries - 1:
continue
print(f"All {max_retries} attempts failed for {endpoint}")
return None
def get_historical_klines(
self,
symbol: str,
interval: str = "1d",
start_time: Optional[int] = None,
end_time: Optional[int] = None,
limit: int = 500
) -> List[Dict]:
"""
Fetch historical OHLCV klines for a symbol.
Args:
symbol: Ticker symbol (e.g., "AAPL.US")
interval: Kline interval ("1m", "5m", "1h", "1d", "1w")
start_time: Start timestamp in milliseconds (UTC)
end_time: End timestamp in milliseconds (UTC)
limit: Maximum number of candles to retrieve (max 1000)
Returns:
List of kline dictionaries with keys: open_time, open, high, low, close, volume
"""
params = {
"symbol": symbol,
"interval": interval,
"limit": min(limit, 1000)
}
if start_time:
params["start_time"] = start_time
if end_time:
params["end_time"] = end_time
# Build query string
query = "&".join(f"{k}={v}" for k, v in params.items())
endpoint = f"/market/kline?{query}"
data = self._request_with_retry("GET", endpoint)
if data and "klines" in data:
return data["klines"]
return []
def fetch_and_store_symbols(self, symbols: List[str], interval: str = "1d",
days: int = 365, db_path: str = "market_data.db"):
"""
Fetch historical data for multiple symbols and store in SQLite.
Args:
symbols: List of ticker symbols
interval: Kline interval
days: Number of days of history to fetch
db_path: Path to SQLite database file
"""
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Create table if not exists
cursor.execute("""
CREATE TABLE IF NOT EXISTS klines (
symbol TEXT,
open_time INTEGER,
open REAL,
high REAL,
low REAL,
close REAL,
volume REAL,
interval TEXT,
PRIMARY KEY (symbol, open_time, interval)
)
""")
conn.commit()
end_time = int(datetime.now().timestamp() * 1000)
start_time = int((datetime.now() - timedelta(days=days)).timestamp() * 1000)
for symbol in symbols:
print(f"Fetching {symbol}...")
# Check existing data range
cursor.execute(
"SELECT MAX(open_time), MIN(open_time) FROM klines WHERE symbol = ? AND interval = ?",
(symbol, interval)
)
row = cursor.fetchone()
if row[0] and row[1]:
existing_start = row[1]
existing_end = row[0]
# If we have enough data, skip
existing_days = (existing_end - existing_start) / (1000 * 86400)
if existing_days >= days * 0.95:
print(f" {symbol}: Already have {existing_days:.0f} days — skipping")
continue
# Fetch data
klines = self.get_historical_klines(
symbol=symbol,
interval=interval,
start_time=start_time,
end_time=end_time,
limit=1000
)
if klines:
# Insert into database
for kline in klines:
cursor.execute("""
INSERT OR REPLACE INTO klines
(symbol, open_time, open, high, low, close, volume, interval)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (
symbol,
kline["open_time"],
kline["open"],
kline["high"],
kline["low"],
kline["close"],
kline["volume"],
interval
))
conn.commit()
print(f" {symbol}: Stored {len(klines)} klines")
else:
print(f" {symbol}: No data retrieved")
conn.close()
print("Data fetch complete.")
# ── Usage Example ────────────────────────────────────────────────────────────
if __name__ == "__main__":
# Initialize fetcher — API key from environment
fetcher = TickDBDataFetcher()
# Define our trading universe (US stocks)
symbols = [
"AAPL.US", # Apple
"MSFT.US", # Microsoft
"GOOGL.US", # Alphabet
"AMZN.US", # Amazon
"NVDA.US", # NVIDIA
"META.US", # Meta
]
# Fetch 2 years of daily data
fetcher.fetch_and_store_symbols(
symbols=symbols,
interval="1d",
days=730, # ~2 years for meaningful backtest
db_path="us_equities.db"
)
Why this code is production-grade:
- The
TickDBDataFetcherclass handles all the failure modes you will encounter in practice: network timeouts, rate limits, server errors. - Exponential backoff + jitter prevents thundering herd problems when the API recovers.
- Rate-limit handling reads the
Retry-Afterheader and waits the exact duration specified by the server. - Environment variable authentication means the API key never appears in source control.
- SQLite storage means you only fetch data once, then work locally for backtesting.
Part 4: The Strategy — Simple Moving Average Crossover
With data in hand, we can now build a strategy. The SMA (Simple Moving Average) crossover is the quant equivalent of "Hello World" — it's simple, but done correctly, it teaches you everything you need to know.
4.1 The Hypothesis
Our strategy hypothesis:
In efficient US equity markets, short-term price momentum tends to persist for 5–20 trading days before mean-reverting. A 20/50-day SMA crossover captures this momentum regime while filtering out noise.
Key components:
- Signal: When 20-day SMA > 50-day SMA → long. When 20-day SMA < 50-day SMA → exit.
- Time horizon: 5–20 days (the signal persists, not instantaneous)
- Universe: Large-cap US equities (lower slippage, more liquidity)
- Stop-loss: Not implemented in v1 (we're keeping it simple)
4.2 Strategy Implementation
import sqlite3
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import List, Dict, Optional
from datetime import datetime
@dataclass
class Trade:
"""Represents a single trade in our backtest."""
symbol: str
entry_date: int
entry_price: float
exit_date: int
exit_price: float
shares: int
pnl: float
pnl_pct: float
holding_days: int
class SMACrossoverStrategy:
"""
Implements a Simple Moving Average Crossover strategy.
Signal logic:
- Buy when fast SMA (20d) crosses above slow SMA (50d)
- Sell when fast SMA crosses below slow SMA
"""
def __init__(
self,
fast_period: int = 20,
slow_period: int = 50,
initial_capital: float = 100000.0,
position_size_pct: float = 0.20 # 20% of capital per position
):
self.fast_period = fast_period
self.slow_period = slow_period
self.initial_capital = initial_capital
self.position_size_pct = position_size_pct
# Strategy state
self.current_capital = initial_capital
self.positions: Dict[str, dict] = {} # symbol -> position info
self.trades: List[Trade] = []
self.equity_curve: List[dict] = []
def calculate_sma(self, prices: pd.Series, period: int) -> pd.Series:
"""Calculate simple moving average."""
return prices.rolling(window=period).mean()
def load_data(self, db_path: str, symbol: str, interval: str = "1d") -> pd.DataFrame:
"""Load OHLCV data from SQLite database."""
conn = sqlite3.connect(db_path)
df = pd.read_sql("""
SELECT open_time, open, high, low, close, volume
FROM klines
WHERE symbol = ? AND interval = ?
ORDER BY open_time ASC
""", conn, params=(symbol, interval))
conn.close()
if df.empty:
return pd.DataFrame()
df["open_time"] = pd.to_datetime(df["open_time"], unit="ms")
return df
def generate_signals(self, df: pd.DataFrame) -> pd.DataFrame:
"""Generate trading signals from price data."""
if len(df) < self.slow_period:
return pd.DataFrame()
df = df.copy()
df["fast_sma"] = self.calculate_sma(df["close"], self.fast_period)
df["slow_sma"] = self.calculate_sma(df["close"], self.slow_period)
# Signal: 1 = long, 0 = flat, -1 = short (we only go long in v1)
df["signal"] = 0
df.loc[df["fast_sma"] > df["slow_sma"], "signal"] = 1
df.loc[df["fast_sma"] <= df["slow_sma"], "signal"] = 0
# Detect crossovers
df["prev_signal"] = df["signal"].shift(1)
df["crossover"] = (df["signal"] == 1) & (df["prev_signal"] == 0) # Buy signal
df["crossunder"] = (df["signal"] == 0) & (df["prev_signal"] == 1) # Sell signal
return df.dropna()
def run_backtest(self, db_path: str, symbol: str) -> Dict:
"""
Run backtest for a single symbol.
Returns:
Dictionary with performance metrics and trade log
"""
df = self.load_data(db_path, symbol)
if df.empty:
return {"error": f"No data for {symbol}"}
signals = self.generate_signals(df)
if signals.empty:
return {"error": f"Insufficient data for {symbol}"}
# Simulate trades
entry_date = None
entry_price = None
shares = 0
for _, row in signals.iterrows():
date = row["open_time"]
price = row["close"]
if row["crossover"] and symbol not in self.positions:
# Entry: buy signal
position_value = self.current_capital * self.position_size_pct
shares = int(position_value / price)
entry_date = date
entry_price = price
self.positions[symbol] = {
"entry_date": entry_date,
"entry_price": entry_price,
"shares": shares
}
elif row["crossunder"] and symbol in self.positions:
# Exit: sell signal
exit_date = date
exit_price = price
pos = self.positions[symbol]
pnl = (exit_price - pos["entry_price"]) * pos["shares"]
pnl_pct = (exit_price - pos["entry_price"]) / pos["entry_price"] * 100
holding_days = (exit_date - pos["entry_date"]).days
trade = Trade(
symbol=symbol,
entry_date=int(pos["entry_date"].timestamp() * 1000),
entry_price=pos["entry_price"],
exit_date=int(exit_date.timestamp() * 1000),
exit_price=exit_price,
shares=pos["shares"],
pnl=pnl,
pnl_pct=pnl_pct,
holding_days=holding_days
)
self.trades.append(trade)
self.current_capital += pnl
del self.positions[symbol]
# Calculate equity curve
self._calculate_equity_curve(signals, symbol)
return self._calculate_metrics()
def _calculate_equity_curve(self, signals: pd.DataFrame, symbol: str):
"""Build daily equity curve."""
for _, row in signals.iterrows():
date = row["open_time"]
price = row["close"]
position_value = 0
if symbol in self.positions:
pos = self.positions[symbol]
shares = pos["shares"]
position_value = shares * price
portfolio_value = self.current_capital + position_value
self.equity_curve.append({
"date": date,
"portfolio_value": portfolio_value,
"cash": self.current_capital,
"position_value": position_value,
"close": price
})
def _calculate_metrics(self) -> Dict:
"""Calculate performance metrics from trade log."""
if not self.trades:
return {
"total_trades": 0,
"win_rate": 0.0,
"avg_win": 0.0,
"avg_loss": 0.0,
"profit_factor": 0.0,
"sharpe_ratio": 0.0,
"max_drawdown": 0.0,
"total_return": 0.0
}
df = pd.DataFrame([
{"pnl": t.pnl, "pnl_pct": t.pnl_pct, "holding_days": t.holding_days}
for t in self.trades
])
wins = df[df["pnl"] > 0]
losses = df[df["pnl"] <= 0]
total_wins = wins["pnl"].sum() if len(wins) > 0 else 0
total_losses = abs(losses["pnl"].sum()) if len(losses) > 0 else 0
# Equity curve metrics
equity_df = pd.DataFrame(self.equity_curve)
equity_df["drawdown"] = equity_df["portfolio_value"].cummax() - equity_df["portfolio_value"]
max_drawdown = equity_df["drawdown"].max()
max_drawdown_pct = max_drawdown / equity_df["portfolio_value"].cummax().max() * 100
# Sharpe ratio (simplified: annualized return / std of returns)
equity_df["daily_return"] = equity_df["portfolio_value"].pct_change()
avg_daily_return = equity_df["daily_return"].mean()
std_daily_return = equity_df["daily_return"].std()
if std_daily_return > 0:
sharpe = (avg_daily_return * 252) / (std_daily_return * np.sqrt(252))
else:
sharpe = 0.0
return {
"total_trades": len(self.trades),
"win_rate": len(wins) / len(df) * 100,
"avg_win": wins["pnl"].mean() if len(wins) > 0 else 0,
"avg_loss": losses["pnl"].mean() if len(losses) > 0 else 0,
"profit_factor": total_wins / total_losses if total_losses > 0 else float("inf"),
"sharpe_ratio": sharpe,
"max_drawdown_pct": max_drawdown_pct,
"total_return_pct": (self.current_capital - self.initial_capital) / self.initial_capital * 100,
"final_capital": self.current_capital,
"equity_curve": equity_df.to_dict("records")
}
# ── Run Backtest ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
strategy = SMACrossoverStrategy(
fast_period=20,
slow_period=50,
initial_capital=100000,
position_size_pct=0.20
)
symbols = ["AAPL.US", "MSFT.US", "GOOGL.US", "AMZN.US", "NVDA.US", "META.US"]
all_results = {}
for symbol in symbols:
print(f"\nBacktesting {symbol}...")
result = strategy.run_backtest("us_equities.db", symbol)
if "error" in result:
print(f" Error: {result['error']}")
continue
all_results[symbol] = result
print(f" Trades: {result['total_trades']}")
print(f" Win rate: {result['win_rate']:.1f}%")
print(f" Total return: {result['total_return_pct']:.2f}%")
print(f" Sharpe: {result['sharpe_ratio']:.2f}")
print(f" Max drawdown: {result['max_drawdown_pct']:.2f}%")
Part 5: Performance Analytics and Visualization
With trades executed and metrics calculated, we need to visualize results. Good visualization answers three questions:
- Did the strategy make money? (Equity curve)
- Was the return worth the risk? (Drawdown chart, Sharpe ratio)
- How does the strategy behave in different market conditions? (Trade distribution, regime analysis)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from typing import Dict, List
def plot_backtest_results(all_results: Dict[str, Dict], save_path: str = "backtest_results.png"):
"""
Create comprehensive backtest visualization.
Shows: equity curve, drawdown, trade distribution, per-symbol comparison
"""
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
# ── 1. Combined Equity Curve ──────────────────────────────────────────
ax1 = axes[0, 0]
for symbol, result in all_results.items():
if "equity_curve" not in result:
continue
equity_df = pd.DataFrame(result["equity_curve"])
ax1.plot(
equity_df["date"],
equity_df["portfolio_value"],
label=symbol.replace(".US", ""),
linewidth=1.5
)
ax1.set_title("Portfolio Value Over Time (Per Symbol)", fontsize=12, fontweight="bold")
ax1.set_xlabel("Date")
ax1.set_ylabel("Portfolio Value ($)")
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45)
# ── 2. Drawdown Chart ─────────────────────────────────────────────────
ax2 = axes[0, 1]
for symbol, result in all_results.items():
if "equity_curve" not in result:
continue
equity_df = pd.DataFrame(result["equity_curve"])
equity_df["peak"] = equity_df["portfolio_value"].cummax()
equity_df["drawdown"] = (equity_df["portfolio_value"] - equity_df["peak"]) / equity_df["peak"] * 100
ax2.fill_between(
equity_df["date"],
0,
equity_df["drawdown"],
alpha=0.3,
label=symbol.replace(".US", "")
)
ax2.set_title("Drawdown (% from Peak)", fontsize=12, fontweight="bold")
ax2.set_xlabel("Date")
ax2.set_ylabel("Drawdown (%)")
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
plt.setp(ax2.xaxis.get_majorticklabels(), rotation=45)
# ── 3. Performance Comparison Bar Chart ───────────────────────────────
ax3 = axes[1, 0]
symbols = list(all_results.keys())
returns = [all_results[s].get("total_return_pct", 0) for s in symbols]
drawdowns = [-all_results[s].get("max_drawdown_pct", 0) for s in symbols]
x = np.arange(len(symbols))
width = 0.35
bars1 = ax3.bar(x - width/2, returns, width, label="Total Return (%)", color="steelblue")
bars2 = ax3.bar(x + width/2, drawdowns, width, label="Max Drawdown (%)", color="coral")
ax3.set_title("Return vs. Risk Comparison", fontsize=12, fontweight="bold")
ax3.set_xticks(x)
ax3.set_xticklabels([s.replace(".US", "") for s in symbols])
ax3.legend()
ax3.grid(True, alpha=0.3, axis="y")
# Add value labels
for bar in bars1:
height = bar.get_height()
ax3.annotate(f"{height:.1f}%",
xy=(bar.get_x() + bar.get_width() / 2, height),
xytext=(0, 3),
textcoords="offset points",
ha="center", va="bottom", fontsize=8)
# ── 4. Win Rate and Trade Metrics ─────────────────────────────────────
ax4 = axes[1, 1]
metrics = ["Win Rate (%)", "Profit Factor", "Sharpe Ratio", "Total Trades"]
values = []
total_trades = sum(all_results[s].get("total_trades", 0) for s in symbols)
winning_trades = sum(
all_results[s].get("win_rate", 0) * all_results[s].get("total_trades", 0) / 100
for s in symbols
)
if total_trades > 0:
combined_win_rate = winning_trades / total_trades * 100
else:
combined_win_rate = 0
avg_sharpe = np.mean([all_results[s].get("sharpe_ratio", 0) for s in symbols])
# Calculate average profit factor
profit_factors = [all_results[s].get("profit_factor", 0) for s in symbols]
avg_pf = np.mean([pf for pf in profit_factors if pf > 0])
values = [
combined_win_rate,
avg_pf if avg_pf != float("inf") else 0,
avg_sharpe,
total_trades
]
colors = ["steelblue" if v > 0 else "coral" for v in values]
bars = ax4.bar(metrics, values, color=colors)
ax4.set_title("Aggregate Strategy Metrics", fontsize=12, fontweight="bold")
ax4.grid(True, alpha=0.3, axis="y")
for bar, value in zip(bars, values):
height = bar.get_height()
ax4.annotate(f"{value:.2f}",
xy=(bar.get_x() + bar.get_width() / 2, height),
xytext=(0, 3),
textcoords="offset points",
ha="center", va="bottom", fontsize=10)
plt.tight_layout()
plt.savefig(save_path, dpi=150, bbox_inches="tight")
plt.close()
print(f"\nVisualization saved to {save_path}")
# ── Run Visualization ─────────────────────────────────────────────────────────
if __name__ == "__main__":
# Assuming all_results is already populated from the backtest
plot_backtest_results(all_results)
Part 6: Understanding Your Results — Common Beginner Mistakes
Running the code above, you'll get numbers. Before you celebrate or quit, understand what those numbers mean.
6.1 Sample Size Problem
If your backtest shows a Sharpe ratio of 2.5 but only 5 trades over 2 years, the Sharpe is meaningless. You need a minimum of 30 trades to have any statistical confidence.
| Trades | Statistical confidence |
|---|---|
| < 10 | None — results are anecdotal |
| 10–20 | Low — could be luck |
| 20–30 | Minimum viable |
| 30–50 | Moderate |
| 50+ | High confidence |
If your strategy generates fewer than 20 trades over a 2-year period, consider:
- Widening the SMA periods (e.g., 10/30 instead of 20/50)
- Adding more symbols to the universe
- Adjusting position sizing to be more aggressive
6.2 Overfitting Problem
If you tune your SMA periods to maximize backtest returns, you'll almost certainly overfit. The parameters that worked best in 2022–2024 may not work in 2025–2027.
A simple test: split your data into two periods — 2022–2023 and 2024–2025. Optimize on the first period, then test on the second. If results degrade significantly, you're overfitting.
6.3 Transaction Cost Problem
Our code doesn't include transaction costs (commissions, slippage). In reality:
- Commission: ~$0.005–$0.01 per share (varies by broker)
- Slippage: ~0.01–0.05% depending on liquidity
- For a $10,000 position, realistic round-trip cost: $1–$5
For a strategy with 50 trades per year and $5 round-trip cost, that's $250/year in costs. If your strategy returns $1,000 gross, your net is $750 — 25% of your gross return. Transaction costs matter.
Part 7: Data Source Comparison — Why We Used TickDB
For a beginner, choosing a data source feels overwhelming. Here's how the major options compare:
| Capability | Yahoo Finance (Free) | Alpha Vantage (Free Tier) | TickDB |
|---|---|---|---|
| Historical OHLCV (US stocks) | Yes, but delisted stocks dropped | Yes, limited to 5 req/min | Yes, 10+ years, point-in-time |
| Real-time data | Delayed 15 min | No (free tier) | Yes, WebSocket push |
| Data cleanliness | Poor — survivorship bias | Moderate | Cleaned, adjusted, aligned |
| API stability | High (but rate limits on scraping) | Moderate | High — production-grade SLA |
| Rate limits | Unreliable | 5 req/min (free) | Defined, with Retry-After headers |
| Historical volume | Yes, but inconsistent | Yes | Yes, consistent |
| Best for | Exploration only | Small projects | Production quant systems |
For the strategy in this article — 2 years of daily data, 6 symbols — Yahoo Finance could technically work. But the moment you want to:
- Add 50 symbols
- Run intraday strategies
- Analyze delisted stocks
- Get real-time data
...Yahoo Finance breaks. TickDB scales with your ambitions.
Part 8: Deployment Guide — By Experience Level
| User type | Recommendation |
|---|---|
| Complete beginner | Start with this exact code. Use the free API tier. Run on your local machine. Focus on understanding before optimizing. |
| Python-intermediate | Add transaction costs to the backtest, implement stop-loss, add more symbols. Set up a cron job to update data weekly. |
| Advanced | Add real-time WebSocket streaming, implement position sizing based on volatility, build a web dashboard for live monitoring. |
Closing: The Real Work Starts Here
We started with a story about a senior engineer whose backtest showed 45% returns and whose live account lost 12%. The difference between that story and your results is the difference between "downloading data" and "building a system."
This article gave you:
- Production-grade data acquisition with error handling, retry logic, and rate-limit compliance
- A strategy with an explicit hypothesis and a clear exit rule
- A backtesting engine with proper metrics (Sharpe, drawdown, win rate)
- Visualization that separates signal from noise
The strategy we've built — SMA crossover on large-cap US stocks — is simple. It's not going to make you rich. But it will teach you the fundamentals: how data quality affects results, how position sizing affects drawdown, how sample size affects statistical confidence.
That's the foundation. Everything else — statistical arbitrage, pairs trading, machine learning — builds on it.
Next steps:
- Run the code: Set up your TickDB API key, run the data fetcher, execute the backtest
- Experiment: Try different SMA periods. Test different symbols. Add transaction costs.
- Read the docs: Explore TickDB's real-time WebSocket capabilities for live trading
- Join the community: Share your results, ask questions, learn from others
The quant journey is long. But every system you've ever admired started exactly where you are right now.
Next Steps
If you want to run this strategy yourself:
- Sign up at tickdb.ai (free, no credit card required)
- Generate an API key in the dashboard
- Set the
TICKDB_API_KEYenvironment variable - Copy-paste the code from this article and run it
If you want 10+ years of historical OHLCV data for more robust backtesting, reach out to enterprise@tickdb.ai for institutional plans with extended history and higher rate limits.
If you use AI coding assistants, search for and install the tickdb-market-data SKILL in your AI tool's marketplace to get TickDB API integration directly in your coding workflow.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. Backtested strategies may perform differently in live trading due to slippage, latency, liquidity, and other factors not captured in historical simulation.