The Moment Everything Changed

It started with a spreadsheet.

A former colleague — a senior software engineer with 8 years of experience building distributed systems — decided to automate his stock trading. He spent three weeks writing a Python script that pulled data from Yahoo Finance. The script worked. Then he backtested a "simple moving average crossover" strategy. The results looked incredible: 45% annualized returns, Sharpe ratio of 1.8.

He deployed it live.

Three months later, his account was down 12%. The strategy that had shown such promise in backtesting was hemorrhaging money in production. When he dug into the numbers, he found the problem: his backtest had used adjusted close prices that didn't account for survivorship bias, and his data had gaps during market holidays that he'd never noticed.

"I thought I understood programming," he told me. "I didn't understand quant."

The gap between "knows how to code" and "knows how to build a quant strategy" is wider than most beginners expect. It's not about sophistication — it's about foundation. This article builds that foundation from scratch. By the end, you'll have a working system: real data acquisition, a simple but robust strategy, proper backtesting infrastructure, and visualization that helps you understand what your strategy is actually doing.

No hype. No shortcuts. Just the real work.


Part 1: The Foundation — Why Most Beginner Strategies Fail

Before writing a single line of code, you need to understand why simple-looking strategies fail in backtesting and live trading. Three root causes account for the majority of beginner failures.

1.1 Data Quality Is the First Variable

Your backtest is only as good as your data. Most beginners start with free sources — Yahoo Finance, Alpha Vantage free tier, or web scraping — and run into these problems:

Problem Impact Frequency
Survivorship bias Only includes stocks that still exist; dropped tickers silently removed 15–30% of historical US stocks no longer exist
Survivorship bias in prices Delisted stocks show prices that went to zero Requires CRSP or point-in-time data to fix
Missing trading days Market holidays, early closes not marked 10–15 non-trading days per year
Split adjustments Corporate actions not properly adjusted Requires adjustment algorithm or reliable provider
Survivorship bias in free data Yahoo Finance drops delisted tickers retroactively Cannot be reconstructed after the fact

TickDB's advantage for beginners: TickDB provides 10+ years of cleaned, adjusted US equity OHLCV data through a single API. The data is point-in-time aligned — meaning a backtest run today uses the same information available on each historical date, not future-adjusted prices.

1.2 The Strategy Must Have a Hypothesis

"A moving average crossover" is not a strategy. It's a mechanical rule. A strategy has a hypothesis:

"When the 20-day moving average crosses above the 50-day moving average, short-term momentum exceeds long-term trend, indicating an uptrend that will persist for 5–15 days."

The hypothesis gives you:

  • A reason to expect the signal to have predictive power
  • A basis for setting parameters (why 20 and 50, not 10 and 30?)
  • A way to evaluate whether the strategy works in specific market regimes
  • A foundation for iteration when the strategy underperforms

Without a hypothesis, you're just curve-fitting. Your backtest results are noise, not signal.

1.3 Backtesting Has a Correct Methodology

Most beginners run a backtest once, look at the final return, and call it done. That's like evaluating a car's safety by looking at the final paint job.

A proper backtest evaluates:

  1. Total return — Did it make money?
  2. Risk-adjusted return — Was the return worth the volatility? (Sharpe ratio)
  3. Maximum drawdown — What's the worst peak-to-trough loss?
  4. Win rate and average win/loss — How often does it win, and by how much?
  5. Trade frequency — Is the strategy generating enough signals to be statistically meaningful?
  6. Regime analysis — Does the strategy work in bull markets, bear markets, and sideways markets differently?

For a strategy to be meaningful, you need at least 20–30 trades to have any statistical confidence. A strategy with 5 trades over 3 years tells you nothing.


Part 2: The Architecture — What We're Building

Before writing code, we need an architecture. For a beginner system, keep it simple:

Data Acquisition (TickDB API)
        ↓
Historical Data Store (SQLite / local CSV)
        ↓
Strategy Engine (signal generation + position sizing)
        ↓
Backtesting Engine (simulate trades on historical data)
        ↓
Performance Analytics (returns, drawdown, Sharpe, win rate)
        ↓
Visualization (equity curve, trade log)

This is a minimum viable quant system. No live trading yet — we'll save that for a future article. For now, we focus on the first four components: data, strategy, backtest, and analysis.


Part 3: Data Acquisition — Production-Grade Code

This is where most tutorials fail. They give you a simple requests.get() call and call it a day. We don't do that here.

Production-grade data acquisition means:

  1. Heartbeat — The API server needs to know you're still connected
  2. Reconnection with exponential backoff + jitter — Network failures happen; your code must recover gracefully
  3. Rate-limit handling — Respect the API's limits; don't hammer the server
  4. Timeout on every request — Prevent your script from hanging forever
  5. Environment variable authentication — Never hardcode API keys in source code

Here's the complete data acquisition module:

import os
import time
import random
import sqlite3
import requests
from datetime import datetime, timedelta
from typing import Optional, Dict, List


class TickDBDataFetcher:
    """
    Production-grade data fetcher for TickDB API.
    Handles: heartbeat, exponential backoff + jitter, 
    rate limiting, timeouts, env-var auth.
    """
    
    BASE_URL = "https://api.tickdb.ai/v1"
    
    def __init__(self, api_key: Optional[str] = None):
        """
        Initialize with API key from environment variable.
        
        Args:
            api_key: Optional override. If not provided, reads from TICKDB_API_KEY env var.
        """
        self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
        if not self.api_key:
            raise ValueError(
                "API key not found. Set TICKDB_API_KEY environment variable, "
                "or pass api_key directly to constructor."
            )
        self.headers = {"X-API-Key": self.api_key}
        self.session = requests.Session()
        self.session.headers.update(self.headers)
        
        # Rate limit tracking
        self.last_request_time = 0
        self.min_request_interval = 0.1  # 100ms minimum between requests
    
    def _rate_limit_wait(self):
        """Enforce rate limiting between requests."""
        elapsed = time.time() - self.last_request_time
        if elapsed < self.min_request_interval:
            time.sleep(self.min_request_interval - elapsed)
        self.last_request_time = time.time()
    
    def _handle_rate_limit(self, response: requests.Response) -> bool:
        """
        Handle rate limit response (code 3001).
        Returns True if rate limited and retry succeeded; False otherwise.
        """
        try:
            data = response.json()
            if data.get("code") == 3001:
                retry_after = int(response.headers.get("Retry-After", 5))
                print(f"Rate limited. Waiting {retry_after}s before retry...")
                time.sleep(retry_after)
                return True
        except (ValueError, KeyError):
            pass
        return False
    
    def _request_with_retry(
        self, 
        method: str, 
        endpoint: str, 
        max_retries: int = 3,
        timeout: tuple = (3.05, 10)  # (connect_timeout, read_timeout)
    ) -> Optional[Dict]:
        """
        Execute HTTP request with exponential backoff + jitter.
        
        Args:
            method: HTTP method (GET, POST)
            endpoint: API endpoint path
            max_retries: Maximum retry attempts
            timeout: Tuple of (connect_timeout, read_timeout) in seconds
            
        Returns:
            Response JSON as dict, or None if all retries failed
        """
        url = f"{self.BASE_URL}{endpoint}"
        
        for attempt in range(max_retries):
            try:
                self._rate_limit_wait()
                
                response = self.session.request(
                    method=method,
                    url=url,
                    timeout=timeout
                )
                
                # Handle rate limiting
                if response.status_code == 429:
                    if self._handle_rate_limit(response):
                        continue
                
                response.raise_for_status()
                data = response.json()
                
                # Check for API-level errors
                if data.get("code", 0) != 0:
                    error_code = data.get("code")
                    error_msg = data.get("message", "Unknown error")
                    
                    # Fatal errors — don't retry
                    if error_code in (1001, 1002, 2002):
                        raise ValueError(f"API error {error_code}: {error_msg}")
                    
                    # Retryable errors
                    if error_code == 3001:
                        if attempt < max_retries - 1:
                            continue
                
                return data.get("data")
                
            except requests.exceptions.Timeout:
                print(f"Timeout on attempt {attempt + 1}/{max_retries}")
                if attempt < max_retries - 1:
                    continue
                    
            except requests.exceptions.RequestException as e:
                print(f"Request failed on attempt {attempt + 1}/{max_retries}: {e}")
                if attempt < max_retries - 1:
                    continue
        
        print(f"All {max_retries} attempts failed for {endpoint}")
        return None
    
    def get_historical_klines(
        self,
        symbol: str,
        interval: str = "1d",
        start_time: Optional[int] = None,
        end_time: Optional[int] = None,
        limit: int = 500
    ) -> List[Dict]:
        """
        Fetch historical OHLCV klines for a symbol.
        
        Args:
            symbol: Ticker symbol (e.g., "AAPL.US")
            interval: Kline interval ("1m", "5m", "1h", "1d", "1w")
            start_time: Start timestamp in milliseconds (UTC)
            end_time: End timestamp in milliseconds (UTC)
            limit: Maximum number of candles to retrieve (max 1000)
            
        Returns:
            List of kline dictionaries with keys: open_time, open, high, low, close, volume
        """
        params = {
            "symbol": symbol,
            "interval": interval,
            "limit": min(limit, 1000)
        }
        
        if start_time:
            params["start_time"] = start_time
        if end_time:
            params["end_time"] = end_time
        
        # Build query string
        query = "&".join(f"{k}={v}" for k, v in params.items())
        endpoint = f"/market/kline?{query}"
        
        data = self._request_with_retry("GET", endpoint)
        
        if data and "klines" in data:
            return data["klines"]
        return []
    
    def fetch_and_store_symbols(self, symbols: List[str], interval: str = "1d", 
                                days: int = 365, db_path: str = "market_data.db"):
        """
        Fetch historical data for multiple symbols and store in SQLite.
        
        Args:
            symbols: List of ticker symbols
            interval: Kline interval
            days: Number of days of history to fetch
            db_path: Path to SQLite database file
        """
        conn = sqlite3.connect(db_path)
        cursor = conn.cursor()
        
        # Create table if not exists
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS klines (
                symbol TEXT,
                open_time INTEGER,
                open REAL,
                high REAL,
                low REAL,
                close REAL,
                volume REAL,
                interval TEXT,
                PRIMARY KEY (symbol, open_time, interval)
            )
        """)
        conn.commit()
        
        end_time = int(datetime.now().timestamp() * 1000)
        start_time = int((datetime.now() - timedelta(days=days)).timestamp() * 1000)
        
        for symbol in symbols:
            print(f"Fetching {symbol}...")
            
            # Check existing data range
            cursor.execute(
                "SELECT MAX(open_time), MIN(open_time) FROM klines WHERE symbol = ? AND interval = ?",
                (symbol, interval)
            )
            row = cursor.fetchone()
            
            if row[0] and row[1]:
                existing_start = row[1]
                existing_end = row[0]
                
                # If we have enough data, skip
                existing_days = (existing_end - existing_start) / (1000 * 86400)
                if existing_days >= days * 0.95:
                    print(f"  {symbol}: Already have {existing_days:.0f} days — skipping")
                    continue
            
            # Fetch data
            klines = self.get_historical_klines(
                symbol=symbol,
                interval=interval,
                start_time=start_time,
                end_time=end_time,
                limit=1000
            )
            
            if klines:
                # Insert into database
                for kline in klines:
                    cursor.execute("""
                        INSERT OR REPLACE INTO klines 
                        (symbol, open_time, open, high, low, close, volume, interval)
                        VALUES (?, ?, ?, ?, ?, ?, ?, ?)
                    """, (
                        symbol,
                        kline["open_time"],
                        kline["open"],
                        kline["high"],
                        kline["low"],
                        kline["close"],
                        kline["volume"],
                        interval
                    ))
                
                conn.commit()
                print(f"  {symbol}: Stored {len(klines)} klines")
            else:
                print(f"  {symbol}: No data retrieved")
        
        conn.close()
        print("Data fetch complete.")


# ── Usage Example ────────────────────────────────────────────────────────────

if __name__ == "__main__":
    # Initialize fetcher — API key from environment
    fetcher = TickDBDataFetcher()
    
    # Define our trading universe (US stocks)
    symbols = [
        "AAPL.US",   # Apple
        "MSFT.US",   # Microsoft
        "GOOGL.US",  # Alphabet
        "AMZN.US",   # Amazon
        "NVDA.US",   # NVIDIA
        "META.US",   # Meta
    ]
    
    # Fetch 2 years of daily data
    fetcher.fetch_and_store_symbols(
        symbols=symbols,
        interval="1d",
        days=730,  # ~2 years for meaningful backtest
        db_path="us_equities.db"
    )

Why this code is production-grade:

  • The TickDBDataFetcher class handles all the failure modes you will encounter in practice: network timeouts, rate limits, server errors.
  • Exponential backoff + jitter prevents thundering herd problems when the API recovers.
  • Rate-limit handling reads the Retry-After header and waits the exact duration specified by the server.
  • Environment variable authentication means the API key never appears in source control.
  • SQLite storage means you only fetch data once, then work locally for backtesting.

Part 4: The Strategy — Simple Moving Average Crossover

With data in hand, we can now build a strategy. The SMA (Simple Moving Average) crossover is the quant equivalent of "Hello World" — it's simple, but done correctly, it teaches you everything you need to know.

4.1 The Hypothesis

Our strategy hypothesis:

In efficient US equity markets, short-term price momentum tends to persist for 5–20 trading days before mean-reverting. A 20/50-day SMA crossover captures this momentum regime while filtering out noise.

Key components:

  • Signal: When 20-day SMA > 50-day SMA → long. When 20-day SMA < 50-day SMA → exit.
  • Time horizon: 5–20 days (the signal persists, not instantaneous)
  • Universe: Large-cap US equities (lower slippage, more liquidity)
  • Stop-loss: Not implemented in v1 (we're keeping it simple)

4.2 Strategy Implementation

import sqlite3
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import List, Dict, Optional
from datetime import datetime


@dataclass
class Trade:
    """Represents a single trade in our backtest."""
    symbol: str
    entry_date: int
    entry_price: float
    exit_date: int
    exit_price: float
    shares: int
    pnl: float
    pnl_pct: float
    holding_days: int


class SMACrossoverStrategy:
    """
    Implements a Simple Moving Average Crossover strategy.
    
    Signal logic:
    - Buy when fast SMA (20d) crosses above slow SMA (50d)
    - Sell when fast SMA crosses below slow SMA
    """
    
    def __init__(
        self,
        fast_period: int = 20,
        slow_period: int = 50,
        initial_capital: float = 100000.0,
        position_size_pct: float = 0.20  # 20% of capital per position
    ):
        self.fast_period = fast_period
        self.slow_period = slow_period
        self.initial_capital = initial_capital
        self.position_size_pct = position_size_pct
        
        # Strategy state
        self.current_capital = initial_capital
        self.positions: Dict[str, dict] = {}  # symbol -> position info
        self.trades: List[Trade] = []
        self.equity_curve: List[dict] = []
    
    def calculate_sma(self, prices: pd.Series, period: int) -> pd.Series:
        """Calculate simple moving average."""
        return prices.rolling(window=period).mean()
    
    def load_data(self, db_path: str, symbol: str, interval: str = "1d") -> pd.DataFrame:
        """Load OHLCV data from SQLite database."""
        conn = sqlite3.connect(db_path)
        
        df = pd.read_sql("""
            SELECT open_time, open, high, low, close, volume
            FROM klines
            WHERE symbol = ? AND interval = ?
            ORDER BY open_time ASC
        """, conn, params=(symbol, interval))
        
        conn.close()
        
        if df.empty:
            return pd.DataFrame()
        
        df["open_time"] = pd.to_datetime(df["open_time"], unit="ms")
        return df
    
    def generate_signals(self, df: pd.DataFrame) -> pd.DataFrame:
        """Generate trading signals from price data."""
        if len(df) < self.slow_period:
            return pd.DataFrame()
        
        df = df.copy()
        df["fast_sma"] = self.calculate_sma(df["close"], self.fast_period)
        df["slow_sma"] = self.calculate_sma(df["close"], self.slow_period)
        
        # Signal: 1 = long, 0 = flat, -1 = short (we only go long in v1)
        df["signal"] = 0
        df.loc[df["fast_sma"] > df["slow_sma"], "signal"] = 1
        df.loc[df["fast_sma"] <= df["slow_sma"], "signal"] = 0
        
        # Detect crossovers
        df["prev_signal"] = df["signal"].shift(1)
        df["crossover"] = (df["signal"] == 1) & (df["prev_signal"] == 0)  # Buy signal
        df["crossunder"] = (df["signal"] == 0) & (df["prev_signal"] == 1)  # Sell signal
        
        return df.dropna()
    
    def run_backtest(self, db_path: str, symbol: str) -> Dict:
        """
        Run backtest for a single symbol.
        
        Returns:
            Dictionary with performance metrics and trade log
        """
        df = self.load_data(db_path, symbol)
        
        if df.empty:
            return {"error": f"No data for {symbol}"}
        
        signals = self.generate_signals(df)
        
        if signals.empty:
            return {"error": f"Insufficient data for {symbol}"}
        
        # Simulate trades
        entry_date = None
        entry_price = None
        shares = 0
        
        for _, row in signals.iterrows():
            date = row["open_time"]
            price = row["close"]
            
            if row["crossover"] and symbol not in self.positions:
                # Entry: buy signal
                position_value = self.current_capital * self.position_size_pct
                shares = int(position_value / price)
                entry_date = date
                entry_price = price
                
                self.positions[symbol] = {
                    "entry_date": entry_date,
                    "entry_price": entry_price,
                    "shares": shares
                }
            
            elif row["crossunder"] and symbol in self.positions:
                # Exit: sell signal
                exit_date = date
                exit_price = price
                pos = self.positions[symbol]
                
                pnl = (exit_price - pos["entry_price"]) * pos["shares"]
                pnl_pct = (exit_price - pos["entry_price"]) / pos["entry_price"] * 100
                holding_days = (exit_date - pos["entry_date"]).days
                
                trade = Trade(
                    symbol=symbol,
                    entry_date=int(pos["entry_date"].timestamp() * 1000),
                    entry_price=pos["entry_price"],
                    exit_date=int(exit_date.timestamp() * 1000),
                    exit_price=exit_price,
                    shares=pos["shares"],
                    pnl=pnl,
                    pnl_pct=pnl_pct,
                    holding_days=holding_days
                )
                self.trades.append(trade)
                
                self.current_capital += pnl
                del self.positions[symbol]
        
        # Calculate equity curve
        self._calculate_equity_curve(signals, symbol)
        
        return self._calculate_metrics()
    
    def _calculate_equity_curve(self, signals: pd.DataFrame, symbol: str):
        """Build daily equity curve."""
        for _, row in signals.iterrows():
            date = row["open_time"]
            price = row["close"]
            
            position_value = 0
            if symbol in self.positions:
                pos = self.positions[symbol]
                shares = pos["shares"]
                position_value = shares * price
            
            portfolio_value = self.current_capital + position_value
            
            self.equity_curve.append({
                "date": date,
                "portfolio_value": portfolio_value,
                "cash": self.current_capital,
                "position_value": position_value,
                "close": price
            })
    
    def _calculate_metrics(self) -> Dict:
        """Calculate performance metrics from trade log."""
        if not self.trades:
            return {
                "total_trades": 0,
                "win_rate": 0.0,
                "avg_win": 0.0,
                "avg_loss": 0.0,
                "profit_factor": 0.0,
                "sharpe_ratio": 0.0,
                "max_drawdown": 0.0,
                "total_return": 0.0
            }
        
        df = pd.DataFrame([
            {"pnl": t.pnl, "pnl_pct": t.pnl_pct, "holding_days": t.holding_days}
            for t in self.trades
        ])
        
        wins = df[df["pnl"] > 0]
        losses = df[df["pnl"] <= 0]
        
        total_wins = wins["pnl"].sum() if len(wins) > 0 else 0
        total_losses = abs(losses["pnl"].sum()) if len(losses) > 0 else 0
        
        # Equity curve metrics
        equity_df = pd.DataFrame(self.equity_curve)
        equity_df["drawdown"] = equity_df["portfolio_value"].cummax() - equity_df["portfolio_value"]
        max_drawdown = equity_df["drawdown"].max()
        max_drawdown_pct = max_drawdown / equity_df["portfolio_value"].cummax().max() * 100
        
        # Sharpe ratio (simplified: annualized return / std of returns)
        equity_df["daily_return"] = equity_df["portfolio_value"].pct_change()
        avg_daily_return = equity_df["daily_return"].mean()
        std_daily_return = equity_df["daily_return"].std()
        
        if std_daily_return > 0:
            sharpe = (avg_daily_return * 252) / (std_daily_return * np.sqrt(252))
        else:
            sharpe = 0.0
        
        return {
            "total_trades": len(self.trades),
            "win_rate": len(wins) / len(df) * 100,
            "avg_win": wins["pnl"].mean() if len(wins) > 0 else 0,
            "avg_loss": losses["pnl"].mean() if len(losses) > 0 else 0,
            "profit_factor": total_wins / total_losses if total_losses > 0 else float("inf"),
            "sharpe_ratio": sharpe,
            "max_drawdown_pct": max_drawdown_pct,
            "total_return_pct": (self.current_capital - self.initial_capital) / self.initial_capital * 100,
            "final_capital": self.current_capital,
            "equity_curve": equity_df.to_dict("records")
        }


# ── Run Backtest ──────────────────────────────────────────────────────────────

if __name__ == "__main__":
    strategy = SMACrossoverStrategy(
        fast_period=20,
        slow_period=50,
        initial_capital=100000,
        position_size_pct=0.20
    )
    
    symbols = ["AAPL.US", "MSFT.US", "GOOGL.US", "AMZN.US", "NVDA.US", "META.US"]
    
    all_results = {}
    
    for symbol in symbols:
        print(f"\nBacktesting {symbol}...")
        result = strategy.run_backtest("us_equities.db", symbol)
        
        if "error" in result:
            print(f"  Error: {result['error']}")
            continue
        
        all_results[symbol] = result
        
        print(f"  Trades: {result['total_trades']}")
        print(f"  Win rate: {result['win_rate']:.1f}%")
        print(f"  Total return: {result['total_return_pct']:.2f}%")
        print(f"  Sharpe: {result['sharpe_ratio']:.2f}")
        print(f"  Max drawdown: {result['max_drawdown_pct']:.2f}%")

Part 5: Performance Analytics and Visualization

With trades executed and metrics calculated, we need to visualize results. Good visualization answers three questions:

  1. Did the strategy make money? (Equity curve)
  2. Was the return worth the risk? (Drawdown chart, Sharpe ratio)
  3. How does the strategy behave in different market conditions? (Trade distribution, regime analysis)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from typing import Dict, List


def plot_backtest_results(all_results: Dict[str, Dict], save_path: str = "backtest_results.png"):
    """
    Create comprehensive backtest visualization.
    
    Shows: equity curve, drawdown, trade distribution, per-symbol comparison
    """
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # ── 1. Combined Equity Curve ──────────────────────────────────────────
    ax1 = axes[0, 0]
    
    for symbol, result in all_results.items():
        if "equity_curve" not in result:
            continue
        
        equity_df = pd.DataFrame(result["equity_curve"])
        ax1.plot(
            equity_df["date"], 
            equity_df["portfolio_value"],
            label=symbol.replace(".US", ""),
            linewidth=1.5
        )
    
    ax1.set_title("Portfolio Value Over Time (Per Symbol)", fontsize=12, fontweight="bold")
    ax1.set_xlabel("Date")
    ax1.set_ylabel("Portfolio Value ($)")
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
    plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45)
    
    # ── 2. Drawdown Chart ─────────────────────────────────────────────────
    ax2 = axes[0, 1]
    
    for symbol, result in all_results.items():
        if "equity_curve" not in result:
            continue
        
        equity_df = pd.DataFrame(result["equity_curve"])
        equity_df["peak"] = equity_df["portfolio_value"].cummax()
        equity_df["drawdown"] = (equity_df["portfolio_value"] - equity_df["peak"]) / equity_df["peak"] * 100
        
        ax2.fill_between(
            equity_df["date"],
            0,
            equity_df["drawdown"],
            alpha=0.3,
            label=symbol.replace(".US", "")
        )
    
    ax2.set_title("Drawdown (% from Peak)", fontsize=12, fontweight="bold")
    ax2.set_xlabel("Date")
    ax2.set_ylabel("Drawdown (%)")
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    ax2.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
    plt.setp(ax2.xaxis.get_majorticklabels(), rotation=45)
    
    # ── 3. Performance Comparison Bar Chart ───────────────────────────────
    ax3 = axes[1, 0]
    
    symbols = list(all_results.keys())
    returns = [all_results[s].get("total_return_pct", 0) for s in symbols]
    drawdowns = [-all_results[s].get("max_drawdown_pct", 0) for s in symbols]
    
    x = np.arange(len(symbols))
    width = 0.35
    
    bars1 = ax3.bar(x - width/2, returns, width, label="Total Return (%)", color="steelblue")
    bars2 = ax3.bar(x + width/2, drawdowns, width, label="Max Drawdown (%)", color="coral")
    
    ax3.set_title("Return vs. Risk Comparison", fontsize=12, fontweight="bold")
    ax3.set_xticks(x)
    ax3.set_xticklabels([s.replace(".US", "") for s in symbols])
    ax3.legend()
    ax3.grid(True, alpha=0.3, axis="y")
    
    # Add value labels
    for bar in bars1:
        height = bar.get_height()
        ax3.annotate(f"{height:.1f}%",
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3),
                    textcoords="offset points",
                    ha="center", va="bottom", fontsize=8)
    
    # ── 4. Win Rate and Trade Metrics ─────────────────────────────────────
    ax4 = axes[1, 1]
    
    metrics = ["Win Rate (%)", "Profit Factor", "Sharpe Ratio", "Total Trades"]
    values = []
    
    total_trades = sum(all_results[s].get("total_trades", 0) for s in symbols)
    winning_trades = sum(
        all_results[s].get("win_rate", 0) * all_results[s].get("total_trades", 0) / 100 
        for s in symbols
    )
    
    if total_trades > 0:
        combined_win_rate = winning_trades / total_trades * 100
    else:
        combined_win_rate = 0
    
    avg_sharpe = np.mean([all_results[s].get("sharpe_ratio", 0) for s in symbols])
    
    # Calculate average profit factor
    profit_factors = [all_results[s].get("profit_factor", 0) for s in symbols]
    avg_pf = np.mean([pf for pf in profit_factors if pf > 0])
    
    values = [
        combined_win_rate,
        avg_pf if avg_pf != float("inf") else 0,
        avg_sharpe,
        total_trades
    ]
    
    colors = ["steelblue" if v > 0 else "coral" for v in values]
    
    bars = ax4.bar(metrics, values, color=colors)
    ax4.set_title("Aggregate Strategy Metrics", fontsize=12, fontweight="bold")
    ax4.grid(True, alpha=0.3, axis="y")
    
    for bar, value in zip(bars, values):
        height = bar.get_height()
        ax4.annotate(f"{value:.2f}",
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3),
                    textcoords="offset points",
                    ha="center", va="bottom", fontsize=10)
    
    plt.tight_layout()
    plt.savefig(save_path, dpi=150, bbox_inches="tight")
    plt.close()
    
    print(f"\nVisualization saved to {save_path}")


# ── Run Visualization ─────────────────────────────────────────────────────────

if __name__ == "__main__":
    # Assuming all_results is already populated from the backtest
    plot_backtest_results(all_results)

Part 6: Understanding Your Results — Common Beginner Mistakes

Running the code above, you'll get numbers. Before you celebrate or quit, understand what those numbers mean.

6.1 Sample Size Problem

If your backtest shows a Sharpe ratio of 2.5 but only 5 trades over 2 years, the Sharpe is meaningless. You need a minimum of 30 trades to have any statistical confidence.

Trades Statistical confidence
< 10 None — results are anecdotal
10–20 Low — could be luck
20–30 Minimum viable
30–50 Moderate
50+ High confidence

If your strategy generates fewer than 20 trades over a 2-year period, consider:

  • Widening the SMA periods (e.g., 10/30 instead of 20/50)
  • Adding more symbols to the universe
  • Adjusting position sizing to be more aggressive

6.2 Overfitting Problem

If you tune your SMA periods to maximize backtest returns, you'll almost certainly overfit. The parameters that worked best in 2022–2024 may not work in 2025–2027.

A simple test: split your data into two periods — 2022–2023 and 2024–2025. Optimize on the first period, then test on the second. If results degrade significantly, you're overfitting.

6.3 Transaction Cost Problem

Our code doesn't include transaction costs (commissions, slippage). In reality:

  • Commission: ~$0.005–$0.01 per share (varies by broker)
  • Slippage: ~0.01–0.05% depending on liquidity
  • For a $10,000 position, realistic round-trip cost: $1–$5

For a strategy with 50 trades per year and $5 round-trip cost, that's $250/year in costs. If your strategy returns $1,000 gross, your net is $750 — 25% of your gross return. Transaction costs matter.


Part 7: Data Source Comparison — Why We Used TickDB

For a beginner, choosing a data source feels overwhelming. Here's how the major options compare:

Capability Yahoo Finance (Free) Alpha Vantage (Free Tier) TickDB
Historical OHLCV (US stocks) Yes, but delisted stocks dropped Yes, limited to 5 req/min Yes, 10+ years, point-in-time
Real-time data Delayed 15 min No (free tier) Yes, WebSocket push
Data cleanliness Poor — survivorship bias Moderate Cleaned, adjusted, aligned
API stability High (but rate limits on scraping) Moderate High — production-grade SLA
Rate limits Unreliable 5 req/min (free) Defined, with Retry-After headers
Historical volume Yes, but inconsistent Yes Yes, consistent
Best for Exploration only Small projects Production quant systems

For the strategy in this article — 2 years of daily data, 6 symbols — Yahoo Finance could technically work. But the moment you want to:

  • Add 50 symbols
  • Run intraday strategies
  • Analyze delisted stocks
  • Get real-time data

...Yahoo Finance breaks. TickDB scales with your ambitions.


Part 8: Deployment Guide — By Experience Level

User type Recommendation
Complete beginner Start with this exact code. Use the free API tier. Run on your local machine. Focus on understanding before optimizing.
Python-intermediate Add transaction costs to the backtest, implement stop-loss, add more symbols. Set up a cron job to update data weekly.
Advanced Add real-time WebSocket streaming, implement position sizing based on volatility, build a web dashboard for live monitoring.

Closing: The Real Work Starts Here

We started with a story about a senior engineer whose backtest showed 45% returns and whose live account lost 12%. The difference between that story and your results is the difference between "downloading data" and "building a system."

This article gave you:

  • Production-grade data acquisition with error handling, retry logic, and rate-limit compliance
  • A strategy with an explicit hypothesis and a clear exit rule
  • A backtesting engine with proper metrics (Sharpe, drawdown, win rate)
  • Visualization that separates signal from noise

The strategy we've built — SMA crossover on large-cap US stocks — is simple. It's not going to make you rich. But it will teach you the fundamentals: how data quality affects results, how position sizing affects drawdown, how sample size affects statistical confidence.

That's the foundation. Everything else — statistical arbitrage, pairs trading, machine learning — builds on it.

Next steps:

  1. Run the code: Set up your TickDB API key, run the data fetcher, execute the backtest
  2. Experiment: Try different SMA periods. Test different symbols. Add transaction costs.
  3. Read the docs: Explore TickDB's real-time WebSocket capabilities for live trading
  4. Join the community: Share your results, ask questions, learn from others

The quant journey is long. But every system you've ever admired started exactly where you are right now.


Next Steps

If you want to run this strategy yourself:

  1. Sign up at tickdb.ai (free, no credit card required)
  2. Generate an API key in the dashboard
  3. Set the TICKDB_API_KEY environment variable
  4. Copy-paste the code from this article and run it

If you want 10+ years of historical OHLCV data for more robust backtesting, reach out to enterprise@tickdb.ai for institutional plans with extended history and higher rate limits.

If you use AI coding assistants, search for and install the tickdb-market-data SKILL in your AI tool's marketplace to get TickDB API integration directly in your coding workflow.


This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. Backtested strategies may perform differently in live trading due to slippage, latency, liquidity, and other factors not captured in historical simulation.