The Moment Your Backtest Lies to You
You spend three weeks building a mean-reversion strategy. Every overnight backtest passes with flying colors — Sharpe of 1.8, max drawdown under 6%. You wire it to live data on a Monday morning. By Thursday, it's down 4%.
The strategy didn't break. The data broke your expectations.
The culprit hides in a place most quant developers never think to check: how their tick data gets aggregated into minute bars. Two seemingly identical streams — one from your broker, one from a third-party vendor — produce K-lines that diverge by as much as 3.2% on high-volatility days. This isn't a data quality bug. It's a fundamental disagreement about what a "minute" means in financial markets.
This article dissects the three architectural decisions that make minute K-line aggregation a source of systematic backtest bias: wall-clock versus trade-time alignment, SIP trade filtering rules, and venue-specific consolidation policies.
The Alignment Problem: Two Paradigms for One Minute
Every financial data vendor aggregates ticks into OHLCV candles. Not all of them use the same rule for where one minute ends and the next begins.
Wall-Clock Alignment (UTC-anchored)
The first paradigm treats time as absolute. A minute bar opens at 09:30:00.000 UTC-5 and closes at 09:30:59.999 UTC-5. Every timestamp gets bucketed by its wall-clock reading — regardless of when the market opened or closed.
from datetime import datetime, timezone
def wall_clock_align(timestamp_ms: int) -> datetime:
"""Align tick to minute boundary using absolute wall clock."""
utc_dt = datetime.fromtimestamp(timestamp_ms / 1000, tz=timezone.utc)
# Floor to the nearest minute
return utc_dt.replace(second=0, microsecond=0)
This approach is simple, deterministic, and timezone-aware. It works well for 24-hour markets like crypto and forex. For US equities, it creates a subtle but persistent bias: the 09:30:00.000 bar contains data from the NYSE opening auction, which technically executes at 09:30:00 but reports with a latency of 100–400 ms. By the time your alignment code processes the tick, you're already 100–400 ms into the bar — meaning the auction prints land in the 09:31 bar on a lagging feed.
Trade-Time Alignment (Market-session anchored)
The second paradigm anchors to the market's open and close. A minute is measured relative to the session start: the first 60 seconds of trading = bar 1, the next 60 seconds = bar 2, and so on.
def trade_time_align(timestamp_ms: int, session_open_ms: int) -> int:
"""Align tick to minute boundary relative to session open."""
elapsed_ms = timestamp_ms - session_open_ms
minute_index = elapsed_ms // 60_000 # integer division
return session_open_ms + (minute_index * 60_000)
On a normal trading day, both paradigms produce identical bars. On days with an early close — such as the day after Thanksgiving, where US equity markets close at 1:00 PM ET — wall-clock alignment continues treating 1:00 PM ET as the last bar of the hour, while trade-time alignment correctly terminates the session bar at the scheduled close.
The divergence compounds when you backtest across historical data that includes shortened sessions, trading halts, or index rebalancing events. A strategy that looks profitable on wall-clock-aligned data may simply be capturing session-edge artifacts that don't survive in live trading.
SIP Filtering: The Reason Your Ticks Disappear
The Securities Information Processor (SIP) is the consolidated data feed operated by the Consolidated Tape Association. It receives trade prints from all US equity exchanges, applies validation rules, and distributes a de-duplicated, corrected stream. Every major US equity data vendor either consumes SIP data directly or derives from it.
What many quant developers don't realize is that SIP applies a sophisticated filtering pipeline that can retroactively modify, cancel, or classify trades. This pipeline has three critical operations that affect your aggregation.
Correction and Cancellation (Reg SHO)
Under Regulation SHO, a trade can be broken or canceled after execution if the clearing member fails to deliver shares within the standard settlement window (T+2). SIP publishes a "trade break" message with the original trade's sequence number, instructing downstream consumers to remove the print from their records.
Without handling this, your aggregated K-lines include phantom prints from failed settlements:
import requests
from datetime import datetime, timezone
def fetch_trades_with_breaks(symbol: str, session: str = "NYSE"):
"""
Fetch trades and filter out broken prints.
Note: not all vendors expose break flags — check before using.
"""
response = requests.get(
f"https://api.example.com/v1/trades/{symbol}",
params={
"date": "2024-03-15",
"session": session, # 'NYSE' or 'NASDAQ' or 'ALL'
},
headers={"Authorization": f"Bearer {API_KEY}"},
timeout=(3.05, 10)
)
trades = response.json().get("results", [])
# Filter out broken trades (SIP condition code 'T' = Trade broken)
clean_trades = [
t for t in trades
if t.get("condition_code") not in ("T", "4") # '4' = Exchange-specific cancel
]
return clean_trades
# ⚠️ Engineering warning: some vendors bundle broken trades into
# the standard trades endpoint and require a separate 'breaks' query.
# Verify your vendor's schema before aggregating.
Adjusted versus Unadjusted Prices
Stock prices undergo splits, dividends, and spin-offs. SIP maintains two price streams: adjusted (reflecting all corporate actions) and unadjusted (raw market prints). K-line aggregation software that uses unadjusted data will produce incorrect OHLC values on any day where a corporate action occurred — unless it explicitly applies a cumulative adjustment factor.
def adjusted_close(raw_close: float, adjustment_factor: float) -> float:
"""Apply cumulative split/dividend adjustment factor."""
return raw_close * adjustment_factor
# ⚠️ Warning: adjustment factors compound across multiple corporate
# actions. Always use the vendor-provided adjusted series rather than
# rolling your own adjustment — the math is more complex than a single multiplier.
Volume Attribution
SIP classifies each trade into one of several volume categories: regular, odd-lot, derivative, or price-improvement. Odd-lot trades (fewer than 100 shares) are not reported to SIP and are excluded from the consolidated tape. Some vendors include odd-lot data from direct exchange feeds as an optional add-on. This creates a systematic difference: a K-line from a vendor that includes odd-lot data will show higher volume than one from a SIP-only vendor, particularly in high-frequency retail-dominated names like meme stocks or popular ETFs.
| Data source | Odd-lot included | Adjustment policy | SIP condition handling |
|---|---|---|---|
| SIP consolidated | No | Adjusted / unadjusted | Full |
| Exchange direct feed | Yes (varies) | User responsibility | Partial |
| Vendor X | No | Adjusted only | Full |
| Vendor Y | Yes (add-on) | Both available | Full |
The Aggregation Discrepancy: A Worked Example
Let's quantify the divergence. Suppose you aggregate tick data into 1-minute bars for NVDA on a high-volatility earnings day. You fetch the tick stream from two sources: your broker's direct feed and a third-party vendor. Both claim to provide "core, filtered US equity data."
Here's what you observe in the 09:30–09:31 bar (the opening auction bar):
| Metric | Your aggregator (raw ticks) | Third-party vendor | Divergence |
|---|---|---|---|
| Open | $142.50 | $142.55 | +$0.05 |
| High | $144.20 | $143.95 | −$0.25 |
| Low | $142.10 | $142.10 | 0 |
| Close | $143.80 | $143.82 | +$0.02 |
| Volume | 1,847,200 | 1,923,600 | +4.1% |
The vendor's bar has a higher open, a lower high, higher volume, and a different close. Why?
Wall-clock versus trade-time alignment. Your aggregator bucketizes by wall clock. The NYSE opening auction executes at 09:30:00.000 ET and reports to SIP within 100–150 ms. If your feed has 300 ms latency, the auction prints land in your 09:31 bar — shifting the OHLC entirely.
SIP condition filtering. The vendor strips trades with SIP condition codes indicating "extended hours only" or "odd-lot" or "Trade-as-determined" (TAND). These are valid prints but are excluded from the consolidated tape by default.
Volume inclusion. The vendor's volume count includes "PRINT" trades — a special SIP classification for single-price cross trades executed at the open or close. These can add 50,000–200,000 shares to the first or last bar of a heavily-traded name.
The combined effect: your "backtest" bar has a lower high and higher volume than the bar your live system will observe. If your entry signal fires when the high exceeds the prior bar's high by 1.5%, you will see more false signals in backtesting than in live trading — because the backtest bar's high is systematically underestimated.
Code: Building a Consistent Aggregation Pipeline
The solution is not to pick one vendor and hope. It's to build an aggregation pipeline that treats alignment rules as explicit configuration, handles SIP condition codes, and applies adjustment factors consistently.
import os
import time
import requests
from datetime import datetime, timezone, timedelta
from collections import defaultdict
from typing import List, Dict, Optional
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Configuration — these must match your backtesting assumptions
ALIGNMENT_MODE = os.environ.get("KLINE_ALIGN_MODE", "wall_clock")
# Options: "wall_clock", "trade_time"
ADJUSTED_PRICES = os.environ.get("KLINE_ADJUSTED", "true").lower() == "true"
# SIP condition codes to exclude from aggregation
EXCLUDED_CONDITIONS = {"T", "4", "Q", "O"} # Break, cancel, odd-lot, exchange-specific
class MinuteAggregator:
def __init__(self, symbol: str, alignment_mode: str = ALIGNMENT_MODE):
self.symbol = symbol
self.alignment_mode = alignment_mode
self.bars = defaultdict(lambda: {"open": None, "high": float("-inf"),
"low": float("inf"), "close": None, "volume": 0})
self.session_open_ms = None
def align_timestamp(self, ts_ms: int) -> int:
"""Align timestamp to minute boundary based on configured mode."""
dt = datetime.fromtimestamp(ts_ms / 1000, tz=timezone.utc)
aligned = dt.replace(second=0, microsecond=0)
if self.alignment_mode == "trade_time":
# For trade-time alignment, floor the minute index to session open
if self.session_open_ms is None:
# First tick sets session open — assumes chronological feed
self.session_open_ms = (aligned.timestamp() * 1000)
elapsed = aligned.timestamp() * 1000 - self.session_open_ms
minute_index = int(elapsed // 60_000)
return int(self.session_open_ms + (minute_index * 60_000))
return int(aligned.timestamp() * 1000)
def add_tick(self, tick: Dict):
"""Add a single trade tick to the aggregation pipeline."""
# Skip excluded SIP conditions
condition = tick.get("condition_code", "")
if condition in EXCLUDED_CONDITIONS:
logger.debug(f"Skipping tick with excluded condition: {condition}")
return
ts_ms = tick["timestamp"]
price = float(tick["price"])
volume = int(tick["size"])
if ADJUSTED_PRICES and "adjustment_factor" in tick:
price *= tick["adjustment_factor"]
bar_key = self.align_timestamp(ts_ms)
bar = self.bars[bar_key]
if bar["open"] is None:
bar["open"] = price
bar["high"] = max(bar["high"], price)
bar["low"] = min(bar["low"], price)
bar["close"] = price
bar["volume"] += volume
def get_bars(self) -> List[Dict]:
"""Return sorted list of completed minute bars."""
return [
{"timestamp": ts, "open": v["open"], "high": v["high"],
"low": v["low"], "close": v["close"], "volume": v["volume"]}
for ts, v in sorted(self.bars.items())
]
def fetch_and_aggregate(symbol: str, date: str, api_key: str) -> List[Dict]:
"""
Complete pipeline: fetch ticks, aggregate into minute bars.
Includes retry logic, rate-limit handling, and validation.
"""
aggregator = MinuteAggregator(symbol)
page_token = None
while True:
params = {"symbol": symbol, "date": date}
if page_token:
params["cursor"] = page_token
response = requests.get(
"https://api.example.com/v1/trades",
headers={"Authorization": f"Bearer {api_key}"},
params=params,
timeout=(3.05, 10)
)
# Rate-limit handling
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 5))
logger.warning(f"Rate limited — sleeping {retry_after}s")
time.sleep(retry_after)
continue
response.raise_for_status()
data = response.json()
for tick in data.get("results", []):
aggregator.add_tick(tick)
page_token = data.get("next_cursor")
if not page_token:
break
# Small delay between pagination requests to avoid burst limits
time.sleep(0.05)
return aggregator.get_bars()
# ⚠️ Engineering notes:
# 1. Verify your vendor exposes SIP condition codes — if not, your
# aggregator cannot distinguish broken trades from valid prints.
# 2. Trade-time alignment requires chronological feed. If your vendor
# delivers out-of-order ticks (common with replay endpoints), sort
# before aggregating.
# 3. Session open detection (first tick sets open) works for continuous
# sessions. For pre-market / after-hours, explicitly set session_open_ms.
Comparing Major Data Vendors: Where They Disagree
The aggregation discrepancy isn't hypothetical. Here's how five major US equity data sources differ on the three core dimensions:
| Vendor | Alignment default | SIP filter handling | Adjustment default | Odd-lot data |
|---|---|---|---|---|
| TickDB (reference) | Trade-time (session-anchored) | Full — condition codes exposed | Adjusted series default | Available via trades endpoint (HK/crypto; US: excluded) |
| Polygon | Wall-clock | Full | Unadjusted default; adjusted available | Excluded |
| Alpaca | Wall-clock | Full | Adjusted default | Excluded |
| Databento | Configurable (defaults to trade-time for US equities) | Full | Unadjusted default | Available as add-on |
| Interactive Brokers | Wall-clock | Partial (some condition codes not exposed via API) | Unadjusted only | Excluded |
The practical consequence: a backtest run against Polygon's data will produce different entry/exit signals than the same strategy against TickDB's data — even if both sources are derived from the same SIP feed. The difference lies in aggregation defaults, not raw data quality.
The Backtest Bias Spectrum
Not all strategies are equally sensitive to aggregation rules. Here's a rough taxonomy:
Low sensitivity — Strategies with holding periods of 30+ minutes and position sizing that doesn't depend on bar-by-bar volatility. A 5-minute timing shift doesn't change the fundamental thesis.
Medium sensitivity — Intraday mean-reversion strategies that enter on bar-close crosses. A 5-minute shift in when "bar close" occurs changes the signal timing enough to affect execution quality.
High sensitivity — Opening-range-breakout strategies and first-bar reversion plays. The first bar of the session carries disproportionate information content. Wall-clock versus trade-time alignment in the 09:30 bar alone can flip a valid signal into a false signal or vice versa.
Mitigation Checklist: Before You Trust Your Backtest
Run through this checklist before treating your backtest results as actionable:
- Identify your alignment mode. Check whether your data vendor defaults to wall-clock or trade-time. Document it in your strategy metadata.
- Verify SIP condition handling. Can you distinguish broken trades from valid prints? If not, your volume figures are unreliable.
- Confirm adjustment policy. Are you using adjusted or unadjusted prices? Mixed usage across datasets is a silent killer.
- Check odd-lot inclusion. Does your vendor include odd-lot prints? In names like SPY or QQQ, odd-lot volume can represent 8–12% of total volume.
- Test on shortened sessions. Run your strategy through historical days with early closes (post-Thanksgiving, Good Friday, etc.). If your alignment is wall-clock, you'll see phantom bars at the wrong timestamps.
- Validate against a reference. If possible, cross-check your aggregated bars against the vendor's native
klineendpoint — which should already apply their standard aggregation rules.
The Path Forward: Explicit Is Better Than Implicit
The root cause of aggregation-driven backtest bias is ambiguity. When a strategy silently assumes that "a minute is a minute" — without specifying alignment mode, SIP handling, or adjustment policy — it inherits the assumptions of whatever data pipeline happened to be available at the time.
The fix is to make these assumptions explicit in your strategy code:
# Strategy metadata — include this at the top of every strategy
STRATEGY_METADATA = {
"alignment_mode": "trade_time", # "wall_clock" or "trade_time"
"sip_filter": "full", # "full", "partial", or "none"
"price_adjustment": "adjusted", # "adjusted" or "unadjusted"
"odd_lot_handling": "excluded", # "included" or "excluded"
"session_type": "regular_equity", # "regular_equity", "pre_market", "after_hours"
}
This metadata travels with your backtest results, making it possible to reproduce the exact conditions in live trading. When your backtest says "Sharpe 1.8," the metadata tells you exactly which assumptions produced that number.
Next Steps
If you're debugging a backtest discrepancy right now, the first step is to compare your aggregated K-lines against your vendor's native kline endpoint output. If they diverge, you know the problem is in your aggregation logic — not in the raw data.
If you want clean, aligned historical data for backtesting, verify that your vendor provides both the raw tick stream and the pre-aggregated OHLCV bars, with explicit documentation of their aggregation rules. TickDB's /v1/market/kline endpoint provides 10+ years of cleaned, aligned US equity OHLCV data with full SIP condition handling — suitable for cross-cycle strategy validation without alignment surprises.
If you're building a multi-vendor data pipeline, the most robust architecture is to consume raw tick data (if available) and apply your own aggregation rules consistently — rather than relying on each vendor's defaults. This adds complexity but eliminates the silent bias that comes from implicit alignment assumptions.
If you use AI coding assistants, search for and install the tickdb-market-data SKILL in your AI tool's marketplace for one-click access to historical K-line data with explicit alignment documentation.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results.