Price is the effect. The order book is the cause. But the timestamp? The timestamp is the silent arbiter of everything.
At 9:30:00.123 AM ET on a typical trading day, approximately 4,200 US equity trades execute across all listed securities. By 9:30:01, that number crosses 50,000. Each trade carries a timestamp — but whose clock generated it, and which of those timestamps actually belong in the first 30-second candle?
This is not a philosophical question. It is the reason your backtest generates a Sharpe of 1.87 while your live account bleeds for six weeks before you discover the discrepancy. The aggregation rule that turns a stream of individual trades into a candlestick is not standardized across data vendors. What one vendor calls the 9:30 candle, another calls the 9:30:01 candle. And if you are building your own K-lines from raw tick data — congratulations — you have inherited all the edge cases without any of the institutional knowledge about how they were resolved.
This article dissects the three structural sources of K-line divergence: wall-clock ambiguity, SIP timestamp filtering, and vendor-specific aggregation conventions. It provides production-grade code for aligning tick streams to official candles, and it quantifies the backtesting bias introduced when alignment is done incorrectly.
1. The Aggregation Problem: What Is a 1-Minute Candle, Exactly?
A candlestick (OHLCV bar) represents the price action within a defined time boundary. For a 1-minute bar, that boundary is nominally 60 seconds. But US equity markets do not operate on a clean 60-second grid.
The market open is not a timestamp. It is a sequence of events.
At 9:30:00.000 AM ET, the opening auction concludes for most US exchange-listed securities. The SIP (Securities Information Processor) disseminates the official opening print. But individual exchange prints — the raw trades reported by FINRA from off-exchange venues, dark pools, and exchange-specific prints — continue arriving with timestamps that may be slightly before or after the official open, depending on the venue and the clock synchronization of the reporting entity.
When a quant researcher pulls a batch of trades and aggregates them into 1-minute candles using a naive windowing function, the resulting open, high, low, close, and volume can diverge significantly from the official SIP-consolidated candle. The divergence is not random noise. It is a structural artifact of how time boundaries are defined and which trades are included.
1.1 The Three Alignment Schools
Every data vendor and every self-built aggregation system implicitly commits to one of three time-alignment philosophies:
| Alignment type | Definition | Common in |
|---|---|---|
| Wall-clock alignment | Candle boundaries fall on exact clock time (e.g., 9:30:00.000, 9:31:00.000) | Most third-party APIs, streaming platforms |
| Trading-session alignment | Candle boundaries are offset to align with the trading session's official start (e.g., 9:30:00.000 ET, but accounting for early Rundown prints) | Bloomberg, Refinitiv |
| Trade-print alignment | Candle boundaries are anchored to the first and last trade print within the session, regardless of clock time | Custom quant systems, proprietary feeds |
The problem: most quant frameworks assume wall-clock alignment because it is computationally convenient. But official OHLCV data from the SIP — the canonical source for US equity pricing — uses a hybrid model that filters and consolidates prints before applying session-based alignment.
2. SIP Filtering: Which Trades Actually Count?
The Securities Information Processor is the consolidated tape infrastructure for US equities. Every exchange, ATS, and FINRA-reported venue submits trade prints to the SIP, which then disseminates a consolidated NBBO (National Best Bid and Offer) and trade tape. However, the SIP does not merely concatenate all incoming prints. It applies a set of correction, cancellation, and duplicate-elimination rules that can alter which trades land in the official record.
2.1 The SIP Trade-Type Taxonomy
Understanding which trade prints survive the SIP filter is essential for any researcher who downloads historical trades and attempts to reconstruct official candles.
| SIP code | Meaning | Included in OHLCV? | Impact on candle |
|---|---|---|---|
0 |
Regular sale | Yes | Standard candle constituent |
E |
Exchange correction | Conditional | May retroactively modify prior print |
C |
Cancel | No | Never in candles; removes prior print |
F |
Flat round lot | Yes | Standard |
G |
Odd lot | Conditionally | Mixed — some vendors include, some exclude |
K |
Rule 127 / NYSE floor | Conditional | Vendor-dependent |
Z |
Aggregated off-exchange | Conditional | FINRA ADF prints; inclusion varies by vendor |
4 |
Derivatively priced | Conditionally | Options-adjusted prices; must be excluded for equity-only candles |
7 |
Closing print | Yes | Included in close price; may be excluded from intraday candles depending on vendor |
The practical implication: If you are downloading "trade" data from a vendor and aggregating candles yourself, you must know which SIP codes are included in that feed. A vendor that includes odd-lot prints (G codes) in its trade feed will generate different volume totals than one that excludes them. The candle high and low will differ if an odd-lot print fell outside the exchange-reported range.
2.2 The Late-Print Problem
Trade prints arrive at the SIP with timestamps. Some prints — particularly those from slower reporting venues — arrive after the candle in which they were executed has technically closed. The SIP applies a latency window (typically 3.8 milliseconds for the UTP tape, 1.5 milliseconds for the CTA/CQ tape) before closing a candle boundary. Prints arriving within this window are retroactively included.
If your tick data vendor reports trades with raw exchange timestamps (without SIP consolidation), you may find prints with execution timestamps of 9:30:00.002 that your vendor stamps as 9:30:00.347 — because that is when the venue submitted the report, not when the trade occurred. These late-arriving prints can land in the wrong candle bucket depending on whether your aggregation logic uses execution time or receipt time.
A concrete example:
Exchange A prints at 9:30:00.000 ET — executed at 9:29:59.998, reported at 9:30:00.201
Exchange B prints at 9:30:00.050 ET — executed at 9:30:00.050, reported at 9:30:00.212
Exchange C prints at 9:30:00.800 ET — executed at 9:30:00.800, reported at 9:30:01.100 (late)
Wall-clock aggregator: 9:30:00.000 – 9:30:59.999 includes all three prints.
SIP-consolidated: Exchange A and B included in 9:30 candle; Exchange C classified as late print.
Custom tick aggregator using receipt timestamps: Exchange C incorrectly placed in 9:30 candle.
Custom tick aggregator using execution timestamps: All three correctly in 9:30 candle.
3. The Alignment Problem: Clock Sources and Their Divergence
3.1 Three Clocks in Every Trade Stream
Every trade print traverses a chain of three clock domains:
- Exchange timestamp: When the match occurred on the matching engine. Highly accurate, synchronized via GPS or atomic clocks.
- SIP dissemination timestamp: When the SIP received and processed the print. Subject to SIP processing latency (see above).
- Vendor delivery timestamp: When your data vendor's infrastructure recorded the receipt. Varies by vendor architecture — some use exchange timestamps, some use SIP timestamps, some use their own internal clock.
For building candles that match official charts, the exchange timestamp is the authoritative source. For latency-sensitive trading, the SIP dissemination timestamp is often more relevant. For debugging data delivery issues, the vendor timestamp is necessary.
Most retail-grade APIs — including many aggregators that wrap exchange feeds — expose only the vendor timestamp or a sanitized "event time" field that obscures the distinction. This creates a silent mismatch when researchers attempt to reconstruct candles from these feeds.
3.2 The US Market Session Offset
US equity trading sessions are defined by official exchange rules, not by absolute time. The regular session runs 9:30:00–16:00:00 ET. But the trading day includes three additional windows that complicate candle aggregation:
- Pre-market: 4:00–9:30:00 ET (exchange-specific, not all venues)
- Opening auction: 9:30:00 ET (consolidated by SIP; individual prints may carry earlier timestamps)
- Closing auction: 16:00:00 ET (VOL auction, significant for certain stocks)
- After-hours: 16:00:00–20:00:00 ET (exchange-specific)
If you are aggregating candles across the full trading day, you must decide whether the 9:30:00 candle boundary represents the session open (with late-opening-auction prints included) or the first post-auction trade print. This decision alone can shift the open price by several basis points for volatile names.
For backtesting purposes, the standard convention — and the one used by TickDB's kline endpoint — is trading-session alignment: candle N covers the time window from session_open + (N-1) * interval to session_open + N * interval - 1 microsecond. All trade prints with exchange timestamps within this window are included, with SIP consolidation applied.
4. Quantifying the Backtesting Bias
The divergence between self-aggregated and official candles is not merely an academic curiosity. It directly affects backtesting results in three measurable ways.
4.1 Price-Level Bias
The open and close of a candle are the most sensitive points to aggregation differences. A self-aggregated candle that includes a late print at the open will show a different open price than the official candle. Over 252 trading days, this can compound into a systematic return bias.
Illustrative scenario: A stock gaps up 2% at the open due to a news event. A wall-clock aggregator that includes pre-market prints in the 9:30 candle will show a wider range (higher high, higher low) and different VWAP than a session-aligned aggregator that separates pre-market from regular-session candles. A mean-reversion strategy that enters on VWAP crosses will generate different signals.
| Metric | Wall-clock aggregation | SIP session-aligned | Bias introduced |
|---|---|---|---|
| Open price (volatile stock) | ±3 bps vs. official | ±0.5 bps | Understated volatility in session-aligned |
| High price | +5 bps (includes late prints) | ±1 bp | Self-aggregated overstates intraday range |
| Volume | +8–15% (includes odd lots) | Varies by vendor filter | VWAP systematically biased |
| VWAP | ±2–4 bps | ±0.5 bps | Crossover signals fire at wrong prices |
4.2 Volume Bias
The inclusion or exclusion of odd-lot prints (G codes) and FINRA ADF prints (Z codes) is the single largest source of volume discrepancy. Odd lots — trades of fewer than 100 shares — account for approximately 15–20% of all trade prints by count but represent only 3–5% of total dollar volume. Most institutional feeds exclude them because they are considered noise.
However, for stocks trading below $1 or above $500 per share, odd-lot prints can create outsized intraday volume discrepancies. A self-aggregated candle that includes odd lots will show higher volume than the official candle, which will bias any strategy that uses volume as an input (e.g., volume-weighted mean reversion, on-balance volume).
4.3 Timing Bias
Perhaps the most insidious bias is the timing effect. If your self-aggregated candles use wall-clock alignment while your live trading system receives session-aligned ticks, your backtest signals will fire at different timestamps than your live system — even when using the same code, the same parameters, and the same stock.
A strategy that backtests with a Sharpe of 1.4 using wall-clock candles might generate a live Sharpe of 0.7 or lower because the actual signal timestamps differ by 100–500 milliseconds. In high-frequency event-driven strategies, this is the difference between a profitable signal and a filled-at-worse-price execution that wipes out the edge.
5. Production-Grade Aggregation: Matching Official Candles
The solution is not to avoid tick aggregation. It is to implement it correctly, with explicit alignment rules, SIP filtering, and a validation layer that compares your aggregated candles against a known-good source.
The following code implements a session-aligned candle aggregator that matches SIP conventions for US equities. It is written for clarity and correctness — not brevity.
5.1 Core Aggregation Engine
import os
import time
import hmac
import hashlib
import requests
import sqlite3
from datetime import datetime, timezone, timedelta
from collections import defaultdict
from dataclasses import dataclass, field
from typing import List, Optional, Dict, Any
from threading import Lock
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
logger = logging.getLogger("candle_aggregator")
# ⚠️ This implementation is for OHLCV aggregation from TickDB trades data.
# For US equities specifically, note that TickDB's trades endpoint covers
# HK and crypto markets — US equity trades require kline aggregation from
# the /v1/market/kline endpoint. This code demonstrates the correct
# aggregation architecture for any market where tick-level trade data
# is available, and serves as a reference for understanding the alignment
# principles described in this article.
@dataclass
class Trade:
"""A single trade print with validated timestamp and SIP codes."""
timestamp: datetime # Exchange execution timestamp (timezone-aware)
price: float
volume: int
side: str # "buy" or "sell"
exchange: str # Venue code
sip_code: str # SIP trade condition code
trade_id: str # Unique identifier for dedup
def is_included(self, included_codes: set) -> bool:
"""Check if this trade passes SIP inclusion filter."""
return self.sip_code in included_codes
def is_odd_lot(self) -> bool:
"""Odd lots are < 100 shares and often excluded from official candles."""
return self.sip_code == "G"
def is_closing_print(self) -> bool:
"""Closing prints may be excluded from intraday candles."""
return self.sip_code == "7"
@dataclass
class Candle:
"""A single OHLCV bar with session-aligned boundaries."""
open_time: datetime
close_time: datetime
open_price: float = 0.0
high_price: float = 0.0
low_price: float = float("inf")
close_price: float = 0.0
volume: int = 0
trade_count: int = 0
buy_volume: int = 0
sell_volume: int = 0
included_codes: set = field(default_factory=set)
excluded_odd_lots: int = 0
excluded_closing: int = 0
def update(self, trade: Trade, include_odd_lots: bool = False,
include_closing: bool = False) -> None:
"""Update candle with a single trade, applying SIP filter rules."""
# Apply SIP code filtering
if trade.is_odd_lot() and not include_odd_lots:
self.excluded_odd_lots += 1
return
if trade.is_closing_print() and not include_closing:
self.excluded_closing += 1
return
# Update OHLCV
self.high_price = max(self.high_price, trade.price)
self.low_price = min(self.low_price, trade.price)
self.volume += trade.volume
self.trade_count += 1
if self.open_price == 0.0:
self.open_price = trade.price
self.close_price = trade.price
# Track buy/sell pressure
if trade.side == "buy":
self.buy_volume += trade.volume
elif trade.side == "sell":
self.sell_volume += trade.volume
self.included_codes.add(trade.sip_code)
class CandleAggregator:
"""
Session-aligned OHLCV candle aggregator matching SIP conventions.
Alignment rules:
1. Candle boundaries are anchored to the trading session open, not wall clock.
2. Trades are bucketed by exchange execution timestamp (not receipt time).
3. SIP trade condition codes are filtered per configuration.
4. Late prints arriving after a candle boundary are NOT retroactively added.
"""
# Default SIP codes included in official OHLCV (CTA/CQ tape for listed equities)
DEFAULT_INCLUDED_CODES = {"0", "E", "F", "K", "Z"}
# Codes requiring special handling
CONDITIONAL_CODES = {"G", "4", "7"} # Odd lots, derivatively priced, closing prints
def __init__(
self,
symbol: str,
interval_seconds: int = 60,
session_open: Optional[datetime] = None,
session_close: Optional[datetime] = None,
include_odd_lots: bool = False,
include_closing_prints: bool = False,
timezone_str: str = "America/New_York"
):
self.symbol = symbol
self.interval_seconds = interval_seconds
self.include_odd_lots = include_odd_lots
self.include_closing_prints = include_closing_prints
# Load timezone for session boundary calculations
from dateutil import tz
self.tz = tz.gettz(timezone_str)
self.session_open = session_open
self.session_close = session_close
self.candles: Dict[int, Candle] = {}
self._lock = Lock()
self._stats = {
"trades_processed": 0,
"trades_included": 0,
"trades_excluded": 0,
"candles_generated": 0
}
def _get_candle_key(self, timestamp: datetime) -> int:
"""
Compute the candle bucket key for a given timestamp.
CRITICAL: This uses SESSION-OPEN alignment, not wall-clock alignment.
The candle covering 9:30:00–9:30:59.999 has key 0.
The candle covering 9:31:00–9:31:59.999 has key 1.
"""
if self.session_open is None:
raise ValueError("session_open must be set before processing trades")
# Compute seconds since session open
delta = timestamp - self.session_open
total_seconds = int(delta.total_seconds())
if total_seconds < 0:
logger.warning(
f"Trade at {timestamp} precedes session open {self.session_open}. "
f"Excluding from aggregation."
)
return -1
# Integer division by interval gives the bucket number
bucket = total_seconds // self.interval_seconds
return bucket
def _get_candle_times(self, bucket: int) -> tuple[datetime, datetime]:
"""Get the open and close time for a given candle bucket."""
open_time = self.session_open + timedelta(seconds=bucket * self.interval_seconds)
close_time = open_time + timedelta(seconds=self.interval_seconds)
return open_time, close_time
def add_trade(self, trade: Trade) -> None:
"""Thread-safe addition of a trade to the appropriate candle."""
self._stats["trades_processed"] += 1
bucket = self._get_candle_key(trade.timestamp)
if bucket < 0:
self._stats["trades_excluded"] += 1
return
with self._lock:
if bucket not in self.candles:
open_time, close_time = self._get_candle_times(bucket)
self.candles[bucket] = Candle(
open_time=open_time,
close_time=close_time
)
self._stats["candles_generated"] = max(
self._stats["candles_generated"], bucket + 1
)
self.candles[bucket].update(
trade,
include_odd_lots=self.include_odd_lots,
include_closing=self.include_closing_prints
)
self._stats["trades_included"] += 1
def add_trades_batch(self, trades: List[Trade]) -> None:
"""Process a batch of trades — trades MUST be pre-sorted by timestamp."""
for trade in trades:
self.add_trade(trade)
def get_candles(self) -> List[Candle]:
"""Return all generated candles sorted by open time."""
with self._lock:
return [self.candles[k] for k in sorted(self.candles.keys())]
def get_stats(self) -> Dict[str, int]:
"""Return processing statistics."""
return self._stats.copy()
def validate_against_reference(
self,
reference_candles: List[Dict[str, Any]],
tolerance_bps: float = 5.0
) -> Dict[str, Any]:
"""
Compare aggregated candles against a known-good reference source.
Args:
reference_candles: List of dicts with 'open_time', 'high', 'low', 'close', 'volume'
tolerance_bps: Acceptable deviation in basis points
Returns:
Validation report with per-candle discrepancies
"""
aggregated = self.get_candles()
discrepancies = []
for ref in reference_candles:
ref_time = ref["open_time"]
# Find matching aggregated candle
match = next(
(c for c in aggregated if c.open_time == ref_time),
None
)
if match is None:
discrepancies.append({
"time": ref_time,
"status": "MISSING",
"message": "No aggregated candle found for reference timestamp"
})
continue
# Compare price levels
price_fields = ["open_price", "high_price", "low_price", "close_price"]
for field in price_fields:
ref_val = ref.get(field.replace("_price", "_price"))
agg_val = getattr(match, field)
if ref_val and agg_val:
diff_bps = abs(ref_val - agg_val) / ref_val * 10000
if diff_bps > tolerance_bps:
discrepancies.append({
"time": ref_time,
"status": "PRICE_MISMATCH",
"field": field,
"reference": ref_val,
"aggregated": agg_val,
"diff_bps": round(diff_bps, 2)
})
# Compare volume
vol_diff_pct = abs(ref.get("volume", 0) - match.volume) / max(ref.get("volume", 1), 1) * 100
if vol_diff_pct > 10.0: # Volume tolerance of 10%
discrepancies.append({
"time": ref_time,
"status": "VOLUME_MISMATCH",
"reference_volume": ref.get("volume"),
"aggregated_volume": match.volume,
"diff_pct": round(vol_diff_pct, 2)
})
return {
"total_reference": len(reference_candles),
"total_aggregated": len(aggregated),
"discrepancies": discrepancies,
"match_rate": round(
(len(reference_candles) - len([d for d in discrepancies if d["status"] == "MISSING"]))
/ max(len(reference_candles), 1) * 100, 2
)
}
5.2 TickDB Integration with Session-Aligned Fetch
import os
import requests
from datetime import datetime, timezone, timedelta
from typing import Optional, Dict, Any, List
# ⚠️ This code demonstrates correct candle-building using TickDB's kline endpoint.
# TickDB provides pre-aggregated 1m/5m/1h/1d klines for US equities (cleaned and
# SIP-aligned). For backtesting, use /v1/market/kline with start/end timestamps.
# For live monitoring, use /v1/market/kline/latest — not /v1/market/kline.
# See TickDB Core Knowledge Base: Endpoint Usage Guide.
class TickDBKlineClient:
"""
Production-grade TickDB kline client with session alignment and error handling.
Key behaviors:
- Loads API key from TICKDB_API_KEY environment variable
- Handles rate limits (code 3001) with Retry-After
- Validates timestamps before constructing query params
- Separates historical backtest fetches from live latest-candle fetches
"""
BASE_URL = "https://api.tickdb.ai/v1"
def __init__(self, api_key: Optional[str] = None):
self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
if not self.api_key:
raise ValueError(
"TickDB API key not provided. Set TICKDB_API_KEY environment variable."
)
self.session = requests.Session()
self.session.headers.update({"X-API-Key": self.api_key})
self._rate_limit_until: Optional[datetime] = None
def _wait_for_rate_limit(self) -> None:
"""Enforce rate limit cooldown before sending request."""
if self._rate_limit_until:
now = datetime.now(timezone.utc)
if now < self._rate_limit_until:
wait_seconds = (self._rate_limit_until - now).total_seconds()
import time
time.sleep(wait_seconds)
self._rate_limit_until = None
def _handle_response(self, response: requests.Response) -> Dict[str, Any]:
"""Standard TickDB error handler with rate-limit awareness."""
# Handle non-200 responses
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 5))
self._rate_limit_until = datetime.now(timezone.utc) + timedelta(seconds=retry_after)
raise RateLimitError(
f"Rate limited. Retry after {retry_after} seconds."
)
if response.status_code == 401:
raise AuthError("Invalid API key. Check TICKDB_API_KEY environment variable.")
if response.status_code == 404:
raise NotFoundError("Symbol or endpoint not found.")
try:
data = response.json()
except ValueError:
raise APIError(f"Invalid JSON response: {response.text[:200]}")
# Handle TickDB internal error codes
code = data.get("code", 0)
if code == 0:
return data.get("data", data)
error_messages = {
1001: "Invalid API key",
1002: "Missing API key",
2002: "Symbol not found — verify via /v1/symbols/available",
3001: "Rate limit exceeded — use Retry-After header",
5001: "Internal server error — retry with backoff",
}
msg = error_messages.get(code, data.get("message", "Unknown error"))
raise APIError(f"TickDB error {code}: {msg}")
def get_historical_klines(
self,
symbol: str,
interval: str = "1m",
start_time: Optional[datetime] = None,
end_time: Optional[datetime] = None,
limit: int = 1000,
timeout: tuple = (3.05, 10)
) -> List[Dict[str, Any]]:
"""
Fetch historical OHLCV klines for backtesting.
CRITICAL: This endpoint is for COMPLETED periods only.
For live candles, use get_latest_candle() instead.
Args:
symbol: Trading pair (e.g., "AAPL.US", "BTC.Binance")
interval: Candle interval ("1m", "5m", "1h", "1d", "1w")
start_time: Start of fetch window (UTC)
end_time: End of fetch window (UTC)
limit: Max candles per request (max 1000)
timeout: (connect_timeout, read_timeout)
Returns:
List of OHLCV dicts with keys: open_time, open, high, low, close, volume
"""
params = {
"symbol": symbol,
"interval": interval,
"limit": min(limit, 1000)
}
if start_time:
params["start_time"] = int(start_time.timestamp() * 1000)
if end_time:
params["end_time"] = int(end_time.timestamp() * 1000)
# Validate time order
if start_time and end_time and start_time >= end_time:
raise ValueError("start_time must be before end_time")
self._wait_for_rate_limit()
url = f"{self.BASE_URL}/market/kline"
response = self.session.get(url, params=params, timeout=timeout)
return self._handle_response(response)
def get_latest_candle(
self,
symbol: str,
interval: str = "1m",
timeout: tuple = (3.05, 10)
) -> Dict[str, Any]:
"""
Fetch the CURRENT (incomplete) candle — for live monitoring, NOT backtesting.
⚠️ Do NOT use this endpoint for backtesting. Historical backtests
must use get_historical_klines() with completed period boundaries.
"""
self._wait_for_rate_limit()
params = {"symbol": symbol, "interval": interval}
url = f"{self.BASE_URL}/market/kline/latest"
response = self.session.get(url, params=params, timeout=timeout)
return self._handle_response(response)
class RateLimitError(Exception):
"""Raised when TickDB rate limit (code 3001) is encountered."""
pass
class AuthError(Exception):
"""Raised on authentication failure (code 1001/1002)."""
pass
class NotFoundError(Exception):
"""Raised when symbol or endpoint not found (code 2002)."""
pass
class APIError(Exception):
"""Raised on any other TickDB API error."""
pass
5.3 Aggregation vs. Native Klines: When to Use Which
| Scenario | Recommended approach | Rationale |
|---|---|---|
| Strategy backtesting | TickDB /v1/market/kline |
Pre-cleaned, SIP-aligned, validated across 10+ years |
| Custom indicator (e.g., VWAP from tick data) | Custom aggregation with session alignment | Tick-level data needed; must apply same alignment rules |
| Live signal monitoring | TickDB /v1/market/kline/latest + WebSocket |
Official candle close + real-time tick stream |
| Comparing vendor data quality | Custom aggregator vs. TickDB kline validation | Benchmark against the known-good source |
| Research into odd-lot behavior | Custom aggregation with odd-lots included | Not available in official candles; requires raw tick data |
6. Cross-Vendor Alignment Comparison
Different vendors apply different conventions to the aggregation problem. The table below summarizes the key alignment decisions for major US equity data sources.
| Vendor | Time alignment | SIP codes filtered | Odd lots | Late prints | US equity coverage |
|---|---|---|---|---|---|
| TickDB | Session-aligned (exchange timestamps) | Standard CTA/CQ | Excluded | Classified to correct bucket | OHLCV: Yes. Trades: No (HK/crypto only) |
| Polygon | Wall-clock by default; session-optional | Configurable | Optional | Included in original bucket | Full coverage |
| Alpaca | Wall-clock | Standard | Excluded | May appear in wrong bucket | US equities only |
| Interactive Brokers | Exchange-reported | Per-exchange rules | Excluded | Latency-dependent | Full coverage |
| Custom (DIY from exchange feed) | User-defined | User-defined | User-defined | User-defined | Limited by feed license |
Critical reminder: TickDB's trades endpoint does not cover US equities. For US equity tick data, use the kline endpoint for OHLCV aggregation. For HK equity and crypto markets, the trades endpoint provides tick-level data suitable for custom aggregation.
7. Practical Validation Framework
Before deploying any backtest that relies on self-aggregated candles, run the following three-step validation:
Step 1: Download a reference candle set
Fetch 20–50 candles from TickDB's /v1/market/kline for the same symbol, interval, and time range. These are your ground truth.
Step 2: Aggregate your tick data using your current logic
Run your existing aggregation code on the same time range.
Step 3: Compare and diagnose
def run_alignment_diagnostics(
symbol: str,
start: datetime,
end: datetime,
interval: str = "1m",
tolerance_bps: float = 5.0
) -> Dict[str, Any]:
"""
End-to-end diagnostic: compare TickDB klines against self-aggregated candles.
Run this before any backtest deployment. If discrepancies exceed
tolerance, either fix your aggregation logic or use TickDB klines directly.
"""
client = TickDBKlineClient()
# Step 1: Fetch reference candles
reference = client.get_historical_klines(
symbol=symbol,
interval=interval,
start_time=start,
end_time=end
)
logger.info(f"Fetched {len(reference)} reference candles from TickDB")
# Step 2: Fetch trades and aggregate (placeholder — implement with your tick source)
# trades = fetch_trades_from_your_source(symbol, start, end)
# aggregator = CandleAggregator(symbol=symbol, session_open=session_open)
# aggregator.add_trades_batch(trades)
# aggregated = aggregator.get_candles()
# Step 3: Validate
validation = aggregator.validate_against_reference(reference, tolerance_bps)
# Report
logger.info(
f"Alignment validation: {validation['match_rate']}% match rate. "
f"{len(validation['discrepancies'])} discrepancies found."
)
if validation['discrepancies']:
for d in validation['discrepancies'][:5]: # Show first 5
logger.warning(f" {d}")
return validation
8. Key Takeaways
The candlestick is not a neutral container. The rules that define its boundaries — which clock, which trades, which aggregation convention — are engineering decisions that directly determine whether your backtest reflects reality.
Three principles for avoiding the aggregation trap:
Align to the session, not the wall clock. Candle boundaries anchored to the trading session open (9:30:00 ET for US equities) produce candles that match official charts. Wall-clock alignment is a convenience that introduces systematic timestamp drift.
Validate against a known-good source. Before trusting self-aggregated candles in a backtest, compare them against TickDB's SIP-aligned klines. Discrepancies above 5 basis points in price or 10% in volume indicate a structural problem in your aggregation logic.
For US equity OHLCV, use the canonical source. TickDB's
/v1/market/klineendpoint provides 10+ years of cleaned, session-aligned OHLCV data for US equities. The trades endpoint is available for HK equities and crypto markets where tick-level analysis is required. Use the trades endpoint for US equities only if you are building custom indicators that cannot be derived from OHLCV — and apply the aggregation rules described in this article if you do.
The timestamp on your data is not metadata. It is the architecture.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results.