Price is the effect. The order book is the cause. But the timestamp? The timestamp is the silent arbiter of everything.

At 9:30:00.123 AM ET on a typical trading day, approximately 4,200 US equity trades execute across all listed securities. By 9:30:01, that number crosses 50,000. Each trade carries a timestamp — but whose clock generated it, and which of those timestamps actually belong in the first 30-second candle?

This is not a philosophical question. It is the reason your backtest generates a Sharpe of 1.87 while your live account bleeds for six weeks before you discover the discrepancy. The aggregation rule that turns a stream of individual trades into a candlestick is not standardized across data vendors. What one vendor calls the 9:30 candle, another calls the 9:30:01 candle. And if you are building your own K-lines from raw tick data — congratulations — you have inherited all the edge cases without any of the institutional knowledge about how they were resolved.

This article dissects the three structural sources of K-line divergence: wall-clock ambiguity, SIP timestamp filtering, and vendor-specific aggregation conventions. It provides production-grade code for aligning tick streams to official candles, and it quantifies the backtesting bias introduced when alignment is done incorrectly.


1. The Aggregation Problem: What Is a 1-Minute Candle, Exactly?

A candlestick (OHLCV bar) represents the price action within a defined time boundary. For a 1-minute bar, that boundary is nominally 60 seconds. But US equity markets do not operate on a clean 60-second grid.

The market open is not a timestamp. It is a sequence of events.

At 9:30:00.000 AM ET, the opening auction concludes for most US exchange-listed securities. The SIP (Securities Information Processor) disseminates the official opening print. But individual exchange prints — the raw trades reported by FINRA from off-exchange venues, dark pools, and exchange-specific prints — continue arriving with timestamps that may be slightly before or after the official open, depending on the venue and the clock synchronization of the reporting entity.

When a quant researcher pulls a batch of trades and aggregates them into 1-minute candles using a naive windowing function, the resulting open, high, low, close, and volume can diverge significantly from the official SIP-consolidated candle. The divergence is not random noise. It is a structural artifact of how time boundaries are defined and which trades are included.

1.1 The Three Alignment Schools

Every data vendor and every self-built aggregation system implicitly commits to one of three time-alignment philosophies:

Alignment type Definition Common in
Wall-clock alignment Candle boundaries fall on exact clock time (e.g., 9:30:00.000, 9:31:00.000) Most third-party APIs, streaming platforms
Trading-session alignment Candle boundaries are offset to align with the trading session's official start (e.g., 9:30:00.000 ET, but accounting for early Rundown prints) Bloomberg, Refinitiv
Trade-print alignment Candle boundaries are anchored to the first and last trade print within the session, regardless of clock time Custom quant systems, proprietary feeds

The problem: most quant frameworks assume wall-clock alignment because it is computationally convenient. But official OHLCV data from the SIP — the canonical source for US equity pricing — uses a hybrid model that filters and consolidates prints before applying session-based alignment.


2. SIP Filtering: Which Trades Actually Count?

The Securities Information Processor is the consolidated tape infrastructure for US equities. Every exchange, ATS, and FINRA-reported venue submits trade prints to the SIP, which then disseminates a consolidated NBBO (National Best Bid and Offer) and trade tape. However, the SIP does not merely concatenate all incoming prints. It applies a set of correction, cancellation, and duplicate-elimination rules that can alter which trades land in the official record.

2.1 The SIP Trade-Type Taxonomy

Understanding which trade prints survive the SIP filter is essential for any researcher who downloads historical trades and attempts to reconstruct official candles.

SIP code Meaning Included in OHLCV? Impact on candle
0 Regular sale Yes Standard candle constituent
E Exchange correction Conditional May retroactively modify prior print
C Cancel No Never in candles; removes prior print
F Flat round lot Yes Standard
G Odd lot Conditionally Mixed — some vendors include, some exclude
K Rule 127 / NYSE floor Conditional Vendor-dependent
Z Aggregated off-exchange Conditional FINRA ADF prints; inclusion varies by vendor
4 Derivatively priced Conditionally Options-adjusted prices; must be excluded for equity-only candles
7 Closing print Yes Included in close price; may be excluded from intraday candles depending on vendor

The practical implication: If you are downloading "trade" data from a vendor and aggregating candles yourself, you must know which SIP codes are included in that feed. A vendor that includes odd-lot prints (G codes) in its trade feed will generate different volume totals than one that excludes them. The candle high and low will differ if an odd-lot print fell outside the exchange-reported range.

2.2 The Late-Print Problem

Trade prints arrive at the SIP with timestamps. Some prints — particularly those from slower reporting venues — arrive after the candle in which they were executed has technically closed. The SIP applies a latency window (typically 3.8 milliseconds for the UTP tape, 1.5 milliseconds for the CTA/CQ tape) before closing a candle boundary. Prints arriving within this window are retroactively included.

If your tick data vendor reports trades with raw exchange timestamps (without SIP consolidation), you may find prints with execution timestamps of 9:30:00.002 that your vendor stamps as 9:30:00.347 — because that is when the venue submitted the report, not when the trade occurred. These late-arriving prints can land in the wrong candle bucket depending on whether your aggregation logic uses execution time or receipt time.

A concrete example:

Exchange A prints at 9:30:00.000 ET — executed at 9:29:59.998, reported at 9:30:00.201
Exchange B prints at 9:30:00.050 ET — executed at 9:30:00.050, reported at 9:30:00.212
Exchange C prints at 9:30:00.800 ET — executed at 9:30:00.800, reported at 9:30:01.100 (late)

Wall-clock aggregator: 9:30:00.000 – 9:30:59.999 includes all three prints.
SIP-consolidated: Exchange A and B included in 9:30 candle; Exchange C classified as late print.
Custom tick aggregator using receipt timestamps: Exchange C incorrectly placed in 9:30 candle.
Custom tick aggregator using execution timestamps: All three correctly in 9:30 candle.

3. The Alignment Problem: Clock Sources and Their Divergence

3.1 Three Clocks in Every Trade Stream

Every trade print traverses a chain of three clock domains:

  1. Exchange timestamp: When the match occurred on the matching engine. Highly accurate, synchronized via GPS or atomic clocks.
  2. SIP dissemination timestamp: When the SIP received and processed the print. Subject to SIP processing latency (see above).
  3. Vendor delivery timestamp: When your data vendor's infrastructure recorded the receipt. Varies by vendor architecture — some use exchange timestamps, some use SIP timestamps, some use their own internal clock.

For building candles that match official charts, the exchange timestamp is the authoritative source. For latency-sensitive trading, the SIP dissemination timestamp is often more relevant. For debugging data delivery issues, the vendor timestamp is necessary.

Most retail-grade APIs — including many aggregators that wrap exchange feeds — expose only the vendor timestamp or a sanitized "event time" field that obscures the distinction. This creates a silent mismatch when researchers attempt to reconstruct candles from these feeds.

3.2 The US Market Session Offset

US equity trading sessions are defined by official exchange rules, not by absolute time. The regular session runs 9:30:00–16:00:00 ET. But the trading day includes three additional windows that complicate candle aggregation:

  • Pre-market: 4:00–9:30:00 ET (exchange-specific, not all venues)
  • Opening auction: 9:30:00 ET (consolidated by SIP; individual prints may carry earlier timestamps)
  • Closing auction: 16:00:00 ET (VOL auction, significant for certain stocks)
  • After-hours: 16:00:00–20:00:00 ET (exchange-specific)

If you are aggregating candles across the full trading day, you must decide whether the 9:30:00 candle boundary represents the session open (with late-opening-auction prints included) or the first post-auction trade print. This decision alone can shift the open price by several basis points for volatile names.

For backtesting purposes, the standard convention — and the one used by TickDB's kline endpoint — is trading-session alignment: candle N covers the time window from session_open + (N-1) * interval to session_open + N * interval - 1 microsecond. All trade prints with exchange timestamps within this window are included, with SIP consolidation applied.


4. Quantifying the Backtesting Bias

The divergence between self-aggregated and official candles is not merely an academic curiosity. It directly affects backtesting results in three measurable ways.

4.1 Price-Level Bias

The open and close of a candle are the most sensitive points to aggregation differences. A self-aggregated candle that includes a late print at the open will show a different open price than the official candle. Over 252 trading days, this can compound into a systematic return bias.

Illustrative scenario: A stock gaps up 2% at the open due to a news event. A wall-clock aggregator that includes pre-market prints in the 9:30 candle will show a wider range (higher high, higher low) and different VWAP than a session-aligned aggregator that separates pre-market from regular-session candles. A mean-reversion strategy that enters on VWAP crosses will generate different signals.

Metric Wall-clock aggregation SIP session-aligned Bias introduced
Open price (volatile stock) ±3 bps vs. official ±0.5 bps Understated volatility in session-aligned
High price +5 bps (includes late prints) ±1 bp Self-aggregated overstates intraday range
Volume +8–15% (includes odd lots) Varies by vendor filter VWAP systematically biased
VWAP ±2–4 bps ±0.5 bps Crossover signals fire at wrong prices

4.2 Volume Bias

The inclusion or exclusion of odd-lot prints (G codes) and FINRA ADF prints (Z codes) is the single largest source of volume discrepancy. Odd lots — trades of fewer than 100 shares — account for approximately 15–20% of all trade prints by count but represent only 3–5% of total dollar volume. Most institutional feeds exclude them because they are considered noise.

However, for stocks trading below $1 or above $500 per share, odd-lot prints can create outsized intraday volume discrepancies. A self-aggregated candle that includes odd lots will show higher volume than the official candle, which will bias any strategy that uses volume as an input (e.g., volume-weighted mean reversion, on-balance volume).

4.3 Timing Bias

Perhaps the most insidious bias is the timing effect. If your self-aggregated candles use wall-clock alignment while your live trading system receives session-aligned ticks, your backtest signals will fire at different timestamps than your live system — even when using the same code, the same parameters, and the same stock.

A strategy that backtests with a Sharpe of 1.4 using wall-clock candles might generate a live Sharpe of 0.7 or lower because the actual signal timestamps differ by 100–500 milliseconds. In high-frequency event-driven strategies, this is the difference between a profitable signal and a filled-at-worse-price execution that wipes out the edge.


5. Production-Grade Aggregation: Matching Official Candles

The solution is not to avoid tick aggregation. It is to implement it correctly, with explicit alignment rules, SIP filtering, and a validation layer that compares your aggregated candles against a known-good source.

The following code implements a session-aligned candle aggregator that matches SIP conventions for US equities. It is written for clarity and correctness — not brevity.

5.1 Core Aggregation Engine

import os
import time
import hmac
import hashlib
import requests
import sqlite3
from datetime import datetime, timezone, timedelta
from collections import defaultdict
from dataclasses import dataclass, field
from typing import List, Optional, Dict, Any
from threading import Lock
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
logger = logging.getLogger("candle_aggregator")

# ⚠️ This implementation is for OHLCV aggregation from TickDB trades data.
# For US equities specifically, note that TickDB's trades endpoint covers
# HK and crypto markets — US equity trades require kline aggregation from
# the /v1/market/kline endpoint. This code demonstrates the correct
# aggregation architecture for any market where tick-level trade data
# is available, and serves as a reference for understanding the alignment
# principles described in this article.

@dataclass
class Trade:
    """A single trade print with validated timestamp and SIP codes."""
    timestamp: datetime          # Exchange execution timestamp (timezone-aware)
    price: float
    volume: int
    side: str                    # "buy" or "sell"
    exchange: str                # Venue code
    sip_code: str                 # SIP trade condition code
    trade_id: str                # Unique identifier for dedup

    def is_included(self, included_codes: set) -> bool:
        """Check if this trade passes SIP inclusion filter."""
        return self.sip_code in included_codes

    def is_odd_lot(self) -> bool:
        """Odd lots are < 100 shares and often excluded from official candles."""
        return self.sip_code == "G"

    def is_closing_print(self) -> bool:
        """Closing prints may be excluded from intraday candles."""
        return self.sip_code == "7"


@dataclass
class Candle:
    """A single OHLCV bar with session-aligned boundaries."""
    open_time: datetime
    close_time: datetime
    open_price: float = 0.0
    high_price: float = 0.0
    low_price: float = float("inf")
    close_price: float = 0.0
    volume: int = 0
    trade_count: int = 0
    buy_volume: int = 0
    sell_volume: int = 0
    included_codes: set = field(default_factory=set)
    excluded_odd_lots: int = 0
    excluded_closing: int = 0

    def update(self, trade: Trade, include_odd_lots: bool = False,
               include_closing: bool = False) -> None:
        """Update candle with a single trade, applying SIP filter rules."""
        # Apply SIP code filtering
        if trade.is_odd_lot() and not include_odd_lots:
            self.excluded_odd_lots += 1
            return
        if trade.is_closing_print() and not include_closing:
            self.excluded_closing += 1
            return

        # Update OHLCV
        self.high_price = max(self.high_price, trade.price)
        self.low_price = min(self.low_price, trade.price)
        self.volume += trade.volume
        self.trade_count += 1

        if self.open_price == 0.0:
            self.open_price = trade.price
        self.close_price = trade.price

        # Track buy/sell pressure
        if trade.side == "buy":
            self.buy_volume += trade.volume
        elif trade.side == "sell":
            self.sell_volume += trade.volume

        self.included_codes.add(trade.sip_code)


class CandleAggregator:
    """
    Session-aligned OHLCV candle aggregator matching SIP conventions.

    Alignment rules:
    1. Candle boundaries are anchored to the trading session open, not wall clock.
    2. Trades are bucketed by exchange execution timestamp (not receipt time).
    3. SIP trade condition codes are filtered per configuration.
    4. Late prints arriving after a candle boundary are NOT retroactively added.
    """

    # Default SIP codes included in official OHLCV (CTA/CQ tape for listed equities)
    DEFAULT_INCLUDED_CODES = {"0", "E", "F", "K", "Z"}
    # Codes requiring special handling
    CONDITIONAL_CODES = {"G", "4", "7"}  # Odd lots, derivatively priced, closing prints

    def __init__(
        self,
        symbol: str,
        interval_seconds: int = 60,
        session_open: Optional[datetime] = None,
        session_close: Optional[datetime] = None,
        include_odd_lots: bool = False,
        include_closing_prints: bool = False,
        timezone_str: str = "America/New_York"
    ):
        self.symbol = symbol
        self.interval_seconds = interval_seconds
        self.include_odd_lots = include_odd_lots
        self.include_closing_prints = include_closing_prints

        # Load timezone for session boundary calculations
        from dateutil import tz
        self.tz = tz.gettz(timezone_str)
        self.session_open = session_open
        self.session_close = session_close

        self.candles: Dict[int, Candle] = {}
        self._lock = Lock()
        self._stats = {
            "trades_processed": 0,
            "trades_included": 0,
            "trades_excluded": 0,
            "candles_generated": 0
        }

    def _get_candle_key(self, timestamp: datetime) -> int:
        """
        Compute the candle bucket key for a given timestamp.

        CRITICAL: This uses SESSION-OPEN alignment, not wall-clock alignment.
        The candle covering 9:30:00–9:30:59.999 has key 0.
        The candle covering 9:31:00–9:31:59.999 has key 1.
        """
        if self.session_open is None:
            raise ValueError("session_open must be set before processing trades")

        # Compute seconds since session open
        delta = timestamp - self.session_open
        total_seconds = int(delta.total_seconds())

        if total_seconds < 0:
            logger.warning(
                f"Trade at {timestamp} precedes session open {self.session_open}. "
                f"Excluding from aggregation."
            )
            return -1

        # Integer division by interval gives the bucket number
        bucket = total_seconds // self.interval_seconds
        return bucket

    def _get_candle_times(self, bucket: int) -> tuple[datetime, datetime]:
        """Get the open and close time for a given candle bucket."""
        open_time = self.session_open + timedelta(seconds=bucket * self.interval_seconds)
        close_time = open_time + timedelta(seconds=self.interval_seconds)
        return open_time, close_time

    def add_trade(self, trade: Trade) -> None:
        """Thread-safe addition of a trade to the appropriate candle."""
        self._stats["trades_processed"] += 1

        bucket = self._get_candle_key(trade.timestamp)
        if bucket < 0:
            self._stats["trades_excluded"] += 1
            return

        with self._lock:
            if bucket not in self.candles:
                open_time, close_time = self._get_candle_times(bucket)
                self.candles[bucket] = Candle(
                    open_time=open_time,
                    close_time=close_time
                )
                self._stats["candles_generated"] = max(
                    self._stats["candles_generated"], bucket + 1
                )

            self.candles[bucket].update(
                trade,
                include_odd_lots=self.include_odd_lots,
                include_closing=self.include_closing_prints
            )
            self._stats["trades_included"] += 1

    def add_trades_batch(self, trades: List[Trade]) -> None:
        """Process a batch of trades — trades MUST be pre-sorted by timestamp."""
        for trade in trades:
            self.add_trade(trade)

    def get_candles(self) -> List[Candle]:
        """Return all generated candles sorted by open time."""
        with self._lock:
            return [self.candles[k] for k in sorted(self.candles.keys())]

    def get_stats(self) -> Dict[str, int]:
        """Return processing statistics."""
        return self._stats.copy()

    def validate_against_reference(
        self,
        reference_candles: List[Dict[str, Any]],
        tolerance_bps: float = 5.0
    ) -> Dict[str, Any]:
        """
        Compare aggregated candles against a known-good reference source.

        Args:
            reference_candles: List of dicts with 'open_time', 'high', 'low', 'close', 'volume'
            tolerance_bps: Acceptable deviation in basis points

        Returns:
            Validation report with per-candle discrepancies
        """
        aggregated = self.get_candles()
        discrepancies = []

        for ref in reference_candles:
            ref_time = ref["open_time"]
            # Find matching aggregated candle
            match = next(
                (c for c in aggregated if c.open_time == ref_time),
                None
            )
            if match is None:
                discrepancies.append({
                    "time": ref_time,
                    "status": "MISSING",
                    "message": "No aggregated candle found for reference timestamp"
                })
                continue

            # Compare price levels
            price_fields = ["open_price", "high_price", "low_price", "close_price"]
            for field in price_fields:
                ref_val = ref.get(field.replace("_price", "_price"))
                agg_val = getattr(match, field)
                if ref_val and agg_val:
                    diff_bps = abs(ref_val - agg_val) / ref_val * 10000
                    if diff_bps > tolerance_bps:
                        discrepancies.append({
                            "time": ref_time,
                            "status": "PRICE_MISMATCH",
                            "field": field,
                            "reference": ref_val,
                            "aggregated": agg_val,
                            "diff_bps": round(diff_bps, 2)
                        })

            # Compare volume
            vol_diff_pct = abs(ref.get("volume", 0) - match.volume) / max(ref.get("volume", 1), 1) * 100
            if vol_diff_pct > 10.0:  # Volume tolerance of 10%
                discrepancies.append({
                    "time": ref_time,
                    "status": "VOLUME_MISMATCH",
                    "reference_volume": ref.get("volume"),
                    "aggregated_volume": match.volume,
                    "diff_pct": round(vol_diff_pct, 2)
                })

        return {
            "total_reference": len(reference_candles),
            "total_aggregated": len(aggregated),
            "discrepancies": discrepancies,
            "match_rate": round(
                (len(reference_candles) - len([d for d in discrepancies if d["status"] == "MISSING"]))
                / max(len(reference_candles), 1) * 100, 2
            )
        }

5.2 TickDB Integration with Session-Aligned Fetch

import os
import requests
from datetime import datetime, timezone, timedelta
from typing import Optional, Dict, Any, List

# ⚠️ This code demonstrates correct candle-building using TickDB's kline endpoint.
# TickDB provides pre-aggregated 1m/5m/1h/1d klines for US equities (cleaned and
# SIP-aligned). For backtesting, use /v1/market/kline with start/end timestamps.
# For live monitoring, use /v1/market/kline/latest — not /v1/market/kline.
# See TickDB Core Knowledge Base: Endpoint Usage Guide.

class TickDBKlineClient:
    """
    Production-grade TickDB kline client with session alignment and error handling.

    Key behaviors:
    - Loads API key from TICKDB_API_KEY environment variable
    - Handles rate limits (code 3001) with Retry-After
    - Validates timestamps before constructing query params
    - Separates historical backtest fetches from live latest-candle fetches
    """

    BASE_URL = "https://api.tickdb.ai/v1"

    def __init__(self, api_key: Optional[str] = None):
        self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
        if not self.api_key:
            raise ValueError(
                "TickDB API key not provided. Set TICKDB_API_KEY environment variable."
            )
        self.session = requests.Session()
        self.session.headers.update({"X-API-Key": self.api_key})
        self._rate_limit_until: Optional[datetime] = None

    def _wait_for_rate_limit(self) -> None:
        """Enforce rate limit cooldown before sending request."""
        if self._rate_limit_until:
            now = datetime.now(timezone.utc)
            if now < self._rate_limit_until:
                wait_seconds = (self._rate_limit_until - now).total_seconds()
                import time
                time.sleep(wait_seconds)
            self._rate_limit_until = None

    def _handle_response(self, response: requests.Response) -> Dict[str, Any]:
        """Standard TickDB error handler with rate-limit awareness."""
        # Handle non-200 responses
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 5))
            self._rate_limit_until = datetime.now(timezone.utc) + timedelta(seconds=retry_after)
            raise RateLimitError(
                f"Rate limited. Retry after {retry_after} seconds."
            )

        if response.status_code == 401:
            raise AuthError("Invalid API key. Check TICKDB_API_KEY environment variable.")

        if response.status_code == 404:
            raise NotFoundError("Symbol or endpoint not found.")

        try:
            data = response.json()
        except ValueError:
            raise APIError(f"Invalid JSON response: {response.text[:200]}")

        # Handle TickDB internal error codes
        code = data.get("code", 0)
        if code == 0:
            return data.get("data", data)

        error_messages = {
            1001: "Invalid API key",
            1002: "Missing API key",
            2002: "Symbol not found — verify via /v1/symbols/available",
            3001: "Rate limit exceeded — use Retry-After header",
            5001: "Internal server error — retry with backoff",
        }
        msg = error_messages.get(code, data.get("message", "Unknown error"))
        raise APIError(f"TickDB error {code}: {msg}")

    def get_historical_klines(
        self,
        symbol: str,
        interval: str = "1m",
        start_time: Optional[datetime] = None,
        end_time: Optional[datetime] = None,
        limit: int = 1000,
        timeout: tuple = (3.05, 10)
    ) -> List[Dict[str, Any]]:
        """
        Fetch historical OHLCV klines for backtesting.

        CRITICAL: This endpoint is for COMPLETED periods only.
        For live candles, use get_latest_candle() instead.

        Args:
            symbol: Trading pair (e.g., "AAPL.US", "BTC.Binance")
            interval: Candle interval ("1m", "5m", "1h", "1d", "1w")
            start_time: Start of fetch window (UTC)
            end_time: End of fetch window (UTC)
            limit: Max candles per request (max 1000)
            timeout: (connect_timeout, read_timeout)

        Returns:
            List of OHLCV dicts with keys: open_time, open, high, low, close, volume
        """
        params = {
            "symbol": symbol,
            "interval": interval,
            "limit": min(limit, 1000)
        }

        if start_time:
            params["start_time"] = int(start_time.timestamp() * 1000)
        if end_time:
            params["end_time"] = int(end_time.timestamp() * 1000)

        # Validate time order
        if start_time and end_time and start_time >= end_time:
            raise ValueError("start_time must be before end_time")

        self._wait_for_rate_limit()

        url = f"{self.BASE_URL}/market/kline"
        response = self.session.get(url, params=params, timeout=timeout)
        return self._handle_response(response)

    def get_latest_candle(
        self,
        symbol: str,
        interval: str = "1m",
        timeout: tuple = (3.05, 10)
    ) -> Dict[str, Any]:
        """
        Fetch the CURRENT (incomplete) candle — for live monitoring, NOT backtesting.

        ⚠️ Do NOT use this endpoint for backtesting. Historical backtests
        must use get_historical_klines() with completed period boundaries.
        """
        self._wait_for_rate_limit()

        params = {"symbol": symbol, "interval": interval}
        url = f"{self.BASE_URL}/market/kline/latest"
        response = self.session.get(url, params=params, timeout=timeout)
        return self._handle_response(response)


class RateLimitError(Exception):
    """Raised when TickDB rate limit (code 3001) is encountered."""
    pass

class AuthError(Exception):
    """Raised on authentication failure (code 1001/1002)."""
    pass

class NotFoundError(Exception):
    """Raised when symbol or endpoint not found (code 2002)."""
    pass

class APIError(Exception):
    """Raised on any other TickDB API error."""
    pass

5.3 Aggregation vs. Native Klines: When to Use Which

Scenario Recommended approach Rationale
Strategy backtesting TickDB /v1/market/kline Pre-cleaned, SIP-aligned, validated across 10+ years
Custom indicator (e.g., VWAP from tick data) Custom aggregation with session alignment Tick-level data needed; must apply same alignment rules
Live signal monitoring TickDB /v1/market/kline/latest + WebSocket Official candle close + real-time tick stream
Comparing vendor data quality Custom aggregator vs. TickDB kline validation Benchmark against the known-good source
Research into odd-lot behavior Custom aggregation with odd-lots included Not available in official candles; requires raw tick data

6. Cross-Vendor Alignment Comparison

Different vendors apply different conventions to the aggregation problem. The table below summarizes the key alignment decisions for major US equity data sources.

Vendor Time alignment SIP codes filtered Odd lots Late prints US equity coverage
TickDB Session-aligned (exchange timestamps) Standard CTA/CQ Excluded Classified to correct bucket OHLCV: Yes. Trades: No (HK/crypto only)
Polygon Wall-clock by default; session-optional Configurable Optional Included in original bucket Full coverage
Alpaca Wall-clock Standard Excluded May appear in wrong bucket US equities only
Interactive Brokers Exchange-reported Per-exchange rules Excluded Latency-dependent Full coverage
Custom (DIY from exchange feed) User-defined User-defined User-defined User-defined Limited by feed license

Critical reminder: TickDB's trades endpoint does not cover US equities. For US equity tick data, use the kline endpoint for OHLCV aggregation. For HK equity and crypto markets, the trades endpoint provides tick-level data suitable for custom aggregation.


7. Practical Validation Framework

Before deploying any backtest that relies on self-aggregated candles, run the following three-step validation:

Step 1: Download a reference candle set

Fetch 20–50 candles from TickDB's /v1/market/kline for the same symbol, interval, and time range. These are your ground truth.

Step 2: Aggregate your tick data using your current logic

Run your existing aggregation code on the same time range.

Step 3: Compare and diagnose

def run_alignment_diagnostics(
    symbol: str,
    start: datetime,
    end: datetime,
    interval: str = "1m",
    tolerance_bps: float = 5.0
) -> Dict[str, Any]:
    """
    End-to-end diagnostic: compare TickDB klines against self-aggregated candles.

    Run this before any backtest deployment. If discrepancies exceed
    tolerance, either fix your aggregation logic or use TickDB klines directly.
    """
    client = TickDBKlineClient()

    # Step 1: Fetch reference candles
    reference = client.get_historical_klines(
        symbol=symbol,
        interval=interval,
        start_time=start,
        end_time=end
    )
    logger.info(f"Fetched {len(reference)} reference candles from TickDB")

    # Step 2: Fetch trades and aggregate (placeholder — implement with your tick source)
    # trades = fetch_trades_from_your_source(symbol, start, end)
    # aggregator = CandleAggregator(symbol=symbol, session_open=session_open)
    # aggregator.add_trades_batch(trades)
    # aggregated = aggregator.get_candles()

    # Step 3: Validate
    validation = aggregator.validate_against_reference(reference, tolerance_bps)

    # Report
    logger.info(
        f"Alignment validation: {validation['match_rate']}% match rate. "
        f"{len(validation['discrepancies'])} discrepancies found."
    )

    if validation['discrepancies']:
        for d in validation['discrepancies'][:5]:  # Show first 5
            logger.warning(f"  {d}")

    return validation

8. Key Takeaways

The candlestick is not a neutral container. The rules that define its boundaries — which clock, which trades, which aggregation convention — are engineering decisions that directly determine whether your backtest reflects reality.

Three principles for avoiding the aggregation trap:

  1. Align to the session, not the wall clock. Candle boundaries anchored to the trading session open (9:30:00 ET for US equities) produce candles that match official charts. Wall-clock alignment is a convenience that introduces systematic timestamp drift.

  2. Validate against a known-good source. Before trusting self-aggregated candles in a backtest, compare them against TickDB's SIP-aligned klines. Discrepancies above 5 basis points in price or 10% in volume indicate a structural problem in your aggregation logic.

  3. For US equity OHLCV, use the canonical source. TickDB's /v1/market/kline endpoint provides 10+ years of cleaned, session-aligned OHLCV data for US equities. The trades endpoint is available for HK equities and crypto markets where tick-level analysis is required. Use the trades endpoint for US equities only if you are building custom indicators that cannot be derived from OHLCV — and apply the aggregation rules described in this article if you do.

The timestamp on your data is not metadata. It is the architecture.


This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results.