From Tick to Candle: How Price Data Gets Compressed into K-Lines | US Stocks

The first trade of the day was executed at $142.03. Forty-seven seconds later, a algorithmic counterparty absorbed 8,400 shares at the bid, driving the price to $141.87. By the end of that minute, 312 individual transactions had occurred across seven exchanges, ranging from $141.72 to $142.18.

None of that granularity survives in the final K-line.

That single candle — {O:142.03, H:142.18, L:141.72, C:141.87, V:156,200} — represents a lossy compression of three-dimensional market activity: price discovery, liquidity consumption, and information arrival. Understanding exactly how that compression works is not academic. It determines whether your backtests are measuring what you think they are measuring, whether your real-time signals fire at the moment you expect, and whether your strategy logic aligns with the data provider's aggregation rules.

This article dissects the tick-to-K-line pipeline from first principles, compares the three dominant aggregation methodologies, and provides production-grade Python code that handles the edge cases most tutorials ignore.

1. The Anatomy of a Tick

Before aggregation makes sense, the primitive unit must be precise. A tick — also called a trade or a quote — is the smallest atomic record of a market event. In its full form, a tick carries more information than the five fields that survive into a K-line.

A complete tick schema for US equities typically includes:

{
  "symbol": "AAPL.US",
  "timestamp": 1707849600123,
  "price": 142.05,
  "size": 100,
  "exchange": "NASDAQ",
  "condition": ["regular", "derivative"],
  "side": "sell"
}

The timestamp field is where aggregation breaks down for most implementations. Precision matters at the millisecond or microsecond level. A timestamp of 1707849600123 represents 2024-02-13 16:00:00.123 UTC. The three-digit millisecond suffix is not decorative — it determines which K-line bucket the tick belongs to under a wall-clock aggregation scheme.

The side field, when present, indicates whether the trade was initiated by the buyer (aggressor) hitting the ask or the seller hitting the bid. This matters for order flow analysis, but it disappears entirely in OHLC aggregation. The buyer's aggressiveness does not change the high or low of the minute.

The condition field describes the trade's regulatory classification. A trade executed as part of an opening or closing auction carries different market microstructure implications than an intraday continuous market trade — yet both contribute to the same K-line if the aggregation window includes the auction period.

This is the first form of information loss: temporal, directional, and regulatory context collapse into a single OHLCV tuple.

2. OHLCV: What Survives and Why

The K-line schema has five fields for price and one for volume. Each field has a precise definition and a precise computation rule.

Field	Full name	Definition	Aggregation rule
O	Open	Price of the first trade in the interval	First tick price by timestamp
H	High	Highest price at which a trade occurred	Max of all tick prices
L	Low	Lowest price at which a trade occurred	Min of all tick prices
C	Close	Price of the last trade in the interval	Last tick price by timestamp
V	Volume	Total quantity of shares (or contracts) traded	Sum of all tick sizes

The asymmetry between "first by timestamp" and "last by timestamp" for open and close is a source of subtle bugs in real-time aggregation systems. A tick arriving out of order — a common occurrence with consolidated market data feeds — can retroactively change the open or close value of a previously closed K-line if the aggregation logic is not timestamp-aware.

For example, if a tick with timestamp 1707849600800 (16:00:00.800) arrives after the market has already processed ticks at 16:00:01.xxx, that late-arriving tick still belongs to the 16:00:00 1-minute bucket under a wall-clock scheme. A naive implementation that processes ticks in arrival order would assign it the wrong K-line.

Volume aggregation is straightforward in concept but treacherous in practice. A-fill and B-fill trades carry different regulatory meaning in options markets. Block trades above a size threshold may be reported separately with a delayed timestamp. Some venues report trades in units of 100 shares (round lots) while others include odd-lot prints. Mixing these without normalization produces inaccurate volume figures.

3. Three Aggregation Methodologies

The industry has not standardized on a single aggregation approach. The three dominant methodologies produce measurably different K-lines for the same tick stream. Choosing the wrong one — or mixing implementations — is a silent backtest killer.

3.1 Wall-Clock (Interval-Based) Aggregation

Wall-clock aggregation buckets ticks by absolute time boundaries. Every K-line starts and ends at a clock second (or minute or hour) aligned to a reference timezone, typically the exchange's local time or UTC.

Under a 1-minute wall-clock scheme for US equities:

K-line N: 16:00:00.000 – 16:00:59.999
K-line N+1: 16:01:00.000 – 16:01:59.999

A tick at 16:00:00.001 belongs to K-line N. A tick at 16:01:00.000 belongs to K-line N+1. The boundary is absolute regardless of trading activity.

Advantages: K-lines are deterministic and reproducible. Two systems consuming the same feed will produce identical wall-clock K-lines. Backtesting and live trading align naturally.

Disadvantages: If no trades occur during an interval, the K-line still exists with an undefined open/high/low/close. Some systems fill these gaps with the previous close (carry-forward) while others omit them entirely. The choice affects indicator calculations that assume a continuous time series.

3.2 Tick-Driven (Completion-Based) Aggregation

Tick-driven aggregation starts a new K-line after a fixed number of ticks, not a fixed number of seconds. The interval length is measured in transactions, not wall time.

K-line N: tick 1 – tick N (e.g., first 100 trades)
K-line N+1: tick 101 – tick 200

Advantages: Each K-line represents a roughly equivalent amount of market information. In high-activity periods, candles form quickly. In low-activity periods, they form slowly. This produces a more uniform sampling of market microstructure states.

Disadvantages: Backtesting is not reproducible without the exact tick sequence. Live K-lines can diverge from historical K-lines even on the same instrument, because the exact tick count boundary may differ between historical data vendors. Most standard OHLCV datasets do not support tick-driven aggregation natively.

3.3 Session-Based Aggregation

Session-based aggregation respects the market's opening and closing structure. K-lines are anchored to the trading session: pre-market, regular session, after-hours. The first tick of the regular session opens the first regular-session K-line; the last tick of the regular session closes the last regular-session K-line.

Advantages: K-lines align with economically meaningful market states. The pre-market open and regular-session open have different microstructure characteristics; session-based aggregation keeps them separate.

Disadvantages: K-line boundaries are not uniform. The pre-market session may produce 15 K-lines in an active day and 3 in a quiet day. Strategies that assume a fixed number of K-lines per trading day will fail. Historical data alignment across sessions requires careful handling of pre-market and after-hours data inclusion/exclusion.

4. The Boundary Problem: Where Does a Tick Belong?

Boundary alignment is the most consequential decision in aggregation design. It determines which K-line a tick belongs to, and that determination propagates into every strategy that consumes the data.

Consider the US equity regular session: 09:30:00 ET to 16:00:00 ET. Under wall-clock aggregation:

A tick at exactly 09:30:00.000 ET belongs to the first 1-minute K-line of the session.
A tick at exactly 16:00:00.000 ET belongs to the last 1-minute K-line of the session.
A tick at 09:30:00.001 ET — one millisecond after open — also belongs to the first K-line.

The subtlety emerges with late prints. If a trade executed at 09:29:59.500 is reported at 09:30:00.200 due to exchange processing latency, which K-line does it belong to?

Most data vendors apply a timestamp-based rule: the tick's exchange timestamp determines the K-line, not its arrival time. This is correct from a historical accuracy standpoint, but it creates real-time operational challenges. A live aggregation system consuming a WebSocket feed cannot know whether a tick marked 09:29:59.500 has fully arrived when the wall clock reaches 09:30:00.000. Waiting for finality is impossible in a streaming context.

Production systems resolve this tension in one of three ways:

Fixed observation window: The system waits a fixed buffer period (typically 1–5 seconds) after the interval boundary before closing the K-line. This sacrifices latency for accuracy.
Provisional close with revision: The system closes the K-line at the boundary and publishes it as provisional. If a late tick arrives within the revision window, the K-line is amended. Some systems publish amended K-lines; others do not, creating silent discrepancies.
Timestamp filter: The system only accepts ticks whose exchange timestamp falls within the current or previous interval. Late prints outside the filter window are discarded, accepting a small accuracy loss for clean real-time K-lines.

TickDB's approach combines elements of options 1 and 3, using a configurable buffer window that can be adjusted by market and session phase.

5. OHLCV Calculation in Practice

With the theory established, the implementation follows. The following Python code provides a production-grade 1-minute OHLCV aggregator with proper edge case handling.

import os
import time
import json
import logging
from datetime import datetime, timezone
from collections import defaultdict
from threading import Lock
from typing import Optional
import requests

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class OHLCVAggregator:
    """
    Production-grade 1-minute OHLCV aggregator with wall-clock alignment.
    Handles out-of-order ticks, late prints, and session boundary detection.
    """

    def __init__(self, symbol: str, buffer_seconds: float = 2.0):
        self.symbol = symbol
        self.buffer_seconds = buffer_seconds
        self._candles: dict[int, dict] = {}  # timestamp (minute boundary) -> OHLCV
        self._lock = Lock()
        self._last_closed_minute: Optional[int] = None

    @staticmethod
    def _minute_boundary(timestamp_ms: int) -> int:
        """Align a millisecond timestamp to its 1-minute wall-clock boundary."""
        return (timestamp_ms // 60_000) * 60_000

    @staticmethod
    def _is_session_open(timestamp_ms: int) -> bool:
        """
        Determine if a tick falls within the US equity regular session.
        Assumes US/Eastern timezone. Adjust for your market's session hours.
        """
        dt = datetime.fromtimestamp(timestamp_ms / 1000, tz=timezone.utc)
        # Convert to US Eastern (naive for simplicity; use zoneinfo in production)
        hour = dt.hour
        minute = dt.minute
        # Regular session: 09:30–16:00 ET
        # Simplified check: 14:30–21:00 UTC
        session_minutes = hour * 60 + minute
        return 14 * 60 + 30 <= session_minutes < 21 * 60

    def ingest_tick(self, tick: dict) -> Optional[dict]:
        """
        Process a single tick and return a closed candle if the buffer
        period has elapsed for the previous minute's candle.

        Returns the closed (final) candle dict, or None if no candle closed.
        """
        price = float(tick["price"])
        size = float(tick.get("size", 0))
        timestamp_ms = int(tick["timestamp"])

        if not self._is_session_open(timestamp_ms):
            logger.debug(f"Tick at {timestamp_ms} outside session — discarded")
            return None

        minute_ts = self._minute_boundary(timestamp_ms)

        with self._lock:
            if minute_ts not in self._candles:
                self._candles[minute_ts] = {
                    "symbol": self.symbol,
                    "open": price,
                    "high": price,
                    "low": price,
                    "volume": size,
                    "open_time": minute_ts,
                    "close": price,
                    "close_time": timestamp_ms,
                    "tick_count": 1,
                }
            else:
                candle = self._candles[minute_ts]
                candle["high"] = max(candle["high"], price)
                candle["low"] = min(candle["low"], price)
                candle["volume"] += size
                candle["close"] = price
                candle["close_time"] = timestamp_ms
                candle["tick_count"] += 1

            # Check if the previous minute's candle has exited the buffer window
            closed_candle = None
            if self._last_closed_minute is not None:
                prev_minute = self._last_closed_minute
                current_time_ms = int(time.time() * 1000)
                buffer_close_time = prev_minute + 60_000 + int(self.buffer_seconds * 1000)

                if prev_minute in self._candles and current_time_ms >= buffer_close_time:
                    closed_candle = self._candles.pop(prev_minute)
                    logger.info(
                        f"Candle closed: {self.symbol} @ {datetime.fromtimestamp(closed_candle['open_time'] / 1000, tz=timezone.utc)} "
                        f"O:{closed_candle['open']} H:{closed_candle['high']} "
                        f"L:{closed_candle['low']} C:{closed_candle['close']} "
                        f"V:{closed_candle['volume']}"
                    )

            self._last_closed_minute = minute_ts
            return closed_candle

    def get_candle(self, minute_ts: int) -> Optional[dict]:
        """Return the current (not yet closed) candle for a given minute."""
        with self._lock:
            return self._candles.get(minute_ts)


# Example: Processing a simulated tick stream
def simulate_tick_stream():
    """
    Simulates ingestion of a tick stream. In production, replace this
    with WebSocket subscription to TickDB depth/trades endpoints.
    """
    aggregator = OHLCVAggregator(symbol="AAPL.US", buffer_seconds=2.0)

    # Simulated tick stream (price, size, timestamp_ms)
    # In production, fetch via: GET /v1/market/trades or WebSocket subscription
    simulated_ticks = [
        {"price": 142.03, "size": 100, "timestamp": 1707849600000},   # 16:00:00.000
        {"price": 142.05, "size": 200, "timestamp": 1707849600800},   # 16:00:00.800
        {"price": 141.98, "size": 150, "timestamp": 1707849601500},   # 16:00:01.500
        {"price": 142.10, "size": 300, "timestamp": 1707849602200},   # 16:00:02.200
        {"price": 141.87, "size": 500, "timestamp": 1707849605800},   # 16:00:05.800
        # Late print: should belong to 16:00:00 bucket
        {"price": 141.95, "size": 100, "timestamp": 1707849600200},   # 16:00:00.200 (arrives late)
        # Next minute
        {"price": 141.90, "size": 200, "timestamp": 1707849660000},   # 16:01:00.000
        {"price": 142.18, "size": 400, "timestamp": 1707849663500},   # 16:01:03.500
        {"price": 141.72, "size": 600, "timestamp": 1707849668000},   # 16:01:08.000
        {"price": 141.85, "size": 300, "timestamp": 1707849693000},   # 16:01:33.000
    ]

    logger.info(f"Processing {len(simulated_ticks)} simulated ticks for {aggregator.symbol}")
    for tick in simulated_ticks:
        closed = aggregator.ingest_tick(tick)
        # In a real system, closed candles would be emitted to a strategy engine or database


if __name__ == "__main__":
    simulate_tick_stream()

Key implementation decisions explained

Timestamp-based bucket assignment: Every tick is bucketed by its exchange timestamp's minute boundary, not its arrival time. This ensures historical reproducibility and correct handling of late prints.

Thread-safe state management: The _lock protects shared state. In a production WebSocket handler receiving concurrent ticks from multiple exchanges, this prevents race conditions on the _candles dictionary.

Buffer-based candle closure: A candle is not closed immediately when the wall clock passes the minute boundary. The aggregator waits for buffer_seconds (configurable, default 2 seconds) to allow late prints to arrive. This is the minimum viable approach for real-time applications; more aggressive accuracy requires longer buffers at the cost of latency.

Session filtering: The _is_session_open check demonstrates how to restrict aggregation to the regular trading session. Removing this check would include pre-market and after-hours prints in the aggregation, which may or may not be the desired behavior for a given strategy.

6. Common Aggregation Pitfalls

6.1 The Off-by-One Boundary Bug

The most frequent aggregation error in live trading systems is a tick processed at exactly the interval boundary being assigned to the wrong candle. If your system processes a tick at 16:00:00.000 and assigns it to K-line 16:01 before updating K-line 16:00, the open price of 16:01 will be correct, but the close of 16:00 will be wrong — and in a fast market, the price at 16:00:00.000 may differ from the price at 15:59:59.999 by a meaningful amount.

The fix is atomic boundary detection: compute the bucket before updating any state, and reject ticks that belong to the previous bucket if a newer bucket has already been started.

6.2 Volume Unit Mismatch

Some exchanges report volume in shares. Others report volume in round lots (100-share units). Futures contracts report volume in contracts. Crypto exchanges may report volume in the quote currency (USD) rather than the base currency (BTC).

Aggregating across venues without normalizing the volume unit produces nonsense. Before any cross-venue analysis, verify the volume unit for each data source and normalize to a consistent representation.

6.3 Auction Inclusion

US equity exchanges conduct an opening auction (09:30 ET) and a closing auction (16:00 ET) with substantial volume. Under wall-clock aggregation with no session awareness, a tick at exactly 16:00:00.000 could be included in the final regular-session K-line or the first after-hours K-line depending on the implementation.

Many strategies explicitly exclude auction prints from their analysis because auction prices are set by a different mechanism (single-price clearing) than continuous trading (order book matching). Ensure your aggregation logic's session awareness matches your strategy's assumptions.

6.4 Historical vs. Real-Time Schema Mismatch

Historical OHLCV data from different vendors may use different aggregation rules for the same raw tick stream. Vendor A may apply a 1-second buffer; Vendor B may apply a 5-second buffer. Vendor C may include pre-market auction prints in the first regular-session K-line; Vendor D may exclude them.

If you backtest with Vendor A's historical data and trade live with Vendor B's real-time feed, expect the live K-lines to differ from the backtested K-lines — not because the market changed, but because the aggregation pipeline changed.

7. Comparing Aggregation in TickDB vs. Alternatives

Understanding TickDB's aggregation behavior requires comparing it against the alternatives quant developers commonly encounter.

Dimension	TickDB (kline endpoint)	Generic WebSocket feed	Custom aggregation on raw trades
Aggregation method	Wall-clock, session-aware	Depends on provider	Fully configurable
Buffer behavior	Configurable per market	Provider-defined	Developer-defined
Historical alignment	Consistent with live	May differ from historical	Must be manually aligned
Pre/post-market handling	Separated by session	Varies by provider	Configurable
Volume normalization	Normalized to exchange-reported units	Varies	Configurable
Latency (live)	< 100 ms push via WebSocket	Provider-dependent	Dependent on tick ingestion speed

TickDB's kline endpoint provides pre-aggregated 1-minute (and higher interval) candles that use a consistent wall-clock methodology across both historical queries and real-time WebSocket streams. The kline/latest endpoint provides the current in-progress candle with updates as ticks arrive. This eliminates the most common class of backtest-live divergence: historical and real-time aggregation rule inconsistency.

For strategies that require custom aggregation rules — tick-volume bars, tick-count bars, or custom session definitions — consuming raw trade data via the trades endpoint and applying a custom aggregator as demonstrated above provides full control. The tradeoff is implementation complexity and the need to handle the edge cases (late prints, out-of-order ticks, session boundaries) that pre-aggregated endpoints abstract away.

8. Choosing the Right Aggregation for Your Strategy

No single aggregation method is universally correct. The right choice depends on what your strategy is trying to capture.

Strategy type	Recommended aggregation	Rationale
Trend following (EMA crossover)	1–5 min wall-clock	Alignment with institutional order flow timing
Mean reversion (Bollinger Bands)	1–15 min wall-clock	Consistency with support/resistance levels
Statistical arbitrage	Tick-count bars	Uniform information content per bar
Auction-related (open/close patterns)	Session-based	Economically distinct market phases
Volume-weighted strategies	Volume bars	Normalizes for varying activity levels
News event response	Sub-second aggregation	Captures initial price discovery

For most systematic equity strategies, wall-clock 1-minute aggregation strikes the right balance between capturing intraday microstructure and maintaining reproducibility. If your strategy's signals are sensitive to bar boundaries — and most are — run a sensitivity analysis: test the same logic on 30-second, 1-minute, and 5-minute aggregations. A strategy that only works on 1-minute bars but fails on 30-second or 5-minute bars is likely overfitting to a specific aggregation artifact rather than capturing a genuine market inefficiency.

9. Next Steps

Aggregation methodology is foundational — get it wrong and every strategy built on top of it carries the error forward. Get it right and you have a solid, reproducible data pipeline that aligns with your backtesting assumptions.

If you are building a backtesting pipeline, use TickDB's GET /v1/market/kline endpoint for historical data. Verify that the aggregation methodology matches your live-trading expectations before running extended simulations.

If you are building a real-time strategy engine, connect to TickDB's WebSocket stream for live candles. The stream provides the same wall-clock methodology as the historical endpoint, eliminating the backtest-live discrepancy that plagues most production quant systems.

If you need custom aggregation logic, consume raw trades via the trades endpoint and apply an aggregator similar to the OHLCVAggregator class above. Ensure your buffer window, session awareness, and late-print handling are explicitly documented in your system — these decisions have strategy-level implications that are easy to forget six months later.

If you use AI coding assistants, search for and install the tickdb-market-data SKILL in your AI tool's marketplace. It provides ready-to-use code templates for all TickDB endpoints, including the aggregation patterns described in this article.

This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results.