Every financial transaction begins as a discrete event: a buyer and a seller agree on a price at a specific moment. This atomic unit of market data—the tick—is the ground truth of any market. It records who traded, at what price, at what time, and for how many shares. It is precise, immutable, and voluminous. A single actively traded stock on the NYSE can generate thousands of ticks per second. By the end of a trading day, the raw tick stream for a liquid security might contain hundreds of megabytes of data.
But humans—and most trading algorithms—cannot consume that stream directly. The solution is aggregation: collapsing the continuous firehose of individual trades into discrete, interpretable units called candlesticks, or K-lines. This transformation from tick to candle is deceptively complex. It is not merely "collect ticks, then output OHLC." Boundary conditions, aggregation rules, and data source inconsistencies introduce subtle but consequential distortions that can silently corrupt a backtest, mislead a strategy, or cause a monitoring dashboard to display phantom signals.
Understanding how tick data becomes K-lines is not optional for serious market participants. It is foundational.
The Anatomy of a Tick
Before examining the aggregation process, we must establish what a tick actually contains. In its most complete form, a market data tick carries the following fields:
| Field | Description | Example |
|---|---|---|
timestamp |
UTC nanoseconds or milliseconds of the trade | 1709740823000000000 |
symbol |
Security identifier | AAPL.US |
price |
Execution price | 182.53 |
volume |
Number of shares traded | 100 |
side |
Taker side: buy or sell |
buy |
conditions |
Exchange-specific trade modifiers | @ or @1 |
exchange |
Venue of execution | NASDAQ |
Not all data sources provide all fields. The absence of side information—a common limitation in low-cost market data feeds—fundamentally limits what analysis is possible downstream. TickDB's trades endpoint, for example, provides full side information for HK equities and crypto assets, making it suitable for order-flow analysis. For US equities, however, the trades endpoint is not available, meaning that applications requiring tick-level buy/sell classification must source this data elsewhere.
The Anatomy of a Candle
A candlestick (K-line) aggregates one or more ticks into four summary statistics:
- Open: The price of the first trade in the interval.
- High: The highest price observed during the interval.
- Low: The lowest price observed during the interval.
- Close: The price of the last trade in the interval.
A complete K-line also carries:
timestamp: The start of the interval (e.g.,2024-03-06T09:30:00-05:00for the 09:30 bar on a 1-minute chart).volume: Total shares traded during the interval.turnover: Total dollar value traded (price × volume, summed).
The interval itself is defined by two variables: its duration (1 second, 1 minute, 1 hour, 1 day) and its alignment rule (when the interval starts and ends). It is the alignment rule that introduces most of the complexity in tick-to-candle conversion.
Three Aggregation Rules: Time, Volume, and Tick Count
The market data industry has converged on three primary aggregation methodologies. Each produces materially different candles from the same underlying tick stream.
Rule 1: Time-Based Aggregation (The Standard)
Time-based aggregation divides the trading day into fixed-duration intervals anchored to the clock. For US equities, the standard convention aligns to the session open: a 1-minute bar starting at 09:30:00.000 ET contains all ticks with timestamps ≥ 09:30:00.000 and < 09:31:00.000.
This method is intuitive, universal, and directly comparable across securities. It is the format used by nearly all charting software, exchanges, and data vendors. TickDB's kline endpoint returns time-based OHLCV data by default.
The critical nuance is timezone and anchor. Not all systems use the same reference point:
| System | Anchor | Notes |
|---|---|---|
| NYSE / US equities | 09:30 ET session open | Aligned to market open |
| Crypto (UTC) | 00:00 UTC | Aligned to calendar day |
| Crypto (exchange-specific) | Exchange local time | Binance uses UTC+8 |
| Futures | Exchange-defined session | May include pre-market / post-market |
Using candles from two systems with different anchors will produce apparently inconsistent OHLC values at the boundaries. A 1-minute candle from TickDB aligned to the US session and a candle from an exchange API aligned to UTC will not correspond to the same price action at exactly 09:30 ET / 14:30 UTC.
Rule 2: Volume-Based Aggregation
Volume-based aggregation closes a candle after a predefined number of shares have traded, regardless of elapsed time. A 5,000-share volume bar for Apple might complete in 2 milliseconds during a news event or take 45 minutes during a quiet afternoon.
Volume bars are popular in liquidation-detection strategies and in markets with highly variable tick density. They reveal effort versus result: a candle with 10,000 shares traded at a 0.1% price move signals different dynamics than 10,000 shares at a 2% move.
The challenge with volume bars is asynchronicity. At any given moment, the "current" volume bar is incomplete. A backtester that uses closed volume bars from one source and live volume bars from another will systematically misestimate liquidity. Production systems must track in-progress volume bars separately from closed bars and apply consistent logic across historical and live queries.
Rule 3: Tick-Count-Based Aggregation (Tick Bars)
Tick bars close after a fixed number of individual trades regardless of price or volume. A 100-tick bar for a liquid stock captures approximately consistent information density, filtering out the microstructure noise that obscures price discovery in time bars during low-activity periods.
The analytical advantage of tick bars is stationarity. The number of information-bearing events per bar is roughly constant, which makes certain statistical properties (autocorrelation, variance) more stable than in time bars where activity varies by orders of magnitude across the trading day.
The disadvantage is operational complexity. You cannot know the close of a tick bar until the Nth tick arrives. This makes real-time dashboarding difficult and introduces look-ahead bias in backtesting if the bar-close logic is not strictly enforced.
The Boundary Problem: Where Does One Bar End and Another Begin?
Boundary alignment is where aggregation logic becomes genuinely tricky. There are three philosophical approaches to the start-of-bar question.
Approach A: Absolute Wall Clock Alignment
Bars begin and end at fixed wall-clock timestamps. 09:30:00.000 to 09:31:00.000. 14:00:00.000 to 15:00:000.000.
This is the simplest approach and matches the intuition of most traders. It is also the approach that most commonly breaks during irregular sessions.
Consider a stock that halts trading at 10:00 AM due to a news event. A time-aligned 10:00 bar for this stock will contain zero ticks—an empty candle. The 10:01 bar will contain all the accumulated ticks from the resumption of trading, which may have moved the price significantly. An algorithm that naively processes empty bars as "no price change" will underestimate realized volatility around halts.
Similarly, early closes (e.g., a 1:00 PM close for US equities before a holiday) create partial bars at the end of the session. A 5-minute chart from 11:55 to 13:00 will contain a 12:55–13:00 bar that is only 5 minutes wide, not 5 minutes. Interpreting this as a "quiet 5-minute period" would be a mistake.
Approach B: Session-Normalized Alignment
Bars are aligned relative to the session open. Bar 1 of the day always starts at session_open. Bar N ends at session_open + N × interval_duration.
This approach handles early closes cleanly by truncating the final bar. It also creates meaningful cross-security comparisons: bar 1 for Apple and bar 1 for Microsoft both represent the first interval after the open, regardless of their respective open times (which are identical for US equities, but differ for global markets).
The complexity emerges when calculating bars across sessions. A daily bar for a security trading in both Tokyo and London sessions cannot use a single daily timestamp without defining which session is being aggregated. The convention typically defaults to the primary session, but this is a business decision embedded in technical infrastructure.
Approach C: Last-Close Alignment (Candle Starts on Close)
A more esoteric but analytically important approach defines a bar's timestamp as its close time, not its start time. The bar labeled "09:30" contains ticks from 09:29:01 to 09:30:00.
This is the convention used by some charting platforms (notably certain interpretations of Japanese candlestick charts) and by some futures data providers. It ensures that the bar's OHLC values are definitively known by the time the bar is "displayed"—there is no in-progress bar whose high or low might be exceeded before the interval closes.
The cost is cognitive friction: the bar's timestamp does not correspond to the period it summarizes. This introduces subtle bugs in time-series analysis when practitioners merge candles from sources with different conventions.
Production-Grade Tick-to-Kline Code
The following implementation demonstrates a clean, resilient tick-to-Kline aggregation engine. It handles time-based aggregation with configurable alignment, manages the boundary conditions discussed above, and includes proper error handling for production deployment.
import os
import time
import json
import logging
from datetime import datetime, timezone, timedelta
from collections import defaultdict
from dataclasses import dataclass, field
from typing import Optional, Callable
import requests
# ============================================================
# TickDB Market Data — Tick-to-Kline Aggregation Engine
# ⚠️ For production HFT workloads, migrate to asyncio / aiohttp
# ============================================================
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
logger = logging.getLogger("tick_kline_aggregator")
@dataclass
class Tick:
"""Canonical tick representation."""
timestamp: int # Unix nanoseconds
symbol: str
price: float
volume: float
side: Optional[str] = None # "buy" or "sell"; None if unavailable
conditions: Optional[str] = None
@dataclass
class Candle:
"""Canonical OHLCV candle representation."""
timestamp: int # Interval start in Unix milliseconds
open_price: float
high_price: float
low_price: float
close_price: float
volume: float
turnover: float
tick_count: int
is_closed: bool = False
def to_dict(self) -> dict:
return {
"timestamp": self.timestamp,
"open": self.open_price,
"high": self.high_price,
"low": self.low_price,
"close": self.close_price,
"volume": self.volume,
"turnover": self.turnover,
"tick_count": self.tick_count,
"is_closed": self.is_closed,
}
class KlineAggregator:
"""
Time-based OHLCV aggregator with configurable interval and alignment.
Supports three alignment modes:
- WALL: bars aligned to absolute wall-clock time (default)
- SESSION: bars aligned relative to session open
- LAST_CLOSE: bar timestamp represents interval close
"""
ALIGN_WALL = "wall"
ALIGN_SESSION = "session"
ALIGN_LAST_CLOSE = "last_close"
def __init__(
self,
interval_seconds: int = 60,
alignment: str = ALIGN_WALL,
session_start_hour: int = 9,
timezone_str: str = "America/New_York",
):
self.interval_ms = interval_seconds * 1000
self.alignment = alignment
self.session_start_ms = session_start_hour * 3600 * 1000
self.timezone_str = timezone_str
# Per-symbol state: symbol -> current open candle
self._candles: dict[str, Candle] = {}
def _get_bar_timestamp(self, tick_ts_ms: int) -> int:
"""Compute the interval start timestamp for a given tick timestamp."""
if self.alignment == self.ALIGN_WALL:
return (tick_ts_ms // self.interval_ms) * self.interval_ms
elif self.alignment == self.ALIGN_SESSION:
# Assume session starts at midnight local time in ms
day_start_ms = (tick_ts_ms // 86400000) * 86400000
offset_in_session = (tick_ts_ms - day_start_ms) % self.interval_ms
return tick_ts_ms - offset_in_session
elif self.alignment == self.ALIGN_LAST_CLOSE:
# Bar timestamp = next boundary after last close
raw_start = (tick_ts_ms // self.interval_ms) * self.interval_ms
return raw_start + self.interval_ms
raise ValueError(f"Unknown alignment mode: {self.alignment}")
def ingest(self, tick: Tick) -> Optional[Candle]:
"""
Process a single tick. Returns a closed Candle if a boundary
was crossed, otherwise None.
"""
tick_ts_ms = tick.timestamp // 1_000_000
bar_ts = self._get_bar_timestamp(tick_ts_ms)
symbol = tick.symbol
current = self._candles.get(symbol)
# New bar
if current is None or current.timestamp != bar_ts:
# Emit the previous bar if it exists and is closed
closed = None
if current is not None:
current.is_closed = True
closed = current
logger.debug(
f"Closed bar for {symbol} at {current.timestamp}: "
f"O={current.open_price:.4f} H={current.high_price:.4f} "
f"L={current.low_price:.4f} C={current.close_price:.4f}"
)
self._candles[symbol] = Candle(
timestamp=bar_ts,
open_price=tick.price,
high_price=tick.price,
low_price=tick.price,
close_price=tick.price,
volume=tick.volume,
turnover=tick.price * tick.volume,
tick_count=1,
is_closed=False,
)
return closed
# Update existing bar
current.high_price = max(current.high_price, tick.price)
current.low_price = min(current.low_price, tick.price)
current.close_price = tick.price
current.volume += tick.volume
current.turnover += tick.price * tick.volume
current.tick_count += 1
return None
def get_current(self, symbol: str) -> Optional[Candle]:
"""Return the in-progress (unclosed) candle for a symbol."""
return self._candles.get(symbol)
def flush_all(self) -> list[Candle]:
"""Force-close all open candles (e.g., end of session)."""
closed = []
for symbol, candle in self._candles.items():
candle.is_closed = True
closed.append(candle)
logger.info(
f"Flushed bar for {symbol} at {candle.timestamp}: "
f"O={candle.open_price:.4f} C={candle.close_price:.4f} "
f"ticks={candle.tick_count}"
)
self._candles.clear()
return closed
# ============================================================
# TickDB REST Client for Historical Kline Data
# ============================================================
class TickDBKlineClient:
"""
Fetches historical OHLCV (kline) data from TickDB's REST API.
The /kline endpoint returns time-aligned candles suitable for
backtesting. Use /kline/latest for live dashboards.
Auth: Header-based. Load key from TICKDB_API_KEY env var.
"""
BASE_URL = "https://api.tickdb.ai/v1"
def __init__(self, api_key: Optional[str] = None):
self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
if not self.api_key:
raise EnvironmentError(
"TickDB API key not set. Set TICKDB_API_KEY env var."
)
self.session = requests.Session()
self.session.headers.update({"X-API-Key": self.api_key})
self._rate_limit_backoff = 1.0
def _request(
self,
method: str,
path: str,
params: Optional[dict] = None,
timeout: tuple[float, float] = (3.05, 10),
) -> dict:
"""Execute an HTTP request with rate-limit handling and timeout."""
url = f"{self.BASE_URL}{path}"
try:
response = self.session.request(
method, url, params=params, timeout=timeout
)
data = response.json()
code = data.get("code", 0)
if code == 0:
self._rate_limit_backoff = 1.0 # Reset on success
return data.get("data", {})
if code == 3001:
# Rate limit exceeded — respect Retry-After header
retry_after = float(
response.headers.get("Retry-After", self._rate_limit_backoff)
)
logger.warning(
f"Rate limit hit (3001). Retrying after {retry_after}s"
)
time.sleep(retry_after)
self._rate_limit_backoff = min(
self._rate_limit_backoff * 2, 60
) # Cap at 60s
return self._request(method, path, params, timeout)
if code in (1001, 1002):
raise ValueError(
"Invalid API key — check TICKDB_API_KEY env var"
)
if code == 2002:
raise KeyError(
f"Symbol not found — verify via /v1/symbols/available"
)
raise RuntimeError(
f"TickDB API error {code}: {data.get('message', 'Unknown')}"
)
except requests.exceptions.Timeout:
raise TimeoutError(f"Request to {url} timed out after {timeout}")
except requests.exceptions.RequestException as e:
raise RuntimeError(f"Request failed: {e}")
def fetch_klines(
self,
symbol: str,
interval: str = "1m",
start_time: Optional[int] = None,
end_time: Optional[int] = None,
limit: int = 1000,
) -> list[dict]:
"""
Fetch historical kline (OHLCV) data for a symbol.
Parameters:
symbol: Exchange symbol, e.g. "BTC.USDT" or "AAPL.US"
interval: Candle interval — "1m", "5m", "15m", "1h", "4h", "1d"
start_time: Unix milliseconds (optional)
end_time: Unix milliseconds (optional)
limit: Max candles per request (max 1000)
Returns:
List of kline dicts with keys: timestamp, open, high, low,
close, volume, turnover, tick_count
"""
params = {
"symbol": symbol,
"interval": interval,
"limit": min(limit, 1000),
}
if start_time is not None:
params["start_time"] = start_time
if end_time is not None:
params["end_time"] = end_time
logger.info(
f"Fetching klines: symbol={symbol} interval={interval} "
f"limit={limit}"
)
return self._request("GET", "/market/kline", params=params)
def fetch_latest(self, symbol: str, interval: str = "1m") -> dict:
"""
Fetch the most recent closed kline. Suitable for live dashboards.
Note: For real-time bars that update every tick, use the WebSocket
depth/trades feeds instead. /kline/latest returns the last closed
bar, not the in-progress bar.
"""
return self._request(
"GET",
"/market/kline/latest",
params={"symbol": symbol, "interval": interval},
)
# ============================================================
# Example: Aggregate live ticks into custom candles
# ============================================================
def demo_live_aggregation():
"""
Demonstrates ingesting TickDB live trades into a custom
1-minute kline aggregator aligned to wall-clock time.
In production, replace this demo loop with a WebSocket client
that subscribes to the trades channel. See TickDB WebSocket docs
for the subscription protocol.
"""
client = TickDBKlineClient()
aggregator = KlineAggregator(
interval_seconds=60,
alignment=KlineAggregator.ALIGN_WALL,
)
# Fetch the last 5 closed 1-minute bars from TickDB
# to seed the aggregator state (handles reconnection correctly)
try:
historical = client.fetch_klines(
symbol="BTC.USDT",
interval="1m",
limit=5,
)
logger.info(f"Loaded {len(historical)} historical bars")
for bar in historical:
# Reconstruct ticks from OHLC — note: tick_count is approximate
# For accurate tick-level aggregation, use the trades feed directly
fake_tick = Tick(
timestamp=bar["timestamp"] * 1_000_000, # ms to ns
symbol="BTC.USDT",
price=bar["close"],
volume=bar["volume"] / max(bar.get("tick_count", 1), 1),
side=None,
)
aggregator.ingest(fake_tick)
except Exception as e:
logger.warning(f"Could not load historical bars: {e}")
logger.info("Aggregator initialized. Ready for live tick ingestion.")
logger.info(
"Current BTC bar: %s",
aggregator.get_current("BTC.USDT"),
)
# In production: connect to TickDB WebSocket, iterate over messages,
# call aggregator.ingest(tick) for each trade event, and handle
# the returned closed Candle objects for storage or transmission.
return aggregator
if __name__ == "__main__":
agg = demo_live_aggregation()
print(json.dumps(agg.get_current("BTC.USDT").to_dict(), indent=2))
The Distortion Spectrum: What Gets Lost in Compression
Aggregation is fundamentally a lossy transformation. The question is not whether information is destroyed, but which information and how much.
What Is Preserved
- Price range: The high and low of the interval survive, capturing the extreme points of price action.
- Directional sequence (partially): If the open is below the close, the candle is "green" (bullish). This encodes net price movement.
- Volume: Total activity is summed, revealing the intensity of trading.
- Time reference: The interval boundary creates a universally comparable time axis.
What Is Destroyed
- Intra-bar price path: The candle reveals nothing about the order of price events within the interval. A stock that opened at $100, crashed to $80, and recovered to $110 produces the same OHLC as one that opened at $100, climbed steadily to $110, and retraced to $105. Neither candle distinguishes these entirely different market dynamics.
- Tick-level order flow: The ratio of buyer-initiated to seller-initiated trades within the bar is averaged away. A bar with 80% buy volume and a bar with 20% buy volume may have identical OHLC values.
- Microstructure events: Individual quotes, cancellations, and partial trades are invisible in the aggregated output.
This is why K-line-only analysis is insufficient for high-frequency strategies. A 1-minute candle at 09:30:15 might contain 500 ticks in the first second (the opening auction) and 2 ticks in the remaining 59 seconds. Summing them into one bar obscures the burst entirely.
TickDB's Position in the Pipeline
TickDB occupies a specific position in the tick-to-candle pipeline: it is primarily a candle delivery service, not a raw tick delivery service for US equities.
| Capability | Supported | Notes |
|---|---|---|
| Historical OHLCV (kline) | Yes — 10+ years for US equities | Time-aligned to US session open |
| Real-time kline | Yes — via /kline/latest |
Returns last closed bar only |
| Tick-level trades (US) | Not supported | US equity trades unavailable |
| Tick-level trades (HK, Crypto) | Supported | Side information preserved |
| Order book depth | US: L1 / HK: L1–L10 / Crypto: L1–L10 | Not available for forex or indices |
For use cases requiring tick-level US equity data (such as order-flow analysis, tick-bar aggregation, or buy/sell pressure calculation), the developer must source raw tick data from a dedicated low-latency feed (such as exchange direct feeds or specialized vendors like Databento). TickDB then serves as the complementary source for long-horizon historical candles, cross-asset OHLCV comparison, and real-time dashboarding via WebSocket depth data.
The recommended architecture for a quant strategy that requires both tick granularity and candle-based analysis:
- Historical backtesting: Use TickDB
/klineendpoint to pull 10+ years of daily and intraday candles for regime analysis and signal generation. - Live monitoring: Subscribe to TickDB WebSocket
depthandtradeschannels for real-time order book imbalance and pressure metrics. - Tick-level signal: Layer a proprietary tick aggregator on top of a raw tick feed for the specific US equities where sub-bar analysis is required.
Choosing the Right Aggregation for Your Strategy
The selection of aggregation method is not arbitrary. It should follow from the strategy's information requirements.
| Strategy type | Recommended aggregation | Rationale |
|---|---|---|
| Trend following (swing trading) | Daily / 4H time bars | Captures multi-day momentum; noise at sub-daily frequencies |
| Mean reversion (intraday) | 5M / 15M time bars | Aligns with typical intraday trading ranges |
| Liquidity detection | Volume bars | Normalizes for varying activity; reveals effort vs. result |
| Order flow / VWAP capture | Tick bars | Equal information density; filters low-activity periods |
| News event response | Second or sub-second bars | Captures the spike; high data volume acceptable |
| Portfolio-level correlation | Daily bars | Long horizon; sub-day noise reduces cross-asset correlation signal |
A common failure mode is using a candle aggregation that was chosen for charting convenience rather than strategy logic. A VWAP algorithm that uses time bars instead of volume bars will produce systematically different fill expectations than what the strategy was backtested against. The aggregation choice must be consistent across backtesting and live execution.
Closing
The tick-to-candle transformation is not a solved problem with a single correct answer. It is a deliberate engineering decision that carries consequences for every strategy built on aggregated data.
The practical takeaways are three:
First, know your data source's alignment convention. A candle labeled "09:30" from one provider may not correspond to the same market moment as a candle labeled "09:30" from another. Version-control your data source alongside your strategy code.
Second, match your aggregation method to your strategy's information requirements. Time bars are the default because they are convenient, not because they are optimal. Volume bars and tick bars encode different information that may be precisely what your strategy needs.
Third, preserve granularity when it is cheap and destroy it when it is necessary. If you have access to tick-level data for a live deployment, store it. You can always aggregate downward. You cannot aggregate upward. The candle you generate today from raw ticks is a one-way door; the original tick stream, if not archived, is gone.
Next Steps
If you are building a backtesting pipeline and need long-horizon OHLCV data: sign up for a free TickDB API key at tickdb.ai and pull 10+ years of historical candles for any covered symbol via the /v1/market/kline endpoint.
If you need real-time order book context for your candle-based strategy: subscribe to the TickDB WebSocket depth channel to monitor bid-ask spread dynamics and pressure ratios alongside your OHLCV data stream.
If you are working with HK equities or crypto and require tick-level trades with side information: explore TickDB's trades endpoint, which provides buyer/seller classification for downstream order-flow analysis.
If you use AI coding assistants: search for the tickdb-market-data SKILL in your AI tool's marketplace to embed market data queries directly in your development workflow.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. Tick data aggregation rules vary by vendor and venue; verify alignment conventions against your specific data source documentation before deploying in live trading systems.