The Problem That Keeps Quant Teams Up at Night
Imagine you are a quant researcher. You have spent three months building a mean-reversion strategy that looks spectacular on paper — 1.87 Sharpe ratio, 12.4% annualized return, max drawdown of just 6.2%. You run the backtest across five years of daily bars. The results are clean.
Then your PM asks the question that turns your stomach: "Can you run that on ten years? We want to see how it performs across the 2018 volatility regime."
You reach for the data. And then you wait. Thirty seconds. A minute. Two minutes. The progress bar crawls. Your elegant strategy is now hostage to a database that was never designed for this kind of interrogation.
This is not a hypothetical. It is the most common bottleneck reported by systematic trading teams at firms ranging from two-person algoshops to multi-strategy funds managing billions. Historical OHLCV data — the backbone of any backtesting pipeline — grows relentlessly. Ten years of 1-minute bars for 8,000 US equities alone generates billions of rows. The naive solution of throwing row-store SQL at this problem collapses under its own weight.
TickDB solves this at the architectural level. This article pulls back the curtain on how TickDB stores and retrieves 10+ years of cleaned, aligned OHLCV data while maintaining sub-second query response times for the vast majority of analytical workloads.
1. The Storage Challenge: Why OHLCV Data Is Different
Before we examine TickDB's architecture, it is worth understanding why historical market data presents a distinct storage challenge that differs from general-purpose transactional workloads.
1.1 Immutable and Append-Only
Unlike user records or financial transactions that update in place, OHLCV bars are immutable. A 1-minute bar for AAPL recorded at 9:30:00 ET on March 15, 2026 is permanently fixed. New bars are appended; existing bars are never modified. This immutability is a gift — it eliminates the need for complex write-ahead logging, transactional isolation, and read-write contention. The storage engine can optimize ruthlessly for sequential write patterns and compressed read paths.
1.2 Time-Range Access Patterns
Quantitative analysis almost never asks for a single row. It asks for ranges: "Give me all 5-minute bars for NVDA between January 2024 and December 2025." "Show me daily bars for the S&P 500 constituents on the 20 trading days before each FOMC meeting in 2023." These are fundamentally time-range scans, not point lookups. Storage architectures that excel at random access (B-tree row stores) are chronically suboptimal for this access pattern.
1.3 The Dimensionality Explosion
A single US equity universe with 8,000 symbols, 390 minutes of trading per day, across 10 years generates approximately 11.7 billion rows of 1-minute bars. At 50 bytes per row (timestamp, open, high, low, close, volume, symbol identifier), that is roughly 585 GB of raw data — before indexes, before compression, before any operational overhead.
The problem is not just storage volume. It is the combinatorial explosion when you add multiple timeframes (1m, 5m, 15m, 1h, 1d), multiple asset classes (US equities, HK stocks, A-shares, crypto, futures), and multiple data types (OHLCV, trades, order book depth).
1.4 The Query Latency Budget
Backtesting frameworks make thousands to millions of sequential queries during a single strategy evaluation run. If each query takes 200 ms, a 100,000-query backtest run takes 5.5 hours. At 20 ms per query, that same backtest completes in 33 minutes. At 5 ms, it finishes in 8 minutes. The difference between a row-store and a purpose-built time-series architecture is the difference between a workflow that is research-prohibitive and one that is research-enabling.
2. TickDB's Storage Architecture: A Layered Design
TickDB's storage architecture is organized into three primary layers, each addressing a distinct set of constraints.
2.1 Layer 1: Columnar Storage Engine
At the core, TickDB uses a columnar storage format for OHLCV data. Column-oriented storage stores each column independently on disk, rather than grouping all columns for a row together.
Why columnar for OHLCV?
| Access pattern | Row-store efficiency | Columnar efficiency |
|---|---|---|
| Read single column (close prices) across 1M rows | Reads all 6 columns, discards 5 | Reads only the close column |
| Aggregate (AVG of volume over range) | Full table scan | Columnar scan + SIMD aggregation |
| Compression (repeated values, delta encoding) | Limited | Highly effective — same-type column values compress 5–10x better |
For a query like "compute the 20-period moving average of daily close prices for SPY over 10 years," a row-store must read every byte of every row (timestamp, open, high, low, volume) and discard the four unused columns. A columnar engine reads only the timestamp and close columns, applies predicate filtering on timestamps, and performs vectorized aggregation on the close column.
TickDB implements its columnar format using a variant of the Parquet-inspired on-disk layout with the following characteristics:
- Column pages: Each column is divided into pages of approximately 1 MB. Pages are individually compressed.
- Dictionary encoding: Symbol identifiers (ticker codes) are dictionary-encoded, reducing a 5-byte ticker string to a 4-byte integer index.
- Delta encoding: Timestamps within a page are stored as deltas from the page's base timestamp, reducing an 8-byte Unix timestamp to 2–4 bytes per row.
- Floating-point compression: OHLC prices use a delta-of-delta scheme for high compression ratios on sequential data.
2.2 Layer 2: Time-Based Partitioning
Raw columnar storage is necessary but not sufficient. Without intelligent partitioning, a query for "AAPL 1-minute bars in Q3 2025" would require scanning the entire AAPL column for all time. Partitioning slices the data along natural boundaries, enabling partition pruning during query planning.
TickDB partitions OHLCV data across three partitioning axes:
Temporal partitioning: Data is organized into time-based partitions at multiple granularity levels.
/kline/
├── us_equity/
│ ├── 1m/
│ │ ├── 2024/
│ │ │ ├── Q1/
│ │ │ │ ├── 2024-01.parquet
│ │ │ │ ├── 2024-02.parquet
│ │ │ │ └── 2024-03.parquet
│ │ │ └── Q4/
│ │ └── 2025/
│ └── 1d/
│ └── ...
└── crypto/
└── ...
This design means a query for "Q3 2025" can prune away all partitions outside 2025-Q3. A query for "March 2025" prunes to the March 2025 file. The query engine never scans data it does not need.
Symbol partitioning: Within each temporal partition, data is further segmented by symbol. This enables queries that filter by symbol first (common in sector rotation strategies) to skip entire symbol groups.
Timeframe partitioning: Separate storage trees for each timeframe (1m, 5m, 15m, 1h, 1d) prevent cross-timeframe pollution and allow independent scaling of high-resolution (hot) data versus daily (warm) data.
2.3 Layer 3: Tiered Storage (Hot–Warm–Cold)
Not all data is accessed with equal frequency. In our experience analyzing TickDB query patterns, the following access distribution holds consistently across user cohorts:
| Data age | Access frequency | Typical use case | Storage tier |
|---|---|---|---|
| Last 30 days | 65% of queries | Live strategy monitoring, intraday backtesting | Hot (NVMe SSD) |
| 31–365 days | 20% of queries | Recent regime analysis, strategy refinement | Warm (SATA SSD) |
| 1–10+ years | 15% of queries | Long-horizon backtests, academic research | Cold (object storage) |
The hot-warm-cold architecture is not merely a cost optimization. It is a performance architecture. By placing the most frequently accessed data on NVMe SSDs with sub-millisecond read latency, TickDB ensures that the majority of queries — the intraday strategy iteration loop that quant researchers live inside — complete in single-digit milliseconds.
When a query spans both hot and cold data (e.g., a 5-year backtest that includes the last 30 days), the query engine automatically decomposes the request:
# Pseudo-code illustrating cross-tier query decomposition
def query_kline(symbol, interval, start, end):
hot_start = max(start, now() - 30_days)
cold_end = min(end, now() - 30_days)
if start < hot_start:
# Fetch cold segment from object storage
cold_data = object_storage.query(symbol, interval, start, hot_start)
# Fetch hot segment from NVMe
hot_data = nvme_storage.query(symbol, interval, hot_start, end)
# Merge and sort
return merge_and_sort(cold_data, hot_data)
The key engineering insight here is that the query planner is tier-aware. It knows the time boundaries of each partition and the physical location of each tier. It does not naively push all queries to the coldest, cheapest storage tier.
3. Index Optimization: The Invisible Performance Multiplier
Storage format and partitioning get you most of the way there. But for complex query patterns — especially those involving symbol filtering, time-range filtering with irregular boundaries, and multi-symbol aggregations — a well-designed indexing strategy is the difference between a query that scans 10 GB and one that scans 10 MB.
3.1 Min-Max Indexes (Block Skipping)
The most impactful index in TickDB's architecture is the Min-Max (also known as Min-Max bloom or bounding box) index maintained at the partition level.
For each Parquet file (partition), TickDB maintains statistics:
{
"file": "us_equity/1m/2025/03/2025-03-01.parquet",
"symbol_min": "A",
"symbol_max": "Z",
"timestamp_min": 1740787200,
"timestamp_max": 1740873540,
"row_count": 4820000,
"compressed_bytes": 12400000
}
When a query arrives for "TSLA 1-minute bars on March 1, 2025," the query planner:
- Examines the Min-Max index for March 2025 files.
- Confirms that the target timestamp range falls within
timestamp_minandtimestamp_max. - Confirms that "TSLA" falls within
symbol_minandsymbol_max. - Skips irrelevant files (e.g., files containing only A–M symbols, or files whose timestamp ranges do not overlap).
This block-skipping mechanism reduces the data scanned by 10x to 100x for typical single-symbol queries.
3.2 Symbol Dictionary Index
Because symbol identifiers are dictionary-encoded (converted from string to integer), TickDB maintains a symbol dictionary mapping:
{
"dict_id": 47,
"symbol": "TSLA",
"exchange": "US",
"first_bar_timestamp": 1585699200,
"last_bar_timestamp": 1740787200
}
This dictionary enables two critical operations:
- Symbol → ID lookup: Converting "TSLA" to its integer dictionary ID for fast filtering during scan.
- Symbol metadata enrichment: Attaching exchange, sector, and listing date to any queried bar without joining to a separate reference table.
3.3 Composite Indexes for Multi-Dimensional Queries
For advanced use cases — particularly multi-symbol, multi-timeframe strategies that filter by date ranges, symbols, and exchange simultaneously — TickDB supports composite index configurations:
# Example: Configuring a composite index for a multi-dimensional query
# In practice, this is handled server-side by the query planner
index_config = {
"name": "us_equity_1m_composite",
"columns": ["exchange", "symbol_id", "timestamp"],
"type": "composite_sorted",
"partition_scheme": "hash",
"partition_count": 16
}
This composite index enables the query planner to prune both by exchange (e.g., filter out HK data when querying US equity strategies) and by symbol within each partition, achieving O(log n) lookup rather than O(n) scans.
4. Query Engine Design: From SQL to Vectorized Execution
The storage architecture is only half the story. The query engine that sits on top must translate user queries into efficient scan plans.
4.1 Push-Down Predicate Evaluation
TickDB's query engine implements predicate push-down aggressively. Rather than reading all data into memory and then filtering, the engine pushes filter predicates down to the storage layer.
Traditional approach:
1. Read entire partition into memory (100 MB)
2. Filter rows where symbol = 'AAPL' (keep 0.1%)
3. Return 100 KB of results
Time: Read 100 MB, filter in-app
TickDB approach:
1. Storage layer reads only AAPL's pages (1 MB)
2. Return 100 KB of results
Time: Read 1 MB, filter at storage layer
The predicate push-down is particularly effective for high-cardinality filters (specific symbols) and range predicates (specific date ranges).
4.2 Vectorized Execution
Within the query engine, TickDB uses vectorized execution — processing batches of rows (typically 1,024 rows per batch) using SIMD (Single Instruction Multiple Data) CPU instructions. Instead of processing one row at a time in a loop:
# Slow: Row-by-row processing
result = []
for row in dataset:
if row.timestamp > start and row.timestamp < end:
result.append(row.close)
The vectorized engine processes 1,024 rows simultaneously:
# Fast: Vectorized batch processing
timestamps = dataset.column("timestamp")
closes = dataset.column("close")
# SIMD comparison — processes 1024 values per CPU cycle
mask = (timestamps > start) & (timestamps < end)
result = closes[mask]
This vectorization provides 10x to 50x throughput improvement for filter-and-aggregate operations, which constitute the majority of backtesting query patterns.
4.3 Result Caching
For frequently repeated queries — a common pattern in iterative strategy development where researchers run the same backtest repeatedly while tuning parameters — TickDB implements a multi-tier result cache:
| Cache tier | TTL | Storage | Use case |
|---|---|---|---|
| L1 (in-memory) | 60 seconds | RAM | Rapid iterative development |
| L2 (NVMe) | 1 hour | SSD | Repeated backtest runs, multiple strategy variants |
| L3 (metadata) | Persistent | SSD | Query plan cache (skips planning for repeated query patterns) |
The caching strategy is transparent to the user. When a cached result is available, the API response includes a X-Cache-Hit: true header for observability.
5. How It Performs: Benchmark Results
Abstract architectural descriptions are useful, but quant researchers want numbers. The following benchmarks were run on TickDB's standard infrastructure using the /v1/market/kline endpoint.
Test environment: TickDB API v1, standard plan, US East region.
| Query | Data scanned | Response time | Notes |
|---|---|---|---|
| AAPL daily bars, 1 year | ~2,550 rows | 12 ms | Single symbol, warm storage |
| SPY daily bars, 10 years | ~2,520 rows | 18 ms | Single symbol, mixed hot/cold |
| Top 50 S&P 500 daily bars, 5 years | ~63,750 rows | 85 ms | Batch query, warm storage |
| AAPL 1-minute bars, 30 days | ~19,500 rows | 28 ms | Intraday, hot storage |
| AAPL 1-minute bars, 1 year | ~237,600 rows | 210 ms | High-resolution annual, warm storage |
| All US equity daily bars, 3 years | ~8.8M rows | 1.4 seconds | Full universe scan, cold storage |
The critical observation: response time scales with the number of rows returned, not with the total data stored. A 10-year daily bar query for a single symbol returns the same number of rows (~2,520) as a 1-year query. The storage architecture ensures that irrelevant rows are never read.
6. Operational Implications for Different User Segments
6.1 Individual Quant Developers (Free / Starter Plan)
The free tier includes access to 2 years of daily OHLCV data and 30 days of intraday (1-minute) bars. For the iterative development loop — building a strategy on recent data, validating on the last 6 months held out — this is sufficient. The hot-storage tier ensures that every query during development feels instantaneous.
Practical tip: Use the /v1/market/kline endpoint with limit parameter for paginated access during initial exploration. Switch to direct time-range queries (start_time, end_time) once you have identified your strategy parameters.
6.2 Professional and Team Plans (1–10 Years of History)
The professional and team plans unlock 5 to 10 years of daily OHLCV history and 1 year of intraday bars. The architecture is designed so that a 10-year backtest completes in under 200 ms for single-symbol queries — fast enough to run inside a Jupyter notebook cell without asynchronous handling.
Practical tip: For multi-symbol strategies that run 50+ symbols simultaneously, batch your requests using the symbol list parameter. Parallel API calls to /v1/market/kline for different symbols are independent and can be executed concurrently in your application.
6.3 Enterprise Plans (Full Historical + Tick Data)
Enterprise plans provide access to the full 10+ year OHLCV archive, HK equity tick-level trade data (note: tick-level trades are not available for US equities), and order book depth data. The cold-storage tier is designed for scheduled batch backtest jobs rather than interactive queries. For enterprise workloads, TickDB offers dedicated query compute capacity with guaranteed SLAs.
7. Comparing Storage Architectures
How does TickDB's approach compare to alternatives that quant teams commonly evaluate?
| Dimension | Generic SQL (PostgreSQL/MySQL) | General-Purpose Time-Series DB (InfluxDB) | TickDB |
|---|---|---|---|
| Storage format | Row-store | TSM / LSM-tree | Columnar (Parquet-style) |
| Time-range query efficiency | O(n) full scan | Better than row-store, but WAL overhead | O(scanned partitions) via Min-Max |
| Compression ratio (OHLCV) | 2–3x | 3–5x | 5–10x |
| 10-year single-symbol query | 2–10 seconds | 500 ms–2 seconds | <50 ms |
| 10-year full-universe query | 30+ seconds | 10–30 seconds | 1–3 seconds |
| Cross-cycle backtest support | Requires manual partitioning | Partial | Native (hot-cold tiers) |
| API-first design | Requires ORM / SQL client | HTTP API available | Native REST + WebSocket |
| Multi-timeframe OHLCV | Separate tables or wide table | Separate measurements | Native (1m, 5m, 15m, 1h, 1d) |
The comparison table makes the architectural tradeoffs clear. Generic SQL databases are general-purpose tools that pay a penalty for OHLCV-specific access patterns. Purpose-built time-series databases improve on this but still carry write-optimized LSM-tree overhead that is unnecessary for immutable data. TickDB's columnar, append-only architecture is the most efficient for the specific workload: immutable, time-range-oriented, read-heavy OHLCV queries.
8. What TickDB Cannot Do (And Why That Matters)
Intellectual honesty requires stating the boundaries of TickDB's data capabilities alongside its strengths.
Tick-level trade data for US equities is not available. The trades endpoint covers HK equities and cryptocurrency pairs but does not support US equity tick data. If your strategy requires tick-by-tick order flow analysis for US stocks — intrabar price dynamics, volume-weighted average price (VWAP) calculation at tick resolution, or quote fade detection — you will need a specialist vendor (e.g., FINRA OTC, Exchange TAP, or proprietary feeds).
Order book depth is not available for forex, precious metals, or indices. The depth channel supports US equity (Level 1), HK equity (Level 1–10), and cryptocurrency (Level 1–10) pairs. For forex pairs like EUR/USD, TickDB provides OHLCV and trade data but not order book depth snapshots.
These are not architectural oversights. They reflect deliberate scope decisions: TickDB focuses on OHLCV data with depth support for markets where exchange data policies permit it, rather than attempting to be a comprehensive vendor across all asset classes and data types.
9. Practical Integration: Querying TickDB's Historical Data
For developers integrating TickDB's historical data into their backtesting pipelines, here is the recommended API workflow:
import os
import requests
import time
TICKDB_API_KEY = os.environ.get("TICKDB_API_KEY")
BASE_URL = "https://api.tickdb.ai/v1"
def fetch_kline(symbol, interval, start_time, end_time, limit=1000):
"""
Fetch OHLCV klines for a symbol within a time range.
Implements pagination for large ranges.
Args:
symbol: Ticker symbol (e.g., "AAPL.US")
interval: Timeframe (e.g., "1d", "1h", "1m")
start_time: Unix timestamp for range start
end_time: Unix timestamp for range end
limit: Maximum bars per request (server-side limit: 1000)
Returns:
List of OHLCV dictionaries
"""
headers = {"X-API-Key": TICKDB_API_KEY}
all_bars = []
current_start = start_time
while current_start < end_time:
params = {
"symbol": symbol,
"interval": interval,
"start_time": current_start,
"end_time": end_time,
"limit": min(limit, 1000) # Respect server-side limit
}
response = requests.get(
f"{BASE_URL}/market/kline",
headers=headers,
params=params,
timeout=(3.05, 10) # Connect timeout, read timeout
)
if response.status_code == 429:
# Rate limited — respect Retry-After header
retry_after = int(response.headers.get("Retry-After", 5))
time.sleep(retry_after)
continue
response.raise_for_status()
data = response.json()
if data.get("code") != 0:
raise RuntimeError(f"API error {data.get('code')}: {data.get('message')}")
bars = data.get("data", {}).get("klines", [])
if not bars:
break
all_bars.extend(bars)
# Advance cursor: use last bar's timestamp + 1 interval unit
current_start = bars[-1]["timestamp"] + 1
# ⚠️ For production HFT workloads, use aiohttp/asyncio for concurrent requests
return all_bars
def backtest_strategy(symbol, interval, start_time, end_time):
"""
Example: Fetch data and run a simple moving average crossover backtest.
"""
bars = fetch_kline(symbol, interval, start_time, end_time)
if len(bars) < 200:
raise ValueError(f"Insufficient data: {len(bars)} bars retrieved")
closes = [bar["close"] for bar in bars]
# Simple 20/50 SMA crossover
sma_20 = []
sma_50 = []
for i in range(len(closes)):
if i >= 19:
sma_20.append(sum(closes[i-19:i+1]) / 20)
else:
sma_20.append(None)
if i >= 49:
sma_50.append(sum(closes[i-49:i+1]) / 50)
else:
sma_50.append(None)
# Generate signals (simplified — no position sizing, no slippage)
signals = []
for i in range(50, len(closes)):
if sma_20[i] is not None and sma_50[i] is not None:
if sma_20[i] > sma_50[i] and sma_20[i-1] <= sma_50[i-1]:
signals.append({"timestamp": bars[i]["timestamp"], "action": "BUY"})
elif sma_20[i] < sma_50[i] and sma_20[i-1] >= sma_50[i-1]:
signals.append({"timestamp": bars[i]["timestamp"], "action": "SELL"})
return signals
# Example: Run a 10-year backtest on AAPL daily bars
if __name__ == "__main__":
import datetime
end = int(datetime.datetime.now().timestamp())
start = int((datetime.datetime.now() - datetime.timedelta(days=3650)).timestamp())
signals = backtest_strategy("AAPL.US", "1d", start, end)
print(f"Generated {len(signals)} trading signals over 10 years")
Key engineering decisions in this code:
- Pagination loop: The
limit=1000parameter is the server-side maximum per request. For a 10-year daily bar query (~2,520 bars), three requests are made. The cursor advances using the last returned bar's timestamp. - Timeout configuration:
(3.05, 10)sets a 3.05-second connection timeout and a 10-second read timeout. This prevents the client from hanging indefinitely on slow responses. - Rate-limit handling: HTTP 429 responses trigger a read of the
Retry-Afterheader and a sleep before retry. This is the production-resilient pattern. - Environment-variable auth: The API key is loaded from
TICKDB_API_KEY, never hardcoded.
Closing: The Architecture Enables the Research
Storage architecture is not a product marketing bullet point. It is the foundational constraint that determines what questions you can ask of your data, how fast you can ask them, and how much your infrastructure costs to answer.
TickDB's columnar storage, tiered partitioning, Min-Max index optimization, and hot-cold storage separation are not independent design decisions. They form an integrated system: immutable data enables columnar compression; columnar compression makes hot-cold tiering cost-effective; hot-cold tiering enables sub-second queries on 10-year datasets; Min-Max indexes ensure that queries scale with result set size, not storage volume.
For the quant researcher, the practical outcome is this: the question "does my strategy work across two market cycles?" becomes a 20-millisecond API call. The iterative development loop — build, backtest, tweak, backtest — compresses from minutes per iteration to seconds. The long-horizon backtest that previously required a dedicated data engineering pipeline runs in a single Python script.
That is the value of getting the architecture right. Not the specifications on a features page — the research questions that become tractable, and the strategies that get discovered because the researcher was not waiting.
Next Steps
If you are an individual quant developer: Sign up at tickdb.ai for a free API key (no credit card required) and start querying historical OHLCV data for your strategy. The free tier includes 2 years of daily bars and 30 days of intraday data — sufficient for initial strategy development.
If you need full historical depth: Professional and team plans provide 5–10 years of OHLCV history across US equities, HK stocks, A-shares, and cryptocurrency markets. Visit tickdb.ai/pricing for plan details.
If you are building institutional infrastructure: Enterprise plans include dedicated query capacity, SLA guarantees, and access to TickDB's complete data catalog including order book depth for supported markets. Reach out to enterprise@tickdb.ai.
If you use AI coding assistants: Search for and install the tickdb-market-data SKILL in your AI tool's marketplace to get TickDB API integration templates and pre-built backtesting code snippets.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. TickDB is a market data infrastructure provider. Strategy design, backtesting, and live trading involve independent engineering decisions for which the user bears full responsibility.