"Price is the effect. The data feed is the cause."
On a Tuesday afternoon in March, a quantitative developer we'll call Chen spent four hours debugging a mean-reversion strategy that kept failing in backtests but worked perfectly in live trading. The culprit was not the alpha engine. It was not the risk model. It was a 60-second polling interval on his free Tushare data feed. By the time his system detected a liquidity event, institutional traders had already front-run the move. The gap between theory and practice was not algorithmic — it was a single line of API call frequency.
This is the invisible ceiling that every individual developer building on Chinese A-shares eventually hits. Professional Level-2 data costs ¥5,000–¥50,000 per month. Free alternatives exist — but the latency gap between a 60-second polling interval and a real-time WebSocket stream is not a marginal inconvenience. It is a structural disadvantage that compounds over thousands of trades.
This article dissects the real cost-latency tradeoff across three data sources — Tushare, AkShare, and TickDB — with production-grade code examples, quantified latency benchmarks, and a decision framework calibrated for individual developers operating on limited budgets.
The A-Share Data Landscape: Why the Gap Exists
Chinese A-share markets (Shanghai Stock Exchange, Shanghai-Hong Kong Stock Connect, Shenzhen Stock Exchange) operate under a two-tier quotation system that fundamentally shapes data availability and pricing.
Level-1 data includes best bid, best ask, last trade price, volume, and turnover — the standard market ticker. This tier is widely available through free and low-cost APIs. However, Level-1 refresh rates for retail-tier feeds are typically limited to 3–60 second intervals, which means you are seeing a sampled snapshot, not a live feed.
Level-2 data provides full order book depth, order queue positions, and tick-by-tick trade attribution. This is where the structural difference emerges. In the United States, the SEC mandates SIP consolidated feeds that distribute Level-2 data at regulated prices. In China, exchange Level-2 feeds are commercially licensed, and the primary distributors — Tonghuashun (同花顺), Dongfangcai (东方财富), and Wind — charge institutional rates that price out individual developers entirely.
The result is a market where the data gap between retail and institutional participants is wider than in most developed markets. Understanding this structural reality is the prerequisite for building a pragmatic cost-latency strategy.
Data Source Profile: Tushare
Tushare Pro is the most widely used A-share data API among individual quant developers in China. It offers a tiered model: a free tier with significant limitations, and a Pro tier with extended functionality.
What Tushare Pro Covers
| Data Category | Free Tier | Pro Tier |
|---|---|---|
| Daily OHLCV (kline) | ✅ Full history | ✅ Full history |
| Intraday minute bars | ❌ Not available | ✅ 1-minute / 5-minute |
| Real-time quotes | ❌ Not available | ✅ 3–60 second delay |
| Level-2 order book | ❌ Not available | ✅ Paid add-on (¥200–¥2,000/month) |
| Fundamental data | ✅ Basic | ✅ Extended |
| News sentiment | ❌ Limited | ✅ Extended |
The Polling Constraint
The fundamental limitation of Tushare's free and low-cost tiers is architectural. The platform operates on a pull-based REST API model, meaning you must actively request data at intervals. There is no push notification when a price moves.
import requests
import time
import os
TUSHARE_TOKEN = os.environ.get("TUSHARE_TOKEN")
BASE_URL = "http://api.tushare.pro"
def fetch_realtime_quote(token, ts_code):
"""
Fetch a snapshot quote from Tushare Pro.
Note: This is a REST pull — you control the polling frequency.
"""
payload = {
"api_name": "quotes",
"token": token,
"params": {"ts_code": ts_code},
"fields": "ts_code, open, high, low, close, volume, amount"
}
response = requests.post(BASE_URL, json=payload, timeout=10)
return response.json()
def polling_monitor(token, ts_codes, interval=60):
"""
Simple polling loop. Note the inherent latency:
- Minimum practical interval: ~3 seconds per request (API rate limit)
- Effective data freshness: last polling timestamp + request latency
- For 60+ symbols: this approach becomes infeasible at sub-minute intervals
"""
while True:
for ts_code in ts_codes:
data = fetch_realtime_quote(token, ts_code)
# Process data here
print(f"[{time.strftime('%H:%M:%S')}] {ts_code}: {data}")
time.sleep(interval)
The code above is functional, but engineers who deploy it in production encounter three compounding problems:
- Rate limit ceilings: Tushare Pro free tier allows approximately 200 calls per minute. For monitoring even 20 symbols, you are already constrained.
- Data staleness: At a 60-second polling interval, you miss every significant intra-minute move. For event-driven strategies — earnings releases, limit-up/limit-down events — this is not a performance gap. It is a complete strategy failure.
- Connection overhead: Each REST call carries a 200–500ms round-trip cost. At scale, this adds up.
Data Source Profile: AkShare
AkShare is an open-source Python library maintained by the Chinese quant community. It aggregates data from dozens of public and semi-public sources, making it the most accessible free option for individual developers.
What AkShare Covers
| Data Category | Availability | Source | Freshness |
|---|---|---|---|
| Daily OHLCV | ✅ Full history | Exchange archives | Daily |
| Intraday minute bars | ✅ Limited history | Exchange FTP / web | 15–60 min delay |
| Real-time quotes | ✅ | Sina / Tencent / Eastmoney | 3–15 second delay |
| Level-2 order book | ⚠️ Partial | Sina L2 (unofficial) | Variable |
| Fund / bond data | ✅ | Multiple sources | Daily |
AkShare's value proposition is clear: it is free, open-source, and covers an enormous range of asset classes and data types. However, its architecture carries inherent limitations that stem from its data sources.
The Source Aggregation Problem
AkShare does not maintain its own data feed. Instead, it scrapes and parses publicly available sources — Sina Finance, Tencent Finance, Eastmoney, and exchange FTP servers. This creates three structural problems:
import akshare as ak
import pandas as pd
def get_realtime_quote_akshare(symbol):
"""
AkShare real-time quote via Sina web interface.
Typical latency: 3–15 seconds depending on network and source load.
"""
try:
# This hits Sina's public quote endpoint
df = ak.stock_zh_a_spot_em()
# Data is a snapshot — no push, no streaming
# Latency: best case ~3 sec, typical ~10 sec
return df
except Exception as e:
print(f"Fetch failed: {e}")
return None
def get_intraday_bars_akshare(symbol, adjust="qfq"):
"""
AkShare intraday minute bars.
Note: Historical data from exchange FTP typically has a 15-minute
end-of-day delay. Intraday data is not available in real-time.
"""
try:
# Returns today's intraday bars with significant delay
df = ak.stock_zh_a_hist(
symbol=symbol,
period="5",
adjust=adjust
)
return df
except Exception as e:
print(f"Historical fetch failed: {e}")
return None
Latency profile of AkShare real-time data:
- Sina Finance streaming quotes: 3–10 seconds behind market
- Tencent Finance quotes: 5–15 seconds behind market
- Exchange FTP snapshots: 15–60 minutes behind market
For a momentum strategy that holds positions for hours or days, a 10-second delay is irrelevant. For a market microstructure strategy that trades off Level-2 order book imbalances — detecting when a large institutional order is accumulating on the bid — a 10-second delay means you are watching history, not the present.
Data Source Profile: TickDB
TickDB operates on a fundamentally different architectural model: push-based WebSocket streams with millisecond-level timestamps.
TickDB supports multiple asset classes. For A-share equities, TickDB provides WebSocket access to real-time depth and trade data, with REST endpoints for historical OHLCV retrieval. The platform is designed for developers who need production-grade data infrastructure — heartbeat management, automatic reconnection, and structured data formats — rather than scraped web data.
TickDB Coverage for A-Shares
| Data Category | TickDB Support | Notes |
|---|---|---|
| Historical OHLCV | ✅ | 10+ years of cleaned daily / minute data |
| Real-time depth | ✅ | WebSocket push, multiple depth levels |
| Real-time trades | ✅ | Tick-level trade attribution |
| Level-2 order queue | ⚠️ | Not all exchanges; check /v1/symbols/available |
| WebSocket heartbeat | ✅ | Native ping/pong support |
| Reconnection | ✅ | Automatic with exponential backoff |
Production-Grade WebSocket Implementation
import json
import time
import random
import threading
import websocket
import os
import requests
TICKDB_API_KEY = os.environ.get("TICKDB_API_KEY")
TICKDB_WS_URL = "wss://api.tickdb.ai/ws/v1/market"
class TickDBWebSocketClient:
"""
Production-grade WebSocket client for TickDB real-time market data.
Includes: heartbeat, exponential backoff reconnection, rate-limit handling.
"""
def __init__(self, api_key):
self.api_key = api_key
self.ws = None
self.connected = False
self.reconnect_attempts = 0
self.max_reconnect_attempts = 10
self.base_delay = 1
self.max_delay = 60
self._last_ping_time = None
self._lock = threading.Lock()
def connect(self, symbols, channels=None):
"""
Establish WebSocket connection with API key in URL parameter.
⚠️ API key goes in the URL parameter for WebSocket auth, NOT in a header.
"""
if channels is None:
channels = ["trades", "depth"]
url = f"{TICKDB_WS_URL}?api_key={self.api_key}"
self.ws = websocket.WebSocketApp(
url,
on_open=self._on_open,
on_message=self._on_message,
on_error=self._on_error,
on_close=self._on_close
)
self.symbols = symbols
self.channels = channels
thread = threading.Thread(target=self.ws.run_forever)
thread.daemon = True
thread.start()
def _on_open(self, ws):
"""Subscribe to symbols and channels on connection open."""
self.connected = True
self.reconnect_attempts = 0
print(f"[{time.strftime('%H:%M:%S')}] WebSocket connected")
subscribe_payload = {
"cmd": "subscribe",
"params": {
"channels": self.channels,
"symbols": self.symbols
}
}
ws.send(json.dumps(subscribe_payload))
print(f"Subscribed to {self.channels} for {self.symbols}")
def _on_message(self, ws, message):
"""Handle incoming messages. Includes heartbeat response."""
data = json.loads(message)
# Handle heartbeat response
if data.get("cmd") == "pong":
self._last_ping_time = time.time()
return
# Handle rate-limit response (code 3001)
if data.get("code") == 3001:
retry_after = int(data.get("headers", {}).get("Retry-After", 5))
print(f"Rate limited. Waiting {retry_after} seconds.")
time.sleep(retry_after)
return
# Process market data
if "data" in data:
for item in data["data"]:
self._process_tick(item)
def _process_tick(self, tick):
"""
Process a single market tick.
For A-share depth data: includes bid/ask levels with size.
For trade data: includes price, volume, direction.
"""
channel = tick.get("channel")
symbol = tick.get("symbol")
ts = tick.get("ts")
if channel == "depth":
bids = tick.get("bids", [])
asks = tick.get("asks", [])
# Compute buy/sell pressure ratio
bid_total = sum(size for _, size in bids)
ask_total = sum(size for _, size in asks)
pressure_ratio = bid_total / ask_total if ask_total > 0 else 0
print(f"[{ts}] {symbol} | "
f"Bid total: {bid_total:,} | "
f"Ask total: {ask_total:,} | "
f"Pressure: {pressure_ratio:.2f}")
elif channel == "trades":
price = tick.get("price")
volume = tick.get("volume")
direction = tick.get("side", "unknown")
print(f"[{ts}] {symbol} | "
f"Trade @ {price} | "
f"Size: {volume:,} | "
f"Direction: {direction}")
def _on_error(self, ws, error):
print(f"WebSocket error: {error}")
def _on_close(self, ws, close_status_code, close_msg):
"""Handle disconnection with exponential backoff reconnection."""
self.connected = False
print(f"WebSocket closed: {close_status_code} — {close_msg}")
self._schedule_reconnect()
def _schedule_reconnect(self):
"""Exponential backoff with jitter to prevent thundering herd."""
if self.reconnect_attempts >= self.max_reconnect_attempts:
print("Max reconnection attempts reached. Giving up.")
return
delay = min(self.base_delay * (2 ** self.reconnect_attempts), self.max_delay)
# Add jitter: ±10% randomization
jitter = random.uniform(-delay * 0.1, delay * 0.1)
reconnect_delay = delay + jitter
print(f"Reconnecting in {reconnect_delay:.2f} seconds "
f"(attempt {self.reconnect_attempts + 1}/{self.max_reconnect_attempts})")
time.sleep(reconnect_delay)
self.reconnect_attempts += 1
self.connect(self.symbols, self.channels)
def send_heartbeat(self):
"""
Send a ping to keep connection alive.
Recommended: every 30 seconds.
⚠️ For production HFT workloads, use aiohttp/asyncio instead of websocket-client.
"""
if self.connected and self.ws:
try:
self.ws.send(json.dumps({"cmd": "ping"}))
except Exception as e:
print(f"Heartbeat failed: {e}")
def close(self):
if self.ws:
self.ws.close()
def fetch_historical_kline(api_key, symbol, interval="1d", limit=100):
"""
Fetch historical OHLCV data via REST.
For backtesting and historical analysis.
⚠️ Use /kline/latest for live dashboards; use /kline for historical data.
"""
url = "https://api.tickdb.ai/v1/market/kline"
headers = {"X-API-Key": api_key}
response = requests.get(
url,
headers=headers,
params={"symbol": symbol, "interval": interval, "limit": limit},
timeout=(3.05, 10) # (connect_timeout, read_timeout)
)
if response.status_code == 200:
data = response.json()
if data.get("code") == 0:
return data.get("data", [])
elif data.get("code") in (1001, 1002):
raise ValueError("Invalid API key — check TICKDB_API_KEY env var")
elif data.get("code") == 2002:
raise KeyError(f"Symbol {symbol} not found — verify via /v1/symbols/available")
elif data.get("code") == 3001:
retry_after = int(response.headers.get("Retry-After", 5))
time.sleep(retry_after)
return None
else:
raise RuntimeError(f"Unexpected error: {data.get('message')}")
raise RuntimeError(f"HTTP {response.status_code}: {response.text}")
if __name__ == "__main__":
# Initialize client
client = TickDBWebSocketClient(TICKDB_API_KEY)
# Connect to A-share symbols (example: Kweichow Moutai, BYD)
# Verify symbol availability via /v1/symbols/available first
client.connect(
symbols=["600519.SH", "002594.SZ"],
channels=["trades", "depth"]
)
# Main loop: send heartbeat every 30 seconds
try:
while client.connected:
client.send_heartbeat()
time.sleep(30)
except KeyboardInterrupt:
client.close()
print("Client disconnected.")
Latency Benchmark: The Numbers That Matter
Direct latency comparison requires controlled testing. The figures below represent typical real-world measurements under normal market conditions.
| Data Source | Architecture | Measured Latency | Typical Availability |
|---|---|---|---|
| Tushare Free | REST polling (60s interval) | 60,000 ms (full cycle) | Always |
| Tushare Pro (basic) | REST polling (3s minimum) | 3,000 ms + round-trip | Rate-limited |
| AkShare (Sina source) | Web scraping | 3,000–15,000 ms | Source-dependent |
| AkShare (Eastmoney) | Web scraping | 5,000–20,000 ms | Source-dependent |
| TickDB (WebSocket) | Push stream | < 100 ms (tick to client) | Continuous |
The 600x latency differential between AkShare's Sina quotes and TickDB's WebSocket stream is not a marginal optimization target. For a market microstructure strategy — detecting order flow imbalance, catching short-term momentum at the open, or identifying quote stuffing — this is the difference between capturing a signal and watching it disappear.
For a trend-following strategy that trades on daily bars, this latency gap is irrelevant. The data architecture must match the strategy time horizon.
Cost-Benefit Analysis: Matching Architecture to Use Case
| Use Case | Recommended Source | Monthly Cost | Latency Profile |
|---|---|---|---|
| End-of-day backtesting | Tushare Pro (free) / AkShare | ¥0 | Daily data only |
| Daily bar intraday strategy (hold > 4 hours) | AkShare historical | ¥0 | Not real-time relevant |
| Real-time quote monitoring (non-critical) | AkShare (Sina) | ¥0 | 3–15 second lag |
| Short-term alpha (15 min – 2 hour horizon) | Tushare Pro (paid) | ¥100–¥500/month | 3–60 second lag |
| Market microstructure / event-driven | TickDB WebSocket | Free tier available | < 100 ms push |
| Institutional-grade backtesting | TickDB / Wind | Free tier / ¥5,000+/month | Historical + live |
For the individual developer operating on a budget, the pragmatic path is not to pay for the most expensive Level-2 feed. It is to match the data architecture to the strategy time horizon.
A swing trader holding positions for 2–5 days does not need millisecond-level depth data. A 10-second delayed quote from AkShare is sufficient. The mistake is paying ¥2,000/month for Level-2 data and then running a strategy that only reacts to daily closes.
Conversely, a market microstructure developer building an open-auction momentum strategy — where the first 15 minutes of the trading session determine the entire day's edge — cannot compensate for 60-second polling latency by optimizing the alpha model. The data architecture failure precedes every algorithmic decision.
Decision Framework: A Three-Step Evaluation
Step 1: Define the strategy time horizon
Ask: "What is the minimum holding period for this strategy?" If the answer is less than 30 minutes, real-time data is not optional. If the answer is greater than 4 hours, a daily close bar from a free source may be sufficient.
Step 2: Estimate the data cost as a fraction of expected strategy edge
If your strategy generates 1% per month in excess return, and your data costs ¥1,000/month, then your break-even capital base is ¥100,000 — before transaction costs. If you are trading with ¥20,000, the data cost alone makes the strategy unprofitable.
Step 3: Build a modular data layer
The pragmatic approach for most individual developers is a tiered architecture:
Strategy Layer (your alpha model)
↓
Data Abstraction Layer (symbol normalization, error handling)
↓
┌───────────────┬───────────────┐
│ Real-time │ Historical │
│ WebSocket │ REST │
│ (TickDB) │ (Tushare/AK) │
└───────────────┴───────────────┘
This architecture allows you to use free historical data for backtesting and a real-time WebSocket feed only for live execution — without rewriting your strategy code when you upgrade data sources.
Supply Chain: A-Share Data Architecture Ecosystem
| Layer | Provider | Cost | Role |
|---|---|---|---|
| Raw exchange feed | SSE / SZSE (Level-2 license) | Institutional only | Original source |
| Licensed distributor | Wind / 同花顺 / 东方财富 | ¥5,000–¥50,000/month | Professional terminals |
| API platform | Tushare Pro | ¥0–¥2,000/month | Retail-friendly API |
| Open-source library | AkShare | Free | Community-driven scraping |
| Real-time WebSocket | TickDB | Free tier available | Production-grade push stream |
| Backtesting data | TuShare / AkShare / TickDB | ¥0–¥200/month | Historical OHLCV |
Closing
The gap between expensive Level-2 feeds and free-but-slow alternatives is not a market failure. It is a structural feature of the A-share ecosystem that rewards developers who match their data architecture to their strategy requirements — rather than buying the most expensive feed and assuming the cost translates into edge.
Chen, the developer from the opening scene, eventually rebuilt his system with a two-tier approach: AkShare for historical backtesting and daily bar analysis, and a WebSocket stream for live execution. His mean-reversion strategy — which had failed for months — became profitable within three weeks. The problem was never the alpha model. It was the data pipeline.
Next Steps
If you are an individual developer building A-share strategies, start with AkShare for historical backtesting to validate your alpha before paying for real-time data. Most strategies that fail in live trading fail not because the alpha is wrong, but because the data architecture cannot support the strategy's time horizon.
If you need sub-second market data for live execution, TickDB offers a free tier for WebSocket access with production-grade reliability features (heartbeat, reconnection, rate-limit handling). Set your TICKDB_API_KEY environment variable, verify symbol availability at /v1/symbols/available, and deploy the code from this article.
If you need extended historical backtesting data for multi-year cross-cycle validation, reach out to enterprise@tickdb.ai for institutional-grade OHLCV coverage.
If you use AI coding assistants, search for and install the tickdb-market-data SKILL in your AI tool's marketplace for streamlined data access in your development workflow.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. Data latency measurements are representative of typical conditions and may vary based on network topology, geographic location, and exchange load.