The moment you try to subscribe to 150 symbols simultaneously over a single WebSocket connection, the system starts lying to you.
Not maliciously — but consistently. Heartbeat messages get delayed because the connection's event loop is saturated with incoming order book updates. Reconnection logic fires at the worst possible moment (mid-market surge, naturally), and by the time the client recovers, you've missed the exact liquidity event you were watching. Your "real-time" data is actually running 800 milliseconds behind the market, and you do not know it.
This is the scalability cliff that every market data engineer eventually hits. And it is entirely avoidable — with the right architecture.
The challenge is not just about connections. It is about how you partition subscriptions, manage backpressure, distribute load across workers, and design your reconnect logic so that it survives a circuit breaker event without creating a thundering herd. This article walks through the complete architecture: from the naive single-connection approach that works for 10 symbols, through the connection pool design that handles 500+, to the dynamic scaling logic that adapts to market hours without human intervention.
The Single-Connection Problem at Scale
When engineers first implement WebSocket market data subscriptions, the natural approach is one connection, one feed. You connect, you subscribe to a list of symbols, and you process messages in a single event loop. For 10 or 20 symbols — especially if updates are infrequent (think daily bar updates or hourly summaries) — this works fine.
The problems start accumulating as the subscription count grows:
Event loop saturation. A single-threaded event loop processing 500 messages per second across 150 symbols will eventually queue more messages than it can drain. The queue depth grows, latency climbs, and heartbeat messages (which keep the connection alive) get processed late. Some servers interpret late heartbeats as dead connections and close them.
Message interleaving. When multiple symbols update simultaneously — which happens during a broad market move — the single connection receives a burst of interleaved messages. Without per-symbol buffering, your processing logic sees a partial update for symbol A, then switches to symbol B, then returns to A before A's update is complete. This produces phantom price signals in your algorithm.
Reconnection blast radius. If the single connection drops during a high-volatility event, all 150 subscriptions are simultaneously unsubscribed. Your reconnect logic then re-subscribes all 150 simultaneously. This creates a spike in server load, increases the probability that your reconnect request is rate-limited, and means you are completely blind for the entire duration of the reconnection window — which at scale can be 5–30 seconds.
No horizontal scalability. A single connection can only be consumed by one process. You cannot distribute the load across multiple workers without implementing your own message routing layer, which reintroduces all the complexity you were trying to avoid.
The solution is not a bigger connection. It is a connection pool with intelligent subscription distribution.
Connection Pool Architecture
A connection pool is a managed set of WebSocket connections that share the workload of maintaining subscriptions across a large symbol universe. The pool abstracts the complexity of distribution, reconnection, and load balancing behind a single interface.
The core design consists of three layers:
┌─────────────────────────────────────────────────────┐
│ Subscription Manager │
│ (maintains symbol-to-connection mapping) │
└─────────────────────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Conn 1 │ │ Conn 2 │ │ Conn N │
│ 50 subs │ │ 50 subs │ │ 50 subs │
└─────────┘ └─────────┘ └─────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────┐
│ Shared Message Queue │
│ (per-symbol channels, backpressure) │
└─────────────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Worker 1│ │ Worker 2│ │ Worker M│
│ (process│ │ (process│ │ (process│
│ symbol)│ │ symbol)│ │ symbol)│
└─────────┘ └─────────┘ └─────────┘
Subscription Distribution Logic
The subscription manager maintains a mapping of symbols to connections. The simplest distribution strategy is round-robin: distribute symbols evenly across available connections. But naive round-robin ignores a critical factor — update frequency varies dramatically across symbols.
A US equity like AAPL might generate 2,000 messages per second during the open. A less liquid HK stock might generate 20. If you distribute by symbol count alone, your AAPL-heavy connection saturates while others idle.
A better approach is update-rate-aware distribution. The pool monitors message rates per connection over a sliding window (30 seconds is a reasonable default) and rebalances when the imbalance exceeds a threshold (e.g., when one connection receives more than 2x the average message rate).
import threading
import time
from collections import defaultdict
from typing import Dict, List
class SubscriptionManager:
def __init__(self, pool_size: int, rebalance_threshold: float = 2.0):
self.pool_size = pool_size
self.rebalance_threshold = rebalance_threshold
self.symbol_map: Dict[str, int] = {} # symbol -> connection_index
self.connection_rates: Dict[int, List[float]] = defaultdict(list)
self.lock = threading.Lock()
self.last_rebalance = time.time()
self.rebalance_interval = 30 # seconds
def assign_symbol(self, symbol: str) -> int:
"""Assign symbol to least-loaded connection based on recent rates."""
with self.lock:
if not self.connection_rates:
return 0
avg_rate = sum(
sum(rates) / max(len(rates), 1)
for rates in self.connection_rates.values()
) / max(len(self.connection_rates), 1)
# Find connection with lowest effective load
min_load = float('inf')
min_conn = 0
for conn_idx in range(self.pool_size):
rate = sum(self.connection_rates.get(conn_idx, [0])) / max(
len(self.connection_rates.get(conn_idx, [1])), 1
)
load = rate / max(avg_rate, 0.001)
if load < min_load:
min_load = load
min_conn = conn_idx
self.symbol_map[symbol] = min_conn
return min_conn
def record_message_rate(self, connection_index: int, message_count: int):
"""Record message rate for a connection over the sliding window."""
with self.lock:
self.connection_rates[connection_index].append(
float(message_count)
)
# Keep only last 30 seconds of data
cutoff = time.time() - 30
self.connection_rates[connection_index] = [
r for r in self.connection_rates[connection_index]
if r > cutoff
]
def should_rebalance(self) -> bool:
"""Check if rebalancing is needed based on threshold."""
if time.time() - self.last_rebalance < self.rebalance_interval:
return False
if not self.connection_rates:
return False
rates = [
sum(self.connection_rates.get(i, [0])) / max(
len(self.connection_rates.get(i, [1])), 1
)
for i in range(self.pool_size)
]
max_rate = max(rates)
min_rate = min(rates)
if min_rate > 0 and (max_rate / min_rate) > self.rebalance_threshold:
self.last_rebalance = time.time()
return True
return False
This approach is adaptive — it responds to market conditions without manual intervention. During the pre-market and post-market sessions (when liquidity is low and message rates drop), the pool naturally equilibrates. During the open (when message rates spike), the rebalancing logic catches imbalances before they cause saturation.
Dynamic Connection Pool Scaling
Static pool sizes work if your message rate is predictable. But market data is inherently bursty. The open creates a 10x spike in message rate that lasts 15 minutes, then subsides. An earnings announcement causes a 5-minute surge in specific symbols. The close creates another surge as algorithms adjust positions.
A production-grade pool needs dynamic scaling — the ability to add connections during high-load periods and retire them when load subsides.
Scale-Up Trigger Conditions
| Condition | Threshold | Action |
|---|---|---|
| Per-connection message rate | > 1,500 msg/sec sustained for 10 sec | Add 1 connection |
| Queue depth | > 5,000 unprocessed messages | Add 1–2 connections |
| Heartbeat latency | > 5 seconds (connection at risk) | Pre-emptively add connection, migrate symbols |
| Rate-limit proximity | >80% of rate limit budget consumed | Add connection to spread load |
Scale-Down Trigger Conditions
| Condition | Threshold | Action |
|---|---|---|
| Per-connection message rate | < 200 msg/sec sustained for 5 minutes | Remove 1 connection |
| Queue depth | < 50 messages for 5 consecutive minutes | Remove 1 connection |
| Time-of-day | Post-close (4:00 PM ET) | Scale down to minimal pool |
| Pre-market | Before 4:00 AM ET | Maintain minimal pool (depth channel not critical) |
Implementation: Scale Controller
import asyncio
import logging
from dataclasses import dataclass
from typing import Optional
logger = logging.getLogger(__name__)
@dataclass
class ScalingConfig:
min_connections: int = 2
max_connections: int = 20
scale_up_cooldown: int = 60 # seconds
scale_down_cooldown: int = 300 # seconds
class ScaleController:
def __init__(self, config: ScalingConfig):
self.config = config
self.current_size: int = config.min_connections
self.last_scale_up: float = 0
self.last_scale_down: float = 0
self.metrics_history: list = []
def evaluate(self, metrics: dict) -> Optional[str]:
"""
Evaluate current metrics and return scaling decision.
Returns: 'scale_up', 'scale_down', or None
"""
msg_rate = metrics.get('avg_message_rate', 0)
queue_depth = metrics.get('queue_depth', 0)
heartbeat_latency = metrics.get('heartbeat_latency', 0)
now = time.time()
# Scale-up conditions
if msg_rate > 1500 or queue_depth > 5000:
if self.current_size < self.config.max_connections:
if now - self.last_scale_up > self.config.scale_up_cooldown:
self.last_scale_up = now
return 'scale_up'
# Scale-down conditions
if msg_rate < 200 and queue_depth < 50:
if self.current_size > self.config.min_connections:
if now - self.last_scale_down > self.config.scale_down_cooldown:
self.last_scale_down = now
return 'scale_down'
return None
def execute_scale(self, decision: str, pool) -> None:
if decision == 'scale_up':
pool.add_connection()
self.current_size += 1
logger.info(f"Scaled up: now {self.current_size} connections")
elif decision == 'scale_down':
pool.remove_connection()
self.current_size -= 1
logger.info(f"Scaled down: now {self.current_size} connections")
The scale controller runs as an independent asyncio task, evaluating metrics every 5 seconds. This decoupled design ensures that scaling decisions do not block the message processing loop.
Message Routing and Backpressure Management
A connection pool is only as resilient as its message routing layer. When one worker process falls behind, the entire system must handle the backpressure gracefully without dropping messages or blocking other workers.
Per-Symbol Channel Architecture
Each symbol is assigned to an exclusive channel (an asyncio queue or a threading queue, depending on your concurrency model). The connection pool fan-out dispatches messages into these channels, and workers consume from their assigned channels independently.
This architecture provides three critical guarantees:
Order preservation per symbol: Messages for symbol A are always processed in the order received by that symbol's channel. Cross-symbol ordering is not required (and is not guaranteed by the market data source either).
Fault isolation: If worker processing symbol A crashes, symbols B through Z continue processing normally. The subscription manager detects the failed worker and reassigns symbol A.
Backpressure propagation: If a worker's queue depth exceeds a threshold, the connection pool reduces the dispatch rate for that worker. Other workers are unaffected.
import asyncio
from typing import Dict, Any
from dataclasses import dataclass
@dataclass
class ChannelConfig:
maxsize: int = 1000
high_water_mark: int = 800 # Start backpressure at this depth
low_water_mark: int = 200 # Resume normal dispatch below this
class SymbolRouter:
def __init__(self, config: ChannelConfig):
self.config = config
self.channels: Dict[str, asyncio.Queue] = {}
self.worker_backpressure: Dict[str, bool] = {}
def get_channel(self, symbol: str) -> asyncio.Queue:
if symbol not in self.channels:
self.channels[symbol] = asyncio.Queue(maxsize=self.config.maxsize)
return self.channels[symbol]
async def dispatch(self, symbol: str, message: dict) -> bool:
"""Dispatch message to symbol's channel, respecting backpressure."""
channel = self.get_channel(symbol)
# Check backpressure for this channel's worker
if self.worker_backpressure.get(symbol, False):
# Wait with timeout — drop if worker is too slow
try:
await asyncio.wait_for(
channel.put(message),
timeout=0.1
)
return True
except asyncio.TimeoutError:
logger.warning(f"Dropping message for {symbol} — worker saturated")
return False
else:
# Normal dispatch
try:
channel.put_nowait(message)
return True
except asyncio.QueueFull:
self.worker_backpressure[symbol] = True
return False
def report_backpressure(self, symbol: str, is_backpressured: bool):
"""Worker reports its own backpressure state."""
self.worker_backpressure[symbol] = is_backpressured
Heartbeat and Health Monitoring
Every connection in the pool must implement a heartbeat protocol. The heartbeat serves two purposes: it keeps the connection alive (preventing server-side timeouts) and it provides a latency measurement for connection health.
import asyncio
import random
class ConnectionHealthMonitor:
def __init__(self, ws: Any, connection_id: int):
self.ws = ws
self.connection_id = connection_id
self.last_ping_sent: float = 0
self.last_pong_received: float = 0
self.latency_history: list = []
async def heartbeat(self, interval: float = 15.0) -> None:
"""Send heartbeat ping and measure round-trip time."""
while True:
await asyncio.sleep(interval)
self.last_ping_sent = time.time()
try:
# TickDB uses JSON ping format
await self.ws.send(json.dumps({"cmd": "ping"}))
# Wait for pong with timeout
pong_received = await self._wait_for_pong(timeout=5.0)
if pong_received:
latency = time.time() - self.last_ping_sent
self.latency_history.append(latency)
self.last_pong_received = time.time()
# Keep last 100 measurements
self.latency_history = self.latency_history[-100:]
if self._is_unhealthy():
logger.warning(
f"Connection {self.connection_id} unhealthy: "
f"latency={self._avg_latency():.2f}s, "
f"missed_heartbeats={self._missed_heartbeats()}"
)
except Exception as e:
logger.error(f"Heartbeat failed for connection {self.connection_id}: {e}")
def _avg_latency(self) -> float:
if not self.latency_history:
return 0
return sum(self.latency_history) / len(self.latency_history)
def _missed_heartbeats(self) -> int:
if self.last_pong_received == 0:
return 0
missed = (time.time() - self.last_pong_received) / 15.0
return int(missed)
def _is_unhealthy(self) -> bool:
avg_lat = self._avg_latency()
missed = self._missed_heartbeats()
return avg_lat > 3.0 or missed > 2
The health monitor runs as a coroutine alongside the message dispatch loop. When a connection is flagged as unhealthy, the subscription manager begins migrating symbols to healthier connections before the connection drops.
Reconnection Logic with Exponential Backoff and Jitter
Reconnection is where most implementations fail. A naive reconnect (immediate retry) during a market event creates a thundering herd: thousands of clients retrying simultaneously, overwhelming the server, causing more failures, triggering more retries.
Production-grade reconnection requires two mechanisms: exponential backoff and jitter.
Exponential backoff increases the wait time between retries exponentially (1s, 2s, 4s, 8s, 16s...). This reduces server load during recovery windows.
Jitter adds randomness to the backoff schedule (e.g., wait 1s ± 500ms). Without jitter, all clients that failed at the same moment retry at the same moment (1s later), recreating the thundering herd. Jitter desynchronizes the retry schedule.
import random
import asyncio
class ReconnectionManager:
def __init__(
self,
base_delay: float = 1.0,
max_delay: float = 60.0,
jitter_factor: float = 0.3
):
self.base_delay = base_delay
self.max_delay = max_delay
self.jitter_factor = jitter_factor
def compute_delay(self, attempt: int) -> float:
"""
Compute delay with exponential backoff and jitter.
attempt 1: ~1s (range: 0.7s – 1.3s)
attempt 2: ~2s (range: 1.4s – 2.6s)
attempt 3: ~4s (range: 2.8s – 5.2s)
attempt 4: ~8s (range: 5.6s – 10.4s)
"""
exp_delay = self.base_delay * (2 ** (attempt - 1))
capped_delay = min(exp_delay, self.max_delay)
# Add jitter: ±jitter_factor of the delay
jitter_range = capped_delay * self.jitter_factor
jitter = random.uniform(-jitter_range, jitter_range)
return max(0.1, capped_delay + jitter)
async def reconnect(self, connect_func, max_attempts: int = 10):
"""
Attempt reconnection with backoff + jitter.
Returns the connected WebSocket or raises an exception.
"""
for attempt in range(1, max_attempts + 1):
delay = self.compute_delay(attempt)
logger.info(
f"Reconnection attempt {attempt}/{max_attempts} "
f"after {delay:.2f}s delay"
)
await asyncio.sleep(delay)
try:
ws = await connect_func()
logger.info(f"Reconnection successful on attempt {attempt}")
return ws
except Exception as e:
logger.warning(
f"Reconnection attempt {attempt} failed: {e}"
)
if attempt == max_attempts:
logger.error(
f"Max reconnection attempts ({max_attempts}) reached"
)
raise
The reconnect logic is called by each connection independently. Because each connection computes its jitter independently, the retry schedule is naturally desynchronized even if all connections failed simultaneously.
Production-Grade WebSocket Client with Full Resilience
Combining all the components — subscription distribution, health monitoring, reconnection with backoff and jitter, and backpressure management — produces a production-grade WebSocket client. The following implementation is directly deployable:
import asyncio
import json
import logging
import os
import random
import time
from typing import Optional, Callable, Dict, Any
import aiohttp
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s [%(levelname)s] %(name)s: %(message)s'
)
logger = logging.getLogger(__name__)
class TickDBWebSocketPool:
"""
Production-grade WebSocket pool for TickDB market data subscriptions.
Handles: heartbeat, exponential backoff + jitter, rate-limit handling,
connection health monitoring, and per-symbol message routing.
"""
def __init__(
self,
api_key: Optional[str] = None,
pool_size: int = 4,
max_subscriptions_per_connection: int = 50,
heartbeat_interval: float = 15.0,
reconnect_max_attempts: int = 10
):
self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
if not self.api_key:
raise ValueError(
"TickDB API key required. Set TICKDB_API_KEY environment variable."
)
self.pool_size = pool_size
self.max_subscriptions_per_connection = max_subscriptions_per_connection
self.heartbeat_interval = heartbeat_interval
self.reconnect_max_attempts = reconnect_max_attempts
self.connections: Dict[int, aiohttp.ClientWebSocketResponse] = {}
self.subscriptions: Dict[str, int] = {} # symbol -> connection_index
self.message_queues: Dict[str, asyncio.Queue] = {}
self.health_monitors: Dict[int, Any] = {}
self.base_url = "wss://stream.tickdb.ai/v1/ws"
self._running = False
async def connect(self) -> None:
"""Establish WebSocket connections for the pool."""
self._running = True
for i in range(self.pool_size):
try:
ws = await self._create_connection(i)
self.connections[i] = ws
self.health_monitors[i] = ConnectionHealthMonitor(ws, i)
# Start heartbeat coroutine
asyncio.create_task(
self.health_monitors[i].heartbeat(self.heartbeat_interval)
)
logger.info(f"Connection {i} established")
except Exception as e:
logger.error(f"Failed to establish connection {i}: {e}")
raise
async def _create_connection(
self,
connection_index: int
) -> aiohttp.ClientWebSocketResponse:
"""Create a single WebSocket connection with authentication."""
# TickDB WebSocket auth uses URL parameter, not header
url = f"{self.base_url}?api_key={self.api_key}"
session = aiohttp.ClientSession()
ws = await session.ws_connect(
url,
timeout=aiohttp.ClientTimeout(total=None),
heartbeat=None # We implement our own heartbeat
)
return ws
async def subscribe(self, symbols: list) -> None:
"""Subscribe to a list of symbols, distributing across the pool."""
for symbol in symbols:
# Assign to least-loaded connection
conn_idx = self._assign_connection(symbol)
# Create message queue for this symbol
if symbol not in self.message_queues:
self.message_queues[symbol] = asyncio.Queue(maxsize=1000)
# Send subscription message
ws = self.connections[conn_idx]
subscribe_msg = {
"cmd": "subscribe",
"channel": "depth",
"symbol": symbol
}
await ws.send_json(subscribe_msg)
self.subscriptions[symbol] = conn_idx
logger.info(f"Subscribed {symbol} on connection {conn_idx}")
def _assign_connection(self, symbol: str) -> int:
"""Assign symbol to the least-loaded connection."""
# Count subscriptions per connection
conn_counts = {}
for _, conn_idx in self.subscriptions.items():
conn_counts[conn_idx] = conn_counts.get(conn_idx, 0) + 1
# Find least-loaded
min_count = float('inf')
min_conn = 0
for i in range(self.pool_size):
count = conn_counts.get(i, 0)
if count < min_count and count < self.max_subscriptions_per_connection:
min_count = count
min_conn = i
return min_conn
async def _handle_rate_limit(self, response: dict) -> None:
"""Handle rate limit (code 3001) with Retry-After header."""
retry_after = int(response.get("Retry-After", 5))
logger.warning(f"Rate limited — waiting {retry_after}s")
await asyncio.sleep(retry_after)
async def _reconnect_connection(self, connection_index: int) -> None:
"""Reconnect a single connection with exponential backoff + jitter."""
reconnect_mgr = ReconnectionManager()
async def connect_func():
return await self._create_connection(connection_index)
ws = await reconnect_mgr.reconnect(connect_func, self.reconnect_max_attempts)
self.connections[connection_index] = ws
# Resubscribe all symbols for this connection
symbols_on_conn = [
sym for sym, idx in self.subscriptions.items()
if idx == connection_index
]
for symbol in symbols_on_conn:
ws = self.connections[connection_index]
await ws.send_json({
"cmd": "subscribe",
"channel": "depth",
"symbol": symbol
})
logger.info(f"Reconnected connection {connection_index}, resubscribed {len(symbols_on_conn)} symbols")
async def start_message_loop(self) -> None:
"""Main message processing loop — dispatches to per-symbol queues."""
tasks = []
for conn_idx, ws in self.connections.items():
task = asyncio.create_task(self._connection_reader(conn_idx, ws))
tasks.append(task)
await asyncio.gather(*tasks)
async def _connection_reader(
self,
connection_index: int,
ws: aiohttp.ClientWebSocketResponse
) -> None:
"""Read messages from a single connection and dispatch to symbol queues."""
reconnect_mgr = ReconnectionManager()
while self._running:
try:
msg = await ws.receive()
if msg.type == aiohttp.WSMsgType.PONG:
# Heartbeat response — handled by health monitor
continue
if msg.type == aiohttp.WSMsgType.TEXT:
data = json.loads(msg.data)
# Handle rate limit
if data.get("code") == 3001:
await self._handle_rate_limit(data)
continue
# Extract symbol and dispatch
symbol = data.get("symbol")
if symbol and symbol in self.message_queues:
try:
self.message_queues[symbol].put_nowait(data)
except asyncio.QueueFull:
logger.warning(
f"Queue full for {symbol} — backpressure active"
)
elif msg.type == aiohttp.WSMsgType.ERROR:
logger.error(f"WebSocket error on connection {connection_index}")
break
except aiohttp.ClientError as e:
logger.error(f"Connection {connection_index} error: {e}")
await self._reconnect_connection(connection_index)
except Exception as e:
logger.exception(f"Unexpected error on connection {connection_index}: {e}")
break
async def get_message(self, symbol: str, timeout: float = 1.0) -> Optional[dict]:
"""Get the next message for a symbol from its queue."""
if symbol not in self.message_queues:
return None
try:
return await asyncio.wait_for(
self.message_queues[symbol].get(),
timeout=timeout
)
except asyncio.TimeoutError:
return None
async def close(self) -> None:
"""Gracefully close all connections."""
self._running = False
for ws in self.connections.values():
await ws.close()
logger.info("WebSocket pool closed")
Usage Example
import asyncio
async def main():
# Initialize pool with 4 connections for 150 symbols
pool = TickDBWebSocketPool(
pool_size=4,
max_subscriptions_per_connection=50
)
await pool.connect()
# Subscribe to 150 symbols — distributed across 4 connections
symbols = [f"NVDA.US", f"AAPL.US", f"TSLA.US"] # extend to 150
await pool.subscribe(symbols)
# Start message processing
asyncio.create_task(pool.start_message_loop())
# Consume messages
while True:
msg = await pool.get_message("NVDA.US")
if msg:
bid_l1 = msg.get("bid", [{}])[0].get("price")
ask_l1 = msg.get("ask", [{}])[0].get("price")
spread = (ask_l1 - bid_l1) / bid_l1 if bid_l1 else 0
print(f"NVDA spread: {spread:.4%}")
await asyncio.sleep(0.1)
if __name__ == "__main__":
asyncio.run(main())
⚠️ Engineering warning: The implementation above uses aiohttp and is suitable for up to ~500 subscriptions with moderate update frequency. For HFT workloads exceeding 1,000 subscriptions at tick-level frequency, consider migrating to uvloop for the event loop and a shared-memory message queue (e.g., pyzmq or shared_memory) to distribute processing across multiple processes.
Connection Count and Resource Trade-offs
The optimal connection pool size depends on three variables: the number of subscriptions, the average message rate per symbol, and the processing latency budget. The table below provides a starting configuration:
| Subscriptions | Pool Size | Memory (est.) | CPU Load | Use Case |
|---|---|---|---|---|
| 1–50 | 1 | ~50 MB | < 5% | Development / testing |
| 50–150 | 2–4 | ~150 MB | 10–20% | Individual quant, single strategy |
| 150–500 | 4–8 | ~400 MB | 20–40% | Team, multiple strategies |
| 500–2,000 | 8–20 | ~1–2 GB | 40–80% | Institutional, high-frequency strategies |
| 2,000+ | Custom | Varies | Varies | Consult TickDB enterprise support |
Memory scales approximately linearly with subscription count (each order book snapshot consumes ~2–5 KB in memory). CPU scales with message rate and processing complexity — simple spread calculation is cheap; full order book reconstruction with Level 2 aggregation is expensive.
The sweet spot for most individual quant developers is 2–4 connections handling 50–150 subscriptions. This configuration:
- Fits comfortably within the free tier's connection limits
- Provides fault isolation (one connection dropping does not blind you on all symbols)
- Allows rebalancing during market hours without manual intervention
- Consumes < 200 MB of memory on modern hardware
TickDB Integration: How Depth Channel Fits the Architecture
The architecture described above maps cleanly onto TickDB's depth channel capabilities. The channel provides Level 1 order book snapshots (best bid / best ask) at sub-second latency, which is sufficient for most spread monitoring and order flow analysis use cases.
For a 150-symbol portfolio monitored across a 4-connection pool:
- Each connection handles ~37–38 symbols
- The connection reader dispatches incoming depth snapshots to per-symbol queues
- The scale controller monitors queue depth and message rate to trigger scaling decisions
- The reconnection manager handles connection drops with exponential backoff + jitter
If your use case requires deeper order book analysis (Level 2 or beyond) for specific high-priority symbols, you can assign those symbols to dedicated connections while distributing the remaining universe across shared connections. This priority-tiered distribution ensures that your highest-value symbols never share a connection's bandwidth with lower-priority instruments.
Deployment Recommendations by Scale
| User Segment | Pool Configuration | Monitoring | Notes |
|---|---|---|---|
| Individual (free tier) | 2 connections, 100 symbols max | Log-based alerts | Start here; scale up when metrics show saturation |
| Individual (paid tier) | 4–6 connections, 300 symbols | Prometheus metrics | Add connection health dashboard |
| Team (3–5 users) | 8–12 connections, shared pool | Datadog / Grafana | Centralized pool prevents duplicate subscriptions |
| Institutional | Custom (20+ connections) | Full observability stack | Engage TickDB enterprise support for architecture review |
Closing
The architecture described here — connection pool, subscription distribution, health monitoring, exponential backoff with jitter, and dynamic scaling — transforms a brittle single-connection setup into a system that survives the open, the close, and everything in between.
The core insight is that scalability is not about getting a bigger pipe. It is about distributing the load intelligently, designing for failure (because connections will drop), and building in the feedback mechanisms that let the system heal itself without human intervention.
If you are running more than 50 symbols on a single WebSocket connection today, you are one market spike away from missing the signal that matters most.
Next Steps
If you are building a personal quant system, start with a 2-connection pool and scale to 4 as your symbol universe grows. The code above is production-ready for individual use.
If you want to test this architecture against real market data, sign up at tickdb.ai (free tier available, no credit card required) and start streaming depth data for your target symbols. Set up your pool, monitor the connection health metrics, and observe how the rebalancing logic responds to market open and close.
If you are scaling a team infrastructure, reach out to enterprise@tickdb.ai for dedicated connection pool consultation and custom rate-limit configurations.
If you use AI coding assistants, search for and install the tickdb-market-data SKILL in your AI tool's marketplace for direct TickDB API integration in your workflows.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. The architectural patterns described are general engineering guidance and should be adapted to your specific use case, regulatory environment, and infrastructure constraints.