The moment you try to subscribe to 150 symbols simultaneously over a single WebSocket connection, the system starts lying to you.

Not maliciously — but consistently. Heartbeat messages get delayed because the connection's event loop is saturated with incoming order book updates. Reconnection logic fires at the worst possible moment (mid-market surge, naturally), and by the time the client recovers, you've missed the exact liquidity event you were watching. Your "real-time" data is actually running 800 milliseconds behind the market, and you do not know it.

This is the scalability cliff that every market data engineer eventually hits. And it is entirely avoidable — with the right architecture.

The challenge is not just about connections. It is about how you partition subscriptions, manage backpressure, distribute load across workers, and design your reconnect logic so that it survives a circuit breaker event without creating a thundering herd. This article walks through the complete architecture: from the naive single-connection approach that works for 10 symbols, through the connection pool design that handles 500+, to the dynamic scaling logic that adapts to market hours without human intervention.


The Single-Connection Problem at Scale

When engineers first implement WebSocket market data subscriptions, the natural approach is one connection, one feed. You connect, you subscribe to a list of symbols, and you process messages in a single event loop. For 10 or 20 symbols — especially if updates are infrequent (think daily bar updates or hourly summaries) — this works fine.

The problems start accumulating as the subscription count grows:

Event loop saturation. A single-threaded event loop processing 500 messages per second across 150 symbols will eventually queue more messages than it can drain. The queue depth grows, latency climbs, and heartbeat messages (which keep the connection alive) get processed late. Some servers interpret late heartbeats as dead connections and close them.

Message interleaving. When multiple symbols update simultaneously — which happens during a broad market move — the single connection receives a burst of interleaved messages. Without per-symbol buffering, your processing logic sees a partial update for symbol A, then switches to symbol B, then returns to A before A's update is complete. This produces phantom price signals in your algorithm.

Reconnection blast radius. If the single connection drops during a high-volatility event, all 150 subscriptions are simultaneously unsubscribed. Your reconnect logic then re-subscribes all 150 simultaneously. This creates a spike in server load, increases the probability that your reconnect request is rate-limited, and means you are completely blind for the entire duration of the reconnection window — which at scale can be 5–30 seconds.

No horizontal scalability. A single connection can only be consumed by one process. You cannot distribute the load across multiple workers without implementing your own message routing layer, which reintroduces all the complexity you were trying to avoid.

The solution is not a bigger connection. It is a connection pool with intelligent subscription distribution.


Connection Pool Architecture

A connection pool is a managed set of WebSocket connections that share the workload of maintaining subscriptions across a large symbol universe. The pool abstracts the complexity of distribution, reconnection, and load balancing behind a single interface.

The core design consists of three layers:

┌─────────────────────────────────────────────────────┐
│                  Subscription Manager                │
│   (maintains symbol-to-connection mapping)          │
└─────────────────────────────────────────────────────┘
                           │
         ┌─────────────────┼─────────────────┐
         ▼                 ▼                 ▼
    ┌─────────┐       ┌─────────┐       ┌─────────┐
    │ Conn 1  │       │ Conn 2  │       │ Conn N  │
    │ 50 subs │       │ 50 subs │       │ 50 subs │
    └─────────┘       └─────────┘       └─────────┘
         │                 │                 │
         ▼                 ▼                 ▼
    ┌─────────────────────────────────────────────┐
    │           Shared Message Queue               │
    │    (per-symbol channels, backpressure)      │
    └─────────────────────────────────────────────┘
                           │
         ┌─────────────────┼─────────────────┐
         ▼                 ▼                 ▼
    ┌─────────┐       ┌─────────┐       ┌─────────┐
    │ Worker 1│       │ Worker 2│       │ Worker M│
    │ (process│       │ (process│       │ (process│
    │  symbol)│       │  symbol)│       │  symbol)│
    └─────────┘       └─────────┘       └─────────┘

Subscription Distribution Logic

The subscription manager maintains a mapping of symbols to connections. The simplest distribution strategy is round-robin: distribute symbols evenly across available connections. But naive round-robin ignores a critical factor — update frequency varies dramatically across symbols.

A US equity like AAPL might generate 2,000 messages per second during the open. A less liquid HK stock might generate 20. If you distribute by symbol count alone, your AAPL-heavy connection saturates while others idle.

A better approach is update-rate-aware distribution. The pool monitors message rates per connection over a sliding window (30 seconds is a reasonable default) and rebalances when the imbalance exceeds a threshold (e.g., when one connection receives more than 2x the average message rate).

import threading
import time
from collections import defaultdict
from typing import Dict, List

class SubscriptionManager:
    def __init__(self, pool_size: int, rebalance_threshold: float = 2.0):
        self.pool_size = pool_size
        self.rebalance_threshold = rebalance_threshold
        self.symbol_map: Dict[str, int] = {}  # symbol -> connection_index
        self.connection_rates: Dict[int, List[float]] = defaultdict(list)
        self.lock = threading.Lock()
        self.last_rebalance = time.time()
        self.rebalance_interval = 30  # seconds

    def assign_symbol(self, symbol: str) -> int:
        """Assign symbol to least-loaded connection based on recent rates."""
        with self.lock:
            if not self.connection_rates:
                return 0
            avg_rate = sum(
                sum(rates) / max(len(rates), 1) 
                for rates in self.connection_rates.values()
            ) / max(len(self.connection_rates), 1)
            
            # Find connection with lowest effective load
            min_load = float('inf')
            min_conn = 0
            for conn_idx in range(self.pool_size):
                rate = sum(self.connection_rates.get(conn_idx, [0])) / max(
                    len(self.connection_rates.get(conn_idx, [1])), 1
                )
                load = rate / max(avg_rate, 0.001)
                if load < min_load:
                    min_load = load
                    min_conn = conn_idx
            
            self.symbol_map[symbol] = min_conn
            return min_conn

    def record_message_rate(self, connection_index: int, message_count: int):
        """Record message rate for a connection over the sliding window."""
        with self.lock:
            self.connection_rates[connection_index].append(
                float(message_count)
            )
            # Keep only last 30 seconds of data
            cutoff = time.time() - 30
            self.connection_rates[connection_index] = [
                r for r in self.connection_rates[connection_index] 
                if r > cutoff
            ]

    def should_rebalance(self) -> bool:
        """Check if rebalancing is needed based on threshold."""
        if time.time() - self.last_rebalance < self.rebalance_interval:
            return False
        
        if not self.connection_rates:
            return False
        
        rates = [
            sum(self.connection_rates.get(i, [0])) / max(
                len(self.connection_rates.get(i, [1])), 1
            )
            for i in range(self.pool_size)
        ]
        max_rate = max(rates)
        min_rate = min(rates)
        
        if min_rate > 0 and (max_rate / min_rate) > self.rebalance_threshold:
            self.last_rebalance = time.time()
            return True
        return False

This approach is adaptive — it responds to market conditions without manual intervention. During the pre-market and post-market sessions (when liquidity is low and message rates drop), the pool naturally equilibrates. During the open (when message rates spike), the rebalancing logic catches imbalances before they cause saturation.


Dynamic Connection Pool Scaling

Static pool sizes work if your message rate is predictable. But market data is inherently bursty. The open creates a 10x spike in message rate that lasts 15 minutes, then subsides. An earnings announcement causes a 5-minute surge in specific symbols. The close creates another surge as algorithms adjust positions.

A production-grade pool needs dynamic scaling — the ability to add connections during high-load periods and retire them when load subsides.

Scale-Up Trigger Conditions

Condition Threshold Action
Per-connection message rate > 1,500 msg/sec sustained for 10 sec Add 1 connection
Queue depth > 5,000 unprocessed messages Add 1–2 connections
Heartbeat latency > 5 seconds (connection at risk) Pre-emptively add connection, migrate symbols
Rate-limit proximity >80% of rate limit budget consumed Add connection to spread load

Scale-Down Trigger Conditions

Condition Threshold Action
Per-connection message rate < 200 msg/sec sustained for 5 minutes Remove 1 connection
Queue depth < 50 messages for 5 consecutive minutes Remove 1 connection
Time-of-day Post-close (4:00 PM ET) Scale down to minimal pool
Pre-market Before 4:00 AM ET Maintain minimal pool (depth channel not critical)

Implementation: Scale Controller

import asyncio
import logging
from dataclasses import dataclass
from typing import Optional

logger = logging.getLogger(__name__)

@dataclass
class ScalingConfig:
    min_connections: int = 2
    max_connections: int = 20
    scale_up_cooldown: int = 60  # seconds
    scale_down_cooldown: int = 300  # seconds

class ScaleController:
    def __init__(self, config: ScalingConfig):
        self.config = config
        self.current_size: int = config.min_connections
        self.last_scale_up: float = 0
        self.last_scale_down: float = 0
        self.metrics_history: list = []

    def evaluate(self, metrics: dict) -> Optional[str]:
        """
        Evaluate current metrics and return scaling decision.
        Returns: 'scale_up', 'scale_down', or None
        """
        msg_rate = metrics.get('avg_message_rate', 0)
        queue_depth = metrics.get('queue_depth', 0)
        heartbeat_latency = metrics.get('heartbeat_latency', 0)
        now = time.time()

        # Scale-up conditions
        if msg_rate > 1500 or queue_depth > 5000:
            if self.current_size < self.config.max_connections:
                if now - self.last_scale_up > self.config.scale_up_cooldown:
                    self.last_scale_up = now
                    return 'scale_up'

        # Scale-down conditions
        if msg_rate < 200 and queue_depth < 50:
            if self.current_size > self.config.min_connections:
                if now - self.last_scale_down > self.config.scale_down_cooldown:
                    self.last_scale_down = now
                    return 'scale_down'

        return None

    def execute_scale(self, decision: str, pool) -> None:
        if decision == 'scale_up':
            pool.add_connection()
            self.current_size += 1
            logger.info(f"Scaled up: now {self.current_size} connections")
        elif decision == 'scale_down':
            pool.remove_connection()
            self.current_size -= 1
            logger.info(f"Scaled down: now {self.current_size} connections")

The scale controller runs as an independent asyncio task, evaluating metrics every 5 seconds. This decoupled design ensures that scaling decisions do not block the message processing loop.


Message Routing and Backpressure Management

A connection pool is only as resilient as its message routing layer. When one worker process falls behind, the entire system must handle the backpressure gracefully without dropping messages or blocking other workers.

Per-Symbol Channel Architecture

Each symbol is assigned to an exclusive channel (an asyncio queue or a threading queue, depending on your concurrency model). The connection pool fan-out dispatches messages into these channels, and workers consume from their assigned channels independently.

This architecture provides three critical guarantees:

  1. Order preservation per symbol: Messages for symbol A are always processed in the order received by that symbol's channel. Cross-symbol ordering is not required (and is not guaranteed by the market data source either).

  2. Fault isolation: If worker processing symbol A crashes, symbols B through Z continue processing normally. The subscription manager detects the failed worker and reassigns symbol A.

  3. Backpressure propagation: If a worker's queue depth exceeds a threshold, the connection pool reduces the dispatch rate for that worker. Other workers are unaffected.

import asyncio
from typing import Dict, Any
from dataclasses import dataclass

@dataclass
class ChannelConfig:
    maxsize: int = 1000
    high_water_mark: int = 800  # Start backpressure at this depth
    low_water_mark: int = 200   # Resume normal dispatch below this

class SymbolRouter:
    def __init__(self, config: ChannelConfig):
        self.config = config
        self.channels: Dict[str, asyncio.Queue] = {}
        self.worker_backpressure: Dict[str, bool] = {}

    def get_channel(self, symbol: str) -> asyncio.Queue:
        if symbol not in self.channels:
            self.channels[symbol] = asyncio.Queue(maxsize=self.config.maxsize)
        return self.channels[symbol]

    async def dispatch(self, symbol: str, message: dict) -> bool:
        """Dispatch message to symbol's channel, respecting backpressure."""
        channel = self.get_channel(symbol)
        
        # Check backpressure for this channel's worker
        if self.worker_backpressure.get(symbol, False):
            # Wait with timeout — drop if worker is too slow
            try:
                await asyncio.wait_for(
                    channel.put(message), 
                    timeout=0.1
                )
                return True
            except asyncio.TimeoutError:
                logger.warning(f"Dropping message for {symbol} — worker saturated")
                return False
        else:
            # Normal dispatch
            try:
                channel.put_nowait(message)
                return True
            except asyncio.QueueFull:
                self.worker_backpressure[symbol] = True
                return False

    def report_backpressure(self, symbol: str, is_backpressured: bool):
        """Worker reports its own backpressure state."""
        self.worker_backpressure[symbol] = is_backpressured

Heartbeat and Health Monitoring

Every connection in the pool must implement a heartbeat protocol. The heartbeat serves two purposes: it keeps the connection alive (preventing server-side timeouts) and it provides a latency measurement for connection health.

import asyncio
import random

class ConnectionHealthMonitor:
    def __init__(self, ws: Any, connection_id: int):
        self.ws = ws
        self.connection_id = connection_id
        self.last_ping_sent: float = 0
        self.last_pong_received: float = 0
        self.latency_history: list = []

    async def heartbeat(self, interval: float = 15.0) -> None:
        """Send heartbeat ping and measure round-trip time."""
        while True:
            await asyncio.sleep(interval)
            self.last_ping_sent = time.time()
            
            try:
                # TickDB uses JSON ping format
                await self.ws.send(json.dumps({"cmd": "ping"}))
                
                # Wait for pong with timeout
                pong_received = await self._wait_for_pong(timeout=5.0)
                
                if pong_received:
                    latency = time.time() - self.last_ping_sent
                    self.latency_history.append(latency)
                    self.last_pong_received = time.time()
                    
                    # Keep last 100 measurements
                    self.latency_history = self.latency_history[-100:]
                    
                    if self._is_unhealthy():
                        logger.warning(
                            f"Connection {self.connection_id} unhealthy: "
                            f"latency={self._avg_latency():.2f}s, "
                            f"missed_heartbeats={self._missed_heartbeats()}"
                        )
            except Exception as e:
                logger.error(f"Heartbeat failed for connection {self.connection_id}: {e}")

    def _avg_latency(self) -> float:
        if not self.latency_history:
            return 0
        return sum(self.latency_history) / len(self.latency_history)

    def _missed_heartbeats(self) -> int:
        if self.last_pong_received == 0:
            return 0
        missed = (time.time() - self.last_pong_received) / 15.0
        return int(missed)

    def _is_unhealthy(self) -> bool:
        avg_lat = self._avg_latency()
        missed = self._missed_heartbeats()
        return avg_lat > 3.0 or missed > 2

The health monitor runs as a coroutine alongside the message dispatch loop. When a connection is flagged as unhealthy, the subscription manager begins migrating symbols to healthier connections before the connection drops.


Reconnection Logic with Exponential Backoff and Jitter

Reconnection is where most implementations fail. A naive reconnect (immediate retry) during a market event creates a thundering herd: thousands of clients retrying simultaneously, overwhelming the server, causing more failures, triggering more retries.

Production-grade reconnection requires two mechanisms: exponential backoff and jitter.

Exponential backoff increases the wait time between retries exponentially (1s, 2s, 4s, 8s, 16s...). This reduces server load during recovery windows.

Jitter adds randomness to the backoff schedule (e.g., wait 1s ± 500ms). Without jitter, all clients that failed at the same moment retry at the same moment (1s later), recreating the thundering herd. Jitter desynchronizes the retry schedule.

import random
import asyncio

class ReconnectionManager:
    def __init__(
        self,
        base_delay: float = 1.0,
        max_delay: float = 60.0,
        jitter_factor: float = 0.3
    ):
        self.base_delay = base_delay
        self.max_delay = max_delay
        self.jitter_factor = jitter_factor

    def compute_delay(self, attempt: int) -> float:
        """
        Compute delay with exponential backoff and jitter.
        
        attempt 1: ~1s (range: 0.7s – 1.3s)
        attempt 2: ~2s (range: 1.4s – 2.6s)
        attempt 3: ~4s (range: 2.8s – 5.2s)
        attempt 4: ~8s (range: 5.6s – 10.4s)
        """
        exp_delay = self.base_delay * (2 ** (attempt - 1))
        capped_delay = min(exp_delay, self.max_delay)
        
        # Add jitter: ±jitter_factor of the delay
        jitter_range = capped_delay * self.jitter_factor
        jitter = random.uniform(-jitter_range, jitter_range)
        
        return max(0.1, capped_delay + jitter)

    async def reconnect(self, connect_func, max_attempts: int = 10):
        """
        Attempt reconnection with backoff + jitter.
        Returns the connected WebSocket or raises an exception.
        """
        for attempt in range(1, max_attempts + 1):
            delay = self.compute_delay(attempt)
            logger.info(
                f"Reconnection attempt {attempt}/{max_attempts} "
                f"after {delay:.2f}s delay"
            )
            
            await asyncio.sleep(delay)
            
            try:
                ws = await connect_func()
                logger.info(f"Reconnection successful on attempt {attempt}")
                return ws
            except Exception as e:
                logger.warning(
                    f"Reconnection attempt {attempt} failed: {e}"
                )
                if attempt == max_attempts:
                    logger.error(
                        f"Max reconnection attempts ({max_attempts}) reached"
                    )
                    raise

The reconnect logic is called by each connection independently. Because each connection computes its jitter independently, the retry schedule is naturally desynchronized even if all connections failed simultaneously.


Production-Grade WebSocket Client with Full Resilience

Combining all the components — subscription distribution, health monitoring, reconnection with backoff and jitter, and backpressure management — produces a production-grade WebSocket client. The following implementation is directly deployable:

import asyncio
import json
import logging
import os
import random
import time
from typing import Optional, Callable, Dict, Any

import aiohttp

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s [%(levelname)s] %(name)s: %(message)s'
)
logger = logging.getLogger(__name__)

class TickDBWebSocketPool:
    """
    Production-grade WebSocket pool for TickDB market data subscriptions.
    Handles: heartbeat, exponential backoff + jitter, rate-limit handling,
    connection health monitoring, and per-symbol message routing.
    """
    
    def __init__(
        self,
        api_key: Optional[str] = None,
        pool_size: int = 4,
        max_subscriptions_per_connection: int = 50,
        heartbeat_interval: float = 15.0,
        reconnect_max_attempts: int = 10
    ):
        self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
        if not self.api_key:
            raise ValueError(
                "TickDB API key required. Set TICKDB_API_KEY environment variable."
            )
        
        self.pool_size = pool_size
        self.max_subscriptions_per_connection = max_subscriptions_per_connection
        self.heartbeat_interval = heartbeat_interval
        self.reconnect_max_attempts = reconnect_max_attempts
        
        self.connections: Dict[int, aiohttp.ClientWebSocketResponse] = {}
        self.subscriptions: Dict[str, int] = {}  # symbol -> connection_index
        self.message_queues: Dict[str, asyncio.Queue] = {}
        self.health_monitors: Dict[int, Any] = {}
        
        self.base_url = "wss://stream.tickdb.ai/v1/ws"
        self._running = False

    async def connect(self) -> None:
        """Establish WebSocket connections for the pool."""
        self._running = True
        
        for i in range(self.pool_size):
            try:
                ws = await self._create_connection(i)
                self.connections[i] = ws
                self.health_monitors[i] = ConnectionHealthMonitor(ws, i)
                
                # Start heartbeat coroutine
                asyncio.create_task(
                    self.health_monitors[i].heartbeat(self.heartbeat_interval)
                )
                
                logger.info(f"Connection {i} established")
            except Exception as e:
                logger.error(f"Failed to establish connection {i}: {e}")
                raise

    async def _create_connection(
        self, 
        connection_index: int
    ) -> aiohttp.ClientWebSocketResponse:
        """Create a single WebSocket connection with authentication."""
        # TickDB WebSocket auth uses URL parameter, not header
        url = f"{self.base_url}?api_key={self.api_key}"
        
        session = aiohttp.ClientSession()
        ws = await session.ws_connect(
            url,
            timeout=aiohttp.ClientTimeout(total=None),
            heartbeat=None  # We implement our own heartbeat
        )
        
        return ws

    async def subscribe(self, symbols: list) -> None:
        """Subscribe to a list of symbols, distributing across the pool."""
        for symbol in symbols:
            # Assign to least-loaded connection
            conn_idx = self._assign_connection(symbol)
            
            # Create message queue for this symbol
            if symbol not in self.message_queues:
                self.message_queues[symbol] = asyncio.Queue(maxsize=1000)
            
            # Send subscription message
            ws = self.connections[conn_idx]
            subscribe_msg = {
                "cmd": "subscribe",
                "channel": "depth",
                "symbol": symbol
            }
            await ws.send_json(subscribe_msg)
            
            self.subscriptions[symbol] = conn_idx
            logger.info(f"Subscribed {symbol} on connection {conn_idx}")

    def _assign_connection(self, symbol: str) -> int:
        """Assign symbol to the least-loaded connection."""
        # Count subscriptions per connection
        conn_counts = {}
        for _, conn_idx in self.subscriptions.items():
            conn_counts[conn_idx] = conn_counts.get(conn_idx, 0) + 1
        
        # Find least-loaded
        min_count = float('inf')
        min_conn = 0
        
        for i in range(self.pool_size):
            count = conn_counts.get(i, 0)
            if count < min_count and count < self.max_subscriptions_per_connection:
                min_count = count
                min_conn = i
        
        return min_conn

    async def _handle_rate_limit(self, response: dict) -> None:
        """Handle rate limit (code 3001) with Retry-After header."""
        retry_after = int(response.get("Retry-After", 5))
        logger.warning(f"Rate limited — waiting {retry_after}s")
        await asyncio.sleep(retry_after)

    async def _reconnect_connection(self, connection_index: int) -> None:
        """Reconnect a single connection with exponential backoff + jitter."""
        reconnect_mgr = ReconnectionManager()
        
        async def connect_func():
            return await self._create_connection(connection_index)
        
        ws = await reconnect_mgr.reconnect(connect_func, self.reconnect_max_attempts)
        self.connections[connection_index] = ws
        
        # Resubscribe all symbols for this connection
        symbols_on_conn = [
            sym for sym, idx in self.subscriptions.items() 
            if idx == connection_index
        ]
        for symbol in symbols_on_conn:
            ws = self.connections[connection_index]
            await ws.send_json({
                "cmd": "subscribe",
                "channel": "depth",
                "symbol": symbol
            })
        
        logger.info(f"Reconnected connection {connection_index}, resubscribed {len(symbols_on_conn)} symbols")

    async def start_message_loop(self) -> None:
        """Main message processing loop — dispatches to per-symbol queues."""
        tasks = []
        
        for conn_idx, ws in self.connections.items():
            task = asyncio.create_task(self._connection_reader(conn_idx, ws))
            tasks.append(task)
        
        await asyncio.gather(*tasks)

    async def _connection_reader(
        self, 
        connection_index: int, 
        ws: aiohttp.ClientWebSocketResponse
    ) -> None:
        """Read messages from a single connection and dispatch to symbol queues."""
        reconnect_mgr = ReconnectionManager()
        
        while self._running:
            try:
                msg = await ws.receive()
                
                if msg.type == aiohttp.WSMsgType.PONG:
                    # Heartbeat response — handled by health monitor
                    continue
                
                if msg.type == aiohttp.WSMsgType.TEXT:
                    data = json.loads(msg.data)
                    
                    # Handle rate limit
                    if data.get("code") == 3001:
                        await self._handle_rate_limit(data)
                        continue
                    
                    # Extract symbol and dispatch
                    symbol = data.get("symbol")
                    if symbol and symbol in self.message_queues:
                        try:
                            self.message_queues[symbol].put_nowait(data)
                        except asyncio.QueueFull:
                            logger.warning(
                                f"Queue full for {symbol} — backpressure active"
                            )
                            
                elif msg.type == aiohttp.WSMsgType.ERROR:
                    logger.error(f"WebSocket error on connection {connection_index}")
                    break
                    
            except aiohttp.ClientError as e:
                logger.error(f"Connection {connection_index} error: {e}")
                await self._reconnect_connection(connection_index)
            except Exception as e:
                logger.exception(f"Unexpected error on connection {connection_index}: {e}")
                break

    async def get_message(self, symbol: str, timeout: float = 1.0) -> Optional[dict]:
        """Get the next message for a symbol from its queue."""
        if symbol not in self.message_queues:
            return None
        
        try:
            return await asyncio.wait_for(
                self.message_queues[symbol].get(),
                timeout=timeout
            )
        except asyncio.TimeoutError:
            return None

    async def close(self) -> None:
        """Gracefully close all connections."""
        self._running = False
        for ws in self.connections.values():
            await ws.close()
        logger.info("WebSocket pool closed")

Usage Example

import asyncio

async def main():
    # Initialize pool with 4 connections for 150 symbols
    pool = TickDBWebSocketPool(
        pool_size=4,
        max_subscriptions_per_connection=50
    )
    
    await pool.connect()
    
    # Subscribe to 150 symbols — distributed across 4 connections
    symbols = [f"NVDA.US", f"AAPL.US", f"TSLA.US"]  # extend to 150
    await pool.subscribe(symbols)
    
    # Start message processing
    asyncio.create_task(pool.start_message_loop())
    
    # Consume messages
    while True:
        msg = await pool.get_message("NVDA.US")
        if msg:
            bid_l1 = msg.get("bid", [{}])[0].get("price")
            ask_l1 = msg.get("ask", [{}])[0].get("price")
            spread = (ask_l1 - bid_l1) / bid_l1 if bid_l1 else 0
            print(f"NVDA spread: {spread:.4%}")
        await asyncio.sleep(0.1)

if __name__ == "__main__":
    asyncio.run(main())

⚠️ Engineering warning: The implementation above uses aiohttp and is suitable for up to ~500 subscriptions with moderate update frequency. For HFT workloads exceeding 1,000 subscriptions at tick-level frequency, consider migrating to uvloop for the event loop and a shared-memory message queue (e.g., pyzmq or shared_memory) to distribute processing across multiple processes.


Connection Count and Resource Trade-offs

The optimal connection pool size depends on three variables: the number of subscriptions, the average message rate per symbol, and the processing latency budget. The table below provides a starting configuration:

Subscriptions Pool Size Memory (est.) CPU Load Use Case
1–50 1 ~50 MB < 5% Development / testing
50–150 2–4 ~150 MB 10–20% Individual quant, single strategy
150–500 4–8 ~400 MB 20–40% Team, multiple strategies
500–2,000 8–20 ~1–2 GB 40–80% Institutional, high-frequency strategies
2,000+ Custom Varies Varies Consult TickDB enterprise support

Memory scales approximately linearly with subscription count (each order book snapshot consumes ~2–5 KB in memory). CPU scales with message rate and processing complexity — simple spread calculation is cheap; full order book reconstruction with Level 2 aggregation is expensive.

The sweet spot for most individual quant developers is 2–4 connections handling 50–150 subscriptions. This configuration:

  • Fits comfortably within the free tier's connection limits
  • Provides fault isolation (one connection dropping does not blind you on all symbols)
  • Allows rebalancing during market hours without manual intervention
  • Consumes < 200 MB of memory on modern hardware

TickDB Integration: How Depth Channel Fits the Architecture

The architecture described above maps cleanly onto TickDB's depth channel capabilities. The channel provides Level 1 order book snapshots (best bid / best ask) at sub-second latency, which is sufficient for most spread monitoring and order flow analysis use cases.

For a 150-symbol portfolio monitored across a 4-connection pool:

  • Each connection handles ~37–38 symbols
  • The connection reader dispatches incoming depth snapshots to per-symbol queues
  • The scale controller monitors queue depth and message rate to trigger scaling decisions
  • The reconnection manager handles connection drops with exponential backoff + jitter

If your use case requires deeper order book analysis (Level 2 or beyond) for specific high-priority symbols, you can assign those symbols to dedicated connections while distributing the remaining universe across shared connections. This priority-tiered distribution ensures that your highest-value symbols never share a connection's bandwidth with lower-priority instruments.


Deployment Recommendations by Scale

User Segment Pool Configuration Monitoring Notes
Individual (free tier) 2 connections, 100 symbols max Log-based alerts Start here; scale up when metrics show saturation
Individual (paid tier) 4–6 connections, 300 symbols Prometheus metrics Add connection health dashboard
Team (3–5 users) 8–12 connections, shared pool Datadog / Grafana Centralized pool prevents duplicate subscriptions
Institutional Custom (20+ connections) Full observability stack Engage TickDB enterprise support for architecture review

Closing

The architecture described here — connection pool, subscription distribution, health monitoring, exponential backoff with jitter, and dynamic scaling — transforms a brittle single-connection setup into a system that survives the open, the close, and everything in between.

The core insight is that scalability is not about getting a bigger pipe. It is about distributing the load intelligently, designing for failure (because connections will drop), and building in the feedback mechanisms that let the system heal itself without human intervention.

If you are running more than 50 symbols on a single WebSocket connection today, you are one market spike away from missing the signal that matters most.


Next Steps

If you are building a personal quant system, start with a 2-connection pool and scale to 4 as your symbol universe grows. The code above is production-ready for individual use.

If you want to test this architecture against real market data, sign up at tickdb.ai (free tier available, no credit card required) and start streaming depth data for your target symbols. Set up your pool, monitor the connection health metrics, and observe how the rebalancing logic responds to market open and close.

If you are scaling a team infrastructure, reach out to enterprise@tickdb.ai for dedicated connection pool consultation and custom rate-limit configurations.

If you use AI coding assistants, search for and install the tickdb-market-data SKILL in your AI tool's marketplace for direct TickDB API integration in your workflows.


This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. The architectural patterns described are general engineering guidance and should be adapted to your specific use case, regulatory environment, and infrastructure constraints.