A production trading system at 2 AM. The order book for 150 symbols needs updating every 500 milliseconds. The single WebSocket connection that worked flawlessly during development is now dropping messages, timing out, and crashing the entire pipeline. The on-call engineer faces an uncomfortable choice: scale vertically until the server runs out of memory, or redesign the entire subscription architecture from scratch.

This scenario is not hypothetical. It is the inevitable endpoint of every real-time data architecture that starts simple and scales without a plan. The gap between "subscribe to one symbol" and "subscribe to 100 symbols with sub-second latency and zero message loss" is not a configuration change. It is an architectural decision.

This article dissects that decision. We will walk through the evolution from single-connection subscriptions to connection-pooled architectures, analyze the trade-offs at each stage, and provide production-grade code that handles the failure modes you will encounter at scale.

The Core Problem: Why 100 Subscriptions Break Single Connections

Understanding why single connections fail under load requires examining three distinct pressure points: message throughput, backpressure handling, and failure blast radius.

Message Throughput Bottlenecks

A single WebSocket connection operates on a single TCP stream. When you subscribe to 100 symbols and each emits 10 messages per second, the connection must handle 1,000 messages per second on one stream. At 50 bytes per message average, this is 50 KB/s — well within any reasonable network capacity. The bottleneck is not bandwidth. It is the event loop.

WebSocket message processing in most runtimes (Node.js event loop, Python asyncio, JavaScript in the browser) is sequential. Each message must be deserialized, parsed, and dispatched before the next message can be processed. When the event loop blocks on a computationally expensive message (say, a depth snapshot with 50 levels), subsequent messages queue up in the kernel receive buffer. If the buffer overflows, the kernel drops packets, and you lose data.

Backpressure and Flow Control

WebSocket includes a built-in flow control mechanism via the ping/pong frame exchange. A receiver can signal that it is overwhelmed by not sending a pong immediately. However, most client libraries implement this incorrectly or not at all. The result is a one-way valve: the server sends data faster than the client can process it, and the connection eventually stalls or resets.

Failure Blast Radius

This is the most insidious problem. When a single connection drops — due to a network hiccup, a server-side restart, or an unhandled exception in the message handler — you lose all 100 subscriptions simultaneously. The reconnection sequence must resubscribe to every symbol, re-establish the order book state, and catch up on any missed data. During this window, which can last 2–10 seconds depending on the server's subscription throttle limits, your system is blind.

For a trading system, 10 seconds of blind operation across 100 positions is not an edge case. It is a weekly occurrence.

Architecture Evolution: Four Stages

The progression from single connection to production-grade connection pooling follows four distinct architectural stages. Each stage solves the problems of the previous stage while introducing new trade-offs.

Stage 1: Single Connection (Development)

┌─────────────────────────────────────────────────────────┐
│                   Application Process                    │
│  ┌─────────────────────────────────────────────────┐    │
│  │            WebSocket Client                      │    │
│  │  ┌─────────────────────────────────────────┐    │    │
│  │  │  Subscription Manager                    │    │    │
│  │  │  - symbol list: [AAPL, TSLA, NVDA...]   │    │    │
│  │  │  - message queue (unbounded)             │    │    │
│  │  └─────────────────────────────────────────┘    │    │
│  └─────────────────────────────────────────────────┘    │
│                         │                               │
│                    single TCP                            │
└─────────────────────────┼───────────────────────────────┘
                          │
                    ┌─────▼─────┐
                    │  TickDB   │
                    │ WebSocket │
                    └───────────┘

Characteristics:

  • One WebSocket connection to the data provider
  • All symbol subscriptions multiplexed over the single stream
  • Single message queue for the entire application
  • Reconnection resubscribes all symbols

When it works: Up to approximately 20–30 symbols with message rates below 5/second per symbol. Development and testing environments.

When it breaks: Beyond 30 symbols or 5 messages/second/symbol, message queue growth becomes unbounded, and the event loop cannot drain messages faster than they arrive.

Stage 2: Connection Multiplexing

┌─────────────────────────────────────────────────────────┐
│                   Application Process                    │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ WS Client #1 │  │ WS Client #2 │  │ WS Client #3 │  │
│  │ symbols:     │  │ symbols:     │  │ symbols:     │  │
│  │ [AAPL-N]     │  │ [NVDA-TSLA]  │  │ [META-GOOGL] │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└─────────────────────────────────────────────────────────┘
              │                │                │
         TCP #1            TCP #2            TCP #3
              │                │                │
        ┌─────┼────────────────┼────────────────┼─────┐
        │     ▼                ▼                ▼     │
        │  ┌─────────────────────────────────────────┐ │
        │  │           Load Balancer                  │ │
        │  └─────────────────────────────────────────┘ │
        │                  │                            │
        │             ┌─────▼─────┐                     │
        │             │  TickDB   │                     │
        │             │ WebSocket │                     │
        │             └───────────┘                     │
        └───────────────────────────────────────────────┘

Characteristics:

  • Multiple WebSocket connections, each subscribing to a subset of symbols
  • Load distribution via round-robin or symbol-hash partitioning
  • Independent reconnection per connection
  • Connection count typically capped by server-side subscription limits

When it works: Up to approximately 100 symbols with message rates below 10/second per symbol. Small-to-medium production deployments.

When it breaks: Static partitioning creates hot connections (high-activity symbols grouped together). Reconnection of one connection still blinds that subset of symbols.

Stage 3: Connection Pooling with Dynamic Scaling

┌─────────────────────────────────────────────────────────┐
│                   Connection Pool Manager                │
│  ┌─────────────────────────────────────────────────┐    │
│  │  Pool State:                                     │    │
│  │  - connections: Map<id, ConnectionState>        │    │
│  │  - symbol_map: Map<symbol, connection_id>        │    │
│  │  - load_score: Map<connection_id, float>         │    │
│  │  - target_load: 0.7 (configurable)               │    │
│  └─────────────────────────────────────────────────┘    │
│                         │                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ PooledConn #1│  │ PooledConn #2│  │ PooledConn #3│  │
│  │ healthy      │  │ degraded     │  │ spawning     │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└─────────────────────────────────────────────────────────┘

Characteristics:

  • Pool of WebSocket connections managed as a pool of resources
  • Dynamic rebalancing: symbols migrate between connections based on load
  • Automatic connection spawning when load exceeds threshold
  • Automatic connection teardown during low-activity periods
  • Per-connection health monitoring with circuit breaker behavior

When it works: 100–1,000 symbols with variable message rates. Production systems with SLA requirements.

When it breaks: Rebalancing events create momentary subscription churn. Requires careful handling of partial state during symbol migration.

Stage 4: Hierarchical Pool with Fan-Out Aggregation

┌─────────────────────────────────────────────────────────┐
│                  Global Aggregation Layer               │
│  ┌─────────────────────────────────────────────────┐    │
│  │  Unified Message Bus                             │    │
│  │  - topic: symbol name                            │    │
│  │  - subscriber: downstream consumers              │    │
│  └─────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘
         │              │              │
    ┌────▼────┐    ┌────▼────┐    ┌────▼────┐
    │ Zone A  │    │ Zone B  │    │ Zone C  │
    │ Pool    │    │ Pool    │    │ Pool    │
    │ 10 conn │    │ 10 conn │    │ 10 conn │
    └─────────┘    └─────────┘    └─────────┘

Characteristics:

  • Multiple connection pools organized by geographic zone or symbol class
  • Central aggregation layer normalizes and fans out messages
  • Enables cross-zone redundancy and failover
  • Supports different subscription tiers per symbol class

When it works: 1,000+ symbols, multi-region deployments, enterprise SLAs requiring 99.99% uptime.

Implementation complexity: High. Requires investment in monitoring, orchestration, and operational tooling.

Production-Grade Implementation

The following implementation covers Stage 3: a connection pool with dynamic scaling. This is the practical sweet spot for most production systems managing 100–1,000 real-time subscriptions.

Core Data Structures

import asyncio
import time
import random
import logging
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Set
from enum import Enum
import os

logger = logging.getLogger(__name__)


class ConnectionHealth(Enum):
    HEALTHY = "healthy"
    DEGRADED = "degraded"
    RECONNECTING = "reconnecting"
    FAILED = "failed"


@dataclass
class ConnectionState:
    """Tracks the state of a single pooled WebSocket connection."""
    id: str
    websocket: Optional[object] = None
    health: ConnectionHealth = ConnectionHealth.FAILED
    subscribed_symbols: Set[str] = field(default_factory=set)
    load_score: float = 0.0
    last_message_ts: float = field(default_factory=time.time)
    reconnect_attempts: int = 0
    max_reconnect_attempts: int = 10
    
    # Load scoring parameters
    messages_per_second: float = 0.0
    message_buffer_depth: int = 0
    
    def __post_init__(self):
        self.message_timestamps: List[float] = []
    
    def update_load(self, message_count: int, buffer_depth: int):
        """Update load metrics using an exponentially weighted moving average."""
        alpha = 0.3  # Smoothing factor
        self.messages_per_second = (
            alpha * message_count + 
            (1 - alpha) * self.messages_per_second
        )
        self.message_buffer_depth = buffer_depth
        # Load score: combination of message rate and buffer backlog
        self.load_score = (
            min(self.messages_per_second / 100, 1.0) * 0.7 +
            min(buffer_depth / 1000, 1.0) * 0.3
        )
        self.last_message_ts = time.time()


@dataclass
class PoolConfig:
    """Configuration for the connection pool."""
    min_connections: int = 2
    max_connections: int = 10
    target_load: float = 0.7  # Scale up when average load exceeds this
    scale_up_threshold: float = 0.8
    scale_down_threshold: float = 0.3
    symbols_per_connection_target: int = 20
    health_check_interval: float = 10.0  # seconds
    reconnect_base_delay: float = 1.0  # seconds
    reconnect_max_delay: float = 60.0  # seconds


# ⚠️ Production note: For HFT workloads with sub-100ms latency requirements,
# consider replacing asyncio with trio or using a compiled language (Rust/Go).
# Python's GIL creates a hard ceiling on message processing throughput.

Connection Pool Manager

class WebSocketConnectionPool:
    """
    Manages a pool of WebSocket connections with dynamic scaling.
    
    Design principles:
    1. No single connection exceeds its target symbol load
    2. Rebalancing happens incrementally to minimize churn
    3. Health monitoring triggers reconnection before complete failure
    4. Graceful degradation: new subscriptions queued if pool is at capacity
    """
    
    def __init__(
        self,
        api_key: str,
        ws_url: str,
        config: Optional[PoolConfig] = None
    ):
        self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
        if not self.api_key:
            raise ValueError(
                "TickDB API key required. Set TICKDB_API_KEY environment variable."
            )
        
        self.ws_url = ws_url
        self.config = config or PoolConfig()
        self.connections: Dict[str, ConnectionState] = {}
        self.symbol_to_connection: Dict[str, str] = {}
        self.subscription_queue: asyncio.Queue = asyncio.Queue()
        self._running = False
        self._lock = asyncio.Lock()
        
        # Rate limiting state
        self._rate_limit_remaining: int = 100
        self._rate_limit_reset: float = 0
    
    async def start(self):
        """Initialize the pool with minimum connections."""
        self._running = True
        
        # Spawn initial connections
        for i in range(self.config.min_connections):
            conn_id = f"conn-{i}"
            await self._spawn_connection(conn_id)
        
        # Start background tasks
        asyncio.create_task(self._health_monitor())
        asyncio.create_task(self._rebalancer())
        asyncio.create_task(self._subscription_processor())
        
        logger.info(
            f"Connection pool started with {self.config.min_connections} connections"
        )
    
    async def _spawn_connection(self, conn_id: str) -> ConnectionState:
        """Create and register a new WebSocket connection."""
        conn = ConnectionState(id=conn_id)
        self.connections[conn_id] = conn
        
        try:
            # WebSocket URL with API key as query parameter (not header)
            # This matches TickDB's WebSocket authentication spec
            ws_url = f"{self.ws_url}?api_key={self.api_key}"
            
            # Using aiohttp for production WebSocket handling
            import aiohttp
            async with aiohttp.ClientSession() as session:
                conn.websocket = await session.ws_connect(
                    ws_url,
                    heartbeat=15,  # ping/pong heartbeat interval
                    timeout=aiohttp.ClientTimeout(total=30)
                )
                conn.health = ConnectionHealth.HEALTHY
                conn.reconnect_attempts = 0
                
                # Start message consumer for this connection
                asyncio.create_task(
                    self._connection_consumer(conn_id, conn.websocket)
                )
                
                logger.info(f"Connection {conn_id} established")
        
        except Exception as e:
            logger.error(f"Failed to spawn connection {conn_id}: {e}")
            conn.health = ConnectionHealth.FAILED
            await self._schedule_reconnect(conn_id)
        
        return conn
    
    async def _connection_consumer(self, conn_id: str, websocket):
        """
        Dedicated consumer for a single connection's message stream.
        Runs in its own task to prevent cross-connection message blocking.
        """
        async for msg in websocket:
            async with self._lock:
                conn = self.connections.get(conn_id)
                if not conn or conn.health != ConnectionState.HEALTHY:
                    break
            
            if msg.type == aiohttp.WSMsgType.PONG:
                # Heartbeat acknowledged — connection is alive
                continue
            
            elif msg.type == aiohttp.WSMsgType.TEXT:
                await self._process_message(conn_id, msg.data)
            
            elif msg.type == aiohttp.WSMsgType.ERROR:
                logger.error(f"WebSocket error on {conn_id}: {msg.data}")
                async with self._lock:
                    self.connections[conn_id].health = ConnectionState.RECONNECTING
                await self._schedule_reconnect(conn_id)
                break
            
            elif msg.type == aiohttp.WSMsgType.CLOSE:
                logger.warning(f"Connection {conn_id} closed by server")
                async with self._lock:
                    self.connections[conn_id].health = ConnectionState.RECONNECTING
                await self._schedule_reconnect(conn_id)
                break
    
    async def _process_message(self, conn_id: str, raw_data: str):
        """Parse and dispatch a single message."""
        conn = self.connections.get(conn_id)
        if not conn:
            return
        
        try:
            import json
            data = json.loads(raw_data)
            
            # Update load metrics
            now = time.time()
            conn.message_timestamps.append(now)
            # Count messages in last second
            recent_count = sum(
                1 for ts in conn.message_timestamps
                if now - ts < 1.0
            )
            conn.update_load(recent_count, 0)  # Buffer depth tracked separately
            
            # Emit to symbol-specific handlers
            symbol = data.get("symbol")
            if symbol:
                await self._emit_to_symbol_subscribers(symbol, data)
        
        except json.JSONDecodeError as e:
            logger.warning(f"Invalid JSON on {conn_id}: {e}")
        except Exception as e:
            logger.error(f"Message processing error on {conn_id}: {e}")
    
    async def _emit_to_symbol_subscribers(self, symbol: str, data: dict):
        """Fan out message to all subscribers of this symbol."""
        # Implementation depends on your downstream consumer pattern
        # Could use asyncio.Event, pub/sub, or a message queue
        pass
    
    async def _schedule_reconnect(self, conn_id: str):
        """Schedule reconnection with exponential backoff and jitter."""
        conn = self.connections.get(conn_id)
        if not conn:
            return
        
        conn.reconnect_attempts += 1
        
        if conn.reconnect_attempts > conn.max_reconnect_attempts:
            logger.error(
                f"Connection {conn_id} exceeded max reconnection attempts. "
                f"Marking as failed."
            )
            conn.health = ConnectionState.FAILED
            return
        
        # Exponential backoff with jitter
        base_delay = self.config.reconnect_base_delay
        max_delay = self.config.reconnect_max_delay
        delay = min(base_delay * (2 ** conn.reconnect_attempts), max_delay)
        jitter = random.uniform(0, delay * 0.1)
        total_delay = delay + jitter
        
        logger.info(
            f"Scheduling reconnect for {conn_id} in {total_delay:.2f}s "
            f"(attempt {conn.reconnect_attempts}/{conn.max_reconnect_attempts})"
        )
        
        await asyncio.sleep(total_delay)
        
        # Re-spawn the connection
        async with self._lock:
            conn.health = ConnectionState.HEALTHY
        await self._spawn_connection(conn_id)
    
    async def subscribe(self, symbol: str):
        """Subscribe to a symbol, routing to the least-loaded connection."""
        # Queue-based to handle burst subscriptions without blocking
        await self.subscription_queue.put(symbol)
    
    async def _subscription_processor(self):
        """
        Background task that processes subscription requests from the queue.
        Implements batching and load-aware routing.
        """
        pending_symbols: List[str] = []
        
        while self._running:
            # Collect subscriptions for batch processing (every 100ms)
            try:
                symbol = await asyncio.wait_for(
                    self.subscription_queue.get(),
                    timeout=0.1
                )
                pending_symbols.append(symbol)
            except asyncio.TimeoutError:
                pass  # Process any pending symbols
            
            if pending_symbols:
                await self._route_subscriptions(pending_symbols)
                pending_symbols = []
    
    async def _route_subscriptions(self, symbols: List[str]):
        """Route symbols to appropriate connections based on current load."""
        for symbol in symbols:
            async with self._lock:
                # Check if already subscribed
                if symbol in self.symbol_to_connection:
                    continue
                
                # Find least-loaded healthy connection
                eligible = [
                    (cid, conn) for cid, conn in self.connections.items()
                    if conn.health == ConnectionState.HEALTHY
                ]
                
                if not eligible:
                    logger.warning(f"No healthy connections for {symbol}, queuing")
                    await self.subscription_queue.put(symbol)
                    continue
                
                # Select connection with lowest load
                target_conn_id, target_conn = min(
                    eligible,
                    key=lambda x: x[1].load_score
                )
                
                # Check if scaling is needed
                avg_load = sum(c.load_score for _, c in eligible) / len(eligible)
                if (
                    avg_load > self.config.scale_up_threshold and
                    len(self.connections) < self.config.max_connections
                ):
                    # Scale up: create new connection for this subscription
                    new_id = f"conn-{len(self.connections)}"
                    await self._spawn_connection(new_id)
                    target_conn_id = new_id
                    logger.info(f"Scaled up: new connection {new_id}")
                
                # Subscribe on target connection
                await self._send_subscription(target_conn_id, symbol)
                self.symbol_to_connection[symbol] = target_conn_id
                target_conn.subscribed_symbols.add(symbol)
                
                logger.debug(
                    f"Subscribed {symbol} to {target_conn_id} "
                    f"(load: {target_conn.load_score:.2f})"
                )
    
    async def _send_subscription(self, conn_id: str, symbol: str):
        """Send a subscription command to a specific connection."""
        conn = self.connections.get(conn_id)
        if not conn or not conn.websocket:
            raise RuntimeError(f"Connection {conn_id} not available")
        
        subscription_msg = {
            "cmd": "subscribe",
            "symbol": symbol,
            "channel": "depth"  # TickDB depth channel for order book
        }
        
        await conn.websocket.send_json(subscription_msg)
    
    async def _health_monitor(self):
        """
        Periodic health check: detect stale connections and trigger recovery.
        """
        while self._running:
            await asyncio.sleep(self.config.health_check_interval)
            
            async with self._lock:
                for conn_id, conn in self.connections.items():
                    now = time.time()
                    stale_seconds = now - conn.last_message_ts
                    
                    # Connection with no messages for >30 seconds is suspicious
                    if stale_seconds > 30 and conn.health == ConnectionState.HEALTHY:
                        logger.warning(
                            f"Connection {conn_id} stale ({stale_seconds:.0f}s "
                            f"since last message)"
                        )
                        conn.health = ConnectionState.DEGRADED
                    
                    # Health check: send ping and expect pong within 5 seconds
                    if conn.websocket and conn.health in (
                        ConnectionState.HEALTHY,
                        ConnectionState.DEGRADED
                    ):
                        try:
                            await asyncio.wait_for(
                                conn.websocket.ping(),
                                timeout=5.0
                            )
                        except asyncio.TimeoutError:
                            logger.warning(
                                f"Ping timeout on {conn_id}, scheduling reconnect"
                            )
                            conn.health = ConnectionState.RECONNECTING
                            asyncio.create_task(self._schedule_reconnect(conn_id))
    
    async def _rebalancer(self):
        """
        Periodically rebalances symbol assignments to even out load.
        Runs every 60 seconds to avoid excessive churn.
        """
        while self._running:
            await asyncio.sleep(60)
            
            async with self._lock:
                await self._execute_rebalance()
    
    async def _execute_rebalance(self):
        """Move symbols from high-load to low-load connections."""
        if len(self.connections) < 2:
            return
        
        # Calculate average load
        loads = [
            (cid, conn.load_score, len(conn.subscribed_symbols))
            for cid, conn in self.connections.items()
            if conn.health == ConnectionState.HEALTHY
        ]
        
        if not loads:
            return
        
        avg_load = sum(l[1] for l in loads) / len(loads)
        
        # Find overloaded and underloaded connections
        overloaded = [
            (cid, score, count) for cid, score, count in loads
            if score > avg_load * 1.3 and count > 1
        ]
        underloaded = [
            (cid, score, count) for cid, score, count in loads
            if score < avg_load * 0.7
        ]
        
        for hot_id, hot_score, hot_count in overloaded:
            for cold_id, _, cold_count in underloaded:
                if hot_count <= self.config.symbols_per_connection_target:
                    break
                
                # Migrate one symbol from hot to cold
                hot_conn = self.connections[hot_id]
                symbol_to_move = next(iter(hot_conn.subscribed_symbols))
                
                hot_conn.subscribed_symbols.remove(symbol_to_move)
                del self.symbol_to_connection[symbol_to_move]
                
                await self._send_subscription(cold_id, symbol_to_move)
                cold_conn = self.connections[cold_id]
                cold_conn.subscribed_symbols.add(symbol_to_move)
                self.symbol_to_connection[symbol_to_move] = cold_id
                
                logger.info(
                    f"Rebalanced {symbol_to_move} from {hot_id} to {cold_id}"
                )
                break  # One migration per rebalance cycle
    
    async def unsubscribe(self, symbol: str):
        """Unsubscribe from a symbol."""
        async with self._lock:
            conn_id = self.symbol_to_connection.get(symbol)
            if not conn_id:
                return
            
            conn = self.connections.get(conn_id)
            if conn and conn.websocket:
                await conn.websocket.send_json({
                    "cmd": "unsubscribe",
                    "symbol": symbol
                })
            
            conn.subscribed_symbols.discard(symbol)
            del self.symbol_to_connection[symbol]
    
    async def stop(self):
        """Graceful shutdown: close all connections and cancel tasks."""
        self._running = False
        
        async with self._lock:
            for conn_id, conn in self.connections.items():
                if conn.websocket:
                    await conn.websocket.close()
        
        logger.info("Connection pool stopped")

Usage Example

import asyncio
import os

async def main():
    # Initialize pool with TickDB WebSocket endpoint
    pool = WebSocketConnectionPool(
        api_key=os.environ.get("TICKDB_API_KEY"),
        ws_url="wss://api.tickdb.ai/v1/ws/market",
        config=PoolConfig(
            min_connections=3,
            max_connections=8,
            target_load=0.6,
            symbols_per_connection_target=25
        )
    )
    
    await pool.start()
    
    # Subscribe to a basket of tech stocks
    symbols = [
        "AAPL.US", "MSFT.US", "GOOGL.US", "NVDA.US", "TSLA.US",
        "META.US", "AMZN.US", "AMD.US", "INTC.US", "CRM.US",
        "ORCL.US", "ADBE.US", "NFLX.US", "PYPL.US", "COIN.US"
    ]
    
    for symbol in symbols:
        await pool.subscribe(symbol)
    
    # Keep running — messages processed in background
    await asyncio.Event().wait()

if __name__ == "__main__":
    asyncio.run(main())

Performance Trade-offs: Memory, CPU, and Connection Count

Designing a connection pool requires explicit decisions about where to accept trade-offs. There is no universally optimal configuration. The table below summarizes the trade-offs at each dimension.

Dimension Low connection count High connection count
Memory per connection Higher (larger message buffers) Lower (smaller per-connection buffers)
CPU overhead Lower (fewer event loop tasks) Higher (more context switching)
Failure blast radius Larger (more symbols per connection) Smaller (fewer symbols per connection)
Server-side limits Safer (fewer open connections) Risk of hitting provider limits
Rebalancing cost Lower frequency Higher frequency (more connections)
Recommended for Stable, predictable symbol sets Dynamic, bursty subscription patterns

Sizing Formula

A practical starting point for connection pool sizing:

optimal_connections = ceil(
    total_symbols / symbols_per_connection_target
) + failover_buffer

Where:

  • total_symbols: Maximum number of concurrent subscriptions
  • symbols_per_connection_target: Target symbols per connection (typically 20–50 depending on message rate)
  • failover_buffer: 1–2 connections for redundancy (handles one connection failure without rebalancing)

For 150 symbols with 25 symbols/connection target and 1 failover buffer:

optimal_connections = ceil(150 / 25) + 1 = 7 + 1 = 8 connections

Memory Per Connection

Each WebSocket connection maintains:

  • TCP receive buffer (typically 16–64 KB)
  • WebSocket frame buffer (varies by library)
  • Application-level message queue (unbounded in naive implementations)

A well-designed pool caps the per-connection queue depth and drops or backpressures when the limit is exceeded. The recommended maximum queue depth is 1,000 messages per connection. Beyond this, the memory growth becomes unbounded during processing spikes.

CPU Considerations

Python's asyncio event loop is single-threaded. Every await point is a context switch, but the GIL means only one task executes Python bytecode at a time. For message processing rates above 5,000 messages/second, Python's asyncio becomes a bottleneck.

Warning: If your use case requires sub-50ms processing latency at high message rates (50+ symbols, 20+ messages/second/symbol), consider:

  1. Offloading parsing to C extension: Use orjson for JSON parsing (10x faster than stdlib)
  2. Process-per-connection: Run each connection in its own process with shared memory for state
  3. Compiled language rewrite: Rust with tokio or Go with channels

TickDB Integration: Depth Channel at Scale

For order book depth data specifically, TickDB's WebSocket API supports the depth channel with up to 10 levels of order book depth per symbol. At scale, this creates a specific optimization opportunity: depth deltas.

The full depth snapshot (L1–L10) on every update is expensive. A 10-level depth update at 10 updates/second across 100 symbols generates 10,000 message parse operations per second. For production systems, consider:

  1. Subscribe at L1 only for high-frequency symbols (hot path)
  2. Subscribe at L5 or L10 for mid-frequency analysis
  3. Use the kline channel for lower-frequency strategies that do not need real-time depth
# Example: Tiered subscription strategy
async def setup_tiered_subscriptions(pool: WebSocketConnectionPool):
    """
    Tier 1: High-activity symbols — L1 depth, high-frequency updates
    Tier 2: Medium-activity symbols — L5 depth, medium-frequency
    Tier 3: Low-activity symbols — L1 depth, low-frequency
    """
    
    tier1_symbols = ["BTC.USDT", "ETH.USDT"]  # Crypto: high volume
    tier2_symbols = ["AAPL.US", "NVDA.US", "TSLA.US"]  # High-volume equities
    tier3_symbols = ["SPY.US", "QQQ.US"]  # ETFs: lower volume
    
    for symbol in tier1_symbols:
        await pool.subscribe(symbol, channel="depth", depth_level=1)
    
    for symbol in tier2_symbols:
        await pool.subscribe(symbol, channel="depth", depth_level=5)
    
    for symbol in tier3_symbols:
        await pool.subscribe(symbol, channel="depth", depth_level=1)

Conclusion

Scaling WebSocket subscriptions from one connection to a dynamic connection pool is not a feature addition. It is an architectural shift that touches every layer of your real-time data system: how messages are routed, how failures are contained, how load is measured, and how the system adapts to changing demand.

The implementation provided in this article covers the core patterns — dynamic scaling, health monitoring, rebalancing, and graceful reconnection — that form the foundation of a production-grade subscription architecture. Adapt it to your specific provider's API semantics, your language runtime's constraints, and your SLA requirements.

The 2 AM incident described at the opening of this article is preventable. It requires investing in the architecture before the crisis, not during it.

Next Steps

If you are building a real-time data pipeline and evaluating data providers, visit tickdb.ai to explore the WebSocket API, check connection limits, and review the depth channel documentation.

If you want to run this connection pool code against TickDB:

  1. Sign up at tickdb.ai (free, no credit card required)
  2. Generate an API key in the dashboard
  3. Set the TICKDB_API_KEY environment variable
  4. Copy the connection pool implementation above and customize the PoolConfig for your symbol count

If you are processing high-frequency order book data, consider the tickdb-market-data SKILL on ClawHub, which includes pre-built connection pooling templates optimized for TickDB's depth channel.

If you need institutional-grade data with 10+ years of historical OHLCV for backtesting your strategy before deploying the real-time pipeline, reach out to enterprise@tickdb.ai for volume pricing and SLA guarantees.


This article does not constitute investment advice. Real-time market data systems involve technical complexity; ensure thorough testing in a sandbox environment before production deployment.