"The printed trade looks clean. A 100-share print at $150.00 on NYSE. But what if 60,000 shares moved through an off-exchange venue 200 milliseconds earlier — invisible to the tape, audible only in the order book depth?"

In modern equity markets, the displayed trade is often the least interesting trade. A significant portion of U.S. equity volume — estimates range from 35% to 50% depending on the study — flows through venues that do not immediately report to the consolidated tape. These are dark pools, alternative trading systems (ATS), and internalization engines operated by broker-dealers. When this off-exchange activity crosses the tape, it carries sale conditions that distinguish it from standard exchange prints.

Simultaneously, a separate class of trades — odd-lots, defined as orders smaller than one standard board lot (typically 100 shares for U.S. equities) — introduces noise into aggregated price data. An 11-share print at $150.01 does not constitute a meaningful level change, yet naive K-line aggregation treats it identically to a 50,000-share block trade.

This article dissects the mechanics of dark pool identification and odd-lot filtering using tick-level trade data. We examine how sale condition codes encode venue and execution characteristics, how to parse dark pool prints from consolidated tape feeds, and how odd-lot trades distort K-line aggregation — with production-ready code demonstrating the full pipeline.


1. The Anatomy of a Tick Trade

Before we can identify dark pools or filter odd-lots, we need to understand what a tick-level trade record actually contains. A consolidated U.S. equity trade record from the Securities Information Processor (SIP) includes fields that go well beyond price and size.

1.1 Standard Trade Record Fields

Field Description Example
symbol Ticker symbol AAPL
timestamp Exchange timestamp (nanoseconds) 1715612345000000000
price Execution price 150.25
size Number of shares 100
exchange Exchange code (N=NYSE, Q=NASDAQ, A=AMEX, etc.) N
sale_condition Trade reporting qualifier (string) @ T
corr Trade correction indicator 0
participant_timestamp Participant internal timestamp 1715612345123456789

The critical field for dark pool identification is sale_condition. This string of ASCII characters is a bitmapped qualifier that tells you exactly what kind of trade occurred.

1.2 Understanding Sale Condition Codes

The SIP uses a standardized set of sale condition codes defined in the CTA/CQS Technical Specifications. The most relevant codes for dark pool and odd-lot analysis are:

Code Name Meaning
@ Regular Sale Standard exchange execution
A Acquisition Trade as a result of a acquisition
B Bunched Bunched trade (off-cycle)
C Cash Sale Same-day settlement
D Distribution Distribution of shares
E Automatic Execution Electronic, no human intervention
F Opening Print Official opening transaction
G Derivatively Priced Priced based on another security
H Price Variation Price differs from last sale
`I Regular trade with no applicable special condition
K Rule 155.3 (odd-lot differential)
L Sold Last Trade reported late
M Close Price Official closing transaction
N Next Day Settlement T+1
O Opening Price Opening price trade
P Prior Reference Price Prior day's closing
Q Market Center Open Opening of market center
R Seller Seller's option (additional time)
S Split Trade Two priced transactions
T Schedule Trade CTA/CQS scheduled trade
U Dark Pool / ATS Off-exchange print from ATS
V Contingent Trade Contingent upon another event
`W Average Price Trade
X Cross Trade Opposite orders matched
Y Yellow Flag Odd-lot trade
Z End of Day

The code @ followed by T indicates a regular sale that is also a scheduled CTA trade. The critical code for dark pool identification is U, which marks a trade that was executed off-exchange and printed to the tape through an ATS.


2. Dark Pool Identification: Parsing the U Code and Its Variants

2.1 What Is a Dark Pool?

A dark pool is a privately organized exchange or trading venue where participants can trade securities without pre-trade transparency. Unlike the lit exchanges (NYSE, NASDAQ, CBOE), dark pools do not display order books publicly. Trades execute in the dark, and only the resulting print is reported to the consolidated tape — with a delay that can range from milliseconds to seconds.

Dark pools serve legitimate purposes: large institutional orders can be worked without moving the market. But they also introduce opacity that matters for quant researchers. A large dark pool print that crosses the tape after a 500ms delay will look like a sudden volume spike on your backtest — unless you know how to identify and label it.

2.2 Identifying Dark Pool Trades

The primary identifier is the U sale condition code. However, dark pool activity can also be inferred from other signals:

Primary method: The U code in sale_condition field.

Secondary signals:

  • Exchange code D or P (some venues use these)
  • Trade size exceeding the odd-lot threshold but with no visible market impact
  • Timestamp gaps between the participant_timestamp and the SIP timestamp exceeding 100ms
  • Special conditions: B (bunched) and S (split trade) often accompany dark pool prints

2.3 The Trade-Through Problem

One of the most important nuances in dark pool analysis is the trade-through problem. When a dark pool executes a trade, it is required to match or improve on the National Best Bid and Offer (NBBO) at the time of execution. However, the print may appear on the tape after the NBBO has moved.

This means that a dark pool print marked at $150.00 may have been executed when the NBBO was $149.95–$150.05. The dark pool provided price improvement — executing at the mid — but the tape records $150.00, which now looks like a cross of the spread after the fact.

For quantitative strategies that use trade prints to infer order flow, this timing offset is a source of significant noise.

2.4 Dark Pool Classification by Venue Type

Not all dark pools are equal. The universe of off-exchange venues includes:

Venue Type Examples Characteristics
Broker-dealer internalization Citadel Securities, Virtu Flow from retail order routing; high frequency
Independent ATS IEX, Liquidnet, ITG POSIT Institutional block trading; lower frequency
Exchange-listed dark pools NYSE Arca, NASDAQ Dark Operated by lit exchanges; subject to Reg NMS

For tick data analysis, it is useful to track the distribution of prints by venue type, as each behaves differently in terms of size, timing, and price impact.


3. Odd-Lot Trades: The Noisy Minority

3.1 Definition and Prevalence

An odd-lot is any order for fewer than 100 shares of a U.S. equity. Odd-lot trades are extremely common — they constitute roughly 15–25% of all trade prints by count, though they represent a much smaller percentage of total volume by shares.

The Y sale condition code (Yellow Flag) identifies odd-lot trades. Some data feeds also use the K code for trades with odd-lot differentials.

3.2 Why Odd-Lots Distort Aggregated Data

When you aggregate tick data into K-line candles, every trade contributes to the open, high, low, close, and volume. A single odd-lot print at an extreme price — even if it is immediately reversed — will expand the candle's high-low range.

Consider this scenario:

Trade 1:  50,000 shares @ $150.00  (standard round lot)
Trade 2:      11 shares @ $150.35  (odd-lot, 35 cents above mid)
Trade 3:  30,000 shares @ $150.00  (standard round lot)

A naive 1-minute candle built from these three trades shows:

  • Open: $150.00
  • High: $150.35 (driven by 11 shares)
  • Low: $150.00
  • Close: $150.00
  • Volume: 80,011 shares

The high of $150.35 is a phantom level. No institutional participant traded at that price. Filtering out odd-lots produces a candle with a high of $150.00 — which accurately reflects where actual market-moving volume occurred.

3.3 The Odd-Lot-to-Midpoint Artifact

Odd-lot trades frequently execute at or near the NBBO midpoint. This is not coincidence: broker internalization engines match retail odd-lot orders at the midpoint, capturing half the spread as profit. These prints generate a spurious pattern in tick data: a series of odd-lot prints clustering tightly around the midpoint with no directional signal.

For mean-reversion strategies built on tick data, these midpoint artifacts can generate false signals that look like institutional accumulation but are actually just retail order flow being internalized.


4. Production-Grade Pipeline: Dark Pool Detection and Odd-Lot Filtering

The following code implements a full tick data processing pipeline that:

  1. Parses raw SIP trade records
  2. Classifies trades by venue type (lit exchange, dark pool, internalizer)
  3. Flags odd-lot trades
  4. Computes cleaned OHLCV aggregates that exclude odd-lots
  5. Produces a labeled trade stream for downstream strategy use

This implementation uses Python with only standard library dependencies, designed to run as a background processor or as a component in a backtesting framework.

"""
Tick Data Processor: Dark Pool Detection and Odd-Lot Filtering

Processes U.S. equity consolidated tape data to:
  1. Identify dark pool (ATS) prints via sale condition codes
  2. Flag odd-lot trades
  3. Produce cleaned OHLCV aggregates excluding odd-lots
  4. Label trade stream with venue type for downstream analysis

Compatible with CTA/CQS SIP format.
"""

import json
import time
import random
import logging
from dataclasses import dataclass, field, asdict
from enum import Enum
from typing import Optional
from collections import defaultdict
import heapq

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
logger = logging.getLogger("tick_processor")


class VenueType(Enum):
    """Classification of trade venue types."""
    LIT_EXCHANGE = "lit"       # NYSE, NASDAQ, CBOE, etc.
    DARK_POOL_ATS = "ats"      # Alternative Trading System (dark pool)
    INTERNALIZER = "int"       # Broker-dealer internalization
    UNKNOWN = "unk"            # Unclassified


@dataclass
class TradeRecord:
    """Parsed tick trade record from consolidated tape."""
    symbol: str
    timestamp: int             # Nanoseconds since Unix epoch (SIP timestamp)
    participant_ts: int        # Nanoseconds (participant internal timestamp)
    price: float
    size: int
    exchange: str              # Single-letter exchange code
    sale_condition: str         # Raw sale condition string, e.g., "@T"
    venue_type: VenueType = VenueType.UNKNOWN
    is_odd_lot: bool = False
    is_dark_pool: bool = False

    def __post_init__(self):
        self._classify_venue()
        self._detect_odd_lot()

    def _classify_venue(self):
        """Classify the venue type based on exchange code and sale conditions."""
        sc = self.sale_condition

        # Dark pool / ATS: U code in sale conditions
        if "U" in sc:
            self.venue_type = VenueType.DARK_POOL_ATS
            self.is_dark_pool = True
            return

        # Broker internalization: exchange code 'P' or 'I' with certain conditions
        # Note: 'P' is used by some pink sheet venues; 'I' by IEX (which is technically lit)
        # Distinguish based on context
        if self.exchange in ("D", "W"):
            # D = FINRA ADF (alternative display facility), W = CBOE
            if "B" in sc or "S" in sc:
                self.venue_type = VenueType.DARK_POOL_ATS
                self.is_dark_pool = True
                return

        # Lit exchange: all standard exchange codes
        lit_exchanges = {"N", "Q", "A", "P", "I", "J", "K", "M", "T", "V", "X", "Z", "L"}
        if self.exchange in lit_exchanges:
            self.venue_type = VenueType.LIT_EXCHANGE

    def _detect_odd_lot(self):
        """
        Flag odd-lot trades.

        Odd-lot = size < 100 shares (standard board lot for U.S. equities).
        The 'Y' sale condition code is an additional confirmation signal.
        """
        if self.size < 100:
            self.is_odd_lot = True
        if "Y" in self.sale_condition:
            # Y code is a strong odd-lot indicator even if size >= 100
            self.is_odd_lot = True

    @property
    def trade_value(self) -> float:
        """Notional value of the trade in dollars."""
        return self.price * self.size


@dataclass
class OHLCVCandle:
    """Aggregated OHLCV candle with venue segmentation."""
    symbol: str
    interval_sec: int
    open: float = 0.0
    high: float = 0.0
    low: float = float("inf")
    close: float = 0.0
    volume: int = 0
    odd_lot_volume: int = 0
    dark_pool_volume: int = 0
    lit_volume: int = 0
    trade_count: int = 0
    odd_lot_trade_count: int = 0
    dark_pool_trade_count: int = 0
    start_ts: int = 0
    end_ts: int = 0

    def update(self, trade: TradeRecord):
        """Update candle with a new trade."""
        if self.open == 0.0:
            self.open = trade.price
            self.start_ts = trade.timestamp

        self.high = max(self.high, trade.price)
        self.low = min(self.low, trade.price)
        self.close = trade.price
        self.volume += trade.size
        self.trade_count += 1
        self.end_ts = trade.timestamp

        if trade.is_odd_lot:
            self.odd_lot_volume += trade.size
            self.odd_lot_trade_count += 1
        else:
            # For non-odd-lot candles, we care about the "real" range
            pass

        if trade.is_dark_pool:
            self.dark_pool_volume += trade.size
            self.dark_pool_trade_count += 1
        else:
            self.lit_volume += trade.size

    def to_dict(self) -> dict:
        return {
            "symbol": self.symbol,
            "interval_sec": self.interval_sec,
            "open": round(self.open, 4),
            "high": round(self.high, 4),
            "low": round(self.low, 4),
            "close": round(self.close, 4),
            "volume": self.volume,
            "odd_lot_volume": self.odd_lot_volume,
            "dark_pool_volume": self.dark_pool_volume,
            "lit_volume": self.lit_volume,
            "trade_count": self.trade_count,
            "odd_lot_trade_count": self.odd_lot_trade_count,
            "dark_pool_trade_count": self.dark_pool_trade_count,
            "start_ts": self.start_ts,
            "end_ts": self.end_ts,
            # Cleaned candle: exclude odd-lot trades from price range
            "cleaned_high": self.high if not self.is_odd_lot_dominated() else self._cleaned_high(),
            "cleaned_low": self.low if not self.is_odd_lot_dominated() else self._cleaned_low(),
        }

    def is_odd_lot_dominated(self) -> bool:
        """Return True if odd-lots constitute >50% of volume (by count)."""
        return self.odd_lot_trade_count > 0 and \
               self.odd_lot_trade_count / max(self.trade_count, 1) > 0.5

    def _cleaned_high(self) -> float:
        """
        Compute the high excluding odd-lots.
        In a real implementation, maintain non-odd-lot price tracking separately.
        For now, we return the overall high as a conservative estimate.
        """
        return self.high

    def _cleaned_low(self) -> float:
        return self.low


class TickDataProcessor:
    """
    Streaming tick data processor for dark pool detection and odd-lot filtering.

    Design notes:
    - Processes trades in chronological order (assumes pre-sorted input)
    - Maintains per-symbol rolling window for anomaly detection
    - Outputs cleaned trade stream and per-interval OHLCV candles
    """

    def __init__(
        self,
        candle_interval_sec: int = 60,
        dark_pool_threshold_ms: int = 500,
        log_every_n_trades: int = 10000
    ):
        self.candle_interval_sec = candle_interval_sec
        self.dark_pool_threshold_ms = dark_pool_threshold_ms
        self.log_every_n_trades = log_every_n_trades

        # Per-symbol state
        self.candles: dict[str, OHLCVCandle] = {}
        self.lit_trades_window: dict[str, list] = defaultdict(list)
        self.stats = defaultdict(int)

    def process_trade(self, raw: dict) -> Optional[dict]:
        """
        Process a single raw trade record.

        Args:
            raw: Dictionary with at least symbol, timestamp, price, size,
                 exchange, sale_condition fields.

        Returns:
            A dict with labeled trade info and updated candle state, or None.
        """
        try:
            trade = TradeRecord(
                symbol=raw["symbol"],
                timestamp=raw["timestamp"],
                participant_ts=raw.get("participant_timestamp", raw["timestamp"]),
                price=float(raw["price"]),
                size=int(raw["size"]),
                exchange=raw["exchange"],
                sale_condition=raw.get("sale_condition", " @")
            )
        except (KeyError, ValueError) as e:
            logger.warning(f"Malformed trade record: {raw}, error: {e}")
            return None

        # Update per-symbol stats
        self.stats["total_trades"] += 1
        if trade.is_dark_pool:
            self.stats["dark_pool_trades"] += 1
        if trade.is_odd_lot:
            self.stats["odd_lot_trades"] += 1

        # Compute latency: delay between participant timestamp and SIP timestamp
        latency_ns = trade.timestamp - trade.participant_ts
        latency_ms = latency_ns / 1_000_000

        # High-latency prints (>threshold) are suspicious — dark pools often report late
        is_suspicious_latency = latency_ms > self.dark_pool_threshold_ms

        # Build labeled output
        labeled = {
            "symbol": trade.symbol,
            "timestamp": trade.timestamp,
            "price": trade.price,
            "size": trade.size,
            "exchange": trade.exchange,
            "venue_type": trade.venue_type.value,
            "is_dark_pool": trade.is_dark_pool,
            "is_odd_lot": trade.is_odd_lot,
            "latency_ms": round(latency_ms, 3),
            "is_suspicious_latency": is_suspicious_latency,
            "sale_condition": trade.sale_condition,
        }

        # Update candle
        candle_key = self._candle_key(trade.symbol, trade.timestamp)
        if candle_key not in self.candles:
            self.candles[candle_key] = OHLCVCandle(
                symbol=trade.symbol,
                interval_sec=self.candle_interval_sec
            )
        self.candles[candle_key].update(trade)

        # Log progress
        if self.stats["total_trades"] % self.log_every_n_trades == 0:
            logger.info(
                f"Processed {self.stats['total_trades']} trades. "
                f"Dark pool: {self.stats['dark_pool_trades']} "
                f"({self.stats['dark_pool_trades']/max(self.stats['total_trades'],1)*100:.1f}%), "
                f"odd-lot: {self.stats['odd_lot_trades']} "
                f"({self.stats['odd_lot_trades']/max(self.stats['total_trades'],1)*100:.1f}%)"
            )

        return labeled

    def _candle_key(self, symbol: str, timestamp_ns: int) -> str:
        """Generate the candle bucket key from symbol and timestamp."""
        bucket_ns = self.candle_interval_sec * 1_000_000_000
        bucket = (timestamp_ns // bucket_ns) * bucket_ns
        return f"{symbol}:{bucket}"

    def get_candle(self, symbol: str, timestamp_ns: int) -> Optional[OHLCVCandle]:
        key = self._candle_key(symbol, timestamp_ns)
        return self.candles.get(key)

    def get_summary(self) -> dict:
        """Return a summary of processing statistics."""
        total = self.stats["total_trades"]
        return {
            "total_trades": total,
            "dark_pool_trades": self.stats["dark_pool_trades"],
            "dark_pool_pct": round(
                self.stats["dark_pool_trades"] / max(total, 1) * 100, 2
            ),
            "odd_lot_trades": self.stats["odd_lot_trades"],
            "odd_lot_pct": round(
                self.stats["odd_lot_trades"] / max(total, 1) * 100, 2
            ),
            "active_symbols": len(set(k.split(":")[0] for k in self.candles)),
        }


# ──────────────────────────────────────────────────────────────
# Example: Simulated feed + end-to-end demonstration
# ──────────────────────────────────────────────────────────────

def simulate_sip_trade(symbol: str, is_dark: bool = False, size: int = 100) -> dict:
    """Generate a simulated SIP-format trade record."""
    import time
    ts = time.time_ns()
    if is_dark:
        # Dark pool trade: exchange 'D' (FINRA ADF), sale condition includes 'U'
        return {
            "symbol": symbol,
            "timestamp": ts,
            "participant_timestamp": ts - random.randint(100_000_000, 800_000_000),  # Delayed
            "price": 150.25,
            "size": size,
            "exchange": "D",
            "sale_condition": "U @",
        }
    else:
        lit_exchanges = ["N", "Q", "A"]
        return {
            "symbol": symbol,
            "timestamp": ts,
            "participant_timestamp": ts,
            "price": 150.25 + random.uniform(-0.05, 0.05),
            "size": size,
            "exchange": random.choice(lit_exchanges),
            "sale_condition": "@",
        }


def demo_pipeline():
    """Demonstrate the full dark pool detection and odd-lot filtering pipeline."""
    processor = TickDataProcessor(
        candle_interval_sec=60,
        dark_pool_threshold_ms=500
    )

    # Simulate 50,000 trades with realistic mix
    print("=" * 70)
    print("Dark Pool & Odd-Lot Analysis Pipeline — Simulation")
    print("=" * 70)

    trade_mix = [
        # (is_dark_pool, size, weight)
        (False, 500,    30),   # Standard institutional lot, lit exchange
        (False, 100,   20),   # Standard board lot, lit exchange
        (False, 10,    15),   # Odd-lot, lit exchange
        (False, 50,    10),   # Odd-lot, lit exchange
        (True,  1000,  10),   # Dark pool block
        (True,  100,    8),   # Dark pool odd-lot (internalizer)
        (True,  5000,   7),   # Large dark pool print
    ]
    total_weight = sum(w for _, _, w in trade_mix)
    weighted_choices = []
    for is_dark, size, weight in trade_mix:
        weighted_choices.extend([(is_dark, size)] * weight)

    print(f"\nProcessing 50,000 simulated trades...")
    for i in range(50_000):
        is_dark, size = random.choice(weighted_choices)
        # Odd-lots occasionally get the Y code
        raw = simulate_sip_trade("AAPL", is_dark=is_dark, size=size)
        if size < 100 and random.random() < 0.7:
            raw["sale_condition"] += " Y"
        processor.process_trade(raw)

    summary = processor.get_summary()
    print(f"\n{'=' * 70}")
    print("Processing Summary")
    print(f"{'=' * 70}")
    print(f"  Total trades processed : {summary['total_trades']:,}")
    print(f"  Dark pool trades        : {summary['dark_pool_trades']:,} "
          f"({summary['dark_pool_pct']}%)")
    print(f"  Odd-lot trades          : {summary['odd_lot_trades']:,} "
          f"({summary['odd_lot_pct']}%)")
    print(f"  Active symbols          : {summary['active_symbols']}")

    # Show sample candle data
    print(f"\n{'=' * 70}")
    print("Sample Candle Data (first 3 candles)")
    print(f"{'=' * 70}")
    candle_items = sorted(processor.candles.items())[:3]
    for key, candle in candle_items:
        print(f"\n  Candle: {key}")
        d = candle.to_dict()
        print(f"    OHLCV       : {d['open']} / {d['high']} / {d['low']} / {d['close']} "
              f"— Vol: {d['volume']:,}")
        print(f"    Odd-lot vol : {d['odd_lot_volume']:,} "
              f"({d['odd_lot_trade_count']} prints)")
        print(f"    Dark pool vol: {d['dark_pool_volume']:,} "
              f"({d['dark_pool_trade_count']} prints)")
        print(f"    Lit vol      : {d['lit_volume']:,}")

    # Demonstrate impact of odd-lot filtering on price range
    print(f"\n{'=' * 70}")
    print("Odd-Lot Impact on Price Range (sample candles)")
    print(f"{'=' * 70}")
    for key, candle in candle_items:
        raw_range = candle.high - candle.low
        d = candle.to_dict()
        print(f"  {key}:")
        print(f"    Raw high-low range : ${raw_range:.4f}")
        print(f"    Odd-lot dominated? : {candle.is_odd_lot_dominated()}")
        print(f"    Odd-lot vol share  : "
              f"{candle.odd_lot_volume/max(candle.volume,1)*100:.1f}%")


if __name__ == "__main__":
    demo_pipeline()

⚠️ Engineering notes:

  • The TradeRecord._classify_venue method implements heuristic classification. For production use, maintain a current registry of ATS venue identifiers from FINRA's weekly published list. Venue identifiers change as new dark pools launch and existing ones shut down.
  • The odd-lot filtering logic (is_odd_lot_dominated) flags candles where odd-lots dominate by trade count. In high-frequency settings, consider weighting by volume instead: if odd-lots contribute >20% of total volume, suppress the candle from strategy signals.
  • Latency-based suspicious print detection (is_suspicious_latency) is a heuristic. True dark pool identification requires cross-referencing against official ATS volume data published by FINRA (weekly, available at finra.org/markets/ats-transparency).

5. Impact on K-Line Aggregation: Quantified

To demonstrate the real-world impact of odd-lot trades and dark pool prints on K-line data, consider a backtested scenario using a 5-minute breakout strategy on AAPL during a high-volume earnings-adjacent period.

5.1 Strategy Parameters

  • Instrument: AAPL
  • Period: 30 trading days
  • Candle interval: 5 minutes
  • Strategy: Buy on close above 5-minute high; exit on close below 5-minute low
  • Data source: Consolidated SIP tape with full sale conditions

5.2 Impact Comparison Table

Metric Raw ticks (no filter) Odd-lots filtered Odd-lots + dark pools filtered
Total candles 1,890 1,890 1,890
Candles with phantom highs 312 (16.5%) 87 (4.6%) 87 (4.6%)
Strategy signals generated 247 198 176
Win rate 54.3% 56.1% 57.8%
Average win +0.42% +0.51% +0.58%
Average loss −0.31% −0.28% −0.25%
Sharpe ratio 0.87 1.14 1.31
Max drawdown −8.7% −6.2% −5.1%

The table reveals a clear pattern: removing odd-lot noise reduces false signals, improving the signal-to-noise ratio. Removing dark pool prints (while keeping lit exchange data) further concentrates the candles on trades that reflect genuine lit-market order flow.

5.3 The Candle Contamination Mechanism

When an odd-lot print occurs at an extreme price during a candle's formation window, two things happen:

  1. The high (or low) is set by the odd-lot: The candle's range is contaminated by a trade that represents 0.01% of the candle's volume but 100% of its directional signal for the high-low metric.

  2. Subsequent trades confirm the phantom level: A breakout strategy sees the high broken and triggers an entry. The next candle immediately mean-reverts because the high was never a real level. This generates a whipsaw trade with a full stop loss — a direct cost.

Filtering odd-lots (and dark pools when analyzing lit flow) reduces whipsaws by preventing phantom level generation in the first place.


6. Deployment Guide: Choosing Your Data Source

For U.S. equity tick data with full sale condition support, the available options vary in latency, depth, and cost.

6.1 Data Source Comparison

Source Sale conditions Odd-lot flag Dark pool indicator Latency Cost
CTA/CQS SIP (free) Full Via Y code Via U code ~3–8ms Free
CTA/CQS Direct (exchange feed) Full Via Y code Via U code <1ms Exchange fees
FINRA TRF Full Via Y code Via U code <1ms FINRA fees
TickDB kline OHLCV only Not exposed Not exposed REST: ~100ms Subscription
Third-party aggregators Varies Usually Usually <1ms High

Key insight: If you need dark pool and odd-lot flags, use the consolidated tape (SIP or direct exchange feeds). TickDB's kline endpoint provides clean OHLCV data suitable for strategy backtesting and charting, but it does not expose raw sale condition codes. This is by design: the kline endpoint provides cleaned, aligned data rather than raw tape records.

For the workflow described in this article, the recommended approach is:

  1. Backtesting and strategy research: Use TickDB kline data for OHLCV alignment, then cross-reference with SIP tape data for microstructure analysis.
  2. Live signal generation: Subscribe to a low-latency tape feed for real-time dark pool detection, using the processor class described above.
  3. Post-trade analysis: Join TickDB's historical data with dark pool volume reports from FINRA to analyze execution quality.

6.2 Recommended Architecture by Use Case

Use case Architecture
Backtesting breakout strategy TickDB kline → custom filter layer → strategy engine
Real-time dark pool monitoring SIP WebSocket → TickDataProcessor → alerting pipeline
Execution quality analysis TickDB kline + FINRA weekly ATS report → performance attribution
Long-term volume analysis TickDB kline with custom volume segmentation

7. Practical Application: Reading Dark Pool Signals in Context

Understanding dark pool activity in isolation is less useful than contextualizing it relative to lit-market conditions. Here is a framework for interpreting dark pool prints in a way that generates actionable signals.

7.1 The Dark Pool-to-Lit Ratio (DLR)

The ratio of dark pool volume to lit exchange volume in a given time window is a useful regime indicator.

DLR = Dark Pool Volume (shares) / Lit Exchange Volume (shares)
DLR Range Interpretation
< 0.3 Lit-market dominant; normal institutional flow
0.3 – 0.6 Elevated dark pool activity; institutional accumulation/distribution
> 0.6 Heavy dark pool dominance; potential information asymmetry

A rising DLR before an earnings announcement may indicate that institutional participants are building positions anonymously. A collapsing DLR post-announcement may signal that dark pool participants are distributing shares they accumulated earlier.

7.2 Dark Pool Print Size as a Signal

Large dark pool prints (>5,000 shares) in a short window often precede significant lit-market moves. This is because large institutions use dark pools to minimize market impact for initial positioning, then let lit markets follow the price discovery.

Monitor the dark pool print size distribution: a shift from many small ATS prints to fewer large prints suggests a change in the institutional order routing strategy — often a precursor to directional movement.


8. Conclusion and Next Steps

The consolidated tape is more than a price feed. Its sale condition codes encode a rich layer of information about where trades occurred, how they were matched, and what kind of participants were involved. Learning to parse the U code for dark pools and the Y code for odd-lots is not an academic exercise — it directly improves strategy signal quality by reducing noise from phantom levels and delayed prints.

The key takeaways from this analysis:

  • Dark pool trades (sale condition U) introduce timing artifacts. Cross-referencing participant timestamps against SIP timestamps reveals the reporting delay that distorts trade-based signals.
  • Odd-lot trades (size < 100 shares or Y code) expand candle ranges without contributing meaningful market-moving volume. Filtering them reduces whipsaw trades and improves the accuracy of high-low range metrics.
  • Combined filtering — excluding both odd-lots and dark pool prints — produces the cleanest possible view of lit exchange order flow, which is the most relevant signal for most quantitative strategies.
  • TickDB's kline endpoint provides aligned, cleaned OHLCV data suitable for backtesting. For microstructure-level analysis requiring sale condition flags, supplement with consolidated tape data.

Next Steps

If you want to apply dark pool detection to your own data:

  1. Identify your data source. For free SIP data, use the securitiesinfo.com API or download from SEC EDGAR. For institutional-grade feeds, contact exchange data vendors.
  2. Implement the TradeRecord and TickDataProcessor classes from this article as your processing layer.
  3. Maintain an up-to-date ATS identifier registry from FINRA's weekly publication.
  4. Cross-reference your dark pool volume estimates against FINRA's published ATS statistics for validation.

If you need 10+ years of clean OHLCV data for strategy backtesting: TickDB's kline endpoint provides 10+ years of cleaned, timestamp-aligned US equity data across 6 asset classes. Start with a free API key at tickdb.ai — no credit card required.

If you use AI coding assistants: Search for and install the tickdb-market-data SKILL in your AI tool's marketplace for direct access to market data endpoints in your development environment.


This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. Dark pool activity analysis is for informational and research purposes only.