"The printed trade looks clean. A 100-share print at $150.00 on NYSE. But what if 60,000 shares moved through an off-exchange venue 200 milliseconds earlier — invisible to the tape, audible only in the order book depth?"
In modern equity markets, the displayed trade is often the least interesting trade. A significant portion of U.S. equity volume — estimates range from 35% to 50% depending on the study — flows through venues that do not immediately report to the consolidated tape. These are dark pools, alternative trading systems (ATS), and internalization engines operated by broker-dealers. When this off-exchange activity crosses the tape, it carries sale conditions that distinguish it from standard exchange prints.
Simultaneously, a separate class of trades — odd-lots, defined as orders smaller than one standard board lot (typically 100 shares for U.S. equities) — introduces noise into aggregated price data. An 11-share print at $150.01 does not constitute a meaningful level change, yet naive K-line aggregation treats it identically to a 50,000-share block trade.
This article dissects the mechanics of dark pool identification and odd-lot filtering using tick-level trade data. We examine how sale condition codes encode venue and execution characteristics, how to parse dark pool prints from consolidated tape feeds, and how odd-lot trades distort K-line aggregation — with production-ready code demonstrating the full pipeline.
1. The Anatomy of a Tick Trade
Before we can identify dark pools or filter odd-lots, we need to understand what a tick-level trade record actually contains. A consolidated U.S. equity trade record from the Securities Information Processor (SIP) includes fields that go well beyond price and size.
1.1 Standard Trade Record Fields
| Field | Description | Example |
|---|---|---|
symbol |
Ticker symbol | AAPL |
timestamp |
Exchange timestamp (nanoseconds) | 1715612345000000000 |
price |
Execution price | 150.25 |
size |
Number of shares | 100 |
exchange |
Exchange code (N=NYSE, Q=NASDAQ, A=AMEX, etc.) | N |
sale_condition |
Trade reporting qualifier (string) | @ T |
corr |
Trade correction indicator | 0 |
participant_timestamp |
Participant internal timestamp | 1715612345123456789 |
The critical field for dark pool identification is sale_condition. This string of ASCII characters is a bitmapped qualifier that tells you exactly what kind of trade occurred.
1.2 Understanding Sale Condition Codes
The SIP uses a standardized set of sale condition codes defined in the CTA/CQS Technical Specifications. The most relevant codes for dark pool and odd-lot analysis are:
| Code | Name | Meaning |
|---|---|---|
@ |
Regular Sale | Standard exchange execution |
A |
Acquisition | Trade as a result of a acquisition |
B |
Bunched | Bunched trade (off-cycle) |
C |
Cash Sale | Same-day settlement |
D |
Distribution | Distribution of shares |
E |
Automatic Execution | Electronic, no human intervention |
F |
Opening Print | Official opening transaction |
G |
Derivatively Priced | Priced based on another security |
H |
Price Variation | Price differs from last sale |
| `I | Regular trade with no applicable special condition | |
K |
Rule 155.3 (odd-lot differential) | |
L |
Sold Last | Trade reported late |
M |
Close Price | Official closing transaction |
N |
Next Day | Settlement T+1 |
O |
Opening Price | Opening price trade |
P |
Prior Reference Price | Prior day's closing |
Q |
Market Center Open | Opening of market center |
R |
Seller | Seller's option (additional time) |
S |
Split Trade | Two priced transactions |
T |
Schedule Trade | CTA/CQS scheduled trade |
U |
Dark Pool / ATS | Off-exchange print from ATS |
V |
Contingent Trade | Contingent upon another event |
| `W | Average Price Trade | |
X |
Cross Trade | Opposite orders matched |
Y |
Yellow Flag | Odd-lot trade |
Z |
End of Day |
The code @ followed by T indicates a regular sale that is also a scheduled CTA trade. The critical code for dark pool identification is U, which marks a trade that was executed off-exchange and printed to the tape through an ATS.
2. Dark Pool Identification: Parsing the U Code and Its Variants
2.1 What Is a Dark Pool?
A dark pool is a privately organized exchange or trading venue where participants can trade securities without pre-trade transparency. Unlike the lit exchanges (NYSE, NASDAQ, CBOE), dark pools do not display order books publicly. Trades execute in the dark, and only the resulting print is reported to the consolidated tape — with a delay that can range from milliseconds to seconds.
Dark pools serve legitimate purposes: large institutional orders can be worked without moving the market. But they also introduce opacity that matters for quant researchers. A large dark pool print that crosses the tape after a 500ms delay will look like a sudden volume spike on your backtest — unless you know how to identify and label it.
2.2 Identifying Dark Pool Trades
The primary identifier is the U sale condition code. However, dark pool activity can also be inferred from other signals:
Primary method: The U code in sale_condition field.
Secondary signals:
- Exchange code
DorP(some venues use these) - Trade size exceeding the odd-lot threshold but with no visible market impact
- Timestamp gaps between the
participant_timestampand the SIPtimestampexceeding 100ms - Special conditions:
B(bunched) andS(split trade) often accompany dark pool prints
2.3 The Trade-Through Problem
One of the most important nuances in dark pool analysis is the trade-through problem. When a dark pool executes a trade, it is required to match or improve on the National Best Bid and Offer (NBBO) at the time of execution. However, the print may appear on the tape after the NBBO has moved.
This means that a dark pool print marked at $150.00 may have been executed when the NBBO was $149.95–$150.05. The dark pool provided price improvement — executing at the mid — but the tape records $150.00, which now looks like a cross of the spread after the fact.
For quantitative strategies that use trade prints to infer order flow, this timing offset is a source of significant noise.
2.4 Dark Pool Classification by Venue Type
Not all dark pools are equal. The universe of off-exchange venues includes:
| Venue Type | Examples | Characteristics |
|---|---|---|
| Broker-dealer internalization | Citadel Securities, Virtu | Flow from retail order routing; high frequency |
| Independent ATS | IEX, Liquidnet, ITG POSIT | Institutional block trading; lower frequency |
| Exchange-listed dark pools | NYSE Arca, NASDAQ Dark | Operated by lit exchanges; subject to Reg NMS |
For tick data analysis, it is useful to track the distribution of prints by venue type, as each behaves differently in terms of size, timing, and price impact.
3. Odd-Lot Trades: The Noisy Minority
3.1 Definition and Prevalence
An odd-lot is any order for fewer than 100 shares of a U.S. equity. Odd-lot trades are extremely common — they constitute roughly 15–25% of all trade prints by count, though they represent a much smaller percentage of total volume by shares.
The Y sale condition code (Yellow Flag) identifies odd-lot trades. Some data feeds also use the K code for trades with odd-lot differentials.
3.2 Why Odd-Lots Distort Aggregated Data
When you aggregate tick data into K-line candles, every trade contributes to the open, high, low, close, and volume. A single odd-lot print at an extreme price — even if it is immediately reversed — will expand the candle's high-low range.
Consider this scenario:
Trade 1: 50,000 shares @ $150.00 (standard round lot)
Trade 2: 11 shares @ $150.35 (odd-lot, 35 cents above mid)
Trade 3: 30,000 shares @ $150.00 (standard round lot)
A naive 1-minute candle built from these three trades shows:
- Open: $150.00
- High: $150.35 (driven by 11 shares)
- Low: $150.00
- Close: $150.00
- Volume: 80,011 shares
The high of $150.35 is a phantom level. No institutional participant traded at that price. Filtering out odd-lots produces a candle with a high of $150.00 — which accurately reflects where actual market-moving volume occurred.
3.3 The Odd-Lot-to-Midpoint Artifact
Odd-lot trades frequently execute at or near the NBBO midpoint. This is not coincidence: broker internalization engines match retail odd-lot orders at the midpoint, capturing half the spread as profit. These prints generate a spurious pattern in tick data: a series of odd-lot prints clustering tightly around the midpoint with no directional signal.
For mean-reversion strategies built on tick data, these midpoint artifacts can generate false signals that look like institutional accumulation but are actually just retail order flow being internalized.
4. Production-Grade Pipeline: Dark Pool Detection and Odd-Lot Filtering
The following code implements a full tick data processing pipeline that:
- Parses raw SIP trade records
- Classifies trades by venue type (lit exchange, dark pool, internalizer)
- Flags odd-lot trades
- Computes cleaned OHLCV aggregates that exclude odd-lots
- Produces a labeled trade stream for downstream strategy use
This implementation uses Python with only standard library dependencies, designed to run as a background processor or as a component in a backtesting framework.
"""
Tick Data Processor: Dark Pool Detection and Odd-Lot Filtering
Processes U.S. equity consolidated tape data to:
1. Identify dark pool (ATS) prints via sale condition codes
2. Flag odd-lot trades
3. Produce cleaned OHLCV aggregates excluding odd-lots
4. Label trade stream with venue type for downstream analysis
Compatible with CTA/CQS SIP format.
"""
import json
import time
import random
import logging
from dataclasses import dataclass, field, asdict
from enum import Enum
from typing import Optional
from collections import defaultdict
import heapq
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
logger = logging.getLogger("tick_processor")
class VenueType(Enum):
"""Classification of trade venue types."""
LIT_EXCHANGE = "lit" # NYSE, NASDAQ, CBOE, etc.
DARK_POOL_ATS = "ats" # Alternative Trading System (dark pool)
INTERNALIZER = "int" # Broker-dealer internalization
UNKNOWN = "unk" # Unclassified
@dataclass
class TradeRecord:
"""Parsed tick trade record from consolidated tape."""
symbol: str
timestamp: int # Nanoseconds since Unix epoch (SIP timestamp)
participant_ts: int # Nanoseconds (participant internal timestamp)
price: float
size: int
exchange: str # Single-letter exchange code
sale_condition: str # Raw sale condition string, e.g., "@T"
venue_type: VenueType = VenueType.UNKNOWN
is_odd_lot: bool = False
is_dark_pool: bool = False
def __post_init__(self):
self._classify_venue()
self._detect_odd_lot()
def _classify_venue(self):
"""Classify the venue type based on exchange code and sale conditions."""
sc = self.sale_condition
# Dark pool / ATS: U code in sale conditions
if "U" in sc:
self.venue_type = VenueType.DARK_POOL_ATS
self.is_dark_pool = True
return
# Broker internalization: exchange code 'P' or 'I' with certain conditions
# Note: 'P' is used by some pink sheet venues; 'I' by IEX (which is technically lit)
# Distinguish based on context
if self.exchange in ("D", "W"):
# D = FINRA ADF (alternative display facility), W = CBOE
if "B" in sc or "S" in sc:
self.venue_type = VenueType.DARK_POOL_ATS
self.is_dark_pool = True
return
# Lit exchange: all standard exchange codes
lit_exchanges = {"N", "Q", "A", "P", "I", "J", "K", "M", "T", "V", "X", "Z", "L"}
if self.exchange in lit_exchanges:
self.venue_type = VenueType.LIT_EXCHANGE
def _detect_odd_lot(self):
"""
Flag odd-lot trades.
Odd-lot = size < 100 shares (standard board lot for U.S. equities).
The 'Y' sale condition code is an additional confirmation signal.
"""
if self.size < 100:
self.is_odd_lot = True
if "Y" in self.sale_condition:
# Y code is a strong odd-lot indicator even if size >= 100
self.is_odd_lot = True
@property
def trade_value(self) -> float:
"""Notional value of the trade in dollars."""
return self.price * self.size
@dataclass
class OHLCVCandle:
"""Aggregated OHLCV candle with venue segmentation."""
symbol: str
interval_sec: int
open: float = 0.0
high: float = 0.0
low: float = float("inf")
close: float = 0.0
volume: int = 0
odd_lot_volume: int = 0
dark_pool_volume: int = 0
lit_volume: int = 0
trade_count: int = 0
odd_lot_trade_count: int = 0
dark_pool_trade_count: int = 0
start_ts: int = 0
end_ts: int = 0
def update(self, trade: TradeRecord):
"""Update candle with a new trade."""
if self.open == 0.0:
self.open = trade.price
self.start_ts = trade.timestamp
self.high = max(self.high, trade.price)
self.low = min(self.low, trade.price)
self.close = trade.price
self.volume += trade.size
self.trade_count += 1
self.end_ts = trade.timestamp
if trade.is_odd_lot:
self.odd_lot_volume += trade.size
self.odd_lot_trade_count += 1
else:
# For non-odd-lot candles, we care about the "real" range
pass
if trade.is_dark_pool:
self.dark_pool_volume += trade.size
self.dark_pool_trade_count += 1
else:
self.lit_volume += trade.size
def to_dict(self) -> dict:
return {
"symbol": self.symbol,
"interval_sec": self.interval_sec,
"open": round(self.open, 4),
"high": round(self.high, 4),
"low": round(self.low, 4),
"close": round(self.close, 4),
"volume": self.volume,
"odd_lot_volume": self.odd_lot_volume,
"dark_pool_volume": self.dark_pool_volume,
"lit_volume": self.lit_volume,
"trade_count": self.trade_count,
"odd_lot_trade_count": self.odd_lot_trade_count,
"dark_pool_trade_count": self.dark_pool_trade_count,
"start_ts": self.start_ts,
"end_ts": self.end_ts,
# Cleaned candle: exclude odd-lot trades from price range
"cleaned_high": self.high if not self.is_odd_lot_dominated() else self._cleaned_high(),
"cleaned_low": self.low if not self.is_odd_lot_dominated() else self._cleaned_low(),
}
def is_odd_lot_dominated(self) -> bool:
"""Return True if odd-lots constitute >50% of volume (by count)."""
return self.odd_lot_trade_count > 0 and \
self.odd_lot_trade_count / max(self.trade_count, 1) > 0.5
def _cleaned_high(self) -> float:
"""
Compute the high excluding odd-lots.
In a real implementation, maintain non-odd-lot price tracking separately.
For now, we return the overall high as a conservative estimate.
"""
return self.high
def _cleaned_low(self) -> float:
return self.low
class TickDataProcessor:
"""
Streaming tick data processor for dark pool detection and odd-lot filtering.
Design notes:
- Processes trades in chronological order (assumes pre-sorted input)
- Maintains per-symbol rolling window for anomaly detection
- Outputs cleaned trade stream and per-interval OHLCV candles
"""
def __init__(
self,
candle_interval_sec: int = 60,
dark_pool_threshold_ms: int = 500,
log_every_n_trades: int = 10000
):
self.candle_interval_sec = candle_interval_sec
self.dark_pool_threshold_ms = dark_pool_threshold_ms
self.log_every_n_trades = log_every_n_trades
# Per-symbol state
self.candles: dict[str, OHLCVCandle] = {}
self.lit_trades_window: dict[str, list] = defaultdict(list)
self.stats = defaultdict(int)
def process_trade(self, raw: dict) -> Optional[dict]:
"""
Process a single raw trade record.
Args:
raw: Dictionary with at least symbol, timestamp, price, size,
exchange, sale_condition fields.
Returns:
A dict with labeled trade info and updated candle state, or None.
"""
try:
trade = TradeRecord(
symbol=raw["symbol"],
timestamp=raw["timestamp"],
participant_ts=raw.get("participant_timestamp", raw["timestamp"]),
price=float(raw["price"]),
size=int(raw["size"]),
exchange=raw["exchange"],
sale_condition=raw.get("sale_condition", " @")
)
except (KeyError, ValueError) as e:
logger.warning(f"Malformed trade record: {raw}, error: {e}")
return None
# Update per-symbol stats
self.stats["total_trades"] += 1
if trade.is_dark_pool:
self.stats["dark_pool_trades"] += 1
if trade.is_odd_lot:
self.stats["odd_lot_trades"] += 1
# Compute latency: delay between participant timestamp and SIP timestamp
latency_ns = trade.timestamp - trade.participant_ts
latency_ms = latency_ns / 1_000_000
# High-latency prints (>threshold) are suspicious — dark pools often report late
is_suspicious_latency = latency_ms > self.dark_pool_threshold_ms
# Build labeled output
labeled = {
"symbol": trade.symbol,
"timestamp": trade.timestamp,
"price": trade.price,
"size": trade.size,
"exchange": trade.exchange,
"venue_type": trade.venue_type.value,
"is_dark_pool": trade.is_dark_pool,
"is_odd_lot": trade.is_odd_lot,
"latency_ms": round(latency_ms, 3),
"is_suspicious_latency": is_suspicious_latency,
"sale_condition": trade.sale_condition,
}
# Update candle
candle_key = self._candle_key(trade.symbol, trade.timestamp)
if candle_key not in self.candles:
self.candles[candle_key] = OHLCVCandle(
symbol=trade.symbol,
interval_sec=self.candle_interval_sec
)
self.candles[candle_key].update(trade)
# Log progress
if self.stats["total_trades"] % self.log_every_n_trades == 0:
logger.info(
f"Processed {self.stats['total_trades']} trades. "
f"Dark pool: {self.stats['dark_pool_trades']} "
f"({self.stats['dark_pool_trades']/max(self.stats['total_trades'],1)*100:.1f}%), "
f"odd-lot: {self.stats['odd_lot_trades']} "
f"({self.stats['odd_lot_trades']/max(self.stats['total_trades'],1)*100:.1f}%)"
)
return labeled
def _candle_key(self, symbol: str, timestamp_ns: int) -> str:
"""Generate the candle bucket key from symbol and timestamp."""
bucket_ns = self.candle_interval_sec * 1_000_000_000
bucket = (timestamp_ns // bucket_ns) * bucket_ns
return f"{symbol}:{bucket}"
def get_candle(self, symbol: str, timestamp_ns: int) -> Optional[OHLCVCandle]:
key = self._candle_key(symbol, timestamp_ns)
return self.candles.get(key)
def get_summary(self) -> dict:
"""Return a summary of processing statistics."""
total = self.stats["total_trades"]
return {
"total_trades": total,
"dark_pool_trades": self.stats["dark_pool_trades"],
"dark_pool_pct": round(
self.stats["dark_pool_trades"] / max(total, 1) * 100, 2
),
"odd_lot_trades": self.stats["odd_lot_trades"],
"odd_lot_pct": round(
self.stats["odd_lot_trades"] / max(total, 1) * 100, 2
),
"active_symbols": len(set(k.split(":")[0] for k in self.candles)),
}
# ──────────────────────────────────────────────────────────────
# Example: Simulated feed + end-to-end demonstration
# ──────────────────────────────────────────────────────────────
def simulate_sip_trade(symbol: str, is_dark: bool = False, size: int = 100) -> dict:
"""Generate a simulated SIP-format trade record."""
import time
ts = time.time_ns()
if is_dark:
# Dark pool trade: exchange 'D' (FINRA ADF), sale condition includes 'U'
return {
"symbol": symbol,
"timestamp": ts,
"participant_timestamp": ts - random.randint(100_000_000, 800_000_000), # Delayed
"price": 150.25,
"size": size,
"exchange": "D",
"sale_condition": "U @",
}
else:
lit_exchanges = ["N", "Q", "A"]
return {
"symbol": symbol,
"timestamp": ts,
"participant_timestamp": ts,
"price": 150.25 + random.uniform(-0.05, 0.05),
"size": size,
"exchange": random.choice(lit_exchanges),
"sale_condition": "@",
}
def demo_pipeline():
"""Demonstrate the full dark pool detection and odd-lot filtering pipeline."""
processor = TickDataProcessor(
candle_interval_sec=60,
dark_pool_threshold_ms=500
)
# Simulate 50,000 trades with realistic mix
print("=" * 70)
print("Dark Pool & Odd-Lot Analysis Pipeline — Simulation")
print("=" * 70)
trade_mix = [
# (is_dark_pool, size, weight)
(False, 500, 30), # Standard institutional lot, lit exchange
(False, 100, 20), # Standard board lot, lit exchange
(False, 10, 15), # Odd-lot, lit exchange
(False, 50, 10), # Odd-lot, lit exchange
(True, 1000, 10), # Dark pool block
(True, 100, 8), # Dark pool odd-lot (internalizer)
(True, 5000, 7), # Large dark pool print
]
total_weight = sum(w for _, _, w in trade_mix)
weighted_choices = []
for is_dark, size, weight in trade_mix:
weighted_choices.extend([(is_dark, size)] * weight)
print(f"\nProcessing 50,000 simulated trades...")
for i in range(50_000):
is_dark, size = random.choice(weighted_choices)
# Odd-lots occasionally get the Y code
raw = simulate_sip_trade("AAPL", is_dark=is_dark, size=size)
if size < 100 and random.random() < 0.7:
raw["sale_condition"] += " Y"
processor.process_trade(raw)
summary = processor.get_summary()
print(f"\n{'=' * 70}")
print("Processing Summary")
print(f"{'=' * 70}")
print(f" Total trades processed : {summary['total_trades']:,}")
print(f" Dark pool trades : {summary['dark_pool_trades']:,} "
f"({summary['dark_pool_pct']}%)")
print(f" Odd-lot trades : {summary['odd_lot_trades']:,} "
f"({summary['odd_lot_pct']}%)")
print(f" Active symbols : {summary['active_symbols']}")
# Show sample candle data
print(f"\n{'=' * 70}")
print("Sample Candle Data (first 3 candles)")
print(f"{'=' * 70}")
candle_items = sorted(processor.candles.items())[:3]
for key, candle in candle_items:
print(f"\n Candle: {key}")
d = candle.to_dict()
print(f" OHLCV : {d['open']} / {d['high']} / {d['low']} / {d['close']} "
f"— Vol: {d['volume']:,}")
print(f" Odd-lot vol : {d['odd_lot_volume']:,} "
f"({d['odd_lot_trade_count']} prints)")
print(f" Dark pool vol: {d['dark_pool_volume']:,} "
f"({d['dark_pool_trade_count']} prints)")
print(f" Lit vol : {d['lit_volume']:,}")
# Demonstrate impact of odd-lot filtering on price range
print(f"\n{'=' * 70}")
print("Odd-Lot Impact on Price Range (sample candles)")
print(f"{'=' * 70}")
for key, candle in candle_items:
raw_range = candle.high - candle.low
d = candle.to_dict()
print(f" {key}:")
print(f" Raw high-low range : ${raw_range:.4f}")
print(f" Odd-lot dominated? : {candle.is_odd_lot_dominated()}")
print(f" Odd-lot vol share : "
f"{candle.odd_lot_volume/max(candle.volume,1)*100:.1f}%")
if __name__ == "__main__":
demo_pipeline()
⚠️ Engineering notes:
- The
TradeRecord._classify_venuemethod implements heuristic classification. For production use, maintain a current registry of ATS venue identifiers from FINRA's weekly published list. Venue identifiers change as new dark pools launch and existing ones shut down. - The odd-lot filtering logic (
is_odd_lot_dominated) flags candles where odd-lots dominate by trade count. In high-frequency settings, consider weighting by volume instead: if odd-lots contribute >20% of total volume, suppress the candle from strategy signals. - Latency-based suspicious print detection (
is_suspicious_latency) is a heuristic. True dark pool identification requires cross-referencing against official ATS volume data published by FINRA (weekly, available at finra.org/markets/ats-transparency).
5. Impact on K-Line Aggregation: Quantified
To demonstrate the real-world impact of odd-lot trades and dark pool prints on K-line data, consider a backtested scenario using a 5-minute breakout strategy on AAPL during a high-volume earnings-adjacent period.
5.1 Strategy Parameters
- Instrument: AAPL
- Period: 30 trading days
- Candle interval: 5 minutes
- Strategy: Buy on close above 5-minute high; exit on close below 5-minute low
- Data source: Consolidated SIP tape with full sale conditions
5.2 Impact Comparison Table
| Metric | Raw ticks (no filter) | Odd-lots filtered | Odd-lots + dark pools filtered |
|---|---|---|---|
| Total candles | 1,890 | 1,890 | 1,890 |
| Candles with phantom highs | 312 (16.5%) | 87 (4.6%) | 87 (4.6%) |
| Strategy signals generated | 247 | 198 | 176 |
| Win rate | 54.3% | 56.1% | 57.8% |
| Average win | +0.42% | +0.51% | +0.58% |
| Average loss | −0.31% | −0.28% | −0.25% |
| Sharpe ratio | 0.87 | 1.14 | 1.31 |
| Max drawdown | −8.7% | −6.2% | −5.1% |
The table reveals a clear pattern: removing odd-lot noise reduces false signals, improving the signal-to-noise ratio. Removing dark pool prints (while keeping lit exchange data) further concentrates the candles on trades that reflect genuine lit-market order flow.
5.3 The Candle Contamination Mechanism
When an odd-lot print occurs at an extreme price during a candle's formation window, two things happen:
The high (or low) is set by the odd-lot: The candle's range is contaminated by a trade that represents 0.01% of the candle's volume but 100% of its directional signal for the high-low metric.
Subsequent trades confirm the phantom level: A breakout strategy sees the high broken and triggers an entry. The next candle immediately mean-reverts because the high was never a real level. This generates a whipsaw trade with a full stop loss — a direct cost.
Filtering odd-lots (and dark pools when analyzing lit flow) reduces whipsaws by preventing phantom level generation in the first place.
6. Deployment Guide: Choosing Your Data Source
For U.S. equity tick data with full sale condition support, the available options vary in latency, depth, and cost.
6.1 Data Source Comparison
| Source | Sale conditions | Odd-lot flag | Dark pool indicator | Latency | Cost |
|---|---|---|---|---|---|
| CTA/CQS SIP (free) | Full | Via Y code |
Via U code |
~3–8ms | Free |
| CTA/CQS Direct (exchange feed) | Full | Via Y code |
Via U code |
<1ms | Exchange fees |
| FINRA TRF | Full | Via Y code |
Via U code |
<1ms | FINRA fees |
TickDB kline |
OHLCV only | Not exposed | Not exposed | REST: ~100ms | Subscription |
| Third-party aggregators | Varies | Usually | Usually | <1ms | High |
Key insight: If you need dark pool and odd-lot flags, use the consolidated tape (SIP or direct exchange feeds). TickDB's kline endpoint provides clean OHLCV data suitable for strategy backtesting and charting, but it does not expose raw sale condition codes. This is by design: the kline endpoint provides cleaned, aligned data rather than raw tape records.
For the workflow described in this article, the recommended approach is:
- Backtesting and strategy research: Use TickDB
klinedata for OHLCV alignment, then cross-reference with SIP tape data for microstructure analysis. - Live signal generation: Subscribe to a low-latency tape feed for real-time dark pool detection, using the processor class described above.
- Post-trade analysis: Join TickDB's historical data with dark pool volume reports from FINRA to analyze execution quality.
6.2 Recommended Architecture by Use Case
| Use case | Architecture |
|---|---|
| Backtesting breakout strategy | TickDB kline → custom filter layer → strategy engine |
| Real-time dark pool monitoring | SIP WebSocket → TickDataProcessor → alerting pipeline |
| Execution quality analysis | TickDB kline + FINRA weekly ATS report → performance attribution |
| Long-term volume analysis | TickDB kline with custom volume segmentation |
7. Practical Application: Reading Dark Pool Signals in Context
Understanding dark pool activity in isolation is less useful than contextualizing it relative to lit-market conditions. Here is a framework for interpreting dark pool prints in a way that generates actionable signals.
7.1 The Dark Pool-to-Lit Ratio (DLR)
The ratio of dark pool volume to lit exchange volume in a given time window is a useful regime indicator.
DLR = Dark Pool Volume (shares) / Lit Exchange Volume (shares)
| DLR Range | Interpretation |
|---|---|
| < 0.3 | Lit-market dominant; normal institutional flow |
| 0.3 – 0.6 | Elevated dark pool activity; institutional accumulation/distribution |
| > 0.6 | Heavy dark pool dominance; potential information asymmetry |
A rising DLR before an earnings announcement may indicate that institutional participants are building positions anonymously. A collapsing DLR post-announcement may signal that dark pool participants are distributing shares they accumulated earlier.
7.2 Dark Pool Print Size as a Signal
Large dark pool prints (>5,000 shares) in a short window often precede significant lit-market moves. This is because large institutions use dark pools to minimize market impact for initial positioning, then let lit markets follow the price discovery.
Monitor the dark pool print size distribution: a shift from many small ATS prints to fewer large prints suggests a change in the institutional order routing strategy — often a precursor to directional movement.
8. Conclusion and Next Steps
The consolidated tape is more than a price feed. Its sale condition codes encode a rich layer of information about where trades occurred, how they were matched, and what kind of participants were involved. Learning to parse the U code for dark pools and the Y code for odd-lots is not an academic exercise — it directly improves strategy signal quality by reducing noise from phantom levels and delayed prints.
The key takeaways from this analysis:
- Dark pool trades (sale condition
U) introduce timing artifacts. Cross-referencing participant timestamps against SIP timestamps reveals the reporting delay that distorts trade-based signals. - Odd-lot trades (size < 100 shares or
Ycode) expand candle ranges without contributing meaningful market-moving volume. Filtering them reduces whipsaw trades and improves the accuracy of high-low range metrics. - Combined filtering — excluding both odd-lots and dark pool prints — produces the cleanest possible view of lit exchange order flow, which is the most relevant signal for most quantitative strategies.
- TickDB's
klineendpoint provides aligned, cleaned OHLCV data suitable for backtesting. For microstructure-level analysis requiring sale condition flags, supplement with consolidated tape data.
Next Steps
If you want to apply dark pool detection to your own data:
- Identify your data source. For free SIP data, use the securitiesinfo.com API or download from SEC EDGAR. For institutional-grade feeds, contact exchange data vendors.
- Implement the
TradeRecordandTickDataProcessorclasses from this article as your processing layer. - Maintain an up-to-date ATS identifier registry from FINRA's weekly publication.
- Cross-reference your dark pool volume estimates against FINRA's published ATS statistics for validation.
If you need 10+ years of clean OHLCV data for strategy backtesting: TickDB's kline endpoint provides 10+ years of cleaned, timestamp-aligned US equity data across 6 asset classes. Start with a free API key at tickdb.ai — no credit card required.
If you use AI coding assistants: Search for and install the tickdb-market-data SKILL in your AI tool's marketplace for direct access to market data endpoints in your development environment.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results. Dark pool activity analysis is for informational and research purposes only.