The code worked beautifully on your laptop. Then your teammate deployed it to the server, and the backtest returned nothing but zeros.
The problem wasn't the algorithm. It was the infrastructure. For three weeks, both of you had been running separate scripts against separate copies of the same dataset. Your results were yours; his were his. Nobody had a single source of truth. When you finally merged the code, you discovered you'd been using inconsistent data from different vendors, different timezones, and—in one case—different symbol conventions entirely.
This is the invisible tax on quant teams that skip infrastructure planning. And it hits small teams hardest: you don't have a dedicated DevOps engineer, but you have the same production requirements as a large firm.
This article builds the foundation that a three-person quant team needs to operate like a production system. We'll cover shared database architecture, Git-based code collaboration, API key management, and permission control—using TickDB as the unified data layer throughout.
The Infrastructure Problem for Small Teams
A solo quant researcher works in a single environment. The data lives on their machine, the code in their repo, and the API key in their head. It is fragile by design—but it is at least coherent.
A three-person team introduces three categories of risk that solo work never surfaces:
Data fragmentation. Without a shared data layer, each team member builds their own local copy of historical data. Discrepancies accumulate silently. Backtests from different machines become incomparable.
Code divergence. Without enforced Git workflows, developers branch off, experiment in isolation, and merge months later into conflicts that are difficult to resolve cleanly. The "latest version" becomes a philosophical question rather than a technical fact.
Credential sprawl. Without centralized API key management, keys live in Slack messages, spreadsheets, or—worse—hardcoded in scripts that get committed to version control. Rotation becomes impossible. Exfiltration becomes trivially easy.
The solution is not a single DevOps transformation. It is a set of pragmatic patterns that a three-person team can implement in an afternoon and maintain indefinitely.
Module 1: Shared TickDB Data Layer
Why a Shared Data Source Eliminates the "Zero Backtest" Problem
The zero backtest problem is almost always a data problem. If your team pulls from different endpoints, uses different caching strategies, or—one of the most insidious issues—uses different symbol naming conventions, your backtests will disagree before a single line of strategy code executes.
The fix is architectural: every team member, every script, and every backtest must resolve the same symbol against the same endpoint with the same normalization rules.
TickDB provides a unified API across six asset classes. For a three-person team, the practical benefit is not just latency or coverage—it is data coherence. When everyone calls GET /v1/market/kline with the same symbol parameters, the data source is identical by definition.
Production-Grade Shared Data Client
The following Python module provides a shared data client that your entire team can import. It enforces consistent symbol resolution, handles authentication centrally, and includes a local cache layer that prevents redundant API calls during development.
import os
import time
import json
import hashlib
import logging
from pathlib import Path
from datetime import datetime, timezone
from typing import Optional, List, Dict, Any
import requests
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
)
logger = logging.getLogger("tickdb_shared")
class TickDBClient:
"""
Shared TickDB client for team use.
Enforces consistent symbol resolution, centralized auth, and local caching.
Usage:
from tickdb_shared import TickDBClient
client = TickDBClient()
data = client.get_kline("AAPL.US", "1h", limit=500)
"""
def __init__(
self,
api_key: Optional[str] = None,
cache_dir: str = ".tickdb_cache",
cache_ttl_seconds: int = 3600,
):
self.api_key = api_key or os.environ.get("TICKDB_API_KEY")
if not self.api_key:
raise ValueError(
"TickDB API key not found. Set TICKDB_API_KEY environment variable "
"or pass api_key directly."
)
self.base_url = "https://api.tickdb.ai/v1"
self.cache_dir = Path(cache_dir)
self.cache_dir.mkdir(parents=True, exist_ok=True)
self.cache_ttl = cache_ttl_seconds
self.session = requests.Session()
self.session.headers.update({"X-API-Key": self.api_key})
# Rate limit state
self._rate_limit_reset: Optional[float] = None
def _cache_key(self, endpoint: str, params: Dict[str, Any]) -> str:
"""Generate a deterministic cache key from endpoint and params."""
param_str = json.dumps(params, sort_keys=True)
raw = f"{endpoint}:{param_str}"
return hashlib.sha256(raw.encode()).hexdigest()
def _read_cache(self, cache_key: str) -> Optional[Dict]:
"""Read from local cache if valid."""
cache_file = self.cache_dir / f"{cache_key}.json"
if not cache_file.exists():
return None
age = time.time() - cache_file.stat().st_mtime
if age > self.cache_ttl:
cache_file.unlink()
return None
try:
with open(cache_file, "r") as f:
return json.load(f)
except (json.JSONDecodeError, IOError):
return None
def _write_cache(self, cache_key: str, data: Dict) -> None:
"""Write data to local cache."""
cache_file = self.cache_dir / f"{cache_key}.json"
try:
with open(cache_file, "w") as f:
json.dump(data, f)
except IOError as e:
logger.warning(f"Failed to write cache: {e}")
def _handle_rate_limit(self, response: requests.Response) -> None:
"""Handle rate limit errors with Retry-After header."""
if response.status_code == 429 or (
response.json().get("code") == 3001
):
retry_after = int(response.headers.get("Retry-After", 5))
logger.warning(f"Rate limited. Waiting {retry_after}s before retry.")
time.sleep(retry_after)
def _request(
self,
method: str,
endpoint: str,
params: Optional[Dict[str, Any]] = None,
max_retries: int = 3,
) -> Dict:
"""Make an authenticated request with retry logic."""
url = f"{self.base_url}{endpoint}"
for attempt in range(max_retries):
try:
response = self.session.request(
method,
url,
params=params,
timeout=(3.05, 27),
)
if response.status_code == 429 or response.json().get("code") == 3001:
self._handle_rate_limit(response)
continue
response.raise_for_status()
result = response.json()
if result.get("code") != 0:
raise RuntimeError(
f"TickDB API error {result.get('code')}: {result.get('message')}"
)
return result
except requests.RequestException as e:
if attempt == max_retries - 1:
raise
wait = min(2 ** attempt + 0.1, 30)
logger.warning(f"Request failed (attempt {attempt + 1}): {e}. Retrying in {wait}s.")
time.sleep(wait)
raise RuntimeError("Max retries exceeded")
def get_kline(
self,
symbol: str,
interval: str,
limit: int = 500,
start_time: Optional[int] = None,
end_time: Optional[int] = None,
use_cache: bool = True,
) -> List[Dict]:
"""
Fetch OHLCV kline data with team-shared caching.
Args:
symbol: Normalized symbol (e.g., "AAPL.US", "BTC.USDT")
interval: Candle interval (e.g., "1m", "1h", "1d")
limit: Number of candles to fetch
start_time: Unix timestamp (ms) for range start
end_time: Unix timestamp (ms) for range end
use_cache: Whether to use local cache (default: True for dev)
"""
params = {
"symbol": symbol,
"interval": interval,
"limit": limit,
}
if start_time:
params["start_time"] = start_time
if end_time:
params["end_time"] = end_time
cache_key = self._cache_key("/market/kline", params)
if use_cache:
cached = self._read_cache(cache_key)
if cached is not None:
logger.info(f"Cache hit for {symbol} {interval}")
return cached.get("data", [])
result = self._request("GET", "/market/kline", params=params)
data = result.get("data", [])
if use_cache:
self._write_cache(cache_key, {"data": data, "fetched_at": time.time()})
logger.info(f"Fetched {len(data)} candles for {symbol} {interval}")
return data
def get_available_symbols(self, market: Optional[str] = None) -> List[str]:
"""
Fetch list of available symbols, optionally filtered by market.
Args:
market: Filter by market code (e.g., "US", "HK", "CRYPTO")
"""
params = {}
if market:
params["market"] = market
cache_key = self._cache_key("/symbols/available", params)
cached = self._read_cache(cache_key)
if cached is not None:
return cached.get("data", [])
result = self._request("GET", "/symbols/available", params=params)
data = result.get("data", [])
self._write_cache(cache_key, {"data": data, "fetched_at": time.time()})
return data
def validate_symbol(self, symbol: str) -> bool:
"""Validate that a symbol exists in TickDB's database."""
symbols = self.get_available_symbols()
return symbol in symbols
# Global singleton for team-wide use
_shared_client: Optional[TickDBClient] = None
def get_shared_client() -> TickDBClient:
"""Get or create the shared TickDB client singleton."""
global _shared_client
if _shared_client is None:
_shared_client = TickDBClient()
return _shared_client
Engineering notes:
- The
TickDBClientenforces centralized authentication. API keys are never hardcoded in scripts; they live in the environment and are injected at runtime. - The local cache prevents redundant API calls during development. In production, set
use_cache=Falseor reducecache_ttl_secondsto ensure fresh data. - The singleton pattern (
get_shared_client()) ensures that the entire team uses the same client instance, same cache directory, and same auth headers—eliminating the "different data source" problem at the infrastructure level.
Module 2: Git Workflow for Quant Research
The Branching Strategy That Prevents Merge Nightmares
Most quant teams adopt Git reactively—usually after the first catastrophic overwrite. A proactive branching strategy for three-person teams does not need to mirror a Fortune 500 engineering org. It needs three things: feature isolation, backtest reproducibility, and a single source of truth.
We recommend a simplified trunk-based development model adapted for quant research:
main
├── backtest/
│ ├── backtest-2024-returns
│ └── backtest-2024-vol-regime
├── data-pipeline/
│ ├── feature-engineering
│ └── cache-layer-v2
└── strategy/
├── mean-reversion-v3
└── momentum-ensemble
Branch naming convention: {type}/{description} where type is one of backtest, data-pipeline, strategy, or infrastructure.
Enforcing Backtest Reproducibility via Git LFS
Quant research is data-heavy. Your backtest results depend on the exact dataset you used—which may be 50 MB of historical klines or 200 MB of depth snapshots. These files belong in the repository, but they should not slow down everyday Git operations.
Git Large File Storage (LFS) handles this by storing large files in a separate LFS server while keeping lightweight pointers in the repository. For a three-person team, the free tier of GitHub or GitLab provides sufficient LFS storage.
Setup:
# Install Git LFS
git lfs install
# Track large data files
git lfs track "*.csv"
git lfs track "*.parquet"
git lfs track "data/**/*.json"
# Add .gitattributes to version control
git add .gitattributes
Commit protocol for backtests:
# Every backtest commit must include a data manifest
git add results/backtest-2024-returns.csv
git commit -m "backtest: mean-reversion v3 on 2024 US equity data
Data manifest:
- Symbol set: SPX constituents (505 tickers)
- Period: 2024-01-01 to 2024-12-31
- Source: TickDB /v1/market/kline, 1h interval
- Cache: .tickdb_cache/v3.2"
git push origin backtest/mean-reversion-v3
The data manifest in the commit message is not optional. When your teammate tries to reproduce your backtest six months later, the manifest tells them exactly which data to pull and which cache version to use.
Code Review Checklist for Quant Commits
Before merging any branch to main, apply this checklist:
| Check | Purpose |
|---|---|
| Data source documented in commit message | Enables reproducibility |
| No API keys in committed code | Security |
| Cache invalidated or version-bumped | Prevents stale-data backtests |
| Backtest results committed alongside code | Creates a versioned results history |
| Docstring on all new functions | Knowledge transfer |
Module 3: API Key Management and Permission Control
The Three-Layer Key Architecture
For a three-person team, API key management is not an enterprise luxury—it is a security baseline. Exposed keys in public repositories account for a significant portion of API abuse incidents across every major data provider.
We recommend a three-layer key architecture:
Layer 1: Development key (local only). Each team member has their own API key generated from the TickDB dashboard. This key is stored in the local environment and is never committed to any repository.
Layer 2: Staging / CI key (shared, read-only). A team-shared key with read-only permissions is used in automated backtests and CI pipelines. This key is stored in the CI/CD system's secret manager (GitHub Secrets, GitLab CI Variables).
Layer 3: Production key (restricted scope). If the team deploys live strategies, a separate key with minimal scope (e.g., only the symbols and endpoints required for the specific strategy) is generated for production use.
Environment Variable Management
The TickDBClient above reads the API key from the TICKDB_API_KEY environment variable. For team development, we recommend a .env.example file that documents the required variables without exposing values:
# .env.example — copy to .env and fill in your values
# NEVER commit .env to version control
TICKDB_API_KEY=your_dev_key_here
TICKDB_CACHE_DIR=.tickdb_cache
TICKDB_CACHE_TTL=3600
# Optional: team-specific settings
TEAM_DATA_DIR=/shared/data
BACKTEST_OUTPUT_DIR=/shared/results
Add .env to .gitignore:
# .gitignore
.env
.tickdb_cache/
__pycache__/
*.pyc
results/
*.parquet
Permission Control: Principle of Least Privilege
TickDB supports key-scoped permissions. For a three-person team, the practical permission model is:
| Team role | Key type | Permissions |
|---|---|---|
| Researcher (research-only) | Development key | Read: kline, depth, symbols |
| Data engineer (pipeline builder) | Development key + CI key | Read: all; no write operations |
| Strategy lead (deployer) | Production key | Read: strategy-relevant symbols only |
Each key should be generated with the minimum set of permissions required. If a key is compromised, the blast radius is limited to the scope it was granted.
Module 4: Shared Data Pipeline Architecture
The Shared Pipeline Problem
When three people each build their own data pipeline, you end up with three pipelines that produce three different versions of "the same" dataset. Reconciling them is a full-time job.
The solution is a single, team-owned data pipeline that writes to a shared location. Every team member reads from that location—not from their own local copy.
Architecture: Team Data Directory
/shared/
├── raw/
│ ├── us_equity/
│ │ └── kline/
│ │ ├── AAPL.US/
│ │ └── SPY.US/
│ ├── crypto/
│ │ └── depth/
│ └── hk_equity/
├── processed/
│ ├── features/
│ └── signals/
└── results/
├── backtests/
└── reports/
Production-Grade Shared Pipeline Script
import os
import logging
from datetime import datetime, timezone
from tickdb_shared import get_shared_client
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("team_pipeline")
# Shared configuration — committed to the repo, never contains secrets
TEAM_DATA_DIR = os.environ.get("TEAM_DATA_DIR", "/shared/data")
SYMBOLS = ["AAPL.US", "MSFT.US", "GOOGL.US", "AMZN.US", "NVDA.US"]
INTERVAL = "1h"
def fetch_and_store(symbol: str, interval: str, lookback_days: int = 90):
"""
Fetch kline data from TickDB and store in the shared directory.
This function is the team's single source of truth for historical data.
Run via cron or CI pipeline on a schedule.
"""
client = get_shared_client()
# Validate symbol exists
if not client.validate_symbol(symbol):
logger.error(f"Symbol {symbol} not found in TickDB. Skipping.")
return
# Calculate time range
end_time_ms = int(datetime.now(timezone.utc).timestamp() * 1000)
start_time_ms = end_time_ms - (lookback_days * 24 * 60 * 60 * 1000)
logger.info(f"Fetching {symbol} {interval} from {datetime.fromtimestamp(start_time_ms / 1000, tz=timezone.utc)}")
try:
data = client.get_kline(
symbol=symbol,
interval=interval,
limit=1000, # TickDB limit per request
start_time=start_time_ms,
end_time=end_time_ms,
use_cache=False, # Always fetch fresh for the pipeline
)
# Store in shared directory
symbol_dir = os.path.join(TEAM_DATA_DIR, "raw", "us_equity", "kline", symbol)
os.makedirs(symbol_dir, exist_ok=True)
filename = f"{interval}_{datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')}.json"
filepath = os.path.join(symbol_dir, filename)
import json
with open(filepath, "w") as f:
json.dump({"metadata": {"symbol": symbol, "interval": interval, "fetched_at": datetime.now(timezone.utc).isoformat()}, "data": data}, f)
logger.info(f"Stored {len(data)} candles at {filepath}")
except Exception as e:
logger.error(f"Pipeline failed for {symbol}: {e}")
raise
def main():
logger.info("Starting team data pipeline")
for symbol in SYMBOLS:
try:
fetch_and_store(symbol, INTERVAL)
except Exception as e:
logger.error(f"Failed to process {symbol}: {e}")
continue # Continue with other symbols rather than failing entirely
logger.info("Pipeline run complete")
if __name__ == "__main__":
main()
Engineering notes:
- This script runs as a scheduled job (cron or CI/CD). Each team member references the data in
/shared/—never their own local copy. - The
fetch_and_storefunction includes symbol validation before making API calls, preventing wasted requests on invalid symbols. - Error handling is designed to continue with remaining symbols rather than aborting the entire pipeline. This is intentional: one failed symbol should not corrupt the data for others.
Module 5: Team Deployment Recommendations
| Team role | Recommended setup | Rationale |
|---|---|---|
| Researcher (solo) | Local dev environment + personal TickDB key | Full flexibility for rapid iteration |
| Researcher (collaborating) | Shared TickDBClient + team data dir + Git LFS |
Data coherence without sacrificing speed |
| Data engineer | CI/CD pipeline + shared staging key + read-only permissions | Automation without write risk |
| Strategy lead (pre-production) | Shared data dir + backtest results in Git + review checklist | Reproducibility before deployment |
| Strategy lead (live deployment) | Production key + minimal permissions + separate execution environment | Blast radius containment |
Module 6: Closing — The Infrastructure Dividend
The zero backtest problem described at the opening of this article is not a code problem. It is an infrastructure problem. The fix is not a better algorithm—it is a shared data layer, a disciplined Git workflow, and a key management architecture that treats security as a baseline, not an afterthought.
These patterns cost an afternoon to implement. They save weeks of "whose backtest is right?" arguments, months of data reconciliation, and the catastrophic risk of an exposed API key draining a production account.
The dividend is not just operational. It is strategic: a team with coherent infrastructure can iterate faster, reproduce results confidently, and onboard new members in hours instead of days.
Next Steps
If you're a solo quant researcher transitioning to a team, start with the shared TickDBClient module above. Place it in a team-accessible repository and adopt the .env.example pattern for key management. This single change eliminates the data coherence problem for most small teams.
If you're already working in a team but without shared infrastructure, audit your current data sources first. Identify every script that pulls from TickDB and verify that they use the same symbol conventions and endpoint parameters. Discrepancies there are the root cause of most "identical code, different results" problems.
If you're building a production deployment, set up separate production keys with minimal permissions before writing a single line of strategy code. The permission model is the last line of defense against credential compromise.
If you use AI coding assistants, search for and install the tickdb-market-data SKILL in your AI tool's marketplace. It provides TickDB-specific context and code generation for market data workflows.
Sign up for a free TickDB API key at tickdb.ai — no credit card required.
This article does not constitute investment advice. Markets involve risk; past performance does not guarantee future results.