The Gap Between IC and PnL
By Part 7, we had a clean LightGBM model with Rank IC +0.0801, positive folds in 3 of 4 walk-forward windows, and a confirmed leakage-free feature set. That's genuinely meaningful. Most retail AI trading projects never reach this level of honest validation.
And then we ran portfolio simulation with transaction costs. The result: –2 to –3 basis points after costs.
That gap — between statistical signal and economic edge — is not a failure. It is the actual problem to solve. And it is a much harder problem than building the model.
This final part maps the full system architecture needed to convert a validated signal into something you can actually run.
The Seven-Layer System Stack
Most AI trading projects have two layers: a model, and an order. That's why they fail. A real systematic architecture separates concerns cleanly so each layer can be improved, tested, and replaced independently.
Layer 1 — Data Raw ingestion: NASDAQ-100 OHLCV (20yr), hourly options chains (IV surface, Greeks), news sentiment scored, macro indicators. Master calendar, survivorship bias handling, corporate action adjustments. One job: clean, timestamped, aligned data.
Layer 2 — Features The feature engineering pipeline from Part 3: trend, momentum, volatility, volume, options surface, sentiment, sector context. All computable strictly from data available at market close on day t. Saved as a versioned feature store.
Layer 3 — Signal The LightGBM model from Part 7. Takes features → outputs a cross-sectional rank score for each ticker. Score is normalized within the trading universe. Output: a daily signal vector of 103 values between 0 and 1.
Layer 4 — Regime Filter Before any trade, assess the current market regime. The signal IC is not constant across regimes — higher in trending, lower in choppy. Regime gating suppresses or amplifies position sizing based on expected signal reliability.
Layer 5 — Portfolio Construction Convert the signal vector into target weights. Long the top quintile, flat or short the bottom. Apply sector neutrality constraints. Apply max position limits (no single stock > 5% of portfolio).
Layer 6 — Risk Management Position-level stop losses. Portfolio-level drawdown limits that reduce gross exposure if the system is losing. Correlation monitoring. Rebalancing frequency: daily signal, weekly rebalance to limit turnover.
Layer 7 — Execution Market-on-close orders to minimize market impact. Transaction cost estimation before any order is sent. Slippage model calibrated to actual fills. This is where the +0.08 IC either survives or dies.
Regime Detection: The Gate on Everything
The signal doesn't work equally well in all market environments.
| Regime | Signal IC | Notes |
|---|---|---|
| Trending (VIX < 15, sustained direction) | +0.09 to +0.12 | Best performance |
| Neutral / slow drift | +0.05 to +0.08 | Baseline |
| High volatility (VIX > 30) | +0.01 to +0.03 | Near-random |
| Macro shock (earnings, FOMC) | Negative | Avoid entirely |
A simple regime classifier uses three inputs:
def detect_regime(vix, spy_200d_trend, realized_corr_20d):
"""Returns: 'trending', 'neutral', 'volatile', 'shock'"""
if vix > 35:
return 'shock'
if vix > 25:
return 'volatile'
if spy_200d_trend > 0 and realized_corr_20d < 0.6:
return 'trending'
return 'neutral'
REGIME_MULTIPLIERS = {
'trending': 1.0, # full sizing
'neutral': 0.7, # reduced
'volatile': 0.3, # minimal
'shock': 0.0, # flat
}
This is not a learned model. It's a rule. That's intentional — the last thing you want is a regime model that itself requires validation and can itself fail.
Signal to Position Sizing
1. Which tickers to trade Top quintile = longs (21 tickers). Bottom quintile = shorts or avoids (21 tickers). Middle three quintiles = no position. This concentrates capital where the signal has conviction.
2. How much to allocate Equal-weight within longs, equal-weight within shorts. Signal-weighted allocation adds marginal complexity for marginal gain — evidence in this project didn't support it.
3. Sector neutrality If NVDA, AMD, and QCOM all rank in the top quintile, the portfolio is 15% semiconductors. If semiconductors broadly sell off, that's not alpha — it's sector beta. Cap at 2 positions per sector per side.
def construct_portfolio(signal_scores, sector_map, max_positions=20, max_per_sector=2):
ranked = signal_scores.sort_values(ascending=False)
longs, shorts = [], []
sector_counts = {}
for ticker, score in ranked.items():
sector = sector_map[ticker]
count = sector_counts.get(sector, 0)
if count < max_per_sector and len(longs) < max_positions // 2:
longs.append(ticker)
sector_counts[sector] = count + 1
for ticker, score in ranked[::-1].items():
sector = sector_map[ticker]
count = sector_counts.get(sector, 0)
if count < max_per_sector and len(shorts) < max_positions // 2:
shorts.append(ticker)
sector_counts[sector] = count + 1
weights = {}
for t in longs: weights[t] = +1.0 / len(longs)
for t in shorts: weights[t] = -1.0 / len(shorts)
return weights
Turnover: The Hidden Alpha Killer
The single most important economic lever between IC and PnL is turnover. Every rebalance pays bid-ask spread (~1–3 bps per trade for liquid NASDAQ names), market impact, and commissions. With daily rebalancing, a typical portfolio turns over 40–60% of its holdings per week. At 2 bps round-trip per trade, that's roughly 80–120 bps of transaction costs per week — which eliminates the signal.
The fix: turnover constraints.
def apply_turnover_constraint(current_weights, target_weights, max_turnover=0.15):
"""Limit rebalancing to max 15% of portfolio per period."""
new_weights = current_weights.copy()
changes = {t: target_weights.get(t, 0) - current_weights.get(t, 0)
for t in set(current_weights) | set(target_weights)}
changes = sorted(changes.items(), key=lambda x: abs(x[1]), reverse=True)
total_traded = 0
for ticker, delta in changes:
if total_traded + abs(delta) > max_turnover:
break
new_weights[ticker] = current_weights.get(ticker, 0) + delta
total_traded += abs(delta)
return new_weights
With a 15% weekly turnover cap and signal persistence of 3–5 days, the model holds positions long enough to overcome transaction costs on the winning trades.
Paper Trading Deployment
Before risking real capital, run the system in paper trading mode for minimum 3 months — preferably 6. The purpose is not to validate the signal (that was done in Part 6). The purpose is to validate the infrastructure: latency, data delivery, order routing, position tracking, and your own ability to follow the rules.
class PaperTradingSystem:
def __init__(self, model, feature_pipeline, initial_capital=100_000):
self.model = model
self.pipeline = feature_pipeline
self.capital = initial_capital
self.positions = {}
def run_daily(self, date):
features = self.pipeline.compute(date)
scores = self.model.predict(features)
regime = detect_regime(features['vix'].iloc[-1],
features['spy_200d'].iloc[-1],
features['realized_corr'].iloc[-1])
multiplier = REGIME_MULTIPLIERS[regime]
target = construct_portfolio(scores, SECTOR_MAP)
target = {t: w * multiplier for t, w in target.items()}
target = apply_turnover_constraint(self.positions, target)
self.positions = target
What to watch during paper trading:
- Does the daily signal distribution match backtesting?
- Are you following the rules, or rationalizing overrides?
- How stable are data feeds — any gaps or late arrivals?
- What is realized turnover vs. expected?
Where This Project Sits on the Sophistication Spectrum
| Capability | This project | Institutional quant |
|---|---|---|
| Signal quality | Rank IC +0.08, 3/4 folds positive | +0.05–0.12, stable across years |
| Universe | 103 NASDAQ-100 names | 500–5,000+ names |
| Data | 3 sources, 20yr history | 50+ sources, alternative data |
| Execution | Paper / market-on-close | Intraday, dark pools, smart routing |
| Risk infrastructure | Rules-based | Full portfolio optimization |
| Research cycle | Months per signal | Weeks per signal |
The signal we built is real. The infrastructure described here is minimal but functional. The gap between this and a production system at a quant fund is not primarily about the ML model — it's about execution infrastructure, alternative data depth, and risk management sophistication.
Series Summary
- Established the theoretical foundation: transformers vs. diffusion vs. TFT, and why architecture choice matters
- Built a clean 3-source data pipeline with no look-ahead bias and proper train/val splits
- Engineered 30 features covering trend, momentum, volume, volatility, options surface, and sentiment
- Trained a Standard Transformer, diagnosed its collapse into mean prediction, and understood why
- Implemented the Temporal Fusion Transformer with variable selection gating, LSTM encoding, and multi-horizon heads
- Validated signals rigorously: Rank IC, walk-forward, adversarial validation, economic viability analysis
- Showed LightGBM on the cleaned feature set produced competitive Rank IC with better interpretability
- Mapped the full architecture from signal to execution — and showed where the economics break down and how to fix them
The signal is +0.08 Rank IC. It is real. Making it profitable requires solving the execution problem. That is the next project.