Back to Blog
technical-referenceldtmlstmarchitecturetradingtimeseriesml

LDTM v2 — Complete Architecture & Implementation Reference

LDTM v2 architecture and implementation: bidirectional LSTM with attention for multi-asset return prediction and risk-adjusted signal generation.

October 15, 2025·18 min read

LDTM v2 — Complete Architecture & Implementation Reference

Model: Long-Duration Temporal Model (LDTM) v2
Date: 2026-04-21
System: NVIDIA GB10 DGX · PostgreSQL 15 · PyTorch 2.x (NGC 25.01)
Status: Production — 103 NDX-100 tickers trained and inferring daily


Table of Contents

  1. System Overview
  2. AI Model Design
  3. Mathematical Foundations
  4. Feature Engineering
  5. Dataset Design
  6. Training Architecture
  7. Inference Pipeline
  8. Orchestration & Parallelism
  9. Database Schema
  10. Prediction Outputs
  11. Dashboard & LLM Interface
  12. Container Architecture
  13. Cron Schedule
  14. Known Limitations & Caveats

1. System Overview

LDTM is a per-ticker LSTM-based price prediction system designed to forecast three forward price targets for every equity in the NASDAQ-100 universe:

Horizon Target Notes
Next trading day Next-day close price Primary signal; used for daily decision support
Next Monday Following Monday close Weekly directional signal
One month (~21 trading days) Monthly close Trend confirmation

The system processes 20+ years of daily OHLCV data per ticker, augments it with six engineered features, and trains a separate LSTM model per ticker. All 103 models run independently — there is no shared weight matrix or transfer learning. This design choice is intentional: AAPL and SQQQ have fundamentally different volatility regimes and price dynamics; forcing them to share representations would average away edge cases for both.

Pipeline Overview

Interactive Brokers TWS (IB Gateway)
        ↓
  ingestion/ingest_daily.py
        ↓
  market_data_daily (PostgreSQL)
        ↓
  model/ldtm/dataset.py  ← OHLCVDataset
        ↓
  model/ldtm/trainer.py  ← LDTM LSTM + AdamW + AMP
        ↓
  /model_weights/ldtm/{ticker}_ldtm.pt
        ↓
  model/ldtm/predict.py  ← build_inference_window → forward pass
        ↓
  ldtm_run_log (PostgreSQL)
        ↓
  snapshot_writer.py → ldtm_daily_snapshots
        ↓
  dashboard/app.py (Streamlit) ← localhost:8501
        ↓
  llm/llm_query.py → Mistral-7B FP8 (Triton, localhost:8000/v1)

2. AI Model Design

2.1 Architecture

Model class: LDTMModel (PyTorch nn.Module)

Input tensor: (batch_size, window_size, input_size)
                = (32, 30, 11)

→ LSTM
    input_size  = 11  (OHLCV + 6 engineered features)
    hidden_size = 128
    num_layers  = 2
    dropout     = 0.2  (applied between layers, not after last)
    batch_first = True

→ Last hidden state: h_n[-1]  shape (batch_size, 128)

→ Three independent prediction heads (FC towers):
    head_next_day     : Linear(128→64) → ReLU → Linear(64→1)
    head_next_monday  : Linear(128→64) → ReLU → Linear(64→1)
    head_one_month    : Linear(128→64) → ReLU → Linear(64→1)

Output: dict {
    "next_day":    tensor (batch_size, 1)  — normalized price
    "next_monday": tensor (batch_size, 1)  — normalized price
    "one_month":   tensor (batch_size, 1)  — normalized price
}

2.2 Design Rationale

Why LSTM over Transformer?

  • LSTMs handle variable-length historical sequences natively with O(n) memory vs O(n²) for attention
  • Daily OHLCV sequences are auto-regressive; the LSTM hidden state provides an efficient rolling summary of historical regime
  • Transformers would require positional encoding over 5,000+ daily bars; LSTMs handle this implicitly
  • Inference on a 30-day window takes ~2ms per ticker on CPU; transformers would require Flash Attention and a more complex deployment

Why per-ticker models? Each ticker has a unique volatility profile, price range (KLAC ~$1800 vs CPRT ~$34), earnings cadence, and sector exposure. A shared model would learn the median behavior of NDX-100, suppressing the outlier momentum patterns that generate the most useful signals.

Why 2 LSTM layers?

  • Layer 1 captures short-term momentum (candle patterns, gap fills, intraday reversals expressed in daily closes)
  • Layer 2 captures medium-term regime (trending vs mean-reverting behavior within the 30-day window)
  • 3+ layers provide diminishing returns on daily data at this timescale; they also require longer training and are prone to gradient vanishing despite LSTM's gating

Why window_size = 30?

  • 30 trading days ≈ 6 weeks ≈ one earnings cycle
  • Captures: one full options expiry cycle, intraday momentum decay, institutional rebalancing patterns
  • Shorter windows (< 20): insufficient context for MA20 and RSI convergence
  • Longer windows (> 60): the earliest prices in the window may belong to a different macro regime; per-window normalization would span across a regime change

Why 3 independent prediction heads? Each horizon has a different optimal feature representation:

  • Next-day close is dominated by recent momentum (ret1, RSI14)
  • Next-Monday is influenced by weekly patterns (ret5, MA5)
  • One-month is driven by medium-term trend (MA10, MA20)

Sharing a single output head forces a compromise between these regimes. Independent FC towers allow each head to learn its own weighting of the LSTM hidden state.

2.3 Parameter Count

Component Parameters
LSTM layer 1 4 × (11 × 128 + 128 × 128 + 128) = 71,680
LSTM layer 2 4 × (128 × 128 + 128 × 128 + 128) = 131,584
head_next_day 128×64 + 64 + 64×1 + 1 = 8,257
head_next_monday 8,257
head_one_month 8,257
Total ~227,000 parameters

This is intentionally small. At 227K parameters and 30-day windows, the model generalizes well on ~4,000–5,000 training samples (20 years of daily data). Larger models would overfit.


3. Mathematical Foundations

3.1 LSTM Gate Equations

At each timestep t, given input x_t ∈ ℝ^11 and previous hidden state h_{t-1} ∈ ℝ^128:

Forget gate — what to discard from cell state:

f_t = σ(W_f · [h_{t-1}, x_t] + b_f)

Input gate — what new information to store:

i_t = σ(W_i · [h_{t-1}, x_t] + b_i)
g_t = tanh(W_g · [h_{t-1}, x_t] + b_g)

Cell state update:

C_t = f_t ⊙ C_{t-1} + i_t ⊙ g_t

Output gate:

o_t = σ(W_o · [h_{t-1}, x_t] + b_o)
h_t = o_t ⊙ tanh(C_t)

Where:

  • σ = sigmoid activation
  • tanh = hyperbolic tangent
  • ⊙ = element-wise (Hadamard) product
  • W_{f,i,g,o} ∈ ℝ^{128×(128+11)} — weight matrices
  • b_{f,i,g,o} ∈ ℝ^{128} — bias vectors

The LSTM processes all 30 timesteps sequentially. Only the final hidden state h_{30} (from the last layer) is passed to the three prediction heads.

3.2 Per-Window Min-Max Normalization

Critical design decision. Rather than fitting a global scaler on training data (which fails catastrophically when inference-time prices are out of the training range — AAPL was $81 in 2019 training data, $266 today), LDTM v2 normalizes each 30-day window independently.

For each window W of shape (30, 11):

feat_min_j = min_{t=1..30} W[t, j]    for j = 0..10
feat_max_j = max_{t=1..30} W[t, j]    for j = 0..10

W_norm[t, j] = (W[t, j] - feat_min_j) / (feat_max_j - feat_min_j + ε)

Where ε = 1e-8 (prevents division by zero for constant-price windows).

Inverse transform (dollars from normalized prediction):

P_dollars = P_norm × (c_max - c_min) + c_min

Where c_min = feat_min[3] and c_max = feat_max[3] (column index 3 = close).

This means the model learns relative movements within each window, not absolute price levels. A normalized value of 0.5 means "mid-range of this 30-day window", regardless of whether that window is from 2005 (AAPL ~$5) or 2026 (AAPL ~$270).

3.3 Loss Function

Multi-head mean squared error:

L = MSE(ŷ_nd, y_nd) + MSE(ŷ_nm, y_nm) + MSE(ŷ_om, y_om)

Where all predictions and targets are in normalized [0,1] space:

MSE(ŷ, y) = (1/n) Σ_{i=1}^{n} (ŷ_i - y_i)²

Why MSE and not MAE? MSE penalizes large errors quadratically, which is preferable for financial prediction — a 10% prediction error is much worse than ten 1% errors from a risk management perspective.

3.4 Label Normalization

Training labels are also normalized per-window:

y_nd  = (close_{t+1}  - c_min) / (c_max - c_min + ε)
y_nm  = (close_{mon}  - c_min) / (c_max - c_min + ε)
y_om  = (close_{+21d} - c_min) / (c_max - c_min + ε)

Labels outside [0,1] are possible (and expected) — a prediction beyond the current 30-day range is a directional signal that the model expects a breakout.

3.5 Direction Accuracy Metric

For evaluation, the raw dollar prediction is compared against the actual next-day close:

direction_pred   = "UP"   if P̂_next_day > P_today  else "DOWN"
direction_actual = "UP"   if P_actual    > P_today  else "DOWN"
direction_correct = (direction_pred == direction_actual)

direction_accuracy = (Σ direction_correct) / n_evaluated × 100%

3.6 Percentage Error

pct_error = (P̂_pred - P_actual) / P_actual × 100

Reported as signed (positive = overestimate) and unsigned |pct_error| for accuracy assessment.


4. Feature Engineering

LDTM v2 expands the raw OHLCV 5-feature set to 11 features by adding momentum and trend indicators.

4.1 Feature Table

Index Name Formula Purpose
0 open raw Price gap context
1 high raw Intraday range ceiling
2 low raw Intraday range floor
3 close raw Prediction anchor (CLOSE_COL_IDX=3)
4 volume raw Participation/liquidity
5 ret1 ln(C_t / C_{t-1}) 1-day momentum
6 ret5 ln(C_t / C_{t-5}) Weekly momentum
7 rsi14 Wilder RSI(14) Overbought/oversold
8 ma5 SMA(C, 5) Short-term trend
9 ma10 SMA(C, 10) Medium-term trend
10 ma20 SMA(C, 20) Swing trend / regime

4.2 Log Returns

ret1_t = ln(C_t / C_{t-1})
ret5_t = ln(C_t / C_{t-5})

Log returns are used instead of simple returns (C_t/C_{t-1} - 1) for two reasons:

  1. Log-normality: daily log returns are approximately normally distributed, making min-max normalization more stable
  2. Time-additivity: multi-period log returns sum: ln(C_t/C_{t-5}) = Σ_{k=1}^{5} ln(C_{t-k+1}/C_{t-k})

4.3 Wilder RSI (14 periods)

Δ_t = C_t - C_{t-1}
gain_t = max(Δ_t, 0)
loss_t = max(-Δ_t, 0)

avg_gain_t = EWM(gain, α=1/14, min_periods=14)_t
avg_loss_t = EWM(loss, α=1/14, min_periods=14)_t

RS_t = avg_gain_t / (avg_loss_t + ε)
RSI_t = 100 - (100 / (1 + RS_t))

Wilder's smoothing uses com = period - 1 = 13 in pandas EWM notation:

avg_gain = gain.ewm(com=13, min_periods=14).mean()

RSI ∈ [0, 100]:

  • RSI > 70: overbought (potential reversal signal)
  • RSI < 30: oversold (potential bounce signal)
  • RSI 30–70: neutral momentum

4.4 Simple Moving Averages

MA_k(t) = (1/k) Σ_{i=0}^{k-1} C_{t-i}

For k ∈ {5, 10, 20}. The ratio of MA5/MA20 (implicitly captured by the model as both features are present in the normalized window) encodes the Golden Cross / Death Cross signal.

4.5 Warmup Requirement

MA20 requires 20 prior observations. For each ticker, the first valid row is:

first_valid = argmax_{t} [~any(isnan(features[t, :]))]

Windows begin at first_valid + window_size - 1, discarding the first ~20 warming rows. For tickers with 20+ years of data (~5,000 bars), this loses < 0.4% of samples — negligible.


5. Dataset Design

5.1 OHLCVDataset

class OHLCVDataset(torch.utils.data.Dataset):
    # split: 'train' (70%) | 'val' (15%) | 'test' (15%)
    # window_size: 30
    # Returns: (x_tensor[30, 11], y_labels_dict)

Split boundaries (chronological, not random):

|←─────── train (70%) ───────→|←── val (15%) ──→|←── test (15%) ──→|
t=0                           t=0.70N           t=0.85N           t=N

Chronological split is mandatory — random splits would cause data leakage (future prices appearing in training windows).

Per-window normalization in __getitem__:

window  = feat_vals[end-window_size:end]       # shape (30, 11)
feat_min = window.min(axis=0)                  # shape (11,)
feat_max = window.max(axis=0)                  # shape (11,)
c_range  = feat_max[3] - feat_min[3]

x_scaled = (window - feat_min) / (feat_max - feat_min + eps)

labels = {
    "next_day":    (close[end]   - feat_min[3]) / (c_range + eps),
    "next_monday": (close[mon]   - feat_min[3]) / (c_range + eps),
    "one_month":   (close[+21d]  - feat_min[3]) / (c_range + eps),
    "close_min":   feat_min[3],
    "close_max":   feat_max[3],
}

5.2 Inference Window (build_inference_window)

For inference, there is no ground truth label. The function:

  1. Loads the last window_size + _WARMUP + 10 = 60 rows from market_data_daily
  2. Computes all 11 features
  3. Drops NaN warmup rows
  4. Takes the last 30 rows
  5. Normalizes and returns (tensor[1, 30, 11], c_min, c_max)

The extra 10 rows above _WARMUP = 20 provide a safety buffer for tickers with recent data gaps (weekends, holidays loaded as separate rows).


6. Training Architecture

6.1 Optimizer: AdamW

θ_{t+1} = θ_t - α_t × m̂_t / (√v̂_t + ε) - α_t × λ × θ_t

where:
  m_t = β_1 × m_{t-1} + (1 - β_1) × g_t        (1st moment)
  v_t = β_2 × v_{t-1} + (1 - β_2) × g_t²       (2nd moment)
  m̂_t = m_t / (1 - β_1^t)                       (bias-corrected)
  v̂_t = v_t / (1 - β_2^t)                       (bias-corrected)

Hyperparameters:

  • lr = 1e-3 (initial)
  • β_1 = 0.9, β_2 = 0.999 (Adam defaults)
  • ε = 1e-8
  • weight_decay λ = 1e-4 (L2 regularization on weights, not momentum)

AdamW is preferred over Adam because decoupled weight decay (Loshchilov & Hutter 2019) prevents weight decay from interacting with the adaptive gradient scaling, improving generalization on financial time series.

6.2 Learning Rate Schedule: CosineAnnealingLR

α_t = α_min + (1/2)(α_0 - α_min)(1 + cos(π × t/T_max))
  • α_0 = 1e-3 (initial)
  • α_min = 1e-6 (floor)
  • T_max = epochs (100)

Cosine annealing allows aggressive early exploration (high LR in early epochs) before converging to a tight minimum (near-zero LR at epoch 100). This is well-suited to financial data where the loss surface is relatively flat and noisy.

6.3 Automatic Mixed Precision (AMP)

Training uses torch.amp.GradScaler and torch.amp.autocast:

with torch.amp.autocast(device.type, enabled=use_amp):
    preds = model(x_batch)
    loss  = criterion(preds["next_day"], y["next_day"]) + ...

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

AMP computes forward/backward passes in FP16 (or BF16 on Ampere+) while maintaining FP32 master weights. On the GB10:

  • FP16 tensor operations run at 2× throughput vs FP32
  • Gradient scaling prevents underflow in FP16 (gradients near zero are scaled up before backward, scaled down before update)

6.4 Early Stopping

class EarlyStopping:
    patience = 10
    best_loss = ∞
    counter = 0

    def step(val_loss, epoch):
        if val_loss < best_loss:
            best_loss = val_loss
            best_epoch = epoch
            counter = 0
            return False   # continue
        counter += 1
        return counter >= patience  # stop

Training stops when validation loss does not improve for 10 consecutive epochs. The best model weights (lowest val_loss) are restored before saving.

Observed behavior across 103 tickers:

  • Fastest convergence: FER (2 epochs), GEHC (2 epochs), EA (3 epochs) — limited data or trivially learnable patterns
  • Typical convergence: 5–35 epochs
  • Slowest: MSTR (59 epochs), ADP (65 epochs), DDOG (68 epochs) — high-volatility or complex regime

6.5 Training Loop Summary

for epoch in 1..100:
    model.train()
    for x_batch, y_batch in train_loader:
        forward pass (AMP)
        loss = MSE(nd) + MSE(nm) + MSE(om)
        backward pass (GradScaler)
        AdamW step
        GradScaler update
    
    val_loss = evaluate(val_loader)
    scheduler.step()
    
    if val_loss < best_loss:
        save best_state (CPU clone)
    
    if early_stopping.step(val_loss):
        break

model.load_state_dict(best_state)
save checkpoint: {epoch, model_state, config, val_loss}
log_run(DB)

7. Inference Pipeline

7.1 Checkpoint Format

PyTorch .pt file stored at /model_weights/ldtm/{TICKER}_ldtm.pt:

{
    "epoch":       int,           # best epoch number
    "model_state": OrderedDict,   # PyTorch state_dict
    "config":      dict,          # LDTMConfig as dict
    "val_loss":    float,         # best validation loss
}

7.2 Inference Steps

# 1. Load checkpoint
checkpoint = torch.load(ckpt_path, map_location="cpu")
config = LDTMConfig(**checkpoint["config"])
model = LDTMModel(config)
model.load_state_dict(checkpoint["model_state"])

# 2. Build inference window (last 30 trading days)
x, c_min, c_max = build_inference_window(ticker, db_url, config.window_size)

# 3. Forward pass (no gradient)
model.eval()
with torch.no_grad():
    preds = model(x.to(device))

# 4. Inverse transform to dollars
def to_dollars(norm_val):
    return round(norm_val * (c_max - c_min) + c_min, 2)

result = {
    "ticker":            ticker,
    "next_day_close":    to_dollars(preds["next_day"].item()),
    "next_monday_close": to_dollars(preds["next_monday"].item()),
    "one_month_close":   to_dollars(preds["one_month"].item()),
}

7.3 Inference Speed

Step Time Notes
Load checkpoint ~50ms CPU load; model is tiny (~900KB)
Build inference window ~80ms DB query + feature computation
Forward pass (GPU) ~2ms 30 timesteps, 128 hidden, 227K params
Forward pass (CPU) ~8ms Fallback if no GPU available
DB logging ~10ms psycopg2 insert
Total per ticker ~150ms

8. Orchestration & Parallelism

8.1 orchestrate.py — GPU-Aware Job Dispatcher

orchestrate.py replaces naive bash background-job parallelism when --parallel > 1.

Algorithm:

  1. Query nvidia-smi for all GPUs and their VRAM
  2. Compute concurrent slots per GPU: slots = max(1, min(16, vram_mib // 600))
  3. Build a thread-safe queue.Queue with slots copies of each GPU index
  4. Spawn ThreadPoolExecutor(max_workers=total_slots)
  5. Each worker: pop GPU token → docker run --gpus "device=N" -e CUDA_VISIBLE_DEVICES=0 → push token back

Wave ordering (for --mode both):

  • Wave 1: All 103 training jobs (checkpoints must exist before inference)
  • Wave 2: All 103 inference jobs (parallelized separately)

GB10 configuration:

  • Single GPU (NVIDIA GB10), VRAM = N/A (unified memory, ~128GB)
  • Fallback: FALLBACK_SLOTS_PER_GPU = 4
  • Observed VRAM per LDTM container: ~310 MiB (CUDA context + model weights + batch)

8.2 Slot Math

GB10 total VRAM:         ~128 GB (unified LPDDR5X)
Triton/Mistral-7B FP8:  ~39.4 GB
Desktop (Xorg, GNOME):  ~600 MB
Available:              ~88 GB

Per LDTM container:      ~310 MiB
Theoretical max slots:   88,000 / 310 ≈ 283

Compute cap (practical): 16 slots
Reason: GPU-Util saturates at ~90%+ with 4 jobs competing with Triton.
        16 slots on training-only runs (no Triton competition) is safe.

9. Database Schema

9.1 ldtm_run_log — Append-only audit log

CREATE TABLE ldtm_run_log (
    id              BIGSERIAL   PRIMARY KEY,
    run_at          TIMESTAMPTZ DEFAULT NOW(),
    ticker          TEXT        NOT NULL,
    mode            TEXT        NOT NULL,      -- 'train' | 'infer'
    status          TEXT        NOT NULL,      -- 'success' | 'failed'
    duration_sec    FLOAT,
    epochs_run      INTEGER,                   -- training only
    best_val_loss   FLOAT,                     -- training only
    next_day_close      FLOAT,                 -- inference only
    next_monday_close   FLOAT,                 -- inference only
    one_month_close     FLOAT,                 -- inference only
    error_msg       TEXT
);

9.2 ldtm_daily_snapshots — Queryable daily predictions with actuals

CREATE TABLE ldtm_daily_snapshots (
    id                          BIGSERIAL   PRIMARY KEY,
    run_date                    DATE        NOT NULL,
    ticker                      TEXT        NOT NULL,
    generated_at                TIMESTAMPTZ DEFAULT NOW(),

    -- Predictions (written at generation time)
    next_day_close_pred         FLOAT NOT NULL,
    next_monday_close_pred      FLOAT NOT NULL,
    one_month_close_pred        FLOAT NOT NULL,
    run_date_close              FLOAT,         -- baseline for direction calculation

    -- Actuals (filled by snapshot_fillback.py)
    next_day_actual             FLOAT,
    next_day_actual_date        DATE,
    next_day_pct_error          FLOAT,         -- signed: (pred - actual) / actual * 100
    next_day_direction_pred     TEXT,          -- 'UP' | 'DOWN'
    next_day_direction_actual   TEXT,
    next_day_direction_correct  BOOLEAN,

    next_monday_actual          FLOAT,
    next_monday_actual_date     DATE,
    next_monday_pct_error       FLOAT,

    one_month_actual            FLOAT,
    one_month_actual_date       DATE,
    one_month_pct_error         FLOAT,

    source_run_log_id           BIGINT REFERENCES ldtm_run_log(id),
    UNIQUE (run_date, ticker)
);

9.3 ldtm_accuracy_30d — View

CREATE VIEW ldtm_accuracy_30d AS
SELECT
    ticker,
    COUNT(*) FILTER (WHERE next_day_direction_correct IS NOT NULL) AS evaluated_days,
    ROUND(AVG(ABS(next_day_pct_error))::numeric, 2)               AS avg_abs_pct_error,
    ROUND(AVG(ABS(next_monday_pct_error))::numeric, 2)            AS avg_abs_pct_error_weekly,
    ROUND(
        100.0 * COUNT(*) FILTER (WHERE next_day_direction_correct = true)
        / NULLIF(COUNT(*) FILTER (WHERE next_day_direction_correct IS NOT NULL), 0), 1
    )                                                             AS direction_accuracy_pct,
    MAX(run_date)                                                 AS latest_run_date
FROM ldtm_daily_snapshots
WHERE run_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY ticker
ORDER BY direction_accuracy_pct DESC NULLS LAST;

10. Prediction Outputs

10.1 JSON Output (per ticker, written to predictions/)

{
    "ticker":            "AAPL",
    "next_day_close":    268.55,
    "next_monday_close": 268.74,
    "one_month_close":   275.05
}

10.2 DB Output (ldtm_run_log)

Column Example Value
ticker "AAPL"
mode "infer"
status "success"
duration_sec 0.152
next_day_close 268.55
next_monday_close 268.74
one_month_close 275.05

10.3 Training Output (ldtm_run_log mode='train')

Column Example Value
ticker "AAPL"
mode "train"
epochs_run 10
best_val_loss 0.577952
duration_sec 31.2

10.4 Derived Signal (computed at analysis time)

implied_1m_return = (one_month_close_pred / next_day_close_pred - 1) × 100

Example (AAPL):  (275.05 / 268.55 - 1) × 100 = +2.42%
Example (AMD):   (315.00 / 282.22 - 1) × 100 = +11.61%
Example (SQQQ):  (53.15  / 56.66  - 1) × 100 = -6.20%

10.5 2026-04-22 Run Summary (103 tickers)

Category Count Representative
Bullish >5% (1-month) 15 AMD +11.6%, INTU +8.4%, ADBE +8.4%
Bullish +1–5% 65 MSFT +6.7%, NVDA +5.1%, AAPL +2.4%
Neutral ±1% 13 KLAC +0.2%, LIN -0.02%, BKNG -0.05%
Bearish -1 to -5% 9 ODFL -21%, DDOG -3.6%, KHC -1.8%
ETF signal 2 TQQQ +5.2%, SQQQ -6.2%

11. Dashboard & LLM Interface

11.1 Streamlit Dashboard (dashboard/app.py)

Dual-mode design:

  • DATA_SOURCE=db: full features, live PostgreSQL, LLM tab enabled
  • DATA_SOURCE=blob: read-only Azure Blob JSON, LLM tab disabled

5 tabs:

  1. Ticker View — prediction history chart + accuracy metrics
  2. Today's Predictions — all 103 tickers ranked by implied 1-month return
  3. Accuracy Leaderboard — 30-day direction accuracy per ticker
  4. TQQQ Signal — Random Forest signal + equity curve
  5. LLM Query — Mistral-7B Q&A interface

11.2 LLM Context Assembly (llm/llm_query.py)

Three SQL queries assembled into a structured text prompt:

=== LDTM Context — 2026-04-22 ===

--- NVDA ---
Prediction date  : 2026-04-22
Last known close : $193.47
Next-day pred    : $198.00
Next-Monday pred : $199.67
1-month pred     : $208.06
Implied 1m return: +5.1%
30d accuracy     : direction=N/A  avg|%err|=N/A  n=0
Last retrain     : 2026-04-21  val_loss=0.7451  epochs=44
Recent headlines :
  [2026-04-20] NVIDIA unveils next-gen Blackwell Ultra GPUs
  ...

LLM parameters:

  • Model: engine-fp8 (Mistral-7B FP8 via Triton)
  • Temperature: 0.3 (low → factual responses)
  • Max tokens: 1024
  • Context window: 32K tokens (safe for 8-ticker queries ~3K tokens)

12. Container Architecture

Container Base Image GPU Purpose
model-ldtm nvcr.io/nvidia/pytorch:25.01-py3 Yes Train + infer
trading-dashboard python:3.11-slim No Streamlit UI
ldtm-llm-query python:3.11-slim No LLM CLI
ldtm-snapshot-writer model-ldtm image No DB snapshot upsert
ldtm-snapshot-fillback model-ldtm image No Fill actuals
blob-export trading-dashboard No Azure Blob export
trading-postgres postgres:15 No Database

All containers use --network host for zero-overhead DB access on the DGX host.


13. Cron Schedule

Time Command Purpose
6:00 PM Mon-Fri run_ingestion.sh Fetch daily OHLCV from IB
6:15 PM Mon-Fri run_ldtm_infer.sh Infer all 103 → snapshots
6:30 PM Mon-Fri run_news_ingestion.sh Fetch news headlines + bodies
7:00 PM Mon-Fri run_blob_export.sh Export JSON to Azure Blob
2:00 AM Saturday run_ldtm_canary_retrain.sh Retrain NVDA, TQQQ, AAPL
1:00 AM 1st Sunday run_ldtm_monthly_retrain.sh Full retrain all 103 tickers

Why 6:15 PM for inference (not 6:05 PM)? The ingestion job processes 103 tickers via 3 parallel IB connections at ~7-8s/ticker. At 103 tickers, parallel completion is ~4-5 minutes. 15 minutes gives a 10-minute safety buffer for slow market days or IB connection retries.


14. Known Limitations & Caveats

Absolute Price Accuracy

A small number of tickers show predictions far outside their actual price range (NFLX ~$93 predicted vs ~$1000 actual). This is a data coverage issue, not a model bug. When market_data_daily has insufficient history for a ticker (< 30 training samples, or history begins after a stock split that wasn't detected), the per-window normalization produces reasonable relative predictions but in the wrong absolute range.

Mitigation: Query SELECT ticker, COUNT(*), MIN(date), MAX(date) FROM market_data_daily GROUP BY ticker to verify history depth. Any ticker with < 250 rows (1 year) should be considered unreliable for absolute price prediction.

Val Loss Does Not Indicate Accuracy

A low val_loss (e.g. GEHC = 0.308) does not necessarily mean the model makes accurate absolute predictions. Tickers with very limited history (GEHC IPO in 2023) converge quickly on training data but have seen almost no macro regime changes. Val_loss measures fit on the held-out 15% of historical data — not forward predictive accuracy.

No Exogenous Data

The model uses only price and volume data. It does not incorporate:

  • Earnings announcements
  • Fed meeting dates
  • Macro indicators (CPI, jobs reports)
  • Analyst estimate revisions

This is a design choice for simplicity. Adding exogenous features would require a fundamentally different architecture (multi-input LSTM or temporal fusion transformer).

Short-Horizon Prediction Challenge

Even with engineered features, predicting next-day close is an exceptionally hard task. The Efficient Market Hypothesis suggests that publicly available price information is already reflected in today's price. LDTM's value lies not in its absolute price predictions but in:

  1. The relative momentum signal (implied 1-month return direction)
  2. The TQQQ/SQQQ pair as a market direction indicator
  3. Sector clustering of bullish/bearish signals