All Posts

20 posts
tradingtrading

Part 1Time-Series Foundations: Transformers, Diffusion, and Why They're Different

Before we build anything, we need a clear mental model of what these architectures actually do — not the marketing version, the mechanistic one.

Jun 2, 2026Read →
tradingtrading

Part 2Data Pipeline: NASDAQ-100, Options Chain, and Sentiment

How to align three different data sources into one consistent daily dataset — and why every alignment decision is also a leakage decision.

Jun 2, 2026Read →
tradingtrading

Part 3Feature Engineering: From OHLCV to a Rich Feature Matrix

What each feature actually measures, why it might matter for predicting NASDAQ-100 returns, and how to compute it without accidentally injecting future information.

Jun 2, 2026Read →
tradingtrading

Part 4The Standard Transformer: Architecture, Training, and Why It Collapsed

We built it. We ran it. It predicted the same thing for every stock. Here's exactly what happened and what it reveals about applying transformers to financial data.

Jun 2, 2026Read →
tradingtrading

Part 5Temporal Fusion Transformer: Gating, Memory, and Sector Learnability

The architecture that finally worked — and a precise explanation of why each component addresses a specific failure mode of the Standard Transformer on financial data.

Jun 2, 2026Read →
tradingtrading

Part 6Validation Discipline: Catching Leakage, Rank IC, and the Economic Viability Wall

The step most AI trading projects skip — and the reason most of them fail when they reach real capital. A rigorous validation framework for quantitative ML.

Jun 2, 2026Read →
tradingtrading

Part 7LightGBM as Signal Engine: Feature Sets, Clean Candidates, and OOS Testing

The model that finally survived adversarial validation — and why gradient boosting often beats deep learning on structured financial data.

Jun 2, 2026Read →
tradingtrading

Part 8System Architecture: Signal → Strategy → Execution

A statistically real signal is not yet a tradeable system. This is the architecture that converts +0.08 Rank IC into a deployable paper-trading framework — and why the layers between prediction and profit are harder than the model itself.

Jun 2, 2026Read →
tradingtrading

Building a Quantitative Trading System — Series Index

A practitioner's journey: from raw market data through Temporal Fusion Transformers, LightGBM, signal validation, and live trading architecture. All concepts defined, all mistakes documented.

Jun 2, 2026Read →
gpu

What Actually Happens When Your Python Calls the GPU? — Part 2 of a Series

May 1, 2026Read →
gpuGPU

What Actually Happens When Your Python Calls the GPU?

A hands-on journey from a single line of PyTorch code to the silicon on an NVIDIA GB10 Spark — tracing a matrix multiply through the full execution stack and profiling it with Nsight Systems.

May 1, 2026Read →
ai-infraTensorRT-LLM

Evaluating Inflight Batching and Scheduler Policies in TensorRT-LLM Triton Backend

Configured inflight batching in TensorRT-LLM Triton backend for Llama 3 8B model, tuning parameters to optimize GPU utilization and balance latency and throughput under bursty loads.

Apr 20, 2026Read →
gpuAWQ

AWQ vs. FP8 Quantization: Balancing Accuracy and Throughput for Llama 3 70B

Comparing AWQ INT4 and FP8 quantization methods on the Llama 3 70B model across accuracy benchmarks (MMLU, HumanEval, GSM8K, MT-Bench) and inference throughput measurements on an H100 GPU.

Apr 20, 2026Read →
ai-infraQwen 2.5 Coder

Running Qwen 2.5 Coder Locally with OpenCode: A Private AI Coding Assistant

A complete setup guide for running Qwen 2.5 Coder locally via Ollama and connecting it to OpenCode, creating a private, offline-capable AI coding assistant in your terminal.

Read →
ai-infraSpeculative Decoding

Speculative Decoding Latency Optimization on TensorRT-LLM with Llama 3 Models

Tuning speculative decoding using Llama 3 8B as the draft model and Llama 3 70B as the target to reduce decode latency.

Read →
gpuCUDA

CUDA Graph Capture for Low-Latency LLM Decode on RTX 4090

Using CUDA graph capture in TensorRT-LLM to eliminate kernel launch overhead during the decode phase of Llama 3 8B on RTX 4090 — benchmarks, trade-offs, and lessons learned.

Read →
gpuTensorRT-LLM

End-to-End TensorRT-LLM FP8 Inference Optimization on NVIDIA GB10 Blackwell

End-to-end TensorRT-LLM FP8 inference optimization on NVIDIA GB10 Blackwell from baseline to 3K+ tokens/sec was achieved through a series of phases.

Read →
ml-infraAWS

Building Production ML Infrastructure on AWS with Kiro: Part 2 — Multi-Account Landing Zone

Migrating from a single-account setup to a proper multi-account AWS landing zone with Organizations, Control Tower, Tailscale hybrid networking, centralized security, and cost guardrails — all built spec-first with Kiro.

Read →
ml-infraAWS

Building Production ML Infrastructure on AWS with Kiro: Part 1 — Network Foundation

How we used Kiro's spec-driven workflow to go from a rough idea to fully deployed AWS network infrastructure across two environments — VPCs, NAT, CI/CD, and audit logging — in a single session.

Read →
ai-infralocal LLMs

Automated Content Publishing Pipeline with Local LLMs

Built an automated content publishing pipeline using local LLMs that processes raw session notes into structured markdown and publishes them to a blog GitHub repository if they meet publishability criteria.

Read →