All Posts
20 postsPart 1 —Time-Series Foundations: Transformers, Diffusion, and Why They're Different
Before we build anything, we need a clear mental model of what these architectures actually do — not the marketing version, the mechanistic one.
Part 2 —Data Pipeline: NASDAQ-100, Options Chain, and Sentiment
How to align three different data sources into one consistent daily dataset — and why every alignment decision is also a leakage decision.
Part 3 —Feature Engineering: From OHLCV to a Rich Feature Matrix
What each feature actually measures, why it might matter for predicting NASDAQ-100 returns, and how to compute it without accidentally injecting future information.
Part 4 —The Standard Transformer: Architecture, Training, and Why It Collapsed
We built it. We ran it. It predicted the same thing for every stock. Here's exactly what happened and what it reveals about applying transformers to financial data.
Part 5 —Temporal Fusion Transformer: Gating, Memory, and Sector Learnability
The architecture that finally worked — and a precise explanation of why each component addresses a specific failure mode of the Standard Transformer on financial data.
Part 6 —Validation Discipline: Catching Leakage, Rank IC, and the Economic Viability Wall
The step most AI trading projects skip — and the reason most of them fail when they reach real capital. A rigorous validation framework for quantitative ML.
Part 7 —LightGBM as Signal Engine: Feature Sets, Clean Candidates, and OOS Testing
The model that finally survived adversarial validation — and why gradient boosting often beats deep learning on structured financial data.
Part 8 —System Architecture: Signal → Strategy → Execution
A statistically real signal is not yet a tradeable system. This is the architecture that converts +0.08 Rank IC into a deployable paper-trading framework — and why the layers between prediction and profit are harder than the model itself.
Building a Quantitative Trading System — Series Index
A practitioner's journey: from raw market data through Temporal Fusion Transformers, LightGBM, signal validation, and live trading architecture. All concepts defined, all mistakes documented.
What Actually Happens When Your Python Calls the GPU? — Part 2 of a Series
What Actually Happens When Your Python Calls the GPU?
A hands-on journey from a single line of PyTorch code to the silicon on an NVIDIA GB10 Spark — tracing a matrix multiply through the full execution stack and profiling it with Nsight Systems.
Evaluating Inflight Batching and Scheduler Policies in TensorRT-LLM Triton Backend
Configured inflight batching in TensorRT-LLM Triton backend for Llama 3 8B model, tuning parameters to optimize GPU utilization and balance latency and throughput under bursty loads.
AWQ vs. FP8 Quantization: Balancing Accuracy and Throughput for Llama 3 70B
Comparing AWQ INT4 and FP8 quantization methods on the Llama 3 70B model across accuracy benchmarks (MMLU, HumanEval, GSM8K, MT-Bench) and inference throughput measurements on an H100 GPU.
Running Qwen 2.5 Coder Locally with OpenCode: A Private AI Coding Assistant
A complete setup guide for running Qwen 2.5 Coder locally via Ollama and connecting it to OpenCode, creating a private, offline-capable AI coding assistant in your terminal.
Speculative Decoding Latency Optimization on TensorRT-LLM with Llama 3 Models
Tuning speculative decoding using Llama 3 8B as the draft model and Llama 3 70B as the target to reduce decode latency.
CUDA Graph Capture for Low-Latency LLM Decode on RTX 4090
Using CUDA graph capture in TensorRT-LLM to eliminate kernel launch overhead during the decode phase of Llama 3 8B on RTX 4090 — benchmarks, trade-offs, and lessons learned.
End-to-End TensorRT-LLM FP8 Inference Optimization on NVIDIA GB10 Blackwell
End-to-end TensorRT-LLM FP8 inference optimization on NVIDIA GB10 Blackwell from baseline to 3K+ tokens/sec was achieved through a series of phases.
Building Production ML Infrastructure on AWS with Kiro: Part 2 — Multi-Account Landing Zone
Migrating from a single-account setup to a proper multi-account AWS landing zone with Organizations, Control Tower, Tailscale hybrid networking, centralized security, and cost guardrails — all built spec-first with Kiro.
Building Production ML Infrastructure on AWS with Kiro: Part 1 — Network Foundation
How we used Kiro's spec-driven workflow to go from a rough idea to fully deployed AWS network infrastructure across two environments — VPCs, NAT, CI/CD, and audit logging — in a single session.
Automated Content Publishing Pipeline with Local LLMs
Built an automated content publishing pipeline using local LLMs that processes raw session notes into structured markdown and publishes them to a blog GitHub repository if they meet publishability criteria.