All Posts
8 postsRunning Qwen 2.5 Coder Locally with OpenCode: A Private Offline AI Coding Assistant
A complete setup guide for running Qwen 2.5 Coder locally via Ollama and connecting it to OpenCode (an open-source AI coding CLI) to create a private, offline-capable AI pair programmer in your terminal.
Running Qwen 2.5 Coder Locally with OpenCode: A Private AI Coding Assistant
A complete setup guide for running Qwen 2.5 Coder locally via Ollama and connecting it to OpenCode, creating a private, offline-capable AI coding assistant in your terminal.
Speculative Decoding Latency Optimization on TensorRT-LLM with Llama 3 Models
Tuning speculative decoding using Llama 3 8B as the draft model and Llama 3 70B as the target to reduce decode latency.
CUDA Graph Capture for Low-Latency LLM Decode on RTX 4090
Using CUDA graph capture in TensorRT-LLM to eliminate kernel launch overhead during the decode phase of Llama 3 8B on RTX 4090 — benchmarks, trade-offs, and lessons learned.
End-to-End TensorRT-LLM FP8 Inference Optimization on NVIDIA GB10 Blackwell
End-to-end TensorRT-LLM FP8 inference optimization on NVIDIA GB10 Blackwell from baseline to 3K+ tokens/sec was achieved through a series of phases.
Building Production ML Infrastructure on AWS with Kiro: Part 2 — Multi-Account Landing Zone
Migrating from a single-account setup to a proper multi-account AWS landing zone with Organizations, Control Tower, Tailscale hybrid networking, centralized security, and cost guardrails — all built spec-first with Kiro.
Building Production ML Infrastructure on AWS with Kiro: Part 1 — Network Foundation
How we used Kiro's spec-driven workflow to go from a rough idea to fully deployed AWS network infrastructure across two environments — VPCs, NAT, CI/CD, and audit logging — in a single session.
Automated Content Publishing Pipeline with Local LLMs
Built an automated content publishing pipeline using local LLMs that processes raw session notes into structured markdown and publishes them to a blog GitHub repository if they meet publishability criteria.