Explore by Topic
All postsTime-Series Foundations: Transformers, Diffusion, and Why They're Differentβ Part 1
Before we build anything, we need a clear mental model of what these architectures actually do β not the marketing version, the mechanistic one.
Building Production ML Infrastructure on AWS with Kiro: Part 1 β Network Foundationβ Part 1
How we used Kiro's spec-driven workflow to go from a rough idea to fully deployed AWS network infrastructure across two environments β VPCs, NAT, CI/CD, and audit logging β in a single session.
What Actually Happens When Your Python Calls the GPU?β Part 1
A hands-on journey from a single line of PyTorch code to the silicon on an NVIDIA GB10 Spark β tracing a matrix multiply through the full execution stack and profiling it with Nsight Systems.
Speculative Decoding Latency Optimization on TensorRT-LLM with Llama 3 Models
Tuning speculative decoding using Llama 3 8B as the draft model and Llama 3 70B as the target to reduce decode latency.
Mac Studio Local LLM Benchmark Report
Mac Studio M4 Max overnight benchmark report: token throughput, latency, and memory bandwidth across 12+ LLM models at various quantization levels.
Ask AI
This assistant answers from published Mikamirai posts and topic metadata. No general-web answers, no hidden notes.
Starter prompts