All Posts
66 postsTime-Series Foundations: Transformers, Diffusion, and Why They're Different— Part 1
Before we build anything, we need a clear mental model of what these architectures actually do — not the marketing version, the mechanistic one.
Data Pipeline: NASDAQ-100, Options Chain, and Sentiment— Part 2
How to align three different data sources into one consistent daily dataset — and why every alignment decision is also a leakage decision.
Feature Engineering: From OHLCV to a Rich Feature Matrix— Part 3
What each feature actually measures, why it might matter for predicting NASDAQ-100 returns, and how to compute it without accidentally injecting future information.
The Standard Transformer: Architecture, Training, and Why It Collapsed— Part 4
We built it. We ran it. It predicted the same thing for every stock. Here's exactly what happened and what it reveals about applying transformers to financial data.
Temporal Fusion Transformer: Gating, Memory, and Sector Learnability— Part 5
The architecture that finally worked — and a precise explanation of why each component addresses a specific failure mode of the Standard Transformer on financial data.
Validation Discipline: Catching Leakage, Rank IC, and the Economic Viability Wall— Part 6
The step most AI trading projects skip — and the reason most of them fail when they reach real capital. A rigorous validation framework for quantitative ML.
LightGBM as Signal Engine: Feature Sets, Clean Candidates, and OOS Testing— Part 7
The model that finally survived adversarial validation — and why gradient boosting often beats deep learning on structured financial data.
System Architecture: Signal → Strategy → Execution— Part 8
A statistically real signal is not yet a tradeable system. This is the architecture that converts +0.08 Rank IC into a deployable paper-trading framework — and why the layers between prediction and profit are harder than the model itself.
Building a Quantitative Trading System — Series Index
A practitioner's journey: from raw market data through Temporal Fusion Transformers, LightGBM, signal validation, and live trading architecture. All concepts defined, all mistakes documented.
Building Production ML Infrastructure on AWS with Kiro: Part 1 — Network Foundation— Part 1
How we used Kiro's spec-driven workflow to go from a rough idea to fully deployed AWS network infrastructure across two environments — VPCs, NAT, CI/CD, and audit logging — in a single session.
Building Production ML Infrastructure on AWS with Kiro: Part 2 — Multi-Account Landing Zone— Part 2
Migrating from a single-account setup to a proper multi-account AWS landing zone with Organizations, Control Tower, Tailscale hybrid networking, centralized security, and cost guardrails — all built spec-first with Kiro.
What Actually Happens When Your Python Calls the GPU?— Part 1
A hands-on journey from a single line of PyTorch code to the silicon on an NVIDIA GB10 Spark — tracing a matrix multiply through the full execution stack and profiling it with Nsight Systems.
What Actually Happens When Your Python Calls the GPU? — Part 2— Part 2
Fixing the empty Nsight profile from Part 1 — running nsys inside the container, reading real CUDA API data, and finding the exact kernel name executing on the GB10 silicon.
3,000 Tokens Per Second on Blackwell: A TensorRT-LLM FP8 Optimization Journey
From FP16 baseline to 3,030 tok/s on the NVIDIA GB10 Blackwell — a step-by-step account of what actually moved the needle and what didn't.
Speculative Decoding Latency Optimization on TensorRT-LLM with Llama 3 Models
Tuning speculative decoding using Llama 3 8B as the draft model and Llama 3 70B as the target to reduce decode latency.
CUDA Graph Capture for Low-Latency LLM Decode on RTX 4090
Using CUDA graph capture in TensorRT-LLM to eliminate kernel launch overhead during the decode phase of Llama 3 8B on RTX 4090 — benchmarks, trade-offs, and lessons learned.
Evaluating Inflight Batching and Scheduler Policies in TensorRT-LLM Triton Backend
Configured inflight batching in TensorRT-LLM Triton backend for Llama 3 8B model, tuning parameters to optimize GPU utilization and balance latency and throughput under bursty loads.
AWQ vs. FP8 Quantization: Balancing Accuracy and Throughput for Llama 3 70B
Comparing AWQ INT4 and FP8 quantization methods on the Llama 3 70B model across accuracy benchmarks (MMLU, HumanEval, GSM8K, MT-Bench) and inference throughput measurements on an H100 GPU.
Automated Content Publishing Pipeline with Local LLMs
How I built a pipeline that turns raw session notes into structured markdown blog posts using local LLMs — from fswatch trigger to GitHub push, fully automated.
Running Qwen 2.5 Coder Locally with OpenCode: A Private AI Coding Assistant
A complete setup guide for running Qwen 2.5 Coder locally via Ollama and connecting it to OpenCode, creating a private, offline-capable AI coding assistant in your terminal.
anandus.ai
Technical writeup of the anandus.ai platform: RAG-based portfolio intelligence, Azure OpenAI integration, and multi-repo indexing.
Mac Studio Local LLM Benchmark Report
Mac Studio M4 Max overnight benchmark report: token throughput, latency, and memory bandwidth across 12+ LLM models at various quantization levels.
anandus.ai — Technical Deep Dive: AI Profile Portfolio with RAG & MCP
A collection of blog posts covering AI infrastructure, RAG pipelines, trading signal governance, and production ML systems.
Local Agentic Control Plane Runbook
Operational runbook for the local agentic control plane: starting services, monitoring agent health, debugging tool calls, and log inspection.
Top 10 Fundamental Growth Stocks for 2026 - MikaMirAI Trading Research
Top 10 fundamental growth stocks for 2026: screening methodology, financial ratios, and sector analysis using quantitative filters.
Semiconductors TFT Ranking Engine Strategy
SEMIS ranking engine strategy: semiconductor sector rotation using TFT confidence scores, momentum filters, and risk-adjusted position weights.
SEMICONDUCTORS TFT RANKING ENGINE — EXECUTION PLAYBOOK
TFT execution playbook: trade entry rules, position sizing, stop-loss logic, and SEMIS sector rotation signals from the NDX100 TFT model.
Building a Local Agentic Control Plane with Mac Mini, Mac Studio, Ollama, Qwen, Matrix, and iPhone Approval
How to build a local agentic control plane: architecture, MCP server integration, and self-hosted AI orchestration patterns.
Day 7 Onward — Market Regime Intelligence Roadmap
Day 7 regime intelligence roadmap: planned enhancements to market regime detection, cross-asset signals, and adaptive position sizing.
Financial Sentiment Models, Phases, GDELT, and Industry Usage
Financial sentiment model phases and industry notes: FinBERT fine-tuning, sector-specific adjustments, and real-time scoring architecture.
LDTM v2 — GPU Performance Analysis & Optimization Guide
LDTM v2 performance analysis: walk-forward returns, Sharpe ratios, maximum drawdown, and regime-conditional signal quality metrics.
I Built an AI That Watches 103 Stocks Simultaneously — Here's What It Found
Overview of the LDTM v2 trading system: LSTM-based temporal models, walk-forward validation, and live deployment architecture.
LDTM v2 — Operations Runbook
Operational runbook for LDTM v2: daily data refresh, model retraining schedule, signal generation pipeline, and Azure deployment steps.
Cross-Asset TFT Model: Mathematical Walkthrough and Data Flow
Math walkthrough of the cross-asset Temporal Fusion Transformer: variable selection networks, gated residuals, and multi-horizon attention.
Codex LD-TM LSTM Implementation, Architecture, Model Design, and Runbook
Step-by-step implementation runbook for the LDTM LSTM model: data prep, training configuration, Azure ML integration, and deployment checklist.
Codex Modular Signal Architecture Requirements
Requirements for the baseline ensemble model: LightGBM, gradient boosting, feature importance, stacking architecture, and benchmark targets.
From Daily Bars to Market Regime Intelligence on a DGX
Building a market regime intelligence layer: detecting bull, bear, and sideways markets using unsupervised classification on price features.
Trading Database — Access Runbook
Runbook for accessing the DGX trading PostgreSQL database: connection methods, read-only credentials, query examples, and maintenance procedures.
ElasticNet Dev-Test-Prod Promotion Design
Dev-test-prod promotion pipeline design for Azure ML workloads: environment isolation, artifact promotion, gate criteria, and rollback strategy.
Attack Vector Catalog — anandus.ai
Attack vectors for AI-powered trading and agentic systems: prompt injection patterns, model extraction risks, and defense strategies.
Requirements Document
Requirements for the AWS AI governance control plane: model registry, audit logging, policy enforcement, and responsible AI controls.
Design Document: NVIDIA AWS GPU Certification Study System
Design specification for the NVIDIA AWS GPU certification environment: P3 instance setup, deep learning AMI, benchmarking, and cost controls.
Agentic AI Security Requirements — anandus.ai
Security requirements for agentic AI systems: capability boundaries, tool use restrictions, memory isolation, and adversarial input handling.
Requirements Document
Requirements for the NVIDIA AWS GPU certification project: compute targets, software stack, test harness, and acceptance criteria.
Requirements Document
Requirements specification for the MM AWS multi-account landing zone: OU structure, SCPs, Control Tower guardrails, and account vending.
mikamirAI Deployment Runbook
Deployment runbook for the MikaMirAI SPA: Azure Static Web Apps, GitHub Actions CI/CD pipeline, custom domain, and rollback procedures.
Design Document: MM AWS Network Infrastructure
AWS network infrastructure design: dual-VPC layout, subnet allocation, NAT strategy, Tailscale mesh routing, and peering topology.
Requirements Document
Formal requirements for the MM AWS network infrastructure: VPC CIDRs, subnet tiers, NAT instance vs gateway decision, and Tailscale connectivity.
Building ML Infrastructure on AWS with Kiro: Part 2 — Multi-Account Landing Zone
Deploying the MM AWS multi-account landing zone: Control Tower setup, OU hierarchy, SCP guardrails, and account factory automation.
LDTM v2 — Complete Architecture & Implementation Reference
LDTM v2 architecture and implementation: bidirectional LSTM with attention for multi-asset return prediction and risk-adjusted signal generation.
AI Trading Strategies for 2025: A Comprehensive Guide - MikaMirAI Trading Research
AI-driven trading strategies for 2024: applying machine learning to equity signal generation, regime detection, and risk management.
NDX-100 v2: Regime-Aware Cross-Sectional Ranking System
NDX100 v2 architecture: Temporal Fusion Transformer ensemble for NASDAQ-100 signal generation, feature engineering pipeline, and serving design.
Building ML Infrastructure on AWS with Kiro: Part 1 — Network Foundation
Building the MM AWS network infrastructure: VPC design, subnet layout, NAT instance vs gateway trade-offs, and Tailscale mesh connectivity.
AutoTransformer — Kiro-Style Design Document
KIRO design for the AutoTransformer project: automated architecture search over temporal Transformer variants for financial time series.
Design Document: AI-Powered Profile Portfolio
Design specification for the AI-powered portfolio site: multi-repo RAG indexing, semantic search, visitor analytics, and Azure Functions backend.
Design Document: Multi-Repo GitHub Documentation Indexing
Multi-repo indexing design: chunking strategy, embedding pipeline, cross-repo retrieval scoring, and incremental update protocol.
Portfolio Stories — DGX Trading System
Portfolio stories from the DGX trading system: building production quant infrastructure, lessons learned, and system evolution.
Welcome to MikaMirAI - The Future of Trading Research - MikaMirAI Trading Research
Welcome to MikaMirAI — the intersection of quantitative trading, AI infrastructure, GPU computing, and production ML systems.
Project: LDTM — Long-Term Deep Temporal Model
Project spotlight: LDTM — Long-duration Temporal Model for multi-asset return prediction and systematic trading signals.
What Actually Happens When Your Python Calls the GPU?
GPU zero-to-one: tracing a single PyTorch matrix multiply from Python through CUDA all the way to NVIDIA silicon, profiled with Nsight Systems.
Project: AutoTransformer — Bounded Self-Improving Architecture Search
Project spotlight: AutoTransformer — automated Transformer architecture search for financial time series forecasting.
Project: Sentiment Engine — Financial NLP Research Program
Project spotlight: the DGX Sentiment Engine — FinBERT-based news sentiment pipeline for real-time trading signal generation.
Anand Ranganathan — AI Architect Lead
Professional profile: quantitative developer building AI trading infrastructure, GPU computing systems, and production ML pipelines.