Automated Content Publishing Pipeline with Local LLMs

Session notes are a graveyard of ideas that never make it out. After a long technical session — debugging a GPU inference stack, tuning a TFT model, wiring together a new pipeline — the notes exist as a loose collection of observations, terminal output, and half-formed thoughts. Getting those into a publishable blog post means context-switching from technical work to writing work, and that context switch happens at exactly the moment when you're most tired and least motivated.

The fix was automation: a pipeline that watches an inbox directory, picks up raw session notes, runs them through two local LLMs, and publishes the result to a blog GitHub repo if the content clears a publishability threshold. No cloud APIs. No manually pasting into a CMS. You drop a file, the pipeline handles the rest.

The Architecture

The pipeline runs as a Docker container with a FastAPI service at its core. An fswatch watcher monitors an inbox directory on the host and fires an HTTP call to the service when new files arrive.

Inside the pipeline, two local models do the work:

Qwen 32B handles structure extraction. Given raw session notes, it pulls out the key narrative — what was learned, what was tried, what the technical insight was — and produces a structured intermediate representation. The 32B parameter count matters here: extracting structure from loosely organized notes requires enough context understanding to infer what's important versus what's incidental.

Llama 3 8B handles the formatting pass. It takes the structured output and produces clean markdown with proper headings, code blocks, and prose flow. A smaller model is fine for this step — it's reformatting against a clear structure, not generating insight.

After formatting, the pipeline runs a cleaning step (removes LLM artifacts, normalizes whitespace), a metadata detection step (infers title, tags, and category from content), and then evaluates against publishability criteria before syncing to the Git repo.

# Build and run the service
docker build -t publish-service .
docker run -d \
  --name publish-service \
  --restart unless-stopped \
  -p 8088:8088 \
  -v "$HOME/Documents/blogs:/app/inbox" \
  -v "$(pwd)/drafts:/app/drafts" \
  -v "$(pwd)/published:/app/published" \
  -v "$(pwd)/processed:/app/processed" \
  -v "/Users/mikamirai/projects/auto-publsh-blog:/Users/mikamirai/projects/auto-publsh-blog" \
  publish-service

# Start the filesystem watcher
bash watcher/watch-blogs.sh

# Check service health
curl http://localhost:8088/health

# Manually trigger a specific file
curl -X POST http://localhost:8088/publish-file \
  -H "Content-Type: application/json" \
  -d '{"path": "inbox/day3.md"}'

The Operational Details That Matter

Git credentials have to live where the commits happen. The pipeline syncs to a GitHub repo, and git operations need SSH keys or credential helpers configured in the environment where the process runs. Getting this wrong — mounting the project but not the git config — produces silent failures where the pipeline reports success but nothing gets pushed. Running git operations inside the container with the host's credential environment explicitly mounted solved this.

fswatch fires multiple events per file write. Most editors write files in multiple steps — save to temp, move to final — which triggers two or three filesystem events for what looks like one file drop. Without debouncing, the pipeline processes the same file multiple times. The fix is a simple state file that records which paths have been processed, combined with a short cooldown window between events.

Parsing the pipeline's JSON response matters. The service returns a structured result for every file it processes — whether it published, rejected, or errored, and why. Consuming this in the watcher script (rather than ignoring the exit code) makes the difference between knowing "the pipeline ran" and knowing "the pipeline rejected this file because it scored below the publishability threshold."

What Actually Gets Published

The publishability check is the quality gate. Not everything that comes out of an LLM processing pass is worth putting on a blog — raw notes about dead ends, debugging sessions that didn't resolve anything interesting, or content too short to be useful all get filtered. The threshold is subjective and has been adjusted manually over time based on what actually turns out to be worth reading.

The pipeline has been running for a few months now. The majority of posts on this site that started as session notes went through it. The main cost: occasionally the extraction step misses the actual insight and focuses on procedural steps instead. That's a prompt tuning problem, and it's ongoing.