Blog — Build write-ups & deep dives

build 2026-05-10

I Built a Claude Code Agent That Reviews PRs (Here's the Prompt)

Working agent that posts inline review comments and blocks on critical issues. The full system prompt, the GitHub Actions wiring, the failure modes I hit and what I changed.

build 2026-05-10

Hosting Open-Source LLMs: vLLM on a $20/mo Box, Real Benchmarks

Concrete config to run a small open-weight model on rental GPU starting around $20/month. Throughput, latency, cold-start, gotchas.

guide 2026-05-10

Building RAG That Doesn't Hallucinate: 5 Tactics That Move the Needle

Five concrete tactics that drove hallucination rate down: chunk dedup, retrieval rerank, refusal prompts, citation forcing, eval-gated deploy. With code.

build 2026-05-10

Fine-Tuning Gemma 3 1B: My Actual Workflow + Costs

Dataset, training loop, eval. RTX 3090 vs A100 cost trade. Push to HuggingFace, deploy to vLLM. Numbers, not theory.

guide 2026-05-10

Why Agentic AI Keeps Failing in Production

I've shipped agents to paying users for 18 months. Here's the honest list of what breaks, why, and the patterns that finally held up.

guide 2026-05-10

Picking a Vector DB in 2026: A Decision Framework

A flowchart, not a feature comparison. Four questions that pick the right vector store every time.

build 2026-05-10

Voice Agents at Scale: ElevenLabs + Twilio + Claude

Production voice pipeline for a small-business phone agent: latency budget, barge-in, fallback, monitoring.

guide 2026-05-10

The 8 Prompt Patterns That Survived My Last 6 Months

Patterns that didn't churn out: role anchoring, structured I/O, refusal scaffolds, few-shot rotation, output validators, retry-with-correction, lazy elaboration, agent privilege walls.

build 2026-05-10

Cost-Engineering an LLM App From $400/day to $60/day

Same product, same quality, 85% cheaper. The five changes that did it: caching, model cascade, context trimming, batch where possible, eval gates.

guide 2026-05-10

Local-First AI: When GPU Rentals Don't Make Sense

When buying matters more than renting. Utilization, fixed-cost lock-in, latency floor, privacy. The math for solo founders.

Builds, deep dives, post-mortems

#All posts

I Built a Claude Code Agent That Reviews PRs (Here's the Prompt)

Hosting Open-Source LLMs: vLLM on a $20/mo Box, Real Benchmarks

Building RAG That Doesn't Hallucinate: 5 Tactics That Move the Needle

Fine-Tuning Gemma 3 1B: My Actual Workflow + Costs

Why Agentic AI Keeps Failing in Production

Picking a Vector DB in 2026: A Decision Framework

Voice Agents at Scale: ElevenLabs + Twilio + Claude

The 8 Prompt Patterns That Survived My Last 6 Months

Cost-Engineering an LLM App From $400/day to $60/day

Local-First AI: When GPU Rentals Don't Make Sense