I Built a Claude Code Agent That Reviews PRs (Here's the Prompt)
Working agent that posts inline review comments and blocks on critical issues. The full system prompt, the GitHub Actions wiring, the failure modes I hit and what I changed.
Long-form write-ups of real things we shipped. Prompts, code, costs, edge cases.
Working agent that posts inline review comments and blocks on critical issues. The full system prompt, the GitHub Actions wiring, the failure modes I hit and what I changed.
Concrete config to run a small open-weight model on rental GPU starting around $20/month. Throughput, latency, cold-start, gotchas.
Five concrete tactics that drove hallucination rate down: chunk dedup, retrieval rerank, refusal prompts, citation forcing, eval-gated deploy. With code.
Dataset, training loop, eval. RTX 3090 vs A100 cost trade. Push to HuggingFace, deploy to vLLM. Numbers, not theory.
I've shipped agents to paying users for 18 months. Here's the honest list of what breaks, why, and the patterns that finally held up.
A flowchart, not a feature comparison. Four questions that pick the right vector store every time.
Production voice pipeline for a small-business phone agent: latency budget, barge-in, fallback, monitoring.
Patterns that didn't churn out: role anchoring, structured I/O, refusal scaffolds, few-shot rotation, output validators, retry-with-correction, lazy elaboration, agent privilege walls.
Same product, same quality, 85% cheaper. The five changes that did it: caching, model cascade, context trimming, batch where possible, eval gates.
When buying matters more than renting. Utilization, fixed-cost lock-in, latency floor, privacy. The math for solo founders.