Blog
Behind the Myth: Can One API Call Rule Them All? (Part 1/2)
OpenRouter's Fusion plugin collapses a three-judge deliberation panel into a single model call. Part 1 puts our old and new checker agents head to head on speed, and asks whether one call is really faster than three.
Mission defines strategy, and strategy defines structure
How our five phase pipeline revealed a bigger image on its own, and what that teaches us about context engineering across agent boundaries
Robustness through meaning - one triple at a time
Exploring how ontologies cross-validate with LLMs, making robust failure detection in agentic systems. Why this is a different approach than Palantir's operational ontology.
Can Your Prompts Optimize Themselves?
Exploring how DSPy's declarative approach to prompt engineering replaces hand-crafted templates with Bayesian-optimized programs and what happens when you apply it to a real failure detection pipeline.
Are Your AI Agents Reliable?
Exploring how frameworks like τ²-bench and Pydantic Evals are shaping the science of evaluating AI agent reliability in production.
Showing 5 of 5 posts