Weekly review: Spec-Driven Development Tools: Promise vs Reality

PLUS - Claude Haiku 4.5: Frontier Performance at Budget Pricing

Oct 20, 2025

Greetings from Altered Craft’s weekly AI review for developers. We deeply appreciate you joining us each week as we navigate AI’s rapid evolution together. This week brings a reality check on spec-driven development, hard-won lessons from developers managing multiple AI agents in production, and critical security patterns for protecting autonomous systems. From Microsoft’s edge AI training to OpenAI’s workforce transformation blueprint, we’re seeing the ecosystem mature beyond hype into practical implementation challenges and solutions.

TUTORIALS & CASE STUDIES

Microsoft’s Complete Edge AI Training Program

Estimated read time: 3 min

Microsoft’s comprehensive course teaches developers how to deploy Small Language Models achieving 85% speed improvements and 75% size reductions on edge devices. The eight-module curriculum covers everything from SLM fundamentals (Phi, Qwen, Gemma families) to production deployment with 10 hands-on samples including multi-agent orchestration, RAG pipelines, and intelligent model routing. Master privacy-preserving AI that runs locally on mobile, IoT, and embedded systems.

Spec-Driven Development Tools: Promise vs Reality

Estimated read time: 4 min

An illustration of the three observed levels of SDD, in 2 columns of “Creation of feature” and “Evolution and maintenance of feature”, each level shown in a row. Spec-first: Spec documents lead to code, both specs and code are marked with a robot and human icon, to show that both AI and humans are editing specs and code. Then after creation of feature, the specs are deleted, and during evolution a new spec is created that describes the change. Next row is spec-anchored, shows the same as spec-first, but the spec is not deleted after creation, instead it gets edited during evolution. Final row is spec-as-source, same as spec-anchored, but the human icon is crossed out for the code files, because humans here do not edit the code. All three concepts are connected with inheritance arrows (arrow with a head that is not filled with color), because they build up on top of each other.

Thoughtworks engineer Birgitta Böckeler examines three spec-driven development tools—Kiro, spec-kit, and Tessl—revealing critical gaps between their marketing and real-world utility. While the spec-first principle proves valuable, these tools often create verbose markdown overhead that’s harder to review than code itself, with agents frequently ignoring instructions despite elaborate workflows. I explored similar concerns in this post about spec-kit and context overload. Learn from MDD’s history to avoid inflexibility coupled with LLM non-determinism.

Securing AI Agents with OAuth 2.0 Patterns

Estimated read time: 3 min

Shifting to a critical implementation concern, LangChain’s practical guide explains implementing authentication and authorization for AI agents using three essential OAuth flows: Auth Code for user-facing services, On-Behalf-Of tokens to limit agents to user permissions, and Client Credentials for autonomous operations. Agents accessing numerous services with fluid access needs require centralized RBAC frameworks and consolidated audit logging. Properly securing agents before they take action on behalf of users is non-negotiable.

Building Production AI Agents That Actually Work

Estimated read time: 4 min

This McKinsey-backed framework reveals why successful AI agents depend on workflow understanding, not technical sophistication. Map four layers—surface procedures, operational reality, contextual intelligence, and cultural DNA—before writing prompts. Your evaluation framework IS your strategy: encoding top performer expertise creates an uncopiable moat. Build hyper-specific solutions for one team first to make their day 10x better, then extract patterns; premature generalization guarantees mediocrity.

Agentic Engineering: Managing 8 Parallel AI Coders

Estimated read time: 5 min

Taking these agent concepts to production scale, developer Peter Steinberger shares battle-tested workflows for using 3-8 parallel agents building a 300k LOC TypeScript project. The “blast radius” concept determines whether to deploy many small changes or one large one for atomic commits. Run agents in terminal grids, interrupt mid-execution for status checks, and leverage GPT-5-codex for optimal intelligence-speed balance. Agentic engineering now writes 100% of his code while maintaining control through active monitoring and strategic intervention.

TOOLS

Claude Haiku 4.5: Frontier Performance at Budget Pricing

Estimated read time: 2 min

Described by Haiku 4.5: A whimsical illustration of a bird with a round tan body, pink beak, and orange legs riding a bicycle against a blue sky and green grass background.

Anthropic’s new model delivers coding performance matching five-month-old Sonnet 4 at one-third the cost ($1/$5 vs $3/$15 per million tokens) and twice the speed. With 200K context, 64K output tokens (up from 8K), and explicit context-awareness training to reduce agentic laziness, Haiku 4.5 democratizes frontier-level AI. The February 2025 knowledge cutoff provides more current framework information than previous models.

Anthropic Skills: Composable AI Capabilities

Estimated read time: 2 min

In more news regarding Claude, the new Skills feature lets you create custom folders with instructions, scripts, and executable code that load contextually across Claude apps, Claude Code, and API. Skills are composable (automatically stack together), portable (same format everywhere), and powerful (include deterministic code for tasks where programming beats token generation). Use the /v1/skills endpoint for programmatic management or try Anthropic-provided skills for Excel, PowerPoint, and PDF generation.

FastAPI MCP: RESTful AI Agent Integration

Estimated read time: 2 min

This repository provides Model Context Protocol server implementation using FastAPI, enabling developers to expose existing FastAPI applications as MCP-compatible services for AI agents. Leverage FastAPI’s familiar patterns, automatic OpenAPI documentation, and async capabilities to build custom tools accessible to AI coding assistants. Bridge your REST APIs with the emerging MCP ecosystem using Python’s most popular web framework.

Gemini CLI Extension Marketplace: 101 MCP Integrations

Estimated read time: 2 min

In the MCP ecosystem, the newly released Gemini CLI extensions marketplace offers 101 Model Context Protocol integrations including GitHub’s official MCP server, Chrome DevTools for browser automation, and MCP Toolbox supporting 30+ databases. Popular extensions span Terraform for IaC, Grafana monitoring, Stripe payments, and Redis management. Personalize your AI-powered command line by connecting favorite tools through standardized MCP interfaces for coding, infrastructure, and service integration.

OpenSpec: Lightweight Spec-Driven Development

Estimated read time: 2 min

Addressing the concerns raised in the earlier spec-driven development analysis, OpenSpec aligns humans and AI through structured specifications requiring no API keys or complex setup. Create change folders (proposals, tasks, spec updates) that keep scope explicit and auditable in existing codebases. Unlike heavier SDD tools, OpenSpec works brownfield-friendly with custom slash commands and context rules across Claude Code, Cursor, and Windsurf. Lock intent before implementation without workflow overhead.

Awesome Claude Code: Curated Productivity Resources

Estimated read time: 3 min

This extensive community-driven collection curates slash commands, Skills, CLI tools, and CLAUDE.md templates for Claude Code workflows. Warning, this is quite the rabbit hole, but well worth it.

NEWS & EDITORIALS

Reconsidering AI with Steve Klabnik

Watch time: 50 min

Developer Steve Klabnik discusses critical perspectives on AI’s role in software development in this video conversation. Gain philosophical viewpoints balancing AI tool enthusiasm with thoughtful criticism. Understanding experienced developer perspectives helps navigate long-term implications of AI-assisted development practices and avoid uncritical adoption of emerging technologies.

After the AI Boom: Infrastructure That Lasts?

Estimated read time: 3 min

Moving from developer perspectives to infrastructure concerns, this analysis compares today’s AI build-out to the dotcom bubble, questioning whether proprietary AI infrastructure will leave durable value like TCP/IP and HTTP did. GPUs with 1-3 year lifespans and vendor-coupled data centers differ from multi-decade open standards. Best case: surplus capacity drives prices down like post-dotcom bandwidth glut. Worst case: silent cathedrals of obsolete, specialized silicon. Diversify now to avoid vendor lock-in.

OpenAI’s Workforce Transformation Blueprint

Estimated read time: 10 min (PDF)

OpenAI’s 12-page blueprint argues AI is currently more enabler than replacer, summarizing usage data from 1.5M ChatGPT conversations and introducing the GDPval evaluation showing GPT‑5‑level systems match or exceed professionals on about half of economically valuable tasks. It highlights bottom‑up workplace adoption, potential early‑career displacement risks, and prescribes actions: expand AI education and certifications, small‑business starter kits and incentives, worker-led training pathways, an OpenAI Jobs Platform for matching talent, and community AI Talent Hubs to coordinate cross‑sector upskilling and data-driven placement.

Where Developers Actually Want AI Support

Estimated read time: 3 min

Returning to the developer experience, research studying 860 developers reveals psychological patterns determining AI tool acceptance across different work contexts. Developers welcome AI for toil and boilerplate but demand human oversight for high-stakes decisions and resist automation in mentoring. Even when AI can technically perform tasks, cognitive appraisals around value, identity, and accountability shape adoption. Deploy AI where developers identify pain points, not blanket automation.

Altered Craft

Discussion about this post

Ready for more?