Weekly AI Review: Claude Sonnet 4.5 advances AI Coding and Agent Development

PLUS - Karpathy Explores Animals vs Ghosts in AI Development

Oct 06, 2025

Welcome back to AlteredCraft’s Delta Notes! Thank you for your continued support as we curate the latest developments in AI and coding. Big week for Anthropic! This edition spotlights Claude Sonnet 4.5’s impressive leap in autonomous coding capabilities, achieving 77.2% on SWE-bench Verified, while Karpathy offers fascinating insights on whether we’re building “animals or summoning ghosts” in AI development. Dive in to discover how these advances are reshaping software development workflows.

In addition to the free weekly Delta Notes. Altered Craft publishes long form, deep dive post on relevant AI topics for software developers, upgrade to a paid subscription. Early adopters get 20% 0ff for the first year.

Get 20% off for 1 year

TUTORIALS & CASE STUDIES

AI-Driven Software Development Lifecycle

Estimated read time: 3 min

This Google workshop teaches developers a structured methodology for partnering with AI agents throughout the professional Software Development Lifecycle. Learn to generate complete Python backends, create unit tests with mocks, deploy Infrastructure as Code using Terraform, and build CI/CD pipelines—all through targeted AI prompts. Transform from manual coder to technical director orchestrating AI tools.

AI Coding Trap: Speed Without Understanding

Estimated read time: 10 min

Chris Loy explores how AI coding agents like Claude Code create a dangerous pattern where developers spend more time understanding AI-generated code than thinking through problems. He compares AI coding agents to lightning-fast junior engineers who need proper management through best practices like modular design, test-driven development, and documentation to avoid technical debt and deliver sustainable software.

Refresh for Github’s AI Agents for Beginners

Estimated read time: Hours; 12 lessons with video and exercises

This 12-lesson GitHub resource received multiple recent commits. The project provides an intro to AI agents and explores various agentic frameworks and design patterns.

Coding Agents Need Document Understanding for Enterprise Apps

Estimated read time: 10 min

LlamaIndex explores why coding agents like Claude Code struggle with enterprise applications that rely heavily on documents. The post presents three approaches to bridge this gap: MCP for document access, CLI tools for document operations, and teaching agents to build agentic document workflows. These solutions enable coding agents to understand PDFs, contracts, and reports, making them more effective for building business applications that process the 90% of enterprise data locked in documents.

Master Designing Agentic Loops for Coding Agents

Estimated read time: 8 min

Simon Willison explores the critical skill of designing agentic loops for coding agents like Claude Code and Codex CLI. He explains how to safely use “YOLO mode” for maximum productivity, select appropriate tools, manage credentials securely, and identify problems suited for agentic solutions. Key applications include debugging, performance optimization, and dependency upgrades.

Cursor’s internal AI Onboarding Guide Goes Public

Estimated read time: 3 min

Cursor has released their internal onboarding guide for non-engineering hires, offering developers a hands-on pathway from zero to deployed project. This public-facing Cursor guide encourages creative experimentation with their AI coding assistant, with featured projects showcased in their Hall of Fame. Perfect for developers exploring AI-powered development workflows beyond traditional tools like GitHub Copilot.

Claude Code Agent SDK - Coding Autonomous Agents

Estimated read time: 10 min

Anthropic renamed Claude Code SDK to Claude Agent SDK, reflecting its broader capabilities beyond coding. The SDK enables developers to build autonomous agents by giving Claude computer access through terminal commands, file operations, and bash scripts. Key features include agentic search, subagents, and context compaction for building finance agents, personal assistants, customer support bots, and research tools using the gather-context → take-action → verify-work loop.

TOOLS

Cognition Rebuilds Devin for Claude Sonnet 4.5

Estimated read time: 8 min

Cognition rebuilt their AI coding agent Devin for Claude Sonnet 4.5, achieving 2x speed and 12% better performance. The new model exhibits context-aware behaviors like proactive note-taking, parallel execution, and self-verification. Key insights include managing “context anxiety” and leveraging the model’s improved judgment for subagent delegation and meta-reasoning.

Claude Sonnet 4.5 advances AI Coding and Agent Development

Estimated read time: 15 min

Chart showing frontier model performance on SWE-bench Verified with Claude Sonnet 4.5 leading

Anthropic launches Claude Sonnet 4.5, achieving state-of-the-art performance on SWE-bench Verified (77.2%) and OSWorld (61.4%). The release includes the Claude Agent SDK, enabling developers to build complex AI agents using Anthropic’s infrastructure. Major upgrades include VS Code extension, checkpoints in Claude Code, and enhanced computer use capabilities for autonomous coding tasks.

Claude Code (2.0) Gets VS Code Extension and Autonomous Features

Estimated read time: 4 min

Anthropic launches a native VS Code extension for Claude Code, bringing AI-powered development directly into IDEs. The update includes checkpoints for autonomous operation, subagents for parallel workflows, and hooks for automated testing. Powered by Sonnet 4.5, developers can now delegate complex refactoring and feature exploration tasks with confidence.

Google Launches Jules API for AI Coding Automation

Estimated read time: 5 min

Google introduces the Jules API, enabling developers to programmatically control their AI coding assistant. The API allows building custom integrations like automated bug fixes from Slack and backlog triage. With simple concepts like Source, Session, and Activity, developers can create asynchronous coding agents that handle complex development tasks. Early access includes comprehensive documentation and a Discord community for feedback.

NEWS & EDITORIALS

AI Village Reveals Model Performance Patterns

Estimated read time: 8 min

A 24-week multi-agent experiment reveals distinct performance patterns: Claude models dominate task execution and goal achievement, while GPT models excel at linguistic style. The findings align with real-world usage where Claudes are preferred for coding and agentic tasks, offering valuable insights for developers selecting models for RAG and multi-LLM agent systems.

AI Progress Follows Exponential Growth Despite Skepticism

Estimated read time: 8 min

New benchmarks from METR and OpenAI reveal AI models are achieving exponential improvements in autonomous task completion, with latest models handling 2+ hour programming tasks and approaching human expert performance across 44 occupations. Conservative projections suggest models will match human experts by 2026, offering developers unprecedented opportunities for AI integration.

Claude Sonnet 4.5’s Secret Sauce for Building Complex Apps

Estimated read time: 8 min

Carlos E. Perez analyzes leaked Claude Sonnet 4.5 system prompts revealing how the AI autonomously builds Slack-like applications over 30 hours. Key patterns include forcing code into durable artifacts, iterative update workflows, runtime constraints, and self-orchestration capabilities enabling 10,000+ lines of coherent code generation.

AI Agents Now Perform Real Economic Work

Estimated read time: 8 min

OpenAI’s new benchmark shows AI agents can complete expert-level tasks averaging 4-7 hours, nearly matching human performance. Claude successfully replicated complex economics research autonomously, demonstrating agents’ ability to handle sophisticated coding and analysis tasks. While agents excel at specific tasks, the key challenge for developers is thoughtfully integrating these capabilities without drowning in unnecessary AI-generated content.

Claude Code Revolutionizes AI Development with Filesystem Access

Estimated read time: 10 min

Developer Noah Hein reveals how Claude Code’s unique combination of filesystem access and Unix philosophy creates a powerful “agentic operating system” for AI development. Unlike browser-based tools, Claude Code enables persistent memory and state management, transforming how developers build AI applications. Hein demonstrates practical implementations including Claudesidian for note-taking automation and an email management system, showcasing how simple, composable tools outperform complex multi-agent architectures.

Karpathy Explores Animals vs Ghosts in AI Development

Estimated read time: 8 min

Andrej Karpathy analyzes Richard Sutton’s critique of LLMs, exploring whether current AI systems truly follow the “bitter lesson” of leveraging computation. He distinguishes between animals (pure reinforcement learning agents) and ghosts (human-data-trained LLMs), suggesting frontier AI research focuses on “summoning ghosts” rather than building animal-like intelligence. This philosophical divide has practical implications for how developers approach AI system design.

Altered Craft

Discussion about this post

Ready for more?