Weekly review: RAPTOR: Autonomous Security Research Framework, OpenCode, and much more

The week's curated set of relevant AI Tutorials, Tools, and News

Dec 08, 2025

Welcome back to Altered Craft’s weekly AI review for developers. Thanks for making this newsletter part of your week. This edition carries a strong Anthropic theme: they acquired Bun, published internal data on how their engineers use Claude, and researchers extracted Claude’s “soul document” from model weights. Beyond the Anthropic news, you’ll find practical guides on AI coding workflows and OpenRouter’s analysis of 100 trillion tokens showing programming now dominates LLM usage.

TUTORIALS & CASE STUDIES

Product Evals in Three Simple Steps

Estimated read time: 12 min

Eugene Yan presents a practical framework for building reliable LLM evaluation systems using binary labeling, single-dimension evaluators, and automated harnesses. Learn why “God Evaluators” fail and why investing four weeks in evaluation infrastructure enabled hundreds of experiments that manual review made impossible.

The opportunity: Teams struggling with LLM quality can move from gut-feel assessments to systematic iteration—the infrastructure investment pays back quickly when you can run experiments at scale.

The 75/25 Rule for AI-Assisted Development

Estimated read time: 5 min

The three phases of AI-assisted coding: Planning, Implementation Plan, and Implementation

Randal Olson, Ph.D., argues that planning should consume 75% of your AI coding workflow, with implementation just 25%. His three-phase approach shifts heavy thinking to high-reasoning models while fast, cheap models handle execution. When coding gets difficult, it signals planning needs strengthening.

Key point: This reframes AI coding struggles productively—difficulty during implementation becomes a signal to improve your planning, not a limitation of the tools.

Mastering Vibe Coding Without Sacrificing Code Quality

Estimated read time: 8 min

This guide teaches developers how to leverage AI code generation while maintaining production standards. Covers five critical gaps in AI-generated code—context awareness, integration, security, testing, and operability—plus a comprehensive review checklist for experienced developers.

What this enables: A concrete framework for reviewing AI-generated code systematically, helping you catch common issues before they reach production.

Cursor’s Official Guide for Non-Engineers

Estimated read time: 15 min

Cursor published an internal onboarding guide walking newcomers from installation through deployment. Originally created for GTM and non-engineering hires, it covers setup, project building, GitHub integration, and Vercel deployment in six steps.

Worth noting: This signals how far AI-assisted development has progressed. A resource you can share with non-technical teammates who want to build their own tools.

Writing Effective CLAUDE.md Files

Estimated read time: 6 min

Shifting from IDE tools to configuration, HumanLayer reveals that Claude Code can reliably follow only 150-200 instructions, with system prompts already consuming ~50. Their recommendation: keep CLAUDE.md under 300 lines, use progressive disclosure via separate files, and focus on universal instructions.

The takeaway: Less is genuinely more here. Bloated configuration files get ignored uniformly, so ruthless prioritization of your most important instructions improves results.

Google’s Context Engineering for Multi-Agent Systems

Estimated read time: 14 min

Agent Development Kit: Making it easy to build multi-agent applications

Google’s Agent Development Kit introduces context engineering as a discipline for scaling AI agents beyond simple context window expansion. The framework separates working context, sessions, memory, and artifacts—treating context as “a compiled view over a richer stateful system.”

Why now: As agent systems grow more complex, understanding how to architect context becomes essential—this provides vocabulary and patterns for production-grade multi-agent design.

TOOLS

OpenCode: Open Source AI Coding Agent

Estimated read time: 3 min

OpenCode is a free, terminal-based AI coding agent supporting 75+ LLM providers including local models. Run multiple agents in parallel, share sessions, and maintain strict privacy—no account required. With 35,000 GitHub stars, it’s emerging as a compelling open alternative.

What’s interesting: Claude Code like expereince, 75+ models.

Claude Code Agents Plugin

Estimated read time: 6 min

For Claude Code users specifically, a comprehensive plugin provides 85 specialized agents, 15 multi-agent orchestrators, 47 skills, and 44 tools across 63 plugins. Features hybrid Haiku/Sonnet orchestration and granular loading averaging just 3.4 components per plugin.

The context: Claude Code’s extensibility ecosystem is maturing rapidly. This marketplace lets you compose specialized capabilities for complex workflows like full-stack feature development.

UI UX Pro Max: Design Intelligence for Coding Assistants

Estimated read time: 4 min

This AI skill provides searchable design databases for Claude Code, Cursor, and Windsurf. Includes 57 UI styles, 95 color palettes by industry, 56 font pairings, and 98 UX guidelines. Supports React, Vue, SwiftUI, and Flutter with natural language queries.

Why this matters: Bridges the gap between “I know what I want it to look like” and actually getting stylistically coherent code—useful for developers who aren’t design-focused.

Mistral 3: Frontier Performance with Apache 2.0 Licensing

Estimated read time: 5 min

Mistral released a comprehensive model family including Mistral Large 3 (41B active parameters, 675B total), ranking #2 among open-source non-reasoning models on LMArena. Apache 2.0 licensing, image understanding, and 40+ language support.

The opportunity: Frontier-class performance under permissive licensing opens possibilities for production deployments where proprietary API dependencies are a concern.

RAPTOR: Autonomous Security Research Framework

Estimated read time: 7 min

Applying these tools to security workflows, RAPTOR combines Claude Code with security tools for autonomous vulnerability discovery, analysis, and remediation. Nine expert personas, integrated static analysis and fuzzing, plus conversational commands like /scan and /analyze for CI/CD integration.

What this enables: Security workflows that previously required deep expertise become more accessible—useful for teams without dedicated security researchers.

NEWS & EDITORIALS

OpenRouter’s 100 Trillion Token Analysis of Real AI Usage

Estimated read time: 18 min

OpenRouter’s analysis reveals programming now drives 50%+ of LLM queries, up from 11%, with prompts expanding 4x to 6K average tokens. Open-source models hold ~30% market share. Key finding: early adopters show exceptional retention via the “glass slipper” effect.

Why this matters: Real usage data from 100 trillion tokens reveals where the industry is actually headind, programming-centric workflows are now the dominant use case, not a niche.

OpenAI Declares ‘Code Red’ as Google Closes Gap

Estimated read time: 3 min

Sam Altman issued an internal “code red,” delaying ads, shopping agents, and Pulse assistant to focus on ChatGPT fundamentals—speed, reliability, and personalization. Google’s Gemini 3 surpassed competitors on benchmarks, completing a full-circle moment from Google’s post-ChatGPT code red.

The context: Intensifying competition at the frontier means faster improvements across all major platforms. The pace of capability advancement isn’t slowing down.

Bun Runtime Acquired by Anthropic

Estimated read time: 3 min

Meanwhile at Anthropic, the company acquired Bun, betting on the fast JavaScript runtime as infrastructure powering Claude Code, Claude Agent SDK, and future AI coding products. Signals commitment to building comprehensive AI development tools rather than relying on external infrastructure.

Worth watching: Deeper Bun integration across Claude’s coding tools could meaningfully improve execution speed and developer experience for JavaScript/TypeScript workflows.

How AI Is Transforming Work at Anthropic

Estimated read time: 10 min

Anthropic surveyed 132 engineers and researchers, finding Claude usage jumped from 28% to 59% of work with a 50% productivity boost. Engineers report becoming “full-stack,” handling tasks outside their expertise—though mentorship patterns are shifting as Claude becomes “the first stop for questions.”

Key point: First-party data on how AI-native teams are evolving their workflows. Useful for understanding where team dynamics may shift as AI adoption deepens.

Claude 4.5 Opus Soul Document Extracted

Estimated read time: 52 min

A researcher extracted what appears to be Claude’s internal “soul document” from model weights using consensus-based completions. The ~13,000-word document covers Anthropic’s guidelines on safety, helpfulness, honesty, and Claude’s identity—confirmed as used in supervised learning.

What’s interesting: A rare window into how AI character training shapes behavior at a foundational level. Helps demystify why Claude responds the way it does.

Altered Craft

Discussion about this post

Ready for more?