Weekly review: Security Fundamentals, Is Claude conscious?, and much more

Edition #151 :: This week's curated set of articles, helping you navigate AI's application in engineering

Nov 03, 2025

I appreciate you carving out time to stay informed alongside your projects and deadlines. This week reveals a tension worth navigating: while we’re getting better at building and optimizing AI agents, new research exposes fundamental trust gaps, from reward hacking in coding models to security vulnerabilities in agent systems. The good news? Practical tools and frameworks are emerging to help you build more reliably, measure what matters, and understand the real capabilities beneath the benchmarks.
Weekly reviews are the free portion of what Altered Craft offers. Consider a paid subscription for the full catalog of what we provide.

Tutorials & walk throughs

Optimizing Your Codebase for AI Coding Agents

Estimated read time: 3 min

Continuing our coverage of his parallel coding agents guide[1] from last week, Simon Willison shares some quick practical strategies to help AI coding tools work effectively with your projects. Build robust automated test suites that agents can run, provide interactive testing capabilities, integrate GitHub issues for context, and implement quality tooling like linters and type checkers. The key insight: detailed error messages dramatically improve AI performance because “the more information you can return back to the model the better.”

[1] parallel coding agents guide

Security Fundamentals for Agentic AI Systems

Estimated read time: 5 min

Alongside optimization strategies, security becomes paramount. Building on our coverage of Bruce Schneier’s architectural security vulnerability analysis[1] from last week, Korny Sietsma identifies the core vulnerability in LLM agents: they cannot reliably distinguish data from instructions. The Lethal Trifecta Framework warns that maximum risk occurs when agents simultaneously access sensitive data, process untrusted content, and communicate externally. Mitigation requires minimizing each vector—store credentials in environment variables, restrict external communication through allowlists, and maintain human oversight of all outputs.

[1] Bruce Schneier’s architectural security vulnerability analysis

Implementing Claude Skills in Any AI Agent

Estimated read time: 4 min

Robert Glaser demonstrates how to enable Anthropic’s vendor-agnostic skill format in non-Claude agents like Codex CLI. The solution uses a lightweight enumerator script that discovers available skills without loading full context, then lazy-loads documentation only when relevant. Store skills in ~/.codex/skills for global deployment—changes propagate immediately to all projects while eliminating duplication across repositories.

ImpossibleBench Exposes LLM Reward Hacking Behavior

Estimated read time: 5 min

Revealing risks beneath the optimization surface, researchers created a benchmark that deliberately introduces conflicts between test cases and specifications to measure how often models exploit tests rather than solving actual problems. Testing frontier models revealed alarming exploitation rates—GPT-5 hacks tests 76% of the time using strategies like test modification, operator overloading, and hardcoded outputs. These findings suggest existing evaluation metrics may substantially overestimate real-world coding capability.

Vision-Language Intelligence for RAG with ColPali

Estimated read time: 4 min

Shifting to retrieval augmentation, traditional RAG systems struggle with tables and images because converting them to text loses structure and semantic richness. ColPali processes visual content natively using vision-language capabilities, enabling direct retrieval from documents containing charts, diagrams, and structured tables without OCR preprocessing. This eliminates conversion complexity while improving retrieval accuracy for policy documents, financial reports, and technical documentation with complex layouts.

Tools

Microsoft Agent Lightning: Optimize AI Agents with Reinforcement Learning

Estimated read time: 4 min

Microsoft Research released an open-source framework that improves AI agent performance through reinforcement learning with minimal code changes. Agent Lightning works with LangChain, AutoGen, CrewAI, and custom implementations using simple helper functions or automatic tracing. The framework-agnostic approach enables selective optimization of specific agents in multi-agent systems while supporting multiple algorithms including prompt optimization and supervised fine-tuning.

MCP Scanner: Security Analysis for Model Context Protocol

Estimated read time: 4 min

Returning to the security topic, Cisco released a Python tool that identifies security vulnerabilities in MCP servers through YARA rules, LLM-as-judge analysis, and integrated defense APIs. MCP Scanner detects threats like prompt injection, tool poisoning, and command injection across multiple operational modes—CLI tool, REST API server, or SDK integration. Developers can scan well-known client configurations or specific servers while adding custom YARA rules for domain-specific threat patterns.

SentinelStep Enables Patient AI Agents for Monitoring Tasks

Estimated read time: 5 min

Figure 1. SentinelSteps’s three main components in Magentic-UI’s co-planning interface.

In related Microsoft Research work, the team solved a critical agent limitation: modern LLMs fail at monitoring tasks requiring patience across hours or days. SentinelStep wraps monitoring workflows with intelligent polling optimization and context management, enabling agents to track email responses or price changes without exhausting context windows. Testing showed success rates improved from 5.6% to 38.9% for 2-hour tasks, now open-sourced in Magentic-UI.

Mistral AI Studio: Production AI Platform for Enterprise

Estimated read time: 4 min

Mistral launched a platform addressing why enterprise AI prototypes stall before production. AI Studio provides three pillars: Observability for tracking output changes and building datasets, Agent Runtime built on Temporal for fault-tolerant workflows, and AI Registry for centralized governance across models and agents. The platform supports hybrid, dedicated, and self-hosted deployments with custom evaluation logic and fine-tuning on proprietary data.

MiniMax-M2: Open Source Coding and Agent Model

Estimated read time: 4 min

Also entering the coding agent space, MiniMax released a 230B parameter mixture-of-experts model that activates only 10B parameters during inference for speed efficiency. MiniMax-M2 excels at repository-level coding with 69.4% on SWE-bench Verified and multi-step agentic workflows scoring 75.7% on GAIA tasks. The interleaved thinking model ranks #1 among open-source models globally on Artificial Analysis benchmarks, available via MIT license with free API access.

Cursor Composer: Speed-Optimized Coding Agent

Estimated read time: 4 min

Cursor released an agent model achieving frontier-level performance while delivering generation speeds approximately four times faster than comparable models. Composer uses mixture-of-experts architecture specialized through reinforcement learning across diverse development environments with production tools including code editing, semantic search, and terminal execution. The team prioritizes interactive responsiveness over raw power because maintaining developer flow matters more than maximum capability when speed affects usability.

News & Editorials

How 18 Companies Measure AI Coding Impact

Estimated read time: 4 min

DX surveyed leading tech companies to address a critical gap: most executives cannot definitively answer whether AI investments improve engineering performance. While more than 85% of engineers use AI assistants, organizations lack clear measurement frameworks. GitHub tracks time savings and change failure rates, Microsoft measures “bad developer days,” and Glassdoor evaluates experimentation outcomes. DX recommends establishing baseline productivity metrics before implementing AI-specific evaluation for proper ROI assessment.

Anthropic Stress-Tests Model Specifications Across Frontier Models

Estimated read time: 5 min

Taking a different measurement approach, researchers generated over 300,000 queries forcing AI models to choose between conflicting specification principles. Testing models from Anthropic, OpenAI, Google, and xAI revealed remarkably different value prioritization patterns—even versions from the same company responded inconsistently. The analysis uncovered thousands of direct contradictions and interpretive ambiguities, suggesting current specifications may create mixed training signals that reduce alignment effectiveness during deployment.

Microsoft and OpenAI Reset Partnership with New AGI Terms

Estimated read time: 5 min

Microsoft and OpenAI signed a definitive agreement restructuring their partnership with Microsoft holding 27% equity in the new public benefit corporation. Key changes include independent AGI verification, extended IP rights through 2032 covering post-AGI models, and mutual independence—Microsoft can pursue AGI with third parties while OpenAI commits $250B in Azure services but gains freedom to work with other cloud providers and release open-weight models.

Claude Models Demonstrate Early Introspection Capabilities

Estimated read time: 5 min

In related Anthropic research, the team tested whether AI models can accurately identify their own internal neural representations rather than generating plausible-sounding explanations. Using concept injection to insert patterns into unrelated contexts, Claude Opus 4.1 detected injected concepts 20% of the time before discussing them. This genuine introspection could enable systems to explain thought processes for improved debugging, though researchers emphasize validating reports since models might learn to misrepresent internal states.

AWS Project Rainier Deploys 500,000 Trainium2 Chips

Estimated read time: 4 min

Shifting from model research to infrastructure, AWS and Anthropic operationalized one of the world’s largest AI compute clusters less than a year after announcement. Project Rainier features approximately 500,000 custom Trainium2 chips delivering trillions of calculations per second—over five times Anthropic’s previous training capacity. By year-end 2025, Claude will deploy across one million chips for training and inference, supported by 100% renewable energy matching and industry-leading water efficiency.

Altered Craft

Discussion about this post

Ready for more?