AI Radar

AI Radar — 07 May 2026

10 items 5 verified 5 secondary 0 rumor 16 sources 41% exploration

Anthropic’s Code with Claude week: managed agents gain self-improvement capabilities; OpenAI refreshes the default ChatGPT model; Sierra closes a $950M enterprise AI round; and regulators on two continents weigh in on Mythos.

Run: 04–07 May 2026 · 28 items reviewed → 10 published · 5 verified · 5 secondary · 0 rumor · 41% exploration · Run timestamp: 2026-05-07


TL;DR


Items

Claude Managed Agents adds Dreaming, Outcomes, and multi-agent orchestration

Source: https://claude.com/blog/new-in-claude-managed-agents · Anthropic · 2026-05-06 Verification: T2 verified · announcement · workflow-automation

Anthropic published three additions to Claude Managed Agents on 6 May 2026 at its Code with Claude developer conference. Outcomes (public beta) lets developers write a success rubric; a separate grader model evaluates the agent’s output in its own context window and routes failed outputs back for another attempt — internal tests show up to 10 percentage points of task-success improvement, with per-format gains of 8.4 pp on .docx and 10.1 pp on .pptx (vendor-measured, no independent replication found). Multi-agent orchestration (public beta) enables a lead agent to spawn specialist subagents with distinct models, prompts, and tools running in parallel on a shared filesystem, with full trace visibility in Claude Console. Dreaming (research preview) schedules overnight reviews of past agent sessions to surface recurring mistakes and shared workflow patterns, updating memory automatically or queuing changes for developer approval.

Why it matters for automation/productivity: Outcomes removes manual prompt-iteration cycles from agentic pipelines; orchestration enables parallelising specialist tasks that previously required sequential tool calls; Dreaming provides a mechanism for deployed agents to improve on a team’s specific workload over time without retraining.

Key claims:

Cross-references:


TinyFish opens web Search and Fetch to all AI agents at no cost

Source: https://www.tinyfish.ai/blog/search-and-fetch-are-now-free-for-every-agent-everywhere · TinyFish · 2026-05-04 Verification: T2 verified · announcement · mcp-ecosystem

TinyFish removed the paywall from its web Search and Fetch APIs on 4 May 2026. The free tier allows 5 search queries per minute and 25 URL fetches per minute across REST API, MCP server, Python and TypeScript SDKs, and CLI, with no credit card required. The underlying infrastructure is a custom Chromium fleet with JavaScript rendering, parallel execution, and bot-detection handling built in, making it suitable for pages that resist simpler HTTP scrapers. The MCP server and SDKs make the endpoints directly reachable from Claude, Cursor, Claude Code, Codex, and any MCP-compatible agent without additional integration work.

Why it matters for automation/productivity: Agent workflows that previously required a paid search or web-scraping subscription can now access search and fetch without spend, removing a cost barrier for both prototyping and low-volume production deployments.

Key claims:


OpenAI replaces ChatGPT’s default model with GPT-5.5 Instant

Source: https://techcrunch.com/2026/05/05/openai-releases-gpt-5-5-instant-a-new-default-model-for-chatgpt/ · TechCrunch citing OpenAI · 2026-05-05 Verification: T3 secondary · announcement · model-release Tier note: Primary URL openai.com/index/gpt-5-5-instant/ returned HTTP 403 during this run. TechCrunch article is based on OpenAI-provided data. Benchmark scores are vendor-reported.

OpenAI swapped GPT-5.3 Instant for GPT-5.5 Instant as the default ChatGPT model on 5 May 2026. In vendor evaluations, the new model scores 81.2 on AIME 2025 (up from 65.4 for its predecessor) and 76 on MMMU-Pro multimodal reasoning (up from 69.2); hallucination reduction in law, medicine, and finance domains is claimed but no test methodology was disclosed publicly. The rollout began for Plus and Pro users on web, with memory and context-management features included; Free, Go Business, and Enterprise users follow in the coming weeks. In the API the model is available as chat-latest; GPT-5.3 remains as a paid API option for three months.

Why it matters for automation/productivity: Teams using chat-latest in production pipelines receive the upgrade automatically; the AIME and MMMU-Pro score increases suggest improved performance on structured reasoning tasks, though independent replication was not available at the time of this run.

Key claims:

Caveats: Benchmark scores are from OpenAI’s own evaluation; no third-party replication published at launch. Hallucination-reduction claim appeared in press materials without a disclosed test methodology.


Anthropic launches ten financial services agent templates with Claude Opus 4.7

Source: https://www.anthropic.com/news/finance-agents · Anthropic · 2026-05-05 Verification: T2 verified · announcement · productivity-ai / ai-for-business

Anthropic released ten ready-to-run agent templates for financial services on 5 May 2026, covering pitchbook construction, KYC file screening, earnings review, financial model building, general ledger reconciliation, month-end closing, statement auditing, and valuation review. All templates run on Claude Opus 4.7 and scored 64.37% on the Vals AI Finance Agent benchmark (a third-party benchmark operator). Eight new data connectors became available at launch: Dun & Bradstreet, Fiscal AI, Financial Modeling Prep, Guidepoint, IBISWorld, SS&C IntraLinks, Third Bridge, and Verisk. Microsoft 365 add-ins for Excel, PowerPoint, and Word reached general availability; a Claude for Outlook add-in entered beta. Moody’s released an MCP server for credit ratings and company financial data accessible directly from Claude.

Why it matters for automation/productivity: The templates lower integration cost for finance teams automating research-heavy or compliance-adjacent workflows; the M365 add-ins allow Excel and PowerPoint users to invoke Claude without leaving familiar tooling; the Moody’s MCP connection adds a verified data source to agentic financial workflows.

Key claims:


Anthropic secures SpaceX Colossus compute, doubles Claude Code rate limits

Source: https://www.anthropic.com/news/higher-limits-spacex · Anthropic · 2026-05-06 Verification: T2 verified · announcement · model-release / ai-for-business

Anthropic agreed on 6 May 2026 to use the full capacity of SpaceX’s Colossus 1 data center in Memphis, Tennessee. The deal provides access to over 220,000 NVIDIA GPUs and over 300 megawatts of capacity, available within the month. Immediate effects for subscribers: Claude Code’s five-hour rate limits are doubled for Pro, Max, Team, and seat-based Enterprise plans; peak-hour throttling is removed for Pro and Max accounts; and API rate limits for Claude Opus models are increased (no specific percentage disclosed in the announcement). Anthropic also indicated interest in jointly developing orbital computing capacity with SpaceX as a future extension.

Why it matters for automation/productivity: Teams hitting Claude Code session limits during long agentic runs gain doubled throughput without a plan upgrade; Opus API users see less queuing during high-demand periods, which reduces production latency for agentic applications.

Key claims:

Cross-references:


Unity launches Unity AI into open beta with an MCP Server for external coding agents

Source: https://80.lv/articles/unity-launches-in-editor-ai-tools-suite-in-beta · 80.lv · 2026-05-04 Verification: T3 secondary · announcement · dev-tools / mcp-ecosystem Tier note: Primary unity.com blog post was not accessible. 80.lv coverage and Unity community forum discussion corroborate the same launch. Date cited by 80.lv is May 4; Unity forum activity begins approximately May 2.

Unity released Unity AI into open beta for Unity 6 and above around 4 May 2026. The suite includes an AI Assistant aware of the project’s live scene hierarchy and component state; a Generators tool for asset creation from text prompts; an AI Gateway that routes requests to third-party frontier models from within the editor; and an MCP Server that exposes the Unity scene graph to external coding agents including Claude Code, Cursor, Windsurf, and Antigravity. Pricing is $10 per month for 1,000 AI credits after a 14-day, 1,000-credit free trial.

Why it matters for automation/productivity: The MCP Server enables an agent running in Claude Code or Cursor to inspect and modify a Unity scene without being embedded inside the Unity Editor, expanding the reach of external AI coding tools into game-development workflows.

Key claims:

Caveats: Primary unity.com announcement page was not accessible; date uncertainty of approximately 2 days (May 2 or May 4). Integration depth of the MCP Server beyond scene-graph reads has not been independently assessed.

Cross-references:


Saperly launches as dedicated phone carrier for AI agents

Source: https://saperly.com/ · Saperly · 2026-05-05 Verification: T3 secondary · announcement · mcp-ecosystem Tier note: Primary source is the vendor product page. No independent tech press coverage was found as of this run. Discovery via practitioner diffusion on X (multiple accounts sharing within 24h).

Saperly launched on approximately 5 May 2026 as a carrier infrastructure layer for AI agents that need persistent, compliant phone identity. An agent provisioned through Saperly receives a stable phone number, voice and SMS routing, and a consistent caller-ID across outbound calls, removing the need to manage telecom API complexity within agent orchestration code. The company ships an MCP server as its primary integration interface, making phone capabilities available to any MCP-compatible agent without additional telecom SDK work.

Why it matters for automation/productivity: Agents that need to make or receive calls — scheduling, verification, customer-service automation — can now acquire a stable, compliant phone identity through an MCP integration rather than building carrier-layer handling into the agent’s own codebase.

Caveats: Product is newly launched; reliability, pricing, and SLA details are not publicly documented. No enterprise adoption evidence exists at this stage. Independent evaluation of the MCP server is not yet available.


Sierra closes $950M round at $15B valuation

Source: https://siliconangle.com/2026/05/04/ai-agent-startup-sierra-valued-15b-new-950m-funding-round/ · SiliconAngle · 2026-05-04 Verification: T3 secondary · funding · ai-for-business / agent-framework

Sierra, the enterprise AI agent platform founded in 2024 by Bret Taylor and Clay Bavor, raised $950 million at a $15 billion valuation on 4 May 2026. The round was led by Alphabet’s GV and Tiger Global, with Benchmark, Sequoia, and Greenoaks participating. Sierra reports $150 million in annual recurring revenue and claims adoption by “nearly half the Fortune 50” (both figures vendor-stated, no independent confirmation). The platform includes an Agent SDK, Agent Studio for code-free agent development, and pre-packaged connectors to third-party data sources; it runs across more than 15 open-source and proprietary models.

Why it matters for automation/productivity: The valuation and ARR figures, if accurate, reflect enterprise willingness to spend on managed agent infrastructure; independent reviews note high setup costs ($50k–$200k) and 3–6 month deployment timelines as counterweights to the adoption claims.

Key claims:

Caveats: ARR and penetration figures are vendor-reported without audited verification. Independent review aggregators document typical customer costs around $150k per year, 3–6 month deployments, and opaque outcome-based billing that can make ROI modeling difficult before signing.

Cross-references:


Anthropic, Blackstone, and Goldman Sachs form a new enterprise AI services company

Source: https://www.anthropic.com/news/enterprise-ai-services-company · Anthropic · 2026-05-04 Verification: T2 verified · announcement · ai-for-business

Anthropic announced the formation of a new AI services company on 4 May 2026, co-founded with Blackstone, Hellman & Friedman, and Goldman Sachs. General Atlantic, Leonard Green, Apollo Global Management, GIC, and Sequoia Capital are backing investors. The firm targets mid-sized organizations — community banks, regional manufacturers, regional health systems — that need Claude deployment expertise but lack the internal AI engineering capacity for implementation. Anthropic’s Applied AI engineers will work alongside the new company’s delivery teams to identify high-impact use cases and build Claude-powered solutions. No launch date, pricing, or geographic scope was disclosed.

Why it matters for automation/productivity: For mid-market enterprises without dedicated AI teams, a vendor-backed implementation partner reduces the upfront engineering barrier to deploying Claude; the lack of pricing transparency at launch makes cost comparison with general consulting partnerships impossible at this stage.


Mythos AI draws scrutiny from EU officials and the White House

Source: https://www.resultsense.com/news/2026-05-05-white-house-pre-release-ai-vetting/ · ResultSense · 2026-05-05; https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities · UK AISI (capabilities assessment) Verification: T3 secondary · policy / investigation · policy-regulation Tier note: Bloomberg EU-angle article (May 5) is behind a paywall; ResultSense and a White House spokesperson non-denial constitute the accessible US record. EU Economy Commissioner statements sourced from secondary coverage of the Bloomberg article.

Anthropic’s Mythos model — still in restricted testing, not publicly released — drew regulatory attention from two directions in the May 4–7 window. The European Commission and EU cybersecurity agency ENISA were scheduled to examine Mythos risks in the European Parliament on 6 May; Anthropic did not attend on short notice. EU Economy Commissioner Valdis Dombrovskis confirmed ongoing talks with Anthropic about giving European banks access to Mythos for cybersecurity resilience testing, citing concern that US entities gain earlier visibility into AI-discovered vulnerabilities. Separately, the White House is considering an executive order establishing a pre-release AI model review working group, with Mythos’s reported cyber capabilities cited as a catalyst; a White House spokesperson did not deny the substance but described specific executive order discussions as “speculation.” The UK AI Safety Institute published an evaluation finding Mythos Preview can autonomously execute multi-stage attacks on vulnerable networks and discover zero-day vulnerabilities in open-source codebases within hours — tasks that expert penetration testers said would take days.

Why it matters for automation/productivity: If a pre-release review process is established, organizations planning frontier-model integrations may face additional lead time before new models reach API availability. The EU’s interest in testing banks with Mythos suggests a near-term policy use case for offensive AI capabilities in regulated industries.

Key claims:

Caveats: Mythos has not been publicly released; no API, no pricing, no release date announced. AISI evaluation is of a restricted preview version; production capabilities may differ. The White House executive order remains deliberative, not signed.


Dropped

Items considered but not published, with reason:

Title consideredSourceReason
Claude Security public betaanthropic.com/news (Apr 30)Already covered in 2026-05-04 radar
Microsoft Agent 365 GAmicrosoft.com/security/blog (May 1)Already covered in 2026-05-04 radar
Mistral Medium 3.5 + Work Modemistral.ai/news (Apr 29)Already covered in 2026-05-03 and 2026-05-04 radars
Manus Cloud Computermanus.im/blog (Apr 30)Already covered in 2026-05-04 radar
xAI Grok 4.3 + Voice Cloning APIx.ai/news (Apr 30–May 2)Already covered in 2026-05-04 radar
Claude Code v2.1.126 gateway supportgithub.com/anthropics/claude-code/releases (May 1)Already covered in 2026-05-04 radar
Lukilabs Craft Agents OSSgithub.com/lukilabs/craft-agents-oss (May 2)Already covered in 2026-05-04 radar
Code Review for Claude Codeclaude.com/blog/code-review (Mar 9)Outside 72h window
CI auto-fix / Preview-review-mergeclaude.com/blog/preview-review-and-merge (Feb 20)Outside 72h window
Claude Code Routinesclaude.com/blog/introducing-routines-in-claude-code (Apr 14)Outside 72h window; showcased at conference but not newly launched
Gmail Gemini AI featuresblog.google/gmail-is-entering-the-gemini-era (Jan 8)Outside 72h window
GLM-5.1 open-weight model by Z.aihuggingface.co/zai-org/GLM-5.1 (Apr 7)Outside 72h window
ARIS autonomous research paperhuggingface.co/papers/2605.03042 (May 6)No shipping product; low BD actionability
OpenSeeker-v2 search agent paperhuggingface.co/papers/2605.04036 (May 6)No shipping product; low BD actionability
Anthropic Orbit proactive assistanttestingcatalog.com (unconfirmed leak)Unannounced feature; T5 — no official source
DeepSeek V4 modelsVarious aggregatorsNo primary source with confirmed May 4–7 date found
L Suite Lloyd legal AI toolfinance.yahoo.com (May 6)T4 only at time of research; no primary source accessible
White House National AI Policy Frameworkwhitehouse.gov (Dec 2025)Outside 72h window

Limitations


Search log (compact)

QueryYieldType
Anthropic Claude announcement May 202610 results, 8 high-relregistry
OpenAI announcement release May 202610 results, 6 high-relregistry
Google DeepMind Gemini release May 202610 results, 4 high-relregistry
anthropic.com/news/finance-agents (fetch)primary confirmed May 5registry
anthropic.com/news/enterprise-ai-services-company (fetch)primary confirmed May 4registry
OpenAI GPT-5.5 release May 5 2026 site:openai.com8 results, 5 high-relregistry
openai.com/index/gpt-5-5-instant/ (fetch)HTTP 403 — inaccessibleregistry
techcrunch.com GPT-5.5 Instant (fetch)T3 secondary confirmed May 5registry
Code with Claude conference announcements May 6 202610 results, 7 high-relregistry
simonwillison.net/2026/May/6/code-w-claude-2026/ (fetch)T3 liveblog confirmed May 6registry
claude.com/code-with-claude (fetch)conference details confirmedregistry
Claude Managed Agents Outcomes Dreaming Orchestration May 6 20268 results, 7 high-relregistry
claude.com/blog/new-in-claude-managed-agents (fetch)primary confirmed May 6registry
Anthropic SpaceX Colossus computing deal May 202610 results, 8 high-relregistry
anthropic.com/news/higher-limits-spacex (fetch)primary confirmed May 6registry
github.com/anthropics/claude-code/releases (fetch)v2.1.128–v2.1.132 May 4–6registry
AI agent framework launch release May 202610 results, 5 high-relexploratory
new AI startup launch funding May 2026 NOT Anthropic NOT OpenAI10 results, 4 high-relexploratory
siliconangle.com Sierra funding (fetch)T3 confirmed May 4exploratory
huggingface.co/papers (fetch)top papers May 6 identifiedexploratory
AI Indonesia startup peluncuran Mei 202610 results, 0 in-windowexploratory/cross-lang
MCP Model Context Protocol new release May 202610 results, 3 high-relexploratory
Anthropic Orbit proactive assistant Claude Cowork May 202610 results, 2 high-relexploratory/adversarial
AI policy regulation executive order May 202610 results, 3 high-relexploratory
Anthropic Mythos cybersecurity EU testing May 202610 results, 8 high-relexploratory
resultsense.com White House AI vetting (fetch)T3 confirmed May 5exploratory
aisi.gov.uk Mythos evaluation (search)T1 confirmedexploratory
site:x.com AnthropicAI Code with Claude SpaceX May 202610 results, 6 high-relexploratory
new AI tool launch May 4-7 2026 NOT Anthropic NOT OpenAI10 results, 3 high-relexploratory
Saperly phone carrier AI agents MCP May 20268 results, 3 high-relexploratory
saperly.com (fetch)vendor product page May 5exploratory
TinyFish web search fetch free AI agents May 20268 results, 5 high-relexploratory
tinyfish.ai/blog/search-fetch-free (fetch)primary confirmed May 4exploratory
Unity AI open beta May 2026 site:unity.com8 results, 4 high-relexploratory
claude.com/blog (fetch)2 posts in May 4–7 windowregistry
site:claude.com CI auto-fix code review remote agents May 20268 results, 5 high-relregistry
claude.com/blog/code-review (fetch)March 9 — outside windowadversarial
claude.com/blog/preview-review-and-merge (fetch)February 20 — outside windowadversarial
Sierra AI agent startup review criticism May 202610 results, 4 high-reladversarial
AI benchmark criticism independent evaluation May 202610 results, 3 high-reladversarial
Mistral xAI Grok new model release May 202610 results, 3 high-relregistry

Total searches: 41, of which 17 exploratory or adversarial (41%).


Suggested next runs