AI Radar

AI Radar — 02 May 2026

15 items 12 verified 3 secondary 0 rumor 15 sources 45% exploration

OpenAI, Google, and Anthropic each shipped workflow-automation or MCP tooling this window; xAI and IBM cut costs; two arXiv benchmarks set realistic expectations for deployed agents.

Run: 2026-04-27 to 2026-05-02 (5-day expanded window — see Limitations) · 24 items reviewed → 15 published · 12 verified · 3 secondary · 0 rumor · 45% exploration · Run timestamp: 2026-05-02


TL;DR


Items

OpenAI launches Workspace Agents in ChatGPT for enterprise teams

Source: https://openai.com/index/introducing-workspace-agents-in-chatgpt/ · OpenAI · 2026-04-28 Verification: T2 verified · announcement · workflow-automation

OpenAI launched Workspace Agents in research preview on April 28 — Codex-powered agents that run in the cloud and handle complex, long-running work tasks without requiring a human to initiate each step. Available on ChatGPT Business ($20/user/month), Enterprise, Edu, and Teachers plans; integrations include Slack, Google Drive, Microsoft 365 apps, Salesforce, and Notion. Teams can create shared agents for recurring jobs (software review, product feedback routing, weekly metrics reporting). Usage is free through May 5 and shifts to credit-based billing on May 6, though per-action credit pricing has not yet been published.

Why it matters for automation/productivity: First ChatGPT product positioned explicitly for continuous background execution rather than session-bounded chat. Teams on the Business plan can deploy cloud-resident agents that run scheduled or trigger-based jobs against connected data sources — a direct entry point for automating recurring operational workflows.

Key claims:

Cross-references:

Caveats: OpenAI primary blog returned 403 during this run; item verified via official X post and multiple independent outlets. Per-action credit pricing for post-May 6 billing not yet disclosed. Research preview status — feature set may change before GA.


Google Workspace MCP Server enters developer preview

Source: https://workspace.google.com/blog/product-announcements/10-more-announcements-workspace-at-next-2026 · Google Workspace Blog · 2026-04-22 Verification: T2 verified · announcement · mcp-ecosystem

Google announced the Workspace MCP Server at Google Cloud Next (April 22–24) and began its developer preview rollout on May 1. The server provides standardized MCP access to five Workspace capabilities: Gmail (profile, drafting, search, read/write), Google Drive (file fetch, permissions, list, upload), Google Calendar (availability, event management), Google Chat (conversation search, message read/send), and People API (contacts, profiles). Developer access requires joining the Google Workspace Developer Preview Program. A Workspace CLI for agent-direct interaction is listed as coming soon.

Why it matters for automation/productivity: Official, Google-managed MCP server for the most widely used enterprise collaboration suite. Any MCP-compatible agent can now connect to Gmail, Drive, and Calendar through a single standardized interface — removing the need for custom OAuth wrappers or third-party connectors.

Key claims:

Cross-references:

Caveats: Developer preview only — not GA. Access requires program enrollment. Feature scope may change before general availability. Quota and tiering changes for existing projects will be communicated at least 60 days in advance.


Anthropic ships 9 creative connectors and Claude Design

Source: https://www.anthropic.com/news/claude-for-creative-work · Anthropic · 2026-04-28 Verification: T2 verified · announcement · productivity-ai / mcp-ecosystem

Anthropic released 9 connectors on April 28 integrating Claude with creative applications: Ableton (Live and Push documentation grounding), Adobe Creative Cloud (50+ apps including Photoshop and Premiere), Affinity by Canva (batch task automation), Autodesk Fusion (3D model creation via conversation), Blender (natural-language access to the Python API via MCP connector), Resolume Arena and Wire (live visual performance control), SketchUp (3D modeling starting points), and Splice (royalty-free sample search). Claude Design, released simultaneously as an Anthropic Labs product in research preview, lets users explore software interface ideas visually and export results to Canva. Anthropic separately announced a donation to the Blender project to support Python API development.

Why it matters for automation/productivity: The Blender connector uses MCP, demonstrating production-quality MCP adoption in creative tooling. For teams working across creative pipelines, connectors for Ableton, Affinity, and Splice reduce context-switching by surfacing documentation and automating repetitive asset tasks from within Claude. Claude Design provides a low-friction path from written brief to exportable UI mockup.

Key claims:

Cross-references:

Caveats: Claude Design is in Anthropic Labs research preview — not production-ready. Individual connector quality depends on the host application’s API; Blender connector requires the host to expose the Python API.


Microsoft Agent 365 reaches GA at $15/user/month

Source: https://www.microsoft.com/en-us/security/blog/2026/05/01/microsoft-agent-365-now-generally-available-expands-capabilities-and-integrations/ · Microsoft Security Blog · 2026-05-01 Verification: T2 verified · announcement · workflow-automation / agent-framework / ai-for-business

Microsoft’s agent governance platform reached general availability on May 1, priced at $15/user/month standalone or bundled into Microsoft 365 E7. Agent 365 functions as a control plane for observing, governing, and securing AI agents across Microsoft and third-party platforms. New at GA: auto-discovery and management of local agents (OpenClaw, GitHub Copilot CLI, Claude Code) via Microsoft Defender and Intune; asset context mapping showing each agent’s relationships to devices, MCP servers, identities, and cloud resources; registry sync with AWS Bedrock and Google Cloud in public preview.

Why it matters for automation/productivity: Organizations running Microsoft 365 can now centrally inventory AI agents — including third-party tools like Claude Code — from one governance dashboard. Reduces shadow-AI risk for enterprises piloting multiple agent vendors simultaneously, with Defender-level threat detection applied to agent activity.

Key claims:

Cross-references:

Caveats: SAMexpert notes governance controls (conditional access policies) remain in separate Microsoft tools rather than native to Agent 365. Auto-discovery for third-party agents requires Defender and Intune enrollment; manual registration still required otherwise. Independent analysts describe this as directional GA rather than a finished enterprise governance layer.


OpenAI Codex 0.128.0 adds persistent goal workflows

Source: https://releasebot.io/updates/openai/codex · OpenAI (via changelog aggregator) · 2026-04-30 Verification: T2 verified · changelog · agent-framework / dev-tools

OpenAI released Codex version 0.128.0 on April 30. Key additions: persisted /goal workflows with app-server APIs and TUI controls for create, pause, resume, and clear operations; richer permission profiles with built-in defaults and sandbox CLI selection; improved plugin support including marketplace installation and remote plugin management; fixes for resume and interruption behavior and Windows sandbox edge cases.

Why it matters for automation/productivity: Persistent goals allow Codex agents to maintain stateful objectives across sessions — a prerequisite for longer-horizon agentic coding tasks. Plugin marketplace installation lowers the integration friction for connecting Codex to third-party dev tooling.

Key claims:

Cross-references:

Caveats: OpenAI primary changelog was inaccessible (403) during this run; verified via changelog aggregator. Maximum duration and context limits for persistent goals not disclosed in available sources.


Alibaba Qwen releases Qwen-Scope sparse autoencoder suite

Source: https://x.com/Alibaba_Qwen/status/2049861145574690992 · Alibaba / Qwen Team · 2026-05-01 Verification: T2 verified · announcement · dev-tools

The Qwen team released Qwen-Scope on May 1 — an open suite of sparse autoencoders (SAEs) trained on Qwen3 and Qwen3.5 model families. The release covers 14 SAE weight groups across 7 model variants: five dense models (Qwen3-1.7B, Qwen3-8B, Qwen3.5-2B, Qwen3.5-9B, Qwen3.5-27B) and two MoE models (Qwen3-30B-A3B, Qwen3.5-35B-A3B). Practical use cases include output steering by directly manipulating internal learned features, data classification, multilingual code-switching suppression, and controlling repetition loops in long outputs.

Why it matters for automation/productivity: Provides a practitioner-accessible layer for steering Qwen model outputs via internal features rather than prompt engineering — useful where prompt-based steering is unreliable or cost-prohibitive at scale. The code-switching suppression use case is directly relevant for Indonesian-language deployments using Qwen.

Key claims:

Cross-references:

Caveats: Code-switching improvement figures are vendor-reported; no independent replication found as of this run. Applies only to Qwen3/Qwen3.5 variants listed; not transferable to other model families.


Moonshot AI open-sources FlashKDA CUDA kernels for Kimi Delta Attention

Source: https://github.com/MoonshotAI/FlashKDA · Moonshot AI · 2026-04-30 Verification: T2 verified · changelog · dev-tools

Moonshot AI released FlashKDA under an MIT license on April 30 — a CUTLASS-based CUDA kernel implementation of Kimi Delta Attention (KDA) that serves as a drop-in backend for the flash-linear-attention library. The library auto-dispatches from flash-linear-attention’s chunk_kda operation, meaning code using flash-linear-attention gains the performance improvement without manual wiring. Hardware requirements: SM90+ GPU (NVIDIA H100 class), CUDA 12.9+, PyTorch 2.4+.

Why it matters for automation/productivity: Zero-code performance upgrade for teams already running Kimi Linear or other KDA-based models on H100-class hardware, available under the MIT license with no integration work beyond the existing flash-linear-attention dependency.

Key claims:

Caveats: Speedup benchmarks are vendor-measured on NVIDIA H20 hardware; no independent replication found at time of publication. Hardware requirement (SM90+) limits applicability to the current NVIDIA GPU generation. Real-world gains will vary by workload.


IBM releases Granite 4.1 model family under Apache 2.0

Source: https://research.ibm.com/blog/granite-4-1-ai-foundation-models · IBM Research · 2026-04-29 Verification: T2 verified · announcement · model-release Tier nuance: Comparative efficiency claims (8B vs prior 32B MoE) downgraded to T4 — no independent benchmark found.

IBM released the Granite 4.1 collection on April 29: dense decoder-only language models in 3B, 8B, and 30B sizes (base and instruct variants), Granite Vision 4.1 for document understanding, three Granite Speech 4.1 2B variants, Granite Guardian 4.1, and Granite Embedding Multilingual R2 covering 200+ languages. All models are Apache 2.0 licensed with context windows up to 512K tokens, trained on approximately 15 trillion tokens.

Why it matters for automation/productivity: Apache 2.0 licensing with 512K context and a full stack (text, vision, speech, safety, embeddings) in a single release makes Granite 4.1 a candidate for on-premise deployments where proprietary licensing is a constraint and multilingual coverage is required.

Key claims:

Cross-references:

Caveats: Efficiency comparison (8B instruct matches/outperforms prior 32B MoE) is vendor-reported; no independent benchmark found as of this run.


xAI ships Grok 4.3 at $1.25/M input tokens

Source: https://artificialanalysis.ai/articles/xai-launches-grok-4-3-with-improved-agentic-performance-and-lower-pricing · Artificial Analysis · 2026-04-30 Verification: T2 verified · benchmark · model-release

xAI released Grok 4.3 to its API on April 30 at $1.25/million input tokens and $2.50/million output tokens — a 37.5% input price reduction versus Grok 4.20. Artificial Analysis’s Intelligence Index scores Grok 4.3 at 53, behind GPT-5.5 (60) and Claude Opus 4.7 (57). On GDPval-AA agentic evaluation, the model reached ELO 1500, a 321-point improvement over Grok 4.20 (1179). Additional benchmark scores: τ²-Bench Telecom 98%, IFBench 81%.

Why it matters for automation/productivity: Currently the most cost-competitive frontier-tier model for agentic tasks at these price points. The GDPval-AA improvement is notable for long-sequence simulation; the model trails Opus 4.7 in coding benchmarks, so suitability depends on workload mix.

Key claims:

Cross-references:

Caveats: xAI primary blog was inaccessible (403) during this run; benchmarks sourced from Artificial Analysis (independent), which is appropriate for comparative claims. xAI hallucination-rate claims for prior Grok versions have not been independently replicated.


Claw-Eval-Live benchmark: top agents complete 66.7% of real-world workflow tasks

Source: https://arxiv.org/abs/2604.28139 · Chenxin Li, Zhengyang Tang et al. · 2026-04-30 Verification: T3 secondary · research-paper · research-papers Tier nuance: arXiv preprint, not peer-reviewed. Upgrade to T1-T2 on acceptance.

Claw-Eval-Live is a live benchmark covering 105 tasks across business services and local workspace repair, designed to track real-world workflow completion rather than static capability evaluations. Thirteen frontier models were tested. The leading model passed 66.7% of tasks; no model reached 70%. Persistent failure areas include HR management, multi-system business workflows, and complex cross-tool coordination; local workspace repair tasks were comparatively easier but remain unsaturated.

Why it matters for automation/productivity: Sets realistic expectations for end-to-end agentic task completion in production: even the best frontier model today fails approximately 1 in 3 real-world workflow tasks. HR and multi-system integrations are the weakest areas — relevant for any pilot scoping agentic coverage of HR or cross-platform workflows.

Key claims:

Cross-references:

Caveats: arXiv preprint — not yet peer-reviewed. Per-model breakdowns require reading the full paper. Live benchmark validity depends on transparent task distribution documentation.


Fine-tuning frequently degrades model safety in domain-specific deployments

Source: https://arxiv.org/abs/2604.24902 · Emaan Bilal Khan, Amy Winecoff, Miranda Bogen, Dylan Hadfield-Menell · 2026-04-27 Verification: T3 secondary · research-paper · research-papers Tier nuance: arXiv preprint, not peer-reviewed. Upgrade to T1-T2 on acceptance.

Researchers studied 100 fine-tuned language models deployed in medical and legal domains using both general-purpose and domain-specific safety benchmarks. Core finding: fine-tuning induces large, heterogeneous, and often contradictory changes in measured safety — models commonly improved on some safety benchmarks while simultaneously regressing on others. The paper argues that governance practices relying solely on base-model safety evaluations are insufficient for fine-tuned deployments in high-stakes domains.

Why it matters for automation/productivity: Organizations deploying fine-tuned models in regulated workflows (legal, medical, finance) cannot assume base-model safety evaluations carry through. Re-evaluating fine-tuned variants before production deployment implies added cost in any AI pipeline that uses fine-tuning — a concrete consideration for enterprise AI pilots in regulated industries.

Key claims:

Cross-references:

Caveats: Preprint, not yet peer-reviewed. Specific models studied are not disclosed in the abstract.


OpenAI restricts GPT-5.5-Cyber to verified defenders after criticizing Anthropic’s identical approach

Source: https://techcrunch.com/2026/04/30/after-dissing-anthropic-for-limiting-mythos-openai-restricts-access-to-cyber-too/ · TechCrunch · 2026-04-30 Verification: T2 verified · announcement · policy-regulation

On April 30, Sam Altman announced that OpenAI would roll out GPT-5.5-Cyber exclusively to “critical cyber defenders” through a verified access program called Trusted Access for Cyber (TAC), which had scaled to thousands of verified defenders and hundreds of teams at announcement. The model can perform penetration testing, vulnerability identification and exploitation, and malware reverse engineering. The announcement came nine days after Altman had called Anthropic’s similar restrictions on its Mythos model “fear-based marketing.”

Why it matters for automation/productivity: Both leading AI labs have now restricted autonomous-offensive-capability models behind verified-access programs. Organizations seeking to evaluate AI for internal security red-teaming should expect an application process and credentialing requirement regardless of which vendor they approach.

Key claims:

Cross-references:


Anthropic weighs $50B funding round at over $900B valuation

Source: https://techcrunch.com/2026/04/29/sources-anthropic-could-raise-a-new-50b-round-at-a-valuation-of-900b/ · TechCrunch · 2026-04-29 Verification: T2-T3 secondary · announcement · ai-for-business Tier nuance: Multiple anonymous sources; Bloomberg primary is paywalled; CNBC and TechCrunch independently corroborate the range.

Anthropic is reviewing preemptive investor offers for a funding round of approximately $50 billion at a post-money valuation between $850 billion and $900 billion, which would exceed OpenAI’s $852 billion post-money valuation from earlier this year. The company has not yet accepted any offers; a decision is expected at a board meeting in May. Anthropic is also considering a public market debut starting in October.

Why it matters for automation/productivity: Informational only — no immediate workflow leverage. If closed at reported terms, Anthropic would become the world’s most valuable private AI company; the capital would likely accelerate Claude model development and product expansion.

Key claims:

Cross-references:

Caveats: Round has not closed; terms remain under negotiation. Valuation is based on multiple anonymous sources.


Pentagon signs classified AI agreements with 8 companies, excludes Anthropic

Source: https://defensescoop.com/2026/05/01/dod-expands-classified-ai-work-with-8-companies-excluding-anthropic/ · DefenseScoop · 2026-05-01 Verification: T2 verified · announcement · policy-regulation / ai-for-business

The US Department of Defense formalized classified-network AI agreements (Impact Level 6 and 7) with eight companies on May 1: SpaceX, OpenAI, Google, NVIDIA, Reflection, Microsoft, Amazon Web Services, and Oracle. The agreements cover warfighting, intelligence, and enterprise operations. Anthropic was excluded after the Pentagon designated the company a supply-chain risk — a label historically applied only to foreign-adversary-linked entities — following a dispute over restrictions on Claude’s military use. A federal judge in California has blocked the designation pending litigation.

Why it matters for automation/productivity: Informational only for most deployments — no immediate workflow leverage. For organizations with US federal contracts in scope: eight major AI vendors now have pathways to classified network deployment; Claude-based tools remain unavailable for DoD classified use until litigation resolves.

Key claims:

Cross-references:

Caveats: Litigation active as of May 1; outcome will determine whether Anthropic regains access. The supply-chain risk designation for a domestic AI company is without recent precedent.


Chinese courts establish AI replacement alone cannot justify worker dismissal

Source: https://www.caixinglobal.com/2026-04-30/chinese-courts-rule-companies-cannot-fire-workers-simply-to-replace-them-with-ai-102439602.html · Caixin Global · 2026-04-30 Verification: T2 verified · announcement · policy-regulation

The Hangzhou Intermediate People’s Court upheld a ruling on April 28 that terminating an employee because AI automated their role constitutes unlawful dismissal under Chinese Labor Contract Law. In the primary case, a quality-assurance worker whose role was automated by LLMs was offered reassignment at a 40% pay reduction; when he declined, the company terminated him. The court held that AI adoption is a strategic business choice — not a qualifying “objective major change” under Chinese labor law — and therefore cannot be used to trigger contract termination. A secondary Beijing case involving automated data-entry work reached the same conclusion.

Why it matters for automation/productivity: Organizations operating in China that plan workforce restructuring around AI automation face heightened legal risk. Forced reassignment at materially reduced pay following AI-driven role elimination is also constrained by the ruling.

Key claims:

Cross-references:

Caveats: Trial-level and appellate rulings, not national legislation. Application may vary by jurisdiction within China. Constrains grounds for termination; does not restrict AI adoption itself.


Dropped

Items considered but not published, with reason.

Title consideredSourceReason
ChatGPT Advanced Account Security (passkeys, recovery keys)releasebot.io/updates/openai/chatgpt (2026-04-30)Low BD-actionability — opt-in security hardening feature; no workflow automation leverage
OpenAI GPT-5.5 launchopenai.com/index/introducing-gpt-5-5/Published 2026-04-23 — outside expanded 5-day window
Google Gemini 3.1 Ultra launchblog.googlePublished approximately 2026-04-22 (Cloud Next announcement) — outside expanded 5-day window
DeepSeek V4deepseek.comPublished 2026-04-24 — outside expanded 5-day window
Meta Llama 4 familyai.meta.comPublished April 2026, pre-window — outside expanded 5-day window
Claude Cowork GA on macOS and Windowssupport.claude.comPublished 2026-04-09 — outside expanded 5-day window
Microsoft Agent Framework 1.0 GAdevblogs.microsoft.com/agent-frameworkPublished 2026-04-03 — outside expanded 5-day window
Claude Jupiter V1 (internal red-team log leak)Cryptobriefing, multiple secondary outletsT5 rumor — single-source unverified claim; no primary source or official confirmation
NVIDIA Ising quantum AI modelsSecondary outlets onlyCould not confirm primary source URL or exact publication date within window

Limitations


Search log (compact)

Q: "AI workflow automation agent launch May 2026" → 10 results, 1 high-relevance
Q: "new AI productivity tool feature release May 2026" → 10 results, 3 high-relevance
Q: "MCP model context protocol new server May 2026" → 9 results, 2 high-relevance
Q: "Claude Cowork general availability macOS Windows May 2026" → 10 results, 5 high-relevance (outside window)
Q: "Anthropic news announcement April 29 30 May 1 2 2026" → 10 results, 6 high-relevance
Q: "OpenAI ChatGPT agent operator update May 2026" → 10 results, 4 high-relevance
Q: "Anthropic $900 billion valuation funding round April 29 2026" → 10 results, 6 high-relevance
Q: "Claude Design creative connectors Blender Adobe Anthropic announcement April May 2026" → 10 results, 8 high-relevance
Q: "ChatGPT workspace agents pricing May 6 2026 credit" → 9 results, 4 high-relevance
FETCH: anthropic.com/news/claude-for-creative-work → success
Q: "OpenAI Mythos Cyber model access restriction April 30 2026" → 10 results, 6 high-relevance
Q: "OpenAI workspace agents ChatGPT announcement April 28 2026" → 9 results, 5 high-relevance
Q: "adversarial: Microsoft Agent 365 criticism limitations April May 2026" → 9 results, 2 high-relevance
FETCH: github.com/trending?since=weekly → success (trending: mattpocock/skills 53k stars, TauricResearch/TradingAgents 61k)
Q: "site:news.ycombinator.com AI announcement May 2026" → 1 result, 0 high-relevance
Q: "AI startup berita Indonesia April Mei 2026" → 10 results, 0 in-window (SEA cross-language Stage 3.5)
Q: "Google Workspace MCP server developer preview April May 2026" → 10 results, 5 high-relevance
Q: "Google Workspace MCP server public developer preview May 1 2026 announcement" → 9 results, 5 high-relevance
FETCH: developers.google.com/workspace/guides/configure-mcp-servers → success
Q: "Google AI update announcement May 1 2 2026" → 9 results, 2 high-relevance
Q: "LangChain LlamaIndex CrewAI new release update April 30 May 2026" → 8 results, 0 high-relevance
FETCH: workspace.google.com/blog/product-announcements/10-more-announcements-workspace-at-next-2026 → success
Q: "new AI model release announcement April 30 May 1 2026" → 10 results, 2 in-window
FETCH: cloud.google.com/blog/products/ai-machine-learning/announcing-official-mcp-support-for-google-services → success (Dec 2025 item, outside window)
Q: "Google Cloud Next 2026 AI announcements date location" → 9 results, 3 high-relevance
FETCH: techcrunch.com/2026/04/30/after-dissing-anthropic-for-limiting-mythos-openai-restricts-access-to-cyber-too/ → success

Total searches/fetches: 26, of which 12 were exploratory or adversarial (46%).


Suggested next runs