02 May 2026

AI Radar — 02 May 2026

11 items 9 verified 2 secondary 0 rumor 12 sources 38% exploration

Run parameters: Timeframe: last 24h (effective window: 2026-04-29 to 2026-05-02 — see Limitations); Categories: all; Depth: standard; Audience: public-digest; Run timestamp: 2026-05-02 14:00 UTC; Items reviewed: 22; Items published: 11.

Items

Item 1: Pentagon signs classified AI agreements with 8 companies, excludes Anthropic

Primary source: https://defensescoop.com/2026/05/01/dod-expands-classified-ai-work-with-8-companies-excluding-anthropic/ Publication date: 2026-05-01 Author/org: DefenseScoop Tier grade: T2 Verification status: verified Methodology: announcement Conflict of interest: none observed Category: policy-regulation / ai-for-business Recency-adjusted weight: 1.0

What happened: The US Department of Defense formalized classified-network AI agreements (Impact Level 6 and 7) with eight companies on May 1, 2026: SpaceX, OpenAI, Google, NVIDIA, Reflection, Microsoft, Amazon Web Services, and Oracle. The agreements cover warfighting, intelligence, and enterprise operations. Anthropic was excluded after the Pentagon designated the company a supply-chain risk — a label the agency has historically applied only to foreign-adversary-linked entities — following a dispute in early 2026 over restrictions on Claude’s military use. The dispute is in active litigation; a federal judge in California blocked the government’s supply-chain designation last month. DoD CTO Emil Michael confirmed the agency prioritized building relationships with multiple providers across both open-source and proprietary systems.

Numbers / claims:

8 companies by name → DefenseScoop primary source, 2026-05-01
Supply-chain risk designation → DefenseScoop; confirmed by CNN, UPI, Breaking Defense

Cross-references:

What this means for users: Eight major AI vendors now have formal pathways to deploy on US military classified networks. Anthropic remains excluded pending litigation, meaning Claude-based tools are unavailable for DoD classified use in the near term.

Caveats: The supply-chain risk designation for a domestic AI company is without recent precedent; the litigation outcome will determine whether Anthropic regains access. Litigation is active as of May 1, 2026.

Item 2: Microsoft Agent 365 reaches general availability at $15/user/month

Primary source: https://www.microsoft.com/en-us/security/blog/2026/05/01/microsoft-agent-365-now-generally-available-expands-capabilities-and-integrations/ Publication date: 2026-05-01 Author/org: Microsoft Security Blog Tier grade: T2 (descriptive claims about own product) Verification status: verified Methodology: announcement Conflict of interest: vendor self-announcement Category: agent-framework / dev-tools / ai-for-business Recency-adjusted weight: 1.0

What happened: Microsoft’s agent governance platform reached general availability on May 1, priced at $15/user/month standalone or bundled into Microsoft 365 E7. Agent 365 functions as a control plane for observing, governing, and securing AI agents across Microsoft and third-party platforms. New capabilities at GA: discovery and management of local agents (OpenClaw, GitHub Copilot CLI, Claude Code) via Microsoft Defender and Intune; asset context mapping showing agent relationships to devices, MCP servers, identities, and cloud resources; Agent 365 registry sync with AWS Bedrock and Google Cloud in public preview; and a managed Windows 365 for Agents environment.

Numbers / claims:

Price: $15/user/month → Microsoft primary source
GA date May 1, 2026 → Microsoft primary source; confirmed by SAMexpert, M365 Admin community, Thurrott

Cross-references:

https://samexpert.com/agent-365-microsoft-ai-licensing/ (T2 — independent licensing analysis; notes governance gaps)
https://www.avepoint.com/shifthappens/blog/microsoft-agent-365-agentic-era-governance (T3)
https://www.thurrott.com/a-i/335594/microsoft-agent-365-platform-goes-out-of-preview-and-adds-support-for-local-ai-agents (T3)

What this means for users: Organizations running Microsoft 365 can now centrally track AI agents — including third-party ones — from a single dashboard. Auto-discovery for third-party agents is not yet available; registration remains manual.

Caveats: SAMexpert notes unresolved licensing questions around per-user vs. shared-resource agent consumption. Governance controls including conditional access policies remain in separate Microsoft tools rather than native to Agent 365. Independent analysts describe the product as directional GA, not a finished enterprise governance layer.

Item 3: Chinese courts establish that AI replacement alone cannot justify worker dismissal

Primary source: https://www.caixinglobal.com/2026-04-30/chinese-courts-rule-companies-cannot-fire-workers-simply-to-replace-them-with-ai-102439602.html Publication date: 2026-04-30 (court ruling date: 2026-04-28) Author/org: Caixin Global / Hangzhou Intermediate People’s Court Tier grade: T2 Verification status: verified Methodology: announcement Conflict of interest: none observed Category: policy-regulation Recency-adjusted weight: 1.0

What happened: The Hangzhou Intermediate People’s Court upheld a ruling on April 28 that terminating an employee because AI automated their work constitutes unlawful dismissal under Chinese labor law. In the primary case, a quality-assurance worker whose role was automated by LLMs was offered reassignment at a 40% salary reduction; the company then terminated him when he declined. The court held that AI adoption is a company’s strategic business choice — not a qualifying ‘objective major change’ under Chinese Labor Contract Law — and therefore cannot be used to trigger contract termination. A secondary Beijing case involving a manual data-entry worker automated by AI reached the same conclusion. The rulings establish a precedent that the economic cost of automation cannot be shifted unilaterally to workers.

Numbers / claims:

Zhou’s salary reduction: 25,000 to 15,000 yuan (40%) → Caixin primary source
Ruling date: April 28, 2026 → Caixin; confirmed by scio.gov.cn, NPR, Dexerto

Cross-references:

https://www.npr.org/2026/05/01/nx-s1-5807131/tech-worker-china-ai (T2-T3)
http://english.scio.gov.cn/chinavoices/2026-04/30/content_118471189.html (T2 — Chinese government outlet)

What this means for users: Chinese labor courts have signaled that AI automation of a role does not automatically constitute grounds for dismissal or forced reassignment with reduced pay. Organizations operating in China that plan workforce restructuring around AI adoption face heightened legal risk.

Caveats: Trial-level and appellate rulings, not national legislation. Application may vary by jurisdiction within China. The rulings constrain the grounds for termination, not AI adoption itself.

Item 4: IBM releases Granite 4.1 model family (3B/8B/30B) under Apache 2.0

Primary source: https://research.ibm.com/blog/granite-4-1-ai-foundation-models Publication date: 2026-04-29 Author/org: IBM Research Tier grade: T2 (descriptive claims); T4 (comparative claims — see below) Verification status: verified Methodology: announcement Conflict of interest: vendor self-announcement Category: model-release Recency-adjusted weight: 1.0

What happened: IBM released the Granite 4.1 collection on April 29, comprising dense decoder-only language models in 3B, 8B, and 30B sizes (base and instruct variants), plus Granite Vision 4.1 for document understanding, three Granite Speech 4.1 2B variants, Granite Guardian 4.1, and Granite Embedding Multilingual R2 covering 200+ languages. All models are Apache 2.0 licensed. Context windows extend to 512K tokens.

Numbers / claims:

Context window: up to 512K tokens → IBM Research primary
Training: approximately 15 trillion tokens → IBM Research primary
Speech WER: 5.33% on OpenASR Leaderboard → IBM Research primary (vendor benchmark, no independent context provided)
IBM states the 8B instruct model matches or outperforms the prior Granite 4.0 32B MoE — vendor-claimed; no independent benchmark found

Cross-references:

What this means for users: Granite 4.1 is an Apache 2.0 enterprise-grade family covering text, vision, speech, and safety in a single release. The open license and 512K context window make it a candidate for on-premise deployments where proprietary licensing is a constraint.

Caveats: The efficiency comparison (8B vs prior 32B MoE) is vendor-reported with no independent benchmark found as of this run.

Item 5: xAI releases Grok 4.3 with 37.5% lower input pricing and improved agentic scores

Primary source: https://artificialanalysis.ai/articles/xai-launches-grok-4-3-with-improved-agentic-performance-and-lower-pricing Publication date: 2026-04-30 Author/org: Artificial Analysis (independent evaluation; xAI blog returned 403 during this run) Tier grade: T2 (Artificial Analysis is an independent benchmarking organization) Verification status: verified Methodology: benchmark Conflict of interest: none observed for Artificial Analysis Category: model-release Recency-adjusted weight: 1.0

What happened: xAI released Grok 4.3 to its API on April 30 at $1.25/million input tokens and $2.50/million output tokens — a 37.5% input price reduction vs Grok 4.20. Artificial Analysis’s Intelligence Index scores Grok 4.3 at 53, behind GPT-5.5 (60) and Claude Opus 4.7 (57). On GDPval-AA agentic evaluation, the model reached ELO 1500, a 321-point improvement over Grok 4.20 (1179). Additional scores: τ²-Bench Telecom 98%, IFBench 81%. The release is positioned as a price-and-speed option for agentic applications rather than a frontier model by benchmark rank.

Numbers / claims:

Input price $1.25/M tokens → Artificial Analysis primary
Intelligence Index 53 → Artificial Analysis (independent benchmark)
GDPval-AA ELO 1500, delta +321 → Artificial Analysis
τ²-Bench Telecom 98%, IFBench 81% → Artificial Analysis

Cross-references:

What this means for users: Grok 4.3 is among the most cost-competitive frontier-tier models for agentic tasks at current pricing. The GDPval-AA improvement is notable for long-sequence simulation, though the model trails Opus 4.7 in coding benchmarks. Suited for teams prioritizing inference cost over raw benchmark rank.

Caveats: The xAI primary blog was inaccessible during this run (403). Benchmarks are from Artificial Analysis (independent), which is appropriate for comparative claims. xAI’s own hallucination-rate claims for prior Grok versions have not been independently replicated; apply similar caution to version 4.3.

Item 6: Alibaba Qwen releases Qwen-Scope — open sparse autoencoder suite for Qwen model interpretability

Primary source: https://x.com/Alibaba_Qwen/status/2049861145574690992 Publication date: 2026-05-01 Author/org: Alibaba / Qwen Team (official X account) Tier grade: T2 (official vendor announcement account) Verification status: verified Methodology: announcement Conflict of interest: vendor self-announcement Category: dev-tools Recency-adjusted weight: 1.0

What happened: The Qwen team released Qwen-Scope on May 1, an open suite of sparse autoencoders (SAEs) trained on the Qwen3 and Qwen3.5 model families. The release includes 14 SAE weight groups across 7 model variants — five dense models (Qwen3-1.7B, 3-8B, 3.5-2B, 3.5-9B, 3.5-27B) and two MoE models (Qwen3-30B-A3B, Qwen3.5-35B-A3B). Practical applications include: output steering by directly manipulating internal learned features without prompt engineering; data classification; suppressing code-switching in multilingual fine-tuning; and controlling repetition loops in long outputs.

Numbers / claims:

14 SAE weight groups across 7 model variants → Qwen team X post
Code-switching ratio reduction exceeding 50% via SASFT method → Qwen team / MarkTechPost (vendor-claimed, methodology disclosed in accompanying research)

Cross-references:

https://www.marktechpost.com/2026/05/01/qwen-ai-releases-qwen-scope-an-open-source-sparse-autoencoders-sae-suite-that-turns-llm-internal-features-into-practical-development-tools/ (T3)

What this means for users: Qwen-Scope offers a practitioner-accessible interpretability layer for the Qwen model family. Steering model outputs by manipulating internal features — rather than through prompting — may be useful where prompt-based steering is unreliable or cost-prohibitive.

Caveats: Code-switching improvement figures are vendor-reported; no independent replication found as of this run. Applies only to Qwen3/Qwen3.5 variants listed; not transferable to other model families.

Item 7: ChatGPT rolls out Advanced Account Security with passkeys and recovery keys

Primary source: https://releasebot.io/updates/openai/chatgpt Publication date: 2026-04-30 Author/org: OpenAI (openai.com/news returned 403 during this run; release sourced via aggregator) Tier grade: T2 (vendor changelog) Verification status: verified Methodology: changelog Conflict of interest: none observed Category: dev-tools Recency-adjusted weight: 1.0

What happened: OpenAI rolled out Advanced Account Security as an opt-in setting for personal ChatGPT accounts on April 30. The feature adds phishing-resistant sign-in options (passkeys, hardware security keys), recovery keys, login notifications, and stricter session controls. OpenAI described the feature as targeted at high-risk users including journalists and public officials.

Numbers / claims:

Feature availability: personal accounts, opt-in → OpenAI release notes via aggregator

Cross-references:

https://releasebot.io/updates/openai (T3 aggregator)

What this means for users: Users in high-risk professions or with elevated accounts can now add hardware-key and passkey authentication. Adoption requires opting in; the default security posture is unchanged.

Caveats: OpenAI primary news page was inaccessible during this run (403). Feature scope across enterprise and Teams plans could not be confirmed from available sources.

Item 8: OpenAI Codex 0.128.0 adds persistent goal workflows and expanded plugin controls

Primary source: https://releasebot.io/updates/openai/codex Publication date: 2026-04-30 Author/org: OpenAI (via release notes aggregator) Tier grade: T2 (vendor changelog) Verification status: verified Methodology: changelog Conflict of interest: none observed Category: agent-framework / dev-tools Recency-adjusted weight: 1.0

What happened: OpenAI released Codex version 0.128.0 on April 30. Key changes: persisted /goal workflows with app-server APIs and TUI controls (create, pause, resume, clear); richer permission profiles with built-in defaults and sandbox CLI selection; improved plugin support including marketplace installation and remote management; fixes for resume and interruption behavior and Windows sandbox edge cases.

Numbers / claims:

Version 0.128.0 → OpenAI release notes via aggregator

Cross-references:

https://releasebot.io/updates/openai (T3 aggregator)

What this means for users: Persistent goal workflows allow Codex agents to maintain stateful objectives across sessions — a step toward longer-horizon agentic coding tasks. Plugin marketplace installation reduces integration friction for third-party tooling.

Caveats: OpenAI primary changelog was inaccessible during this run. Capability depth of persistent goals (maximum duration, context limits) could not be confirmed from aggregator data alone.

Item 9: Research paper finds that fine-tuning frequently degrades model safety properties

Primary source: https://arxiv.org/abs/2604.24902 Publication date: 2026-04-27 (arXiv); featured on Hugging Face Daily Papers: 2026-05-01 Author/org: Emaan Bilal Khan, Amy Winecoff, Miranda Bogen, Dylan Hadfield-Menell Tier grade: T3 (arXiv preprint — not peer-reviewed) Verification status: secondary Methodology: research-paper Conflict of interest: none observed Category: research-papers Recency-adjusted weight: 1.0

What happened: Researchers studied 100 fine-tuned language models deployed in medical and legal domains, using both general-purpose and domain-specific safety benchmarks. The core finding: fine-tuning induces large, heterogeneous, and often contradictory changes in measured safety. Models frequently improved on some safety benchmarks while degrading on others simultaneously. The paper argues that governance practices relying solely on base-model safety evaluations are insufficient for fine-tuned deployments in high-stakes domains.

Numbers / claims:

100 models studied → arXiv primary source
Domains: medical and legal → arXiv primary
Finding: models improve on some instruments while degrading on others → paraphrased from arXiv abstract

Cross-references:

https://huggingface.co/papers (featured 2026-05-01) (T3)

What this means for users: Organizations deploying fine-tuned models in regulated or high-stakes settings (healthcare, legal, finance) should not assume the base model’s safety evaluation profile carries through. The paper recommends independent re-evaluation of fine-tuned models before deployment.

Caveats: arXiv preprint — not yet peer-reviewed. Findings are preliminary. The specific models studied are not disclosed in the abstract; full methodology requires reading the paper.

Item 10: Claw-Eval-Live benchmark — leading AI agents complete only 66.7% of real-world workflow tasks

Primary source: https://arxiv.org/abs/2604.28139 Publication date: 2026-04-30 (arXiv); featured on Hugging Face Daily Papers: 2026-05-01 Author/org: Chenxin Li, Zhengyang Tang et al. Tier grade: T3 (arXiv preprint — not peer-reviewed) Verification status: secondary Methodology: benchmark Conflict of interest: none observed Category: research-papers Recency-adjusted weight: 1.0

What happened: Claw-Eval-Live is a live agent benchmark covering 105 tasks across business services and local workspace repair, designed to track real-world workflow completion rather than static capability evals. Thirteen frontier models were tested. The leading model passed 66.7% of tasks; no model reached 70%. Persistent failure areas include HR management, multi-system business workflows, and complex cross-tool coordination. Local workspace repair tasks were comparatively easier but remained unsaturated.

Numbers / claims:

Top score 66.7% → arXiv primary
13 frontier models tested → arXiv primary
105 tasks across business services and workspace repair → arXiv primary
No model passes 70% → arXiv primary

Cross-references:

https://huggingface.co/papers (featured 2026-05-01) (T3)

What this means for users: Even the best-performing frontier models today fail approximately 1 in 3 real-world end-to-end workflow tasks. HR and multi-system workflows remain the weakest areas. The live benchmark design will update as workflows evolve, limiting saturation risk.

Caveats: arXiv preprint — not yet peer-reviewed. Model names are not identified in this summary; check the paper for per-model breakdowns. Live benchmarks carry validity concerns if task distribution is not transparently documented.

Item 11: Moonshot AI open-sources FlashKDA CUDA kernels for Kimi Delta Attention

Primary source: https://github.com/MoonshotAI/FlashKDA Publication date: 2026-04-30 Author/org: Moonshot AI Tier grade: T2 (vendor open-source GitHub release) Verification status: verified Methodology: changelog Conflict of interest: vendor release of own kernel Category: dev-tools Recency-adjusted weight: 1.0

What happened: Moonshot AI released FlashKDA under an MIT license on April 30 — a CUTLASS-based CUDA kernel implementation of Kimi Delta Attention (KDA) that serves as a drop-in backend for the flash-linear-attention library. The library is auto-dispatched from flash-linear-attention’s chunk_kda operation, meaning codebases using flash-linear-attention gain the performance improvement without manual wiring. Hardware requirements: SM90+ GPU (NVIDIA H100 class), CUDA 12.9+, PyTorch 2.4+.

Numbers / claims:

Fixed-length prefill speedup: 1.72x over flash-linear-attention baseline on H20 GPUs → vendor benchmark (H20 hardware, vendor-measured)
Variable-length peak speedup: 2.22x → vendor benchmark (H20 hardware)
K = V = 128 dimension requirement → GitHub primary

Cross-references:

https://www.marktechpost.com/2026/04/30/moonshot-ai-open-sources-flashkda-cutlass-kernels-for-kimi-delta-attention-with-variable-length-batching-and-h20-benchmarks/ (T3)

What this means for users: Teams running Kimi Linear or other KDA-based models on H100-class hardware can adopt FlashKDA as a low-effort performance upgrade via the MIT license. Hardware requirement (SM90+) limits applicability to the current NVIDIA GPU generation.

Caveats: Speedup benchmarks are vendor-measured on NVIDIA H20 hardware. Real-world gains will vary by workload and GPU variant. No independent replication found at time of publication.

Conflicts Surfaced

GPT-5.5 benchmark signals (context — outside 24h window): GPT-5.5 (released April 23, outside this window) shows conflicting signals in secondary reporting: the-decoder.com reports elevated hallucination rates relative to competing models; Tom’s Guide testing placed GPT-5.5 behind Claude Opus 4.7 in 7 of 7 test categories; OpenAI’s own announcement emphasized frontier benchmark ranks. The conflict is noted for context; GPT-5.5 was excluded due to publication date.

Grok 4.3 agentic vs. frontier positioning: Artificial Analysis Intelligence Index places Grok 4.3 at 53, behind GPT-5.5 (60) and Claude Opus 4.7 (57). xAI’s GDPval-AA agentic improvement claim (+321 ELO) is consistent across secondary outlets but could not be confirmed against the primary xAI blog (403 during this run). The comparative agentic claim is therefore treated at T3 weight specifically; the pricing and Intelligence Index figures remain T2 from Artificial Analysis.

Limitations

Sources unreachable: xAI blog x.ai/news (403), OpenAI News openai.com/news (403), grok.com/release-notes (403). OpenAI and Grok items relied on a third-party aggregator (releasebot.io). Items sourced this way are treated as T2 for descriptive claims but the xAI primary agentic claim specifically is T3.
Login-walled coverage gap: This run did not access X timelines, Instagram, LinkedIn, or Discord. Items on these channels may have been missed if not also indexed via public search.
mcp-ecosystem: No new MCP specification releases or reference implementation releases were found in the coverage window. The most recent MCP roadmap update found was from March 2026. This category had zero items this period.
SEA/Indonesia coverage gap: One explicit exploratory search for Indonesian and SEA-region AI news found no in-window items. Sources remain US/EU-heavy.
Expanded effective window: Strict 24h ending May 2 yielded only 3 confirmed items (Pentagon, Agent 365, Qwen-Scope). The window was expanded to April 29 to include items actively covered as of May 1. All April 29-30 items were verified as published within those dates and had not been covered in any prior bulletin.
arXiv items need upgrade: Items 9 and 10 are preprints at T3. Both would be upgraded to T1-T2 upon peer-review acceptance. Check ICLR/NeurIPS proceedings in coming months.
Vendor benchmark claims not independently verified: IBM Granite 4.1’s efficiency comparison (8B vs prior 32B MoE) and Moonshot AI FlashKDA speedup figures (1.72x-2.22x) are vendor-measured with no independent replication found as of this run.

Dropped

Title considered	Source	Reason for drop
GPT-5.5 launch	openai.com/index/introducing-gpt-5-5/	Published 2026-04-23 — outside 24h window
NVIDIA Nemotron 3 Nano Omni release	blogs.nvidia.com	Published 2026-04-28 — outside 24h window
Moonshot AI Kimi K2.6 release	marktechpost.com	Published 2026-04-20 — outside 24h window
Anthropic Google/Broadcom compute partnership	anthropic.com/news/google-broadcom-partnership-compute	Published 2026-04-06 — outside 24h window
Microsoft Agent Framework 1.0 GA	devblogs.microsoft.com/agent-framework	Published 2026-04-03 — outside 24h window
Intel OpenVINO 2026.1 release	github.com/openvinotoolkit/openvino	Published 2026-04-08 — outside 24h window
Google Gemini 3.1 Pro launch	blog.google	Published March 2026 — outside 24h window
AMI Labs $1.03B seed round (Yann LeCun)	techcrunch.com	Published 2026-03-09 — outside 24h window
DeepSeek V4 API	deepseek.com	Could not confirm primary source URL or exact publication date within window
Dataiku Kiji Privacy Proxy	solutionsreview.com	Source returned 403; primary source and date unverifiable
US DoL AI Apprenticeship Innovation Portal	dol.gov	Could not confirm exact launch date within window

Search Log

Q: "AI model release announcement May 2026" → 10 results, 3 high-relevance
Q: "OpenAI Anthropic Google AI announcement May 2 2026" → 10 results, 4 high-relevance
Q: "MCP model context protocol update May 2026" → 9 results, 1 high-relevance
Q: "AI agent framework release launch May 1 2 2026" → 9 results, 2 high-relevance
Q: "new LLM model released May 1 2026" → 10 results, 2 high-relevance
Q: "Microsoft Agent 365 launch May 1 2026" → 9 results, 5 high-relevance
Q: "AMI Labs Yann LeCun funding launch 2026" → 9 results, 0 in-window
Q: "AI startup launch funding announcement May 1 2 2026" → 10 results, 1 high-relevance
Q: "AI developer tools release update May 2026" → 10 results, 2 high-relevance
Q: "Grok 4.3 xAI release April 30 2026" → 10 results, 5 high-relevance
Q: "AI policy regulation announcement May 1 2026" → 10 results, 1 high-relevance
Q: "NVIDIA Nemotron 3 Nano Omni model release date 2026" → 9 results, 0 in-window
Q: "Pentagon AI deals 8 companies May 2026 Anthropic excluded" → 10 results, 6 high-relevance
Q: "China court AI employee replacement ruling May 2026" → 10 results, 5 high-relevance
Q: "Qwen-Scope sparse autoencoders release May 2026" → 9 results, 4 high-relevance
Q: "OpenVINO 2026.1 release date May 2026" → 10 results, 0 in-window
Q: "IBM Granite 4.1 release May 2026 AI" → 9 results, 4 high-relevance
Q: "GitHub trending AI repository May 2026" → 10 results, 1 high-relevance
Q: "OpenAI API update May 1 2026" → 9 results, 3 high-relevance
Q: "AI news announcement release May 1 2026" → 10 results, 4 high-relevance
Q: "adversarial AI safety criticism controversy May 1 2026" → 10 results, 1 high-relevance
Q: "Moonshot AI FlashKDA release May 2026" → 9 results, 4 high-relevance
Q: "Grok 4.3 criticism limitations hallucination April 2026" → 10 results, 2 high-relevance
Q: "GPT-5.5 benchmark debunked controversy criticism 2026" → 9 results, 2 high-relevance
Q: "AI Indonesia OR startup AI Asia announcement May 2026" → 10 results, 0 in-window

Total searches: 25 (of which approximately 9 were exploratory or adversarial = 36%)

Suggested Next Runs

Anthropic Pentagon litigation: Active in a California federal court. Outcome affects all eight covered companies’ competitive position for DoD contracts.
GPT-5.5 independent benchmarks: Contradictory signals on hallucination rates and coding performance warrant a focused follow-up once independent replications are published.
Safety Drift paper (arXiv 2604.24902) and Claw-Eval-Live (arXiv 2604.28139): Both preprints at T3. Check for peer-review status and conference submissions in coming months to upgrade verification tier.
mcp-ecosystem quiet period: No new MCP releases found this window. Check modelcontextprotocol.io and GitHub releases next run; stateless HTTP transport variant was under review as of the March 2026 roadmap.