09 May 2026

AI Radar — 09 May 2026

5 items 4 verified 1 secondary 0 rumor 14 sources 42% exploration

OpenAI adds voice reasoning and live translation to its Realtime API; Google ships Gemini 3.1 Flash-Lite to general availability; Coder launches self-hosted agent infrastructure for enterprise teams; Snyk embeds Claude for AI-native security; Harvey open-sources the first agent-native legal benchmark.

Run: 06–09 May 2026 · 28 items reviewed → 5 published · 4 verified · 1 secondary · 0 rumor · 42% exploration · Run timestamp: 2026-05-09

TL;DR

OpenAI GPT-Realtime-2 — three new voice API models bring GPT-5-class reasoning, 70-language live translation, and streaming transcription to developer voice agents. (→ OpenAI GPT-Realtime-2)
Coder Agents beta — self-hosted, model-agnostic agent platform lets enterprise teams run AI coding workflows on their own infrastructure; free beta through September. (→ Coder Agents)
Gemini 3.1 Flash-Lite GA — Google’s fastest Gemini 3 model reaches general availability, priced at $0.25/1M input tokens with sub-second tool-call latency in production deployments. (→ Gemini 3.1 Flash-Lite)
Snyk + Claude — Snyk embeds Anthropic’s Claude into its AI Security Platform for automated vulnerability discovery across code, containers, and AI-generated artifacts. (→ Snyk embeds Claude)
Harvey Legal Agent Benchmark — open-source benchmark with 1,200+ agent tasks across 24 legal practice areas, evaluated against 75,000+ expert rubric criteria; no leaderboard yet. (→ Harvey LAB)

Items

OpenAI ships three voice API models with GPT-5-class reasoning, live translation, and streaming transcription

Source: https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/ · OpenAI · 2026-05-07 Verification: T2 secondary · announcement · dev-tools / agent-framework Tier nuance: Primary URL returned HTTP 403 during this run. TechCrunch (T2) and The Next Web (T2) independently confirmed announcement date and full pricing.

OpenAI released three new models in its Realtime API on 7 May 2026. GPT-Realtime-2 brings GPT-5-class reasoning to live voice interactions, with a 128K context window (up from 32K), parallel tool calls, and adjustable reasoning-effort levels from low through xhigh. GPT-Realtime-Translate supports real-time speech translation across 70+ input languages into 13 output languages while keeping pace with the speaker. GPT-Realtime-Whisper provides streaming speech-to-text transcription as the speaker talks, priced at $0.017/minute.

Why it matters for automation/productivity: Customer service, IVR, and real-time interpretation workflows can now access a voice model that reasons mid-conversation, calls tools in parallel, and recovers from task failures — replacing the need to chain separate transcription, reasoning, and response models.

Key claims:

GPT-Realtime-2: $32/1M audio input tokens, $64/1M audio output tokens → vendor-claimed (TechCrunch T2 corroborating)
GPT-Realtime-Translate: $0.034/minute → vendor-claimed
GPT-Realtime-Whisper: $0.017/minute → vendor-claimed
128K context window, up from 32K → secondary confirmation (multiple outlets)
15.2% improvement on Big Bench Audio benchmark → vendor-claimed benchmark, run at xhigh reasoning effort
Zillow: 69% to 95% call-success rate improvement → vendor-reported case study, not independently verified

Cross-references:

https://techcrunch.com/2026/05/07/openai-launches-new-voice-intelligence-features-in-its-api/ (T2, corroborating)
https://thenextweb.com/news/openai-gpt-realtime-2-voice-models (T2, corroborating)
https://9to5mac.com/2026/05/07/openai-has-new-voice-models-that-reason-translate-and-transcribe-as-you-speak/ (T3, corroborating)

Caveats: Realtime API is not HIPAA-eligible under the OpenAI Business Associate Agreement as of this run; Azure OpenAI text endpoints are eligible but the audio modality is not. Default reasoning effort is low; the Big Bench Audio benchmark was run at xhigh — production performance at default settings will differ. The Zillow call-success figure is a vendor-reported case study. Session drift occurs in long conversations; context trimming is required.

Coder launches self-hosted, model-agnostic enterprise agent platform in beta

Source: https://coder.com/blog/introducing-coder-agents · Coder · 2026-05-06 Verification: T2 verified · announcement · dev-tools / agent-framework

Coder released Coder Agents in beta on 6 May 2026, a platform for running AI coding workflows on self-hosted infrastructure. The platform supports any major AI model provider — Anthropic, OpenAI, Google, and AWS Bedrock, plus OpenAI-compatible self-hosted models — without sending source code or prompts to external services. It exposes a conversational interface and API, centralized model and prompt-policy management, MCP-based tool integrations, sub-agent orchestration, and network-isolated workspace provisioning. Full premium features are available at no cost through September 2026; post-beta pricing has not been announced.

Why it matters for automation/productivity: Organizations with strict code-governance or data-residency requirements can run coding agents entirely within their network perimeter. The MCP and sub-agent architecture allows existing internal tooling to connect to the agent workflow without custom integration code.

Key claims:

Model-agnostic: Anthropic, OpenAI, Google, AWS Bedrock, OpenAI-compatible self-hosted → coder.com primary (T2)
Free beta through September 2026 → coder.com primary (T2)
70% of companies deploy agents on infrastructure not designed for them → Coder-cited internal research (methodology not published — T4)

Cross-references:

Caveats: Beta — API may have breaking changes before GA. The 70% statistic on agent infrastructure mismatch is from Coder’s own unpublished research. Pricing after September 2026 not announced.

Google ships Gemini 3.1 Flash-Lite to general availability

Source: https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-flash-lite-is-now-generally-available · Google Cloud · 2026-05-08 Verification: T2 verified · announcement · model-release

Google moved Gemini 3.1 Flash-Lite to general availability on 8 May 2026 via the Gemini API and Vertex AI, two months after its March 2026 preview. The model targets high-volume, latency-sensitive tasks: tool calling, content moderation, and orchestration steps in agentic pipelines. According to its preview announcement, it is priced at $0.25/1M input tokens and $1.50/1M output tokens, with 2.5× faster time-to-first-answer and 45% higher output speed compared to Gemini 2.5 Flash. Gladly, a customer service platform, reported approximately 60% lower costs versus thinking models and a p95 classifier-call latency under one second in production.

Why it matters for automation/productivity: At $0.25/1M input tokens, Flash-Lite is among the lower-cost options for high-volume steps in agentic pipelines — classification, routing, and lightweight orchestration where a full reasoning model would be cost-prohibitive.

Key claims:

$0.25/1M input, $1.50/1M output → vendor-claimed (from March 2026 preview blog; GA pricing not confirmed in Cloud blog — verify before production use)
2.5× faster TTFT vs Gemini 2.5 Flash → vendor-claimed
45% faster output generation vs Gemini 2.5 Flash → vendor-claimed
Gladly: ~60% lower cost vs thinking models, p95 classifier latency <1s → vendor-reported case study

Cross-references:

https://www.testingcatalog.com/google-launches-gemini-3-1-flash-lite-in-general-availability/ (T3, corroborating — confirmed May 8 GA date)
https://artificialanalysis.ai/models/gemini-3-1-flash-lite-preview (T2, independent benchmark — FACTS score 40.6%, TTFT 5.46s)

Caveats: Pricing cited from the March 2026 preview announcement; the GA blog post did not republish pricing — verify against current Google Cloud pricing page before committing to production. Independent FACTS benchmark score: 40.6% for Flash-Lite vs 50.4% for Gemini 3.0 Flash Dynamic — the model trades factual grounding for speed. Independent TTFT measured at 5.46 seconds (Artificial Analysis), above the median for its price tier at 2.02 seconds. Gladly case-study figures are vendor-reported.

Snyk embeds Claude into AI Security Platform for vulnerability discovery across code and AI artifacts

Source: https://www.helpnetsecurity.com/2026/05/08/snyk-ai-security-platform/ · Help Net Security (reporting on Snyk press release) · 2026-05-08 Verification: T2 secondary · announcement · dev-tools / ai-for-business Tier nuance: Snyk press release issued via Globenewswire; Help Net Security and Yahoo Finance both published May 8 coverage. Direct snyk.io primary not retrieved in this run.

Snyk announced on 8 May 2026 that it has embedded Anthropic’s Claude models across its AI Security Platform to power automated vulnerability discovery, prioritization, and developer-ready fix generation. Coverage extends across code, open-source dependencies, containers, and AI-generated artifacts. Evo by Snyk uses Claude to continuously discover AI assets — models, agents, MCP servers, datasets, and third-party tools — across an organization’s environment, enabling AI governance workflows alongside traditional SAST and SCA scanning. The integration is available to current joint Snyk-Anthropic customers; broader rollout continues through 2026.

Why it matters for automation/productivity: Development teams using Snyk alongside Claude Code or other AI coding assistants can now surface and remediate security issues within the same AI-assisted development loop. The MCP server discovery capability within Evo is particularly relevant for teams building or adopting MCP-connected tooling, where AI-generated artifact security is a known gap.

Key claims:

Coverage: code, dependencies, containers, AI-generated artifacts → Snyk press release (T2 secondary)
Evo discovers models, agents, MCP servers, datasets, third-party tools → Snyk press release (T2 secondary)
Available to joint customers today; broader rollout through 2026 → press release (T2 secondary)

Cross-references:

Caveats: No independent benchmark comparing Snyk+Claude detection rates against other AI-powered security tools was available at launch. The current-customers-only launch means new customers may face enterprise sales cycles before access. Direct snyk.io primary not fetched — verification relies on press release distribution and trade press corroboration.

Harvey open-sources Legal Agent Benchmark with 1,200 agent tasks across 24 practice areas

Source: https://www.harvey.ai/blog/introducing-harveys-legal-agent-benchmark · Harvey · 2026-05-06 Verification: T2 verified · announcement · research-papers / ai-for-business COI: Harvey is both the benchmark creator and a commercial legal AI vendor — potential bias in task selection and rubric design.

Harvey released the Legal Agent Benchmark (LAB) on 6 May 2026, in partnership with Artificial Analysis. LAB is an open-source benchmark on GitHub containing 1,200+ agent tasks spanning 24 legal practice areas — transactional, advisory, regulatory, and litigation — with 75,000+ expert-written rubric criteria. Each task mirrors law firm associate work: a short instruction averaging 50 words, a file environment mixing relevant and peripheral documents, and a required legal deliverable. Evaluation uses all-pass grading, where partial task completion scores zero. The initial release does not include a leaderboard; baseline results and submission standards are forthcoming.

Why it matters for automation/productivity: Legal teams evaluating AI agents for document drafting, contract review, or regulatory analysis now have a domain-specific benchmark for comparing models on tasks that reflect actual law firm conditions rather than generic reasoning tests. For AI vendors pursuing legal sector sales, LAB provides a credible evaluation surface beyond general benchmarks.

Key claims:

1,200+ agent tasks → harvey.ai primary (T2)
24 legal practice areas → harvey.ai primary (T2)
75,000+ expert rubric criteria → harvey.ai primary (T2)
Open-source on GitHub: https://github.com/harveyai/harvey-labs → confirmed

Cross-references:

https://www.artificiallawyer.com/2026/05/06/harvey-launches-legal-agent-bench/ (T2, corroborating)
https://x.com/ArtificialAnlys/status/2052145762650431840 (T3, discovery — Artificial Analysis partnership announcement)

Caveats: No leaderboard published at launch; baseline results and submission standards described as forthcoming. Harvey is a commercial legal AI vendor, representing a conflict of interest in benchmark design and rubric selection. Benchmark methodology was not independently peer-reviewed at launch.

Dropped

Items considered but not published, with reason:

Title considered	Source	Reason
Meta Muse Spark LLM launch	ai.meta.com · 2026-04-08	Outside window — April 8
Anthropic Project Glasswing / Claude Mythos	anthropic.com · 2026-04-07	Outside window — April 7
Microsoft Agent Framework 1.0 GA	devblogs.microsoft.com · 2026-04-03	Outside window — April 3
Microsoft AI Diffusion 2026 Survey	blogs.microsoft.com · 2026-05-07	Already covered in 2026-05-08 radar
AWS MCP Server GA	aws.amazon.com · 2026-05-06	Already covered in 2026-05-08 radar
NIST CAISI classified AI evaluation	nextgov.com · 2026-05-05	Already covered in 2026-05-08 radar
OpenAI ChatGPT Ads Manager self-serve	openai.com · 2026-05-05	Already covered in 2026-05-08 radar
Claude Managed Agents (Dreaming, Outcomes, Orchestration)	anthropic.com · 2026-05-06	Already covered in 2026-05-07 radar
Anthropic / SpaceX Colossus compute deal	anthropic.com · 2026-05-06	Already covered in 2026-05-07 radar
Anthropic Claude usage limit expansion	anthropic.com · 2026-05-06	Same blog as SpaceX deal; covered in 2026-05-07 radar
MCP 2026 Roadmap	blog.modelcontextprotocol.io · 2026-03-09	Outside window — March 9
OpenAI GPT-5.5 Instant new ChatGPT default	openai.com · 2026-05-05	Already covered in 2026-05-07 radar
xAI Grok 4.3 API GA	x.ai · API rollout 2026-04-30	API rollout complete April 30 (before window); May 6 user email was notification of already-shipped change
Google Gemini 3.1 Ultra	date unconfirmed	No primary source with confirmed May 6–9 publication date found; excluded pending confirmation
Jama Connect MCP Server	globenewswire.com · 2026-05-04	Outside window — May 4
Opsera-Cursor DevSecOps Agents partnership	sdtimes.com · 2026-05-08	Low standalone BD signal; no primary from opsera.io; adjacent to Coder Agents item
Precisely Data Integrity Suite MCP APIs	sdtimes.com · 2026-05-08	Enterprise data tool; low BD signal for general audience; no primary from precisely.com
Hugging Face WAVe 1B multimodal model	huggingface.co · 2026-05-05	Outside window — May 5
Microsoft Azure App Service + MAF integration blog	techcommunity.microsoft.com · 2026-05-08	Original MAF shipped April 3; blog is secondary integration guidance, not a new product
Scout AI $100M Series A raise	crunchbase.com · May 2026	Funding news; no primary from scoutai.com; low actionability
Rhoda AI FutureVision launch	crescendo.ai roundup · undated	In-window date unconfirmed; no primary source found
Harvey $11B valuation funding round	harvey.ai · 2026-05-05	Outside window — May 5; covered separately from LAB; funding news, low actionability
Snyk Studio (Claude Code security integration)	snyk.io · 2026-02-23	Outside window — February 2026; distinct from May 8 partnership announcement

Limitations

Sources unreachable: openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/ returned HTTP 403 during this run. The OpenAI voice models item relies on secondary confirmation from TechCrunch and The Next Web (both T2), with verification status downgraded to secondary accordingly. The snyk.io primary blog post for the May 8 announcement was not retrieved; item relies on Help Net Security trade press coverage and Globenewswire press release distribution.
Gemini 3.1 Flash-Lite GA pricing gap: The Google Cloud blog post announcing GA on May 8 did not republish pricing details. Figures cited ($0.25/$1.50 per million tokens) come from the March 2026 preview announcement on blog.google and may not reflect GA pricing. Readers should verify against the current Google Cloud pricing page before production use.
No primary for Gemini 3.1 Ultra: Search results referenced a Gemini 3.1 Ultra launch, but no primary source with a confirmed May 6–9 publication date was found. Excluded pending confirmation.
Harvey LAB conflict of interest: Harvey is a commercial legal AI vendor and the sole author of LAB’s task definitions, rubric criteria, and evaluation methodology. The benchmark was not independently peer-reviewed at launch. Claims about task representativeness and rubric quality should be treated as vendor-asserted until independent replication.
Login-walled coverage: X timelines, Instagram, LinkedIn private feeds, and Discord were not accessed directly. Public X posts indexed by search engines were captured. Social discovery searches yielded no high-signal in-window items beyond what was found via primary sources.
Window overlap with May 8 bulletin: The 72h window (May 6–9) overlaps substantially with the May 8 bulletin’s window (May 5–8). Eleven items from the overlap period were already covered in prior bulletins and appear in the Dropped section rather than being republished.
Geographic bias: An Indonesian/SEA-language search (AI Indonesia startup model Mei 2026) returned no in-window local AI product launches from Indonesian or SEA-based vendors. Coverage this run is US/EU-dominated. This structural gap persists across runs.
Items requiring upgrade: The OpenAI voice models item (T2 secondary, verification 403) would upgrade to T2 verified once the primary openai.com post is accessible. No rumor-status items this run.
No MCP-ecosystem or policy-regulation items: Searches against MCP aggregators and policy sources yielded no in-window items not already covered in prior bulletins.

Search log (compact)

Query	Yield	Type
Anthropic Claude announcement May 2026	10 results, 5 high-rel	registry
OpenAI GPT release announcement May 2026	10 results, 6 high-rel	registry
Google DeepMind Gemini AI announcement May 2026	10 results, 4 high-rel	registry
OpenAI GPT-Realtime voice models API release May 7 2026	10 results, 8 high-rel	registry
MCP Model Context Protocol new server release May 2026	10 results, 3 high-rel	registry
fetch: openai.com advancing-voice-intelligence	HTTP 403	registry
fetch: techcommunity.microsoft.com MAF 1.0	partial content only	registry
AI agent framework launch May 8 9 2026	10 results, 3 high-rel	exploratory
AI dev tools coding assistant update Cursor Claude Code May 2026	10 results, 4 high-rel	exploratory
fetch: sdtimes.com May 8 AI updates	confirmed Coder Agents, Snyk	exploratory
fetch: blog.modelcontextprotocol.io 2026-mcp-roadmap	March 9 — outside window	registry
new AI model release May 8 9 2026	10 results, 3 high-rel	exploratory
Meta Muse Spark LLM launch date May 2026	confirmed April 8 — outside window	exploratory
Anthropic Project Glasswing Claude Mythos launch date May 2026	confirmed April 7 — outside window	registry
fetch: llm-stats.com/llm-updates	Grok 4.3 API rollout confirmed April 30	exploratory
fetch: thenextweb.com openai-gpt-realtime-2-voice-models	T2 confirmed May 8 date	registry
Gemini 3.1 Flash-Lite release date Google May 2026	confirmed GA May 7–8	registry
fetch: cloud.google.com/blog gemini-3-1-flash-lite GA	T2 confirmed May 8	registry
fetch: blog.google/innovation-and-ai gemini-3-1-flash-lite	March 3 preview — pricing confirmed	registry
Coder Agents beta launch enterprise self-hosted May 8 2026	confirmed May 6 via globenewswire	exploratory
AI announcement OR AI launch May 9 2026	10 results, 2 high-rel	exploratory
fetch: anthropic.com/news	SpaceX / rate limits May 6 (covered)	registry
AI Indonesia startup model rilis Mei 2026	10 results, 0 in-window local launches	exploratory — cross-language
OpenAI realtime voice GPT limitations criticism May 2026	HIPAA gap, default reasoning confirmed	adversarial
Gemini 3.1 Flash-Lite independent benchmark review criticism May 2026	FACTS 40.6%, TTFT 5.46s	adversarial
Coder Agents self-hosted enterprise limitation May 2026	beta API instability confirmed	adversarial
fetch: techcrunch.com OpenAI voice May 7 2026	T2 confirmed May 7 date and pricing	registry
Snyk Claude Anthropic integration partnership launch date May 2026	confirmed May 8	exploratory
Harvey Legal Agent Benchmark Artificial Analysis launch date 2026	confirmed May 6	exploratory
fetch: harvey.ai/blog introducing-harveys-legal-agent-benchmark	T2 confirmed May 6	exploratory
fetch: snyk.io/articles anthropic-launches-claude-code-security	February 2026 — wrong article	exploratory
xAI Grok 4.3 release date exact May 2026	API rollout April 30; email May 6	registry
AI productivity tools launch release May 8 9 2026	10 results, 2 high-rel	exploratory
fetch: crescendo.ai/news latest-ai-news-and-updates	May 4 cutoff — no May 6–9 items	exploratory
GitHub trending AI repositories week May 2026	discovery only — no primary items	exploratory
AI startup funding launch product May 8 9 2026	10 results, 2 high-rel	exploratory
Product Hunt AI launch week May 8 2026	10 results, 1 high-rel	exploratory
xAI Grok 4.3 API pricing capabilities agentic May 2026	rollout April 30 confirmed	registry

Total searches: 38, of which 16 exploratory or adversarial (42%).

Suggested next runs

OpenAI Realtime API HIPAA eligibility — The audio modality’s exclusion from the OpenAI BAA blocks health-sector voice agent deployments. Track if this changes; it is a meaningful gate for regulated industry use cases.
Harvey LAB leaderboard — Baseline results and submission standards are forthcoming. The first leaderboard will show which foundation models and agent frameworks perform best on legal tasks — worth tracking for organizations evaluating legal AI.
Gemini 3.1 Flash-Lite GA pricing confirmation — The Cloud blog did not republish pricing at GA. Verify against current Vertex AI pricing page before production commitments; preview pricing may not carry through.