AI Radar — 09 May 2026
OpenAI adds voice reasoning and live translation to its Realtime API; Google ships Gemini 3.1 Flash-Lite to general availability; Coder launches self-hosted agent infrastructure for enterprise teams; Snyk embeds Claude for AI-native security; Harvey open-sources the first agent-native legal benchmark.
Run: 06–09 May 2026 · 28 items reviewed → 5 published · 4 verified · 1 secondary · 0 rumor · 42% exploration · Run timestamp: 2026-05-09
TL;DR
- OpenAI GPT-Realtime-2 — three new voice API models bring GPT-5-class reasoning, 70-language live translation, and streaming transcription to developer voice agents. (→ OpenAI GPT-Realtime-2)
- Coder Agents beta — self-hosted, model-agnostic agent platform lets enterprise teams run AI coding workflows on their own infrastructure; free beta through September. (→ Coder Agents)
- Gemini 3.1 Flash-Lite GA — Google’s fastest Gemini 3 model reaches general availability, priced at $0.25/1M input tokens with sub-second tool-call latency in production deployments. (→ Gemini 3.1 Flash-Lite)
- Snyk + Claude — Snyk embeds Anthropic’s Claude into its AI Security Platform for automated vulnerability discovery across code, containers, and AI-generated artifacts. (→ Snyk embeds Claude)
- Harvey Legal Agent Benchmark — open-source benchmark with 1,200+ agent tasks across 24 legal practice areas, evaluated against 75,000+ expert rubric criteria; no leaderboard yet. (→ Harvey LAB)
Items
OpenAI ships three voice API models with GPT-5-class reasoning, live translation, and streaming transcription
Source: https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/ · OpenAI · 2026-05-07 Verification: T2 secondary · announcement · dev-tools / agent-framework Tier nuance: Primary URL returned HTTP 403 during this run. TechCrunch (T2) and The Next Web (T2) independently confirmed announcement date and full pricing.
OpenAI released three new models in its Realtime API on 7 May 2026. GPT-Realtime-2 brings GPT-5-class reasoning to live voice interactions, with a 128K context window (up from 32K), parallel tool calls, and adjustable reasoning-effort levels from low through xhigh. GPT-Realtime-Translate supports real-time speech translation across 70+ input languages into 13 output languages while keeping pace with the speaker. GPT-Realtime-Whisper provides streaming speech-to-text transcription as the speaker talks, priced at $0.017/minute.
Why it matters for automation/productivity: Customer service, IVR, and real-time interpretation workflows can now access a voice model that reasons mid-conversation, calls tools in parallel, and recovers from task failures — replacing the need to chain separate transcription, reasoning, and response models.
Key claims:
- GPT-Realtime-2: $32/1M audio input tokens, $64/1M audio output tokens → vendor-claimed (TechCrunch T2 corroborating)
- GPT-Realtime-Translate: $0.034/minute → vendor-claimed
- GPT-Realtime-Whisper: $0.017/minute → vendor-claimed
- 128K context window, up from 32K → secondary confirmation (multiple outlets)
- 15.2% improvement on Big Bench Audio benchmark → vendor-claimed benchmark, run at
xhighreasoning effort - Zillow: 69% to 95% call-success rate improvement → vendor-reported case study, not independently verified
Cross-references:
- https://techcrunch.com/2026/05/07/openai-launches-new-voice-intelligence-features-in-its-api/ (T2, corroborating)
- https://thenextweb.com/news/openai-gpt-realtime-2-voice-models (T2, corroborating)
- https://9to5mac.com/2026/05/07/openai-has-new-voice-models-that-reason-translate-and-transcribe-as-you-speak/ (T3, corroborating)
Caveats: Realtime API is not HIPAA-eligible under the OpenAI Business Associate Agreement as of this run; Azure OpenAI text endpoints are eligible but the audio modality is not. Default reasoning effort is low; the Big Bench Audio benchmark was run at xhigh — production performance at default settings will differ. The Zillow call-success figure is a vendor-reported case study. Session drift occurs in long conversations; context trimming is required.
Coder launches self-hosted, model-agnostic enterprise agent platform in beta
Source: https://coder.com/blog/introducing-coder-agents · Coder · 2026-05-06 Verification: T2 verified · announcement · dev-tools / agent-framework
Coder released Coder Agents in beta on 6 May 2026, a platform for running AI coding workflows on self-hosted infrastructure. The platform supports any major AI model provider — Anthropic, OpenAI, Google, and AWS Bedrock, plus OpenAI-compatible self-hosted models — without sending source code or prompts to external services. It exposes a conversational interface and API, centralized model and prompt-policy management, MCP-based tool integrations, sub-agent orchestration, and network-isolated workspace provisioning. Full premium features are available at no cost through September 2026; post-beta pricing has not been announced.
Why it matters for automation/productivity: Organizations with strict code-governance or data-residency requirements can run coding agents entirely within their network perimeter. The MCP and sub-agent architecture allows existing internal tooling to connect to the agent workflow without custom integration code.
Key claims:
- Model-agnostic: Anthropic, OpenAI, Google, AWS Bedrock, OpenAI-compatible self-hosted → coder.com primary (T2)
- Free beta through September 2026 → coder.com primary (T2)
- 70% of companies deploy agents on infrastructure not designed for them → Coder-cited internal research (methodology not published — T4)
Cross-references:
- https://www.globenewswire.com/news-release/2026/05/06/3288916/0/en/Coder-Sets-a-New-Standard-for-AI-Coding-with-Self-Hosted-AI-Model-Agnostic-Coder-Agents.html (T2, corroborating)
- https://sdtimes.com/ai/may-8-2026-ai-updates-from-the-past-week-coder-agents-launch-snyk-claude-partnership-opsera-cursor-partnership-and-more/ (T3, corroborating)
Caveats: Beta — API may have breaking changes before GA. The 70% statistic on agent infrastructure mismatch is from Coder’s own unpublished research. Pricing after September 2026 not announced.
Google ships Gemini 3.1 Flash-Lite to general availability
Source: https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-flash-lite-is-now-generally-available · Google Cloud · 2026-05-08 Verification: T2 verified · announcement · model-release
Google moved Gemini 3.1 Flash-Lite to general availability on 8 May 2026 via the Gemini API and Vertex AI, two months after its March 2026 preview. The model targets high-volume, latency-sensitive tasks: tool calling, content moderation, and orchestration steps in agentic pipelines. According to its preview announcement, it is priced at $0.25/1M input tokens and $1.50/1M output tokens, with 2.5× faster time-to-first-answer and 45% higher output speed compared to Gemini 2.5 Flash. Gladly, a customer service platform, reported approximately 60% lower costs versus thinking models and a p95 classifier-call latency under one second in production.
Why it matters for automation/productivity: At $0.25/1M input tokens, Flash-Lite is among the lower-cost options for high-volume steps in agentic pipelines — classification, routing, and lightweight orchestration where a full reasoning model would be cost-prohibitive.
Key claims:
- $0.25/1M input, $1.50/1M output → vendor-claimed (from March 2026 preview blog; GA pricing not confirmed in Cloud blog — verify before production use)
- 2.5× faster TTFT vs Gemini 2.5 Flash → vendor-claimed
- 45% faster output generation vs Gemini 2.5 Flash → vendor-claimed
- Gladly: ~60% lower cost vs thinking models, p95 classifier latency <1s → vendor-reported case study
Cross-references:
- https://www.testingcatalog.com/google-launches-gemini-3-1-flash-lite-in-general-availability/ (T3, corroborating — confirmed May 8 GA date)
- https://artificialanalysis.ai/models/gemini-3-1-flash-lite-preview (T2, independent benchmark — FACTS score 40.6%, TTFT 5.46s)
Caveats: Pricing cited from the March 2026 preview announcement; the GA blog post did not republish pricing — verify against current Google Cloud pricing page before committing to production. Independent FACTS benchmark score: 40.6% for Flash-Lite vs 50.4% for Gemini 3.0 Flash Dynamic — the model trades factual grounding for speed. Independent TTFT measured at 5.46 seconds (Artificial Analysis), above the median for its price tier at 2.02 seconds. Gladly case-study figures are vendor-reported.
Snyk embeds Claude into AI Security Platform for vulnerability discovery across code and AI artifacts
Source: https://www.helpnetsecurity.com/2026/05/08/snyk-ai-security-platform/ · Help Net Security (reporting on Snyk press release) · 2026-05-08 Verification: T2 secondary · announcement · dev-tools / ai-for-business Tier nuance: Snyk press release issued via Globenewswire; Help Net Security and Yahoo Finance both published May 8 coverage. Direct snyk.io primary not retrieved in this run.
Snyk announced on 8 May 2026 that it has embedded Anthropic’s Claude models across its AI Security Platform to power automated vulnerability discovery, prioritization, and developer-ready fix generation. Coverage extends across code, open-source dependencies, containers, and AI-generated artifacts. Evo by Snyk uses Claude to continuously discover AI assets — models, agents, MCP servers, datasets, and third-party tools — across an organization’s environment, enabling AI governance workflows alongside traditional SAST and SCA scanning. The integration is available to current joint Snyk-Anthropic customers; broader rollout continues through 2026.
Why it matters for automation/productivity: Development teams using Snyk alongside Claude Code or other AI coding assistants can now surface and remediate security issues within the same AI-assisted development loop. The MCP server discovery capability within Evo is particularly relevant for teams building or adopting MCP-connected tooling, where AI-generated artifact security is a known gap.
Key claims:
- Coverage: code, dependencies, containers, AI-generated artifacts → Snyk press release (T2 secondary)
- Evo discovers models, agents, MCP servers, datasets, third-party tools → Snyk press release (T2 secondary)
- Available to joint customers today; broader rollout through 2026 → press release (T2 secondary)
Cross-references:
- https://finance.yahoo.com/sectors/technology/articles/snyk-embeds-anthropics-claude-advance-174900036.html (T3, corroborating)
- https://sdtimes.com/ai/may-8-2026-ai-updates-from-the-past-week-coder-agents-launch-snyk-claude-partnership-opsera-cursor-partnership-and-more/ (T3, corroborating)
Caveats: No independent benchmark comparing Snyk+Claude detection rates against other AI-powered security tools was available at launch. The current-customers-only launch means new customers may face enterprise sales cycles before access. Direct snyk.io primary not fetched — verification relies on press release distribution and trade press corroboration.
Harvey open-sources Legal Agent Benchmark with 1,200 agent tasks across 24 practice areas
Source: https://www.harvey.ai/blog/introducing-harveys-legal-agent-benchmark · Harvey · 2026-05-06 Verification: T2 verified · announcement · research-papers / ai-for-business COI: Harvey is both the benchmark creator and a commercial legal AI vendor — potential bias in task selection and rubric design.
Harvey released the Legal Agent Benchmark (LAB) on 6 May 2026, in partnership with Artificial Analysis. LAB is an open-source benchmark on GitHub containing 1,200+ agent tasks spanning 24 legal practice areas — transactional, advisory, regulatory, and litigation — with 75,000+ expert-written rubric criteria. Each task mirrors law firm associate work: a short instruction averaging 50 words, a file environment mixing relevant and peripheral documents, and a required legal deliverable. Evaluation uses all-pass grading, where partial task completion scores zero. The initial release does not include a leaderboard; baseline results and submission standards are forthcoming.
Why it matters for automation/productivity: Legal teams evaluating AI agents for document drafting, contract review, or regulatory analysis now have a domain-specific benchmark for comparing models on tasks that reflect actual law firm conditions rather than generic reasoning tests. For AI vendors pursuing legal sector sales, LAB provides a credible evaluation surface beyond general benchmarks.
Key claims:
- 1,200+ agent tasks → harvey.ai primary (T2)
- 24 legal practice areas → harvey.ai primary (T2)
- 75,000+ expert rubric criteria → harvey.ai primary (T2)
- Open-source on GitHub: https://github.com/harveyai/harvey-labs → confirmed
Cross-references:
- https://www.artificiallawyer.com/2026/05/06/harvey-launches-legal-agent-bench/ (T2, corroborating)
- https://x.com/ArtificialAnlys/status/2052145762650431840 (T3, discovery — Artificial Analysis partnership announcement)
Caveats: No leaderboard published at launch; baseline results and submission standards described as forthcoming. Harvey is a commercial legal AI vendor, representing a conflict of interest in benchmark design and rubric selection. Benchmark methodology was not independently peer-reviewed at launch.
Dropped
Items considered but not published, with reason:
| Title considered | Source | Reason |
|---|---|---|
| Meta Muse Spark LLM launch | ai.meta.com · 2026-04-08 | Outside window — April 8 |
| Anthropic Project Glasswing / Claude Mythos | anthropic.com · 2026-04-07 | Outside window — April 7 |
| Microsoft Agent Framework 1.0 GA | devblogs.microsoft.com · 2026-04-03 | Outside window — April 3 |
| Microsoft AI Diffusion 2026 Survey | blogs.microsoft.com · 2026-05-07 | Already covered in 2026-05-08 radar |
| AWS MCP Server GA | aws.amazon.com · 2026-05-06 | Already covered in 2026-05-08 radar |
| NIST CAISI classified AI evaluation | nextgov.com · 2026-05-05 | Already covered in 2026-05-08 radar |
| OpenAI ChatGPT Ads Manager self-serve | openai.com · 2026-05-05 | Already covered in 2026-05-08 radar |
| Claude Managed Agents (Dreaming, Outcomes, Orchestration) | anthropic.com · 2026-05-06 | Already covered in 2026-05-07 radar |
| Anthropic / SpaceX Colossus compute deal | anthropic.com · 2026-05-06 | Already covered in 2026-05-07 radar |
| Anthropic Claude usage limit expansion | anthropic.com · 2026-05-06 | Same blog as SpaceX deal; covered in 2026-05-07 radar |
| MCP 2026 Roadmap | blog.modelcontextprotocol.io · 2026-03-09 | Outside window — March 9 |
| OpenAI GPT-5.5 Instant new ChatGPT default | openai.com · 2026-05-05 | Already covered in 2026-05-07 radar |
| xAI Grok 4.3 API GA | x.ai · API rollout 2026-04-30 | API rollout complete April 30 (before window); May 6 user email was notification of already-shipped change |
| Google Gemini 3.1 Ultra | date unconfirmed | No primary source with confirmed May 6–9 publication date found; excluded pending confirmation |
| Jama Connect MCP Server | globenewswire.com · 2026-05-04 | Outside window — May 4 |
| Opsera-Cursor DevSecOps Agents partnership | sdtimes.com · 2026-05-08 | Low standalone BD signal; no primary from opsera.io; adjacent to Coder Agents item |
| Precisely Data Integrity Suite MCP APIs | sdtimes.com · 2026-05-08 | Enterprise data tool; low BD signal for general audience; no primary from precisely.com |
| Hugging Face WAVe 1B multimodal model | huggingface.co · 2026-05-05 | Outside window — May 5 |
| Microsoft Azure App Service + MAF integration blog | techcommunity.microsoft.com · 2026-05-08 | Original MAF shipped April 3; blog is secondary integration guidance, not a new product |
| Scout AI $100M Series A raise | crunchbase.com · May 2026 | Funding news; no primary from scoutai.com; low actionability |
| Rhoda AI FutureVision launch | crescendo.ai roundup · undated | In-window date unconfirmed; no primary source found |
| Harvey $11B valuation funding round | harvey.ai · 2026-05-05 | Outside window — May 5; covered separately from LAB; funding news, low actionability |
| Snyk Studio (Claude Code security integration) | snyk.io · 2026-02-23 | Outside window — February 2026; distinct from May 8 partnership announcement |
Limitations
- Sources unreachable: openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/ returned HTTP 403 during this run. The OpenAI voice models item relies on secondary confirmation from TechCrunch and The Next Web (both T2), with verification status downgraded to secondary accordingly. The snyk.io primary blog post for the May 8 announcement was not retrieved; item relies on Help Net Security trade press coverage and Globenewswire press release distribution.
- Gemini 3.1 Flash-Lite GA pricing gap: The Google Cloud blog post announcing GA on May 8 did not republish pricing details. Figures cited ($0.25/$1.50 per million tokens) come from the March 2026 preview announcement on blog.google and may not reflect GA pricing. Readers should verify against the current Google Cloud pricing page before production use.
- No primary for Gemini 3.1 Ultra: Search results referenced a Gemini 3.1 Ultra launch, but no primary source with a confirmed May 6–9 publication date was found. Excluded pending confirmation.
- Harvey LAB conflict of interest: Harvey is a commercial legal AI vendor and the sole author of LAB’s task definitions, rubric criteria, and evaluation methodology. The benchmark was not independently peer-reviewed at launch. Claims about task representativeness and rubric quality should be treated as vendor-asserted until independent replication.
- Login-walled coverage: X timelines, Instagram, LinkedIn private feeds, and Discord were not accessed directly. Public X posts indexed by search engines were captured. Social discovery searches yielded no high-signal in-window items beyond what was found via primary sources.
- Window overlap with May 8 bulletin: The 72h window (May 6–9) overlaps substantially with the May 8 bulletin’s window (May 5–8). Eleven items from the overlap period were already covered in prior bulletins and appear in the Dropped section rather than being republished.
- Geographic bias: An Indonesian/SEA-language search (AI Indonesia startup model Mei 2026) returned no in-window local AI product launches from Indonesian or SEA-based vendors. Coverage this run is US/EU-dominated. This structural gap persists across runs.
- Items requiring upgrade: The OpenAI voice models item (T2 secondary, verification 403) would upgrade to T2 verified once the primary openai.com post is accessible. No rumor-status items this run.
- No MCP-ecosystem or policy-regulation items: Searches against MCP aggregators and policy sources yielded no in-window items not already covered in prior bulletins.
Search log (compact)
| Query | Yield | Type |
|---|---|---|
| Anthropic Claude announcement May 2026 | 10 results, 5 high-rel | registry |
| OpenAI GPT release announcement May 2026 | 10 results, 6 high-rel | registry |
| Google DeepMind Gemini AI announcement May 2026 | 10 results, 4 high-rel | registry |
| OpenAI GPT-Realtime voice models API release May 7 2026 | 10 results, 8 high-rel | registry |
| MCP Model Context Protocol new server release May 2026 | 10 results, 3 high-rel | registry |
| fetch: openai.com advancing-voice-intelligence | HTTP 403 | registry |
| fetch: techcommunity.microsoft.com MAF 1.0 | partial content only | registry |
| AI agent framework launch May 8 9 2026 | 10 results, 3 high-rel | exploratory |
| AI dev tools coding assistant update Cursor Claude Code May 2026 | 10 results, 4 high-rel | exploratory |
| fetch: sdtimes.com May 8 AI updates | confirmed Coder Agents, Snyk | exploratory |
| fetch: blog.modelcontextprotocol.io 2026-mcp-roadmap | March 9 — outside window | registry |
| new AI model release May 8 9 2026 | 10 results, 3 high-rel | exploratory |
| Meta Muse Spark LLM launch date May 2026 | confirmed April 8 — outside window | exploratory |
| Anthropic Project Glasswing Claude Mythos launch date May 2026 | confirmed April 7 — outside window | registry |
| fetch: llm-stats.com/llm-updates | Grok 4.3 API rollout confirmed April 30 | exploratory |
| fetch: thenextweb.com openai-gpt-realtime-2-voice-models | T2 confirmed May 8 date | registry |
| Gemini 3.1 Flash-Lite release date Google May 2026 | confirmed GA May 7–8 | registry |
| fetch: cloud.google.com/blog gemini-3-1-flash-lite GA | T2 confirmed May 8 | registry |
| fetch: blog.google/innovation-and-ai gemini-3-1-flash-lite | March 3 preview — pricing confirmed | registry |
| Coder Agents beta launch enterprise self-hosted May 8 2026 | confirmed May 6 via globenewswire | exploratory |
| AI announcement OR AI launch May 9 2026 | 10 results, 2 high-rel | exploratory |
| fetch: anthropic.com/news | SpaceX / rate limits May 6 (covered) | registry |
| AI Indonesia startup model rilis Mei 2026 | 10 results, 0 in-window local launches | exploratory — cross-language |
| OpenAI realtime voice GPT limitations criticism May 2026 | HIPAA gap, default reasoning confirmed | adversarial |
| Gemini 3.1 Flash-Lite independent benchmark review criticism May 2026 | FACTS 40.6%, TTFT 5.46s | adversarial |
| Coder Agents self-hosted enterprise limitation May 2026 | beta API instability confirmed | adversarial |
| fetch: techcrunch.com OpenAI voice May 7 2026 | T2 confirmed May 7 date and pricing | registry |
| Snyk Claude Anthropic integration partnership launch date May 2026 | confirmed May 8 | exploratory |
| Harvey Legal Agent Benchmark Artificial Analysis launch date 2026 | confirmed May 6 | exploratory |
| fetch: harvey.ai/blog introducing-harveys-legal-agent-benchmark | T2 confirmed May 6 | exploratory |
| fetch: snyk.io/articles anthropic-launches-claude-code-security | February 2026 — wrong article | exploratory |
| xAI Grok 4.3 release date exact May 2026 | API rollout April 30; email May 6 | registry |
| AI productivity tools launch release May 8 9 2026 | 10 results, 2 high-rel | exploratory |
| fetch: crescendo.ai/news latest-ai-news-and-updates | May 4 cutoff — no May 6–9 items | exploratory |
| GitHub trending AI repositories week May 2026 | discovery only — no primary items | exploratory |
| AI startup funding launch product May 8 9 2026 | 10 results, 2 high-rel | exploratory |
| Product Hunt AI launch week May 8 2026 | 10 results, 1 high-rel | exploratory |
| xAI Grok 4.3 API pricing capabilities agentic May 2026 | rollout April 30 confirmed | registry |
Total searches: 38, of which 16 exploratory or adversarial (42%).
Suggested next runs
- OpenAI Realtime API HIPAA eligibility — The audio modality’s exclusion from the OpenAI BAA blocks health-sector voice agent deployments. Track if this changes; it is a meaningful gate for regulated industry use cases.
- Harvey LAB leaderboard — Baseline results and submission standards are forthcoming. The first leaderboard will show which foundation models and agent frameworks perform best on legal tasks — worth tracking for organizations evaluating legal AI.
- Gemini 3.1 Flash-Lite GA pricing confirmation — The Cloud blog did not republish pricing at GA. Verify against current Vertex AI pricing page before production commitments; preview pricing may not carry through.