Tech consultant on a Frankfurt rooftop terrace reviews a hand-drawn multiagent flowchart with a lead agent and four subagents in front of the banking skyline

Claude Managed Agents Add Dreaming, Outcomes and Multiagent Orchestration

Four new features from 6 May 2026 and what they mean for European companies

Anthropic shifts the competition from model selection to agent engineering. Dreaming curates memory between sessions, Outcomes grades results against rubrics, a lead agent directs up to 20 subagents. For European companies data residency stays the key question, because the platform runs only in the United States.

Summary

On 6 May 2026 at the Code w/ Claude event in San Francisco Anthropic introduced four new features for Claude Managed Agents: Dreaming as a research preview, Outcomes and Multiagent Orchestration in public beta, and Webhooks. Dreaming reviews up to 100 past sessions and curates the memory, Outcomes raises task success by 8.4 percent on docx and 10.1 percent on pptx according to internal Anthropic tests, Multiagent Orchestration distributes work across up to 20 specialised subagents in up to 25 parallel threads. Harvey reports roughly six times higher completion rates from Dreaming, Netflix processes build logs from hundreds of pipelines in parallel, Wisedocs cuts document reviews by 50 percent. For European companies bound by GDPR the platform is usable only via AWS Bedrock AgentCore in eu-central-1 or Vertex AI in europe-west3, both with a 10 to 25 percent price premium. The EU AI Act high-risk obligations take effect on 2 August 2026 and target exactly the open question of accountability in multi-agent setups.

+10.1 %
pptx generation success via Outcomes
20
parallel subagents under one lead
Harvey completion rate via Dreaming
50 %
Wisedocs review time via Outcomes

What Anthropic released on 6 May 2026

On 6 May 2026 at the Code w/ Claude event in San Francisco Anthropic introduced four new features for Claude Managed Agents. The platform has been on the market only since 8 April 2026, just over four weeks. With the new features the competition shifts from pure model quality toward agent engineering and self-improvement.

Claude Managed Agents is Anthropic's managed platform for autonomous agentic AI , offering sessions, memory, tools and orchestration as a service since April 2026, billed per active session hour.
  • Four features: Dreaming as research preview, Outcomes and Multiagent Orchestration in public beta, Webhooks for task notifications
  • Introduced by Ami Vora (Chief Product Officer) and Boris Cherny (creator of Claude Code) at the Code w/ Claude event
  • Announced alongside: doubled rate limits for Claude Code Pro and Max, peak-hour throttling removed
  • SpaceX partnership for the Colossus data centre with more than 300 megawatts of additional capacity and over 220,000 GPUs within one month
  • API volume has grown 17 times year on year, per Anthropic
Key takeaway

Model quality is no longer a competitive moat in 2026. The competition shifts to agent engineering, and the control plane question matters more than the model choice.

Self-learning

Dreaming: How agents learn between sessions

Dreaming is a scheduled background process that analyses past agent sessions, identifies recurring patterns and curates the memory store. Anthropic likens it to the consolidation of hippocampal memories during sleep. Concretely Dreaming reads up to 100 past sessions and the existing memory store, removes duplicates and outdated entries and rebuilds the memory while preserving the original transcripts.

  • Detects recurring mistakes, converging workflows and team-wide preferences that a single agent cannot see on its own
  • Updates either automatic or only after manual review
  • Supported models: Claude Opus 4.7 and Sonnet 4.6, billed via standard API tokens, no additional licence fee
  • Harvey case: the legal AI firm reports roughly six times higher completion rates through institutional knowledge from Dreaming, for example file-handling protocols
  • Status: research preview, access only through Anthropic's request form

The idea of letting agents review their own sessions is not new. The Hermes approach from Nous Research proposed a similar self-improvement concept in the open-source space. Anthropic's variant has the advantage that it requires no custom training and integrates into an existing platform.

Outcomes: Rubrics instead of prompts

Outcomes shifts the responsibility for quality from the prompt engineer to the grader. Developers define a rubric that describes what success looks like. A separate grader agent in its own context window evaluates the result against those criteria. If the result falls short, the grader identifies the gaps and the agent revises up to three times, in exceptional cases up to 20.

Outcomes strengths
Separate context window avoids the reasoning bias of self-evaluation
Up to 10 percentage points higher success on demanding tasks
Plus 8.4 percent on docx generation, plus 10.1 percent on pptx
Wisedocs: 50 percent faster medical document reviews
Outcomes limits
Rubric quality determines result quality, a weak rubric yields weak output
Up to 20 attempts mean higher token costs per task
The grader agent itself is not inherently less prone to hallucination than the main agent
Subjective quality dimensions are hard to express in rubric form

Outcomes is public beta and can be activated through an API header. Companies with a measurable definition of success can put the mechanism into production today, without new models or new SDKs.

Orchestration

Multiagent Orchestration: Up to 20 specialists under a lead agent

Multiagent Orchestration solves the context window problem for large tasks. A lead agent decomposes the work and delegates each piece to a subagent with its own model, prompt and toolset. The specialists run in parallel on a shared filesystem and contribute their results back into the lead agent's context.

Two software engineers in a Berlin loft studio watch a wall-mounted monitor showing five parallel terminal panels with line counts ticking up
A lead agent fans its task into five parallel subagents, each with its own toolset.
  • Up to 20 distinct agent roles, up to 25 parallel threads
  • Single-level model: no sub-orchestrators, no recursive delegation
  • Full traceability through the Anthropic Console
Lead

Decompose

The lead agent analyses the task and defines the subagents with their model, prompt and tools.

Sub

Run in parallel

Specialists work simultaneously on a shared filesystem, each in its own context window.

Lead

Consolidate

The lead agent merges the results, the Console logs every step.

In the Anthropic demo a lead agent investigates an incident while subagents fan out in parallel through deploy history, error logs, metrics and support tickets. Netflix uses the pattern in production for build logs of hundreds of pipelines. Spiral by Every combines Haiku 4.5 as a specialist with Opus 4.7 as the lead and reports frontier-level quality at roughly one fifth of the cost.

Comparable to OpenAI's approach: Symphony uses a similar lead-and-subagent pattern but is coupled to the Codex ecosystem. Anyone weighing both platforms should consider the race for the control plane and the vendor lock-in question together.

European perspective: Data residency and GDPR

Claude Managed Agents run only on Anthropic's own infrastructure in the United States. For European mid-market and enterprise customers with GDPR obligations, direct usage is therefore only partially possible. Anthropic opened a Munich office in March 2026 and hired a Head of DACH and CEE, but data residency in the US is unchanged.

Chain-link perimeter fence and low concrete halls of a hyperscaler data centre on the outskirts of Frankfurt under diffuse midday light
Frankfurt is the only EU location where European companies can run Claude models in a GDPR-compliant way, via AWS Bedrock or Google Vertex AI.
Option EU data residency Orchestration Premium
Anthropic direct No, US only Fully native 0 percent
AWS Bedrock AgentCore eu-central-1 Frankfurt Managed, native MCP support 10 to 25 percent
Google Vertex AI europe-west3 Frankfurt Build it yourself 10 to 25 percent

Bedrock AgentCore adds 3 to 4 seconds of latency because requests flow through AWS-owned endpoints. For many back-office workflows that is acceptable, for real-time customer interactions usually not. The EU AI Act high-risk deadlines take effect on 2 August 2026 and address exactly the accountability question in multi-agent setups that the text does not yet make explicit.

What European companies should clarify now

Data residency before feature choice. Companies under GDPR should treat AWS Bedrock AgentCore in Frankfurt as the default and allow direct Anthropic usage only for prototypes with non-personal data.

Risks

Challenges and risks

More autonomy means greater damage when decisions are wrong. Hallucinations in agents are no longer just wrong answers, they become wrong actions with real consequences. Multi-agent setups amplify these risks because errors cascade across agent boundaries.

OWASP Top 10 for Agentic Applications 2026: Memory poisoning, goal hijacking, tool misuse, identity abuse and cascading failures appear in the first formal taxonomy of risks specific to autonomous agents. A single agent often holds credentials for CRM, email, cloud infrastructure and payment systems at once, turning one compromise into multi-system damage.

  • Memory poisoning: Malicious entries in the memory store persist across sessions and shape future behaviour. A known attack scenario: prepared court filings that a legal agent later retrieves and acts on.
  • Cascading failures: A failing subagent can compromise the entire workflow, the lead agent does not always recognise this.
  • Audit gap with Dreaming: When memory is automatically curated between sessions, it is unclear how a later explanation of an agent decision can be reconstructed in practice.
  • Regulatory uncertainty: The EU AI Act does not define agentic systems explicitly, the distributed responsibility across multiple agents is legally unresolved.
  • Identity problem: Multi-agent setups demand an identity model for non-human actors that most companies still lack.

What companies should do now

The new features are only worth it if architecture and governance grow with them. Companies should not skip the platform question out of enthusiasm for Dreaming and Multiagent Orchestration. Clarify both in parallel.

  1. Data residency before feature choice

    Companies under GDPR should start directly with AWS Bedrock AgentCore in Frankfurt. Allow direct Anthropic usage only for prototypes with non-personal data.

  2. Introduce Outcomes before Dreaming

    Rubrics force the team to a measurable definition of success before agents learn from it. Without clear success criteria, Dreaming learns the wrong patterns.

  3. Start multiagent setups small

    Two or three specialists under one lead agent, not 20 at once. Add further subagents only after four to six weeks of stable operation.

  4. Define memory governance

    Who may create, modify or delete entries in the memory store? How are source, timestamp and trigger of every entry logged? Without these answers Dreaming remains an audit risk.

  5. Test audit requirements

    Take a sample of agent decisions and fully reproduce and trace them before going live. That includes the curated memory contents, the grader rubric and each subagent's tools.

  6. Keep the vendor strategy open

    Anthropic's features are an advantage today; in three to six months similar functions are expected from OpenAI, Google and Salesforce. Design the architecture portably, for example with Model Context Protocol as the tool layer.

Treat enterprise agent governance as the leverage, not the feature set. A well-defined Outcomes rubric is worth more than three additional subagents.

Further Reading

Frequently Asked Questions

What are the new features in Claude Managed Agents? +

On 6 May 2026 Anthropic introduced four new features: Dreaming as a research preview, Outcomes and Multiagent Orchestration in public beta, and Webhooks for task notifications. Dreaming curates the memory between sessions, Outcomes evaluates results against rubrics, and Multiagent Orchestration delegates work to up to 20 specialists under a lead agent.

How does Dreaming work in Claude Agents? +

Dreaming is a scheduled background process that reviews up to 100 past sessions and the existing memory store. It removes duplicates, identifies recurring mistakes and workflows and builds a curated memory while preserving the original transcripts. Supported models are Claude Opus 4.7 and Sonnet 4.6, billed via standard API tokens.

Are Claude Managed Agents GDPR compliant? +

Direct Claude Managed Agents run only on Anthropic infrastructure in the United States. For GDPR-bound workloads two EU options exist: AWS Bedrock AgentCore in eu-central-1 Frankfurt with native managed orchestration, or Google Vertex AI in europe-west3 Frankfurt, where orchestration must be built yourself. Both options carry a 10 to 25 percent price premium over direct Anthropic usage.

How many subagents does Multiagent Orchestration support? +

A lead agent can coordinate up to 20 distinct subagent types, each with its own model, prompt and toolset. Up to 25 parallel threads run simultaneously and subagents share a common filesystem. The model is single level, so there are no sub-orchestrators and no recursive delegation.

What risks come with multiagent setups? +

Memory poisoning can persist across sessions and shape future agent behaviour. Cascading failures propagate across agent boundaries. Goal hijacking and tool misuse appear in the OWASP Top 10 for agentic applications 2026. A single agent often holds credentials for multiple systems, turning one compromise into multi-system damage. The EU AI Act does not yet define agentic systems explicitly, so accountability in multi-agent setups remains legally unclear.

What does Outcomes deliver in practice? +

In Anthropic's internal tests Outcomes delivers up to 10 percentage points higher task success. On docx generation it is plus 8.4 percent, on pptx plus 10.1 percent. The separate grader uses its own context window and avoids the reasoning bias that classic self-evaluation introduces. Wisedocs cuts medical document review time by 50 percent while maintaining team quality standards.