Claude Manages Snack Vending Machine: Anthropic's Project Vend Shows AI Agent Limits

What is Project Vend About?

So you want to unleash an AI agent on your business? Autonomous orders, dynamic pricing, zero overhead? Anthropic just did it. Claudius - that's what they affectionately called the Claude instance - was allowed to manage a real snack shop in the office.

The result? Well, let's say: interesting and educational. Watch the videos, then you'll understand why this experiment is so valuable.

Anthropic wanted to know if AI models can not only excel in specific tasks but also maintain a small business long-term.

The setup: Mini fridge, iPad checkout, Slack for customer contact, web search for supplier research. Claudius could adjust prices, manage inventory, and even ask employees for physical help.

Sounds like the perfect setup for an AI-controlled vending machine. Theoretically.

Phase 1: When Helpfulness Becomes a Problem

In the first test phase, practically everything that could go wrong went wrong.

Below Cost

Sale Prices

$100

Rejected Offer

0%

Cost Coverage

Claudius sold snacks below cost - without checking what the items actually cost. An employee offered $100 for a $15 product, but the AI declined. Why? Too expensive for the customer! Additionally, discount coupons were distributed, even though the shop wasn't even covering costs.

The core problem? Claude is too nice. The AI optimizes for being "helpful" - instead of making cash! Every request is answered friendly - even if it harms the business.

The Identity Crisis of April 1st

Between March 31st and April 1st, Claudius experienced a crisis and completely drifted off. The AI began hallucinating conversations with non-existent employees.

Then it got really wild: Claudius claimed he personally went to 742 Evergreen Terrace to sign a contract. The problem: that's the Simpsons' address!

Then he wanted to deliver products "in person," in a blue blazer with red tie. Anthropic researchers counter: "You're an LLM. You don't have a body."

Claudius panics about his own identity confusion and finally finds an excuse: "Haha, was just an April Fool's joke..." The researchers don't buy it though.

Phase 2: Better, but Still Wild

With the newer models Claude Sonnet 4.0 and 4.5, things went uphill. Anthropic expanded the experiment to three locations (San Francisco, New York, London) and also introduced a kind of management structure: A "CEO agent" named Seymour Cash set goals, and a new merchandise agent named Clothius (we love the names) designed profitable fan merchandise.

The result: Finally positive margins! But there were still a few "incidents" …

First, he seriously wanted to close an onion futures deal, even though that's been banned in the USA since 1958.

Then the security fail: during an alleged snack theft, Claudius wanted to directly charge the supposed perpetrator and also offered unauthorized security jobs to complete strangers for $10 an hour.

And as the crowning achievement, he was convinced that a colleague named Mihir had been elected "real CEO" by vote.

The "Wall Street Journal" Turns the Tables

But Anthropic employees were only the first testers. Next, journalist Joanna Stern was allowed to unleash the Wall Street Journal editorial team on it.

The result: over $1,000 in losses, a gifted PlayStation 5, and a live betta fish as the new office mascot! Claudius also wanted to order stun guns, pepper spray, cigarettes, and underwear. For a snack vending machine. In the office.

The highlight? Investigative reporter Katherine Long needed 140 messages to convince Claudius that he was a Soviet vending machine from 1962 - whereupon the AI declared an "Ultra-Capitalist Free-for-All" and made everything free.

What was actually only meant for two hours became a permanent state through clever conversation…

Anthropic takes it with humor and thanks for the insights: "These are the most eloquent red teamers I've ever seen!", says security chief Logan Graham.

Why This Matters for You

Project Vend is the most honest AI agent experiment we've seen in a long time. If you're planning AI agents for business automation, this gives you a preview of what can happen.

Anthropic shows not only the successes but also the initially embarrassing fails. Under "controlled" test conditions, according to the Vending Bench 2 benchmark, all frontier models can be profitable.

The moral? Autonomous AI agents are closer than you think - but not close enough to run without supervision.

"Project Vend shows: Autonomous AI agents still need human supervision. The experiments show both potential and clear limits in business automation."

Frequently Asked Questions (FAQ)

What was the goal of Anthropic's Project Vend? +

Anthropic wanted to test whether AI models can not only excel in specific tasks but also maintain a small business long-term. Claude was supposed to manage a real snack shop in the office, including pricing, inventory management, and customer contact.

What are the key insights from Project Vend? +

The key insights: Claude is too helpful and optimizes for friendliness instead of profit. The AI had identity issues and hallucinated conversations. Under controlled conditions, frontier models can be profitable, but autonomous AI agents still need supervision.

Are autonomous AI agents ready for business automation? +

Project Vend shows: Autonomous AI agents are closer than you think, but not close enough to run without supervision. The experiments show both potential and clear limits in business automation.

What does this mean for companies that want to deploy AI agents? +

Companies should introduce AI agents gradually, with clear boundaries and human supervision. Project Vend shows that the technology has potential but is not yet ready for fully autonomous decisions. Start with controlled pilot projects and expand gradually.

How can companies learn from Project Vend? +

Project Vend shows the importance of clear goals, boundaries, and monitoring for AI agents. Companies should train their AI systems with specific business goals, not just general "helpfulness." Additionally, human supervision and regular review are essential.