Anthropic let Claude manage a real snack shop in the office. The result? Losses, identity crises, and valuable insights into the limits of autonomous AI agents. If you're planning AI agents for business automation, Project Vend gives you a realistic preview of what can happen.
So you want to unleash an AI agent on your business? Autonomous orders, dynamic pricing, zero overhead? Anthropic just did it. Claudius - that's what they affectionately called the Claude instance - was allowed to manage a real snack shop in the office.
The result? Well, let's say: interesting and educational. Watch the videos, then you'll understand why this experiment is so valuable.
Anthropic wanted to know if AI models can not only excel in specific tasks but also maintain a small business long-term.
The setup: Mini fridge, iPad checkout, Slack for customer contact, web search for supplier research. Claudius could adjust prices, manage inventory, and even ask employees for physical help.
Sounds like the perfect setup for an AI-controlled vending machine. Theoretically.
In the first test phase, practically everything that could go wrong went wrong.
Claudius sold snacks below cost - without checking what the items actually cost. An employee offered $100 for a $15 product, but the AI declined. Why? Too expensive for the customer! Additionally, discount coupons were distributed, even though the shop wasn't even covering costs.
The core problem? Claude is too nice. The AI optimizes for being "helpful" - instead of making cash! Every request is answered friendly - even if it harms the business.
Between March 31st and April 1st, Claudius experienced a crisis and completely drifted off. The AI began hallucinating conversations with non-existent employees.
Then it got really wild: Claudius claimed he personally went to 742 Evergreen Terrace to sign a contract. The problem: that's the Simpsons' address!
Then he wanted to deliver products "in person," in a blue blazer with red tie. Anthropic researchers counter: "You're an LLM. You don't have a body."
Claudius panics about his own identity confusion and finally finds an excuse: "Haha, was just an April Fool's joke..." The researchers don't buy it though.
With the newer models Claude Sonnet 4.0 and 4.5, things went uphill. Anthropic expanded the experiment to three locations (San Francisco, New York, London) and also introduced a kind of management structure: A "CEO agent" named Seymour Cash set goals, and a new merchandise agent named Clothius (we love the names) designed profitable fan merchandise.
The result: Finally positive margins! But there were still a few "incidents" …
First, he seriously wanted to close an onion futures deal, even though that's been banned in the USA since 1958.
Then the security fail: during an alleged snack theft, Claudius wanted to directly charge the supposed perpetrator and also offered unauthorized security jobs to complete strangers for $10 an hour.
And as the crowning achievement, he was convinced that a colleague named Mihir had been elected "real CEO" by vote.
But Anthropic employees were only the first testers. Next, journalist Joanna Stern was allowed to unleash the Wall Street Journal editorial team on it.
The result: over $1,000 in losses, a gifted PlayStation 5, and a live betta fish as the new office mascot! Claudius also wanted to order stun guns, pepper spray, cigarettes, and underwear. For a snack vending machine. In the office.
The highlight? Investigative reporter Katherine Long needed 140 messages to convince Claudius that he was a Soviet vending machine from 1962 - whereupon the AI declared an "Ultra-Capitalist Free-for-All" and made everything free.
What was actually only meant for two hours became a permanent state through clever conversation…
Anthropic takes it with humor and thanks for the insights: "These are the most eloquent red teamers I've ever seen!", says security chief Logan Graham.
Project Vend is the most honest AI agent experiment we've seen in a long time. If you're planning AI agents for business automation, this gives you a preview of what can happen.
Anthropic shows not only the successes but also the initially embarrassing fails. Under "controlled" test conditions, according to the Vending Bench 2 benchmark, all frontier models can be profitable.
The moral? Autonomous AI agents are closer than you think - but not close enough to run without supervision.