Live Talk on "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents"

This talk covers the paper AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents by Harsh Trivedi. It introduces a simulated world environment and benchmark for agents using tools/APIs to tackle complex day-to-day tasks across apps like Amazon, Venmo, Todoist, Splitwise, Gmail, etc. Examples include: 'Return my last Amazon-ordered shirt & buy it in one size larger,' or 'I owe money to friends on Splitwise. Pay them on Venmo.' The tasks are challenging, often requiring 15+ API calls and 80+ lines of code, and include robust programmatic evaluation."
This paper is crucial for advancing interactive coding and API use, addressing gaps in benchmarks, and highlighting the limitations of state-of-the-art models.
Massive thanks to Harsh Trivedi for this insightful talk! Learn more about the paper at appworld.dev or follow him on X: @harsh3vedi.
Hello there, passionate AI enthusiasts! 🌟 We are 🐫 CAMEL-AI.org, a global coalition of students, researchers, and engineers dedicated to advancing the frontier of AI and fostering a harmonious relationship between agents and humans.
📘 Our Mission: To harness the potential of AI agents in crafting a brighter and more inclusive future for all. Every contribution we receive helps push the boundaries of what’s possible in the AI realm.
🙌 Join Us: If you believe in a world where AI and humanity coexist and thrive, then you’re in the right place. Your support can make a significant difference. Let’s build the AI society of tomorrow, together!
- Find all our updates on X.
- Make sure to star our GitHub repositories.
- Join our Discord, WeChat or Slack, community.
- You can contact us by email: camel.ai.team@gmail.com
- Dive deeper and explore our projects on https://www.camel-ai.org/
Recent Posts

SETA: Scaling Environments for Terminal Agents
In SETA, we start with building robust toolkits to empower the agent’s planning and terminal capability, and achieved SOTA performance

Brainwash Your Agent: How We Keep The Memory Clean

How CAMEL Rebuilt Browser Automation: From Python to TypeScript for Reliable AI Agents
How CAMEL AI rebuilt browser automation with TypeScript for faster, smarter, and more reliable AI web interactions.