BLOG / Research

Live Talk on "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents"

December 19, 2024|2 min read

This talk covers the paper AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents by Harsh Trivedi. It introduces a simulated world environment and benchmark for agents using tools/APIs to tackle complex day-to-day tasks across apps like Amazon, Venmo, Todoist, Splitwise, Gmail, etc. Examples include: 'Return my last Amazon-ordered shirt & buy it in one size larger,' or 'I owe money to friends on Splitwise. Pay them on Venmo.' The tasks are challenging, often requiring 15+ API calls and 80+ lines of code, and include robust programmatic evaluation."

This paper is crucial for advancing interactive coding and API use, addressing gaps in benchmarks, and highlighting the limitations of state-of-the-art models.

Massive thanks to Harsh Trivedi for this insightful talk! Learn more about the paper at appworld.dev or follow him on X: @harsh3vedi.

Hello there, passionate AI enthusiasts! 🌟 We are 🐫 CAMEL-AI.org, a global coalition of students, researchers, and engineers dedicated to advancing the frontier of AI and fostering a harmonious relationship between agents and humans.

📘 Our Mission: To harness the potential of AI agents in crafting a brighter and more inclusive future for all. Every contribution we receive helps push the boundaries of what’s possible in the AI realm.

🙌 Join Us: If you believe in a world where AI and humanity coexist and thrive, then you’re in the right place. Your support can make a significant difference. Let’s build the AI society of tomorrow, together!

Find all our updates on X.
Make sure to star our GitHub repositories.
Join our Discord, WeChat or Slack, community.
You can contact us by email: camel.ai.team@gmail.com
Dive deeper and explore our projects on https://www.camel-ai.org/

‍

Live Talk on "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents"

December 19, 2024|2 min read

This paper is crucial for advancing interactive coding and API use, addressing gaps in benchmarks, and highlighting the limitations of state-of-the-art models.

Massive thanks to Harsh Trivedi for this insightful talk! Learn more about the paper at appworld.dev or follow him on X: @harsh3vedi.

Find all our updates on X.
Make sure to star our GitHub repositories.
Join our Discord, WeChat or Slack, community.
You can contact us by email: camel.ai.team@gmail.com
Dive deeper and explore our projects on https://www.camel-ai.org/

‍

Live Talk on "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents"

Recent Posts

SETA: Scaling Environments for Terminal Agents

Brainwash Your Agent: How We Keep The Memory Clean

How CAMEL Rebuilt Browser Automation: From Python to TypeScript for Reliable AI Agents

Live Talk on "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents"

Recent Posts

SETA: Scaling Environments for Terminal Agents

Brainwash Your Agent: How We Keep The Memory Clean

How CAMEL Rebuilt Browser Automation: From Python to TypeScript for Reliable AI Agents