Tutorial
How to Enable Real-Time Agentic Replies on WhatsApp Using OWL
A step-by-step guide to building context-aware AI agents on WhatsApp using OWL and the MCP servers for seamless tool integration and real-time responses.
Tutorial
A step-by-step guide to building context-aware AI agents on WhatsApp using OWL and the MCP servers for seamless tool integration and real-time responses.
Imagine having a personal AI assistant on WhatsApp that can answer questions, automate tasks, or fetch information for you in real time. In this blog, we’ll walk through how to build a WhatsApp automation agent using OWL (Optimized Workforce Learning), an open-source multi-agent framework from CAMEL-AI, and a custom WhatsApp MCP server.
We’ll cover what the Model Context Protocol (MCP) is, how OWL’s CAMEL agents work together, and how to connect everything so your AI agent can send and examine WhatsApp messages autonomously.
In the sections below, we’ll break down each step to get the system running, from installing the WhatsApp MCP integration to launching the OWL agent. By the end, you’ll have a friendly AI roleplaying agent in your WhatsApp, and a deeper understanding of how MCP servers, MCP clients, and MCP hosts work together to bridge AI with the real world use-cases.
Before diving into the demo, let’s briefly introduce Model Context Protocol (MCP) and some key terms. Model Context Protocol is a protocol that standardizes how an LLMs connects to external tools and services (we can think of it as a common language for AI agents to talk to other apps). It’s designed to be simple and universal, so you can plug in various “capabilities” (like messaging apps, databases, web browsers, etc.) without writing custom integration code for each one.
In MCP’s architecture, we commonly talk about MCP hosts, MCP clients, and MCP servers:
To put it simply, if our AI agent were a person making a phone call to get information, the MCP host is the person themselves, the MCP client is the phone they use (one phone line per service), and the MCP server is the friend on the other end who has the information or can perform an action. This clear separation makes it easy to add new “friends” (tools) for the AI to call upon.
OWL (Optimized Workforce Learning) is CAMEL-AI’s powerful framework for building multi-agent systems. Unlike a single AI agent, OWL enables multiple AI agents to collaborate, each with defined roles or specialties. This approach is inspired by the CAMEL framework, where typically a “user” agent and an “assistant” agent converse to solve problems or complete tasks in a more robust way than a single agent alone. In practical terms, OWL provides a structured environment where agents can share information, make real-time decisions, and use external tools through MCP integrations.
Key features of OWL that make our WhatsApp automation possible include:
In summary, OWL will be the brain running our AI agents (the assistant), and MCP will be the nervous system that connects that brain to the outside world (WhatsApp, in this case). Next, let’s focus on our specific use case and what we’re about to set up.
Before we begin, make sure you have the following prerequisites in place on your system:
With those ready, let’s proceed with the installation and setup:
1. Clone the necessary code repositories. First, grab the OWL framework code and the WhatsApp MCP server code from GitHub. Open a terminal and run:
# Clone the OWL framework (includes CAMEL agents and community use-cases)
git clone https://github.com/camel-ai/owl.git
# Clone the WhatsApp MCP server integration code
git clone https://github.com/lharries/whatsapp-mcp.git
This will download the OWL project (which contains the multi-agent framework and our demo script) and the community WhatsApp MCP server project (which contains the bridge and server needed for WhatsApp integration). The OWL repo also includes the WhatsApp MCP example as a community use-case, you can find additional documentation in the community_usecase/Whatsapp-MCP folder of the OWL repo. (The README there is essentially a technical manual for what we’re doing here.)
2. Build and run the WhatsApp Bridge (Go). Next, we’ll set up the WhatsApp bridge which connects to the WhatsApp network. In a terminal, navigate into the WhatsApp MCP Server project’s bridge directory and run the Go program:
cd WhatsApp_MCP_Server/whatsapp-bridge
# (Optional) Tidy up and download any Go module dependencies
go mod download
# Run the bridge (this will compile and execute the Go code)
go run main.go
After a moment, the bridge service will start and prompt you with a QR code in the terminal (or console output).
Open WhatsApp on your phone, go to the linked devices section, and scan this QR code (just as you would for WhatsApp Web). This will authorize the bridge to connect to your WhatsApp account. Once scanned, the terminal should show a message that the connection is successful (e.g. “WhatsApp connection established” or similar).
Leave this bridge program running – it needs to stay open to maintain the WhatsApp connection. Tip: you might want to run it in a separate terminal or in the background since it will continuously output logs of WhatsApp events.
3. Configure the MCP server integration for WhatsApp. Now we’ll create a config file to let OWL know about the WhatsApp MCP server. This config file will instruct the OWL MCP host how to launch and connect to the WhatsApp server. It’s a simple JSON file that lists our Model Context Protocol servers (in this case, just one server named "whatsapp").
Create a file (for example, mcp_config_whatsapp.json) with the following content:
{
"mcpServers": {
"whatsapp": {
"command": "<PATH_TO_UVICORN_EXECUTABLE>",
"args": [
"<PATH_TO_WHATSAPP_MCP_SERVER_MAIN.py>",
"--connect_serial_host",
"--only_one"
]
}
}
}
4. Launch the OWL WhatsApp agent demo. We’re ready to fire up our AI agent! Open a new terminal (keeping the Go bridge running in its own terminal) and run the OWL demo script for WhatsApp:
# Assuming you are in the root directory of the cloned OWL repository
cd owl
# Run the WhatsApp MCP use-case app
python community_usecase/Whatsapp-MCP/app.py
import asyncio
import sys
from pathlib import Path
from typing import List
from dotenv import load_dotenv
from camel.models import ModelFactory
from camel.toolkits import FunctionTool
from camel.types import ModelPlatformType, ModelType
from camel.logger import set_log_level
from camel.toolkits import MCPToolkit,SearchToolkit
from owl.utils.enhanced_role_playing import OwlRolePlaying, arun_society
load_dotenv()
set_log_level(level="DEBUG")
async def construct_society(
question: str,
tools: List[FunctionTool],
) -> OwlRolePlaying:
r"""build a multi-agent OwlRolePlaying instance.
Args:
question (str): The question to ask.
tools (List[FunctionTool]): The MCP tools to use.
"""
models = {
"user": ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O,
model_config_dict={"temperature": 0},
),
"assistant": ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O,
model_config_dict={"temperature": 0},
),
}
user_agent_kwargs = {"model": models["user"]}
assistant_agent_kwargs = {
"model": models["assistant"],
"tools": tools,
}
task_kwargs = {
"task_prompt": question,
"with_task_specify": False,
}
society = OwlRolePlaying(
**task_kwargs,
user_role_name="user",
user_agent_kwargs=user_agent_kwargs,
assistant_role_name="assistant",
assistant_agent_kwargs=assistant_agent_kwargs,
)
return society
async def main():
config_path = Path(__file__).parent / "mcp_servers_config.json"
mcp_toolkit = MCPToolkit(config_path=str(config_path))
try:
print("Attempting to connect to MCP servers...")
await mcp_toolkit.connect()
# Default task
default_task = (
"Read the unread messages from {contact name} on whatsapp and reply to his query"
)
# Override default task if command line argument is provided
task = sys.argv[1] if len(sys.argv) > 1 else default_task
# Connect to all MCP toolkits
tools = [*mcp_toolkit.get_tools(),SearchToolkit().search_duckduckgo,]
society = await construct_society(task, tools)
answer, chat_history, token_count = await arun_society(society)
print(f"\033[94mAnswer: {answer}\033[0m")
except Exception as e:
print(f"An error occurred during connection: {e}")
finally:
# Make sure to disconnect safely after all operations are completed.
try:
await mcp_toolkit.disconnect()
except Exception:
print("Disconnect failed")
if __name__ == "__main__":
asyncio.run(main())
When this script runs, it will initialize the OWL environment, load the MCP config file you created (via the --config argument), and automatically launch the WhatsApp MCP server using that config. You should see Uvicorn starting up the FastAPI server (the Python MCP server) in the console. OWL will establish a connection to the WhatsApp MCP server as an MCP client.
After initialization, the script will likely indicate that the agent is up and running and waiting for messages.
Now it’s showtime: open your WhatsApp and send a message to the same WhatsApp account that you linked in the bridge (for example, if you linked your own number, just message yourself from another phone, or have a friend message you). When a message comes in, you should see the OWL agent’s logs showing that it received a message via the MCP server.
The AI assistant will process the message and formulate a reply (if you have asked it to do so like in this use case). Within a few seconds, you should see a reply appear in WhatsApp, sent from your AI agent! 🎉
The exact behavior will depend on how the agent is configured in the OWL script – by default, it may use a general-purpose AI assistant persona (similar to ChatGPT) to respond helpfully to any input. You now have a two-way connection: messages from WhatsApp go into OWL’s AI brain, and messages from the AI brain come out through WhatsApp. Your WhatsApp has effectively become a conversational interface for a powerful AI agent.
Setting up the WhatsApp integration involves multiple pieces, so here are a few common tips in case you run into issues:
Congratulations! You’ve built a WhatsApp automation flow that combines the power of CAMEL-AI’s OWL multi-agent framework with the flexibility of the Model Context Protocol. This demo showcased how an MCP host (OWL) can leverage an MCP server (WhatsApp integration) via an MCP client to seamlessly extend an AI’s reach into a popular messaging platform.
The real beauty of this setup is how general it is. With minimal changes, the same pattern can connect AI agents to other applications. The OWL framework allows your CAMEL agents to use all these tools in concert, enabling complex workflows and task automation that go well beyond simple chat responses.
Feel free to explore and modify the solution:
We hope this tutorial empowers you to create your own AI integrations. The combination of OWL and MCP is opening up a new world of agent-based automation.
Today it’s WhatsApp; tomorrow, who knows maybe your OWL agent will be managing IoT devices, trading stocks, or booking flights, all through the same elegant protocol.
Happy hacking, and enjoy conversing with your new WhatsApp AI friend! 🚀
Hello there, passionate AI enthusiasts! 🌟 We are 🐫 CAMEL-AI.org, a global coalition of students, researchers, and engineers dedicated to advancing the frontier of AI and fostering a harmonious relationship between agents and humans.
📘 Our Mission: To harness the potential of AI agents in crafting a brighter and more inclusive future for all. Every contribution we receive helps push the boundaries of what’s possible in the AI realm.
🙌 Join Us: If you believe in a world where AI and humanity coexist and thrive, then you’re in the right place. Your support can make a significant difference. Let’s build the AI society of tomorrow, together!