Open WebUI (126k★) is the most popular self-hosted chat interface for AI. With Hermes Agent’s built-in API server, you can use Open WebUI as a polished web frontend for your agent — complete with conversation management, user accounts, and a modern chat interface.
B["hermes-agent<br/>gateway API server<br/>port 8642"]
A -->|POST /v1/chat/completions| B
B -->|SSE streaming response| A
Open WebUI connects to Hermes Agent’s API server just like it would connect to OpenAI. Your agent handles the requests with its full toolset — terminal, file operations, web search, memory, skills — and returns the final response.
Open WebUI talks to Hermes server-to-server, so you do not need API_SERVER_CORS_ORIGINS for this integration.
hermes config set auto-routes the flag to config.yaml and the secret to ~/.hermes/.env. If the gateway is already running, restart it so the change takes effect:
If /health fails, the gateway didn’t pick up API_SERVER_ENABLED=true — restart it. If /v1/models returns 401, your Authorization header doesn’t match API_SERVER_KEY.
ENABLE_OLLAMA_API=false suppresses the default Ollama backend, which would otherwise show up empty and clutter the model picker. Omit it if you actually have Ollama running alongside.
First launch takes 15–30 seconds: Open WebUI downloads sentence-transformer embedding models (~150MB) the first time it starts. Wait for docker logs open-webui to settle before opening the UI.
Go to http://localhost:3000. Create your admin account (the first user becomes admin). You should see your agent in the model dropdown (named after your profile, or hermes-agent for the default profile). Start chatting!
API Key: your key or any non-empty value (e.g., not-needed)
Click the checkmark to verify the connection
Save
Your agent model should now appear in the model dropdown (named after your profile, or hermes-agent for the default profile).
Warning
Environment variables only take effect on Open WebUI’s first launch. After that, connection settings are stored in its internal database. To change them later, use the Admin UI or delete the Docker volume and start fresh.
This is the default and requires no extra configuration. Open WebUI sends standard OpenAI-format requests and Hermes Agent responds accordingly. Each request includes the full conversation history.
Go to Admin Settings → Connections → OpenAI → Manage
Edit your hermes-agent connection
Change API Type from “Chat Completions” to “Responses (Experimental)”
Save
With the Responses API, Open WebUI sends requests in the Responses format (input array + instructions), and Hermes Agent can preserve full tool call history across turns via previous_response_id. When stream: true, Hermes also streams spec-native function_call and function_call_output items, which enables custom structured tool-call UI in clients that render Responses events.
Note
Open WebUI currently manages conversation history client-side even in Responses mode — it sends the full message history in each request rather than using previous_response_id. The main advantage of Responses mode today is the structured event stream: text deltas, function_call, and function_call_output items arrive as OpenAI Responses SSE events instead of Chat Completions chunks.
Open WebUI sends a POST /v1/chat/completions request with your message and conversation history
Hermes Agent creates an AIAgent instance with its full toolset
The agent processes your request — it may call tools (terminal, file operations, web search, etc.)
As tools execute, inline progress messages stream to the UI so you can see what the agent is doing (e.g. `💻 ls -la`, `🔍 Python 3.12 release`)
The agent’s final text response streams back to Open WebUI
Open WebUI displays the response in its chat interface
Your agent has access to all the same tools and capabilities as when using the CLI or Telegram — the only difference is the frontend.
Tip: Tool Progress
With streaming enabled (the default), you’ll see brief inline indicators as tools run — the tool emoji and its key argument. These appear in the response stream before the agent’s final answer, giving you visibility into what’s happening behind the scenes.
Check the URL has /v1 suffix: http://host.docker.internal:8642/v1 (not just :8642)
Verify the gateway is running: curl http://localhost:8642/health should return {"status": "ok"}
Check model listing: curl -H "Authorization: Bearer your-secret-key" http://localhost:8642/v1/models should return a list with hermes-agent
Docker networking: From inside Docker, localhost means the container, not your host. Use host.docker.internal or --network=host.
Empty Ollama backend shadowing the picker: If you omitted ENABLE_OLLAMA_API=false, Open WebUI shows an empty Ollama section above your Hermes models. Restart the container with -e ENABLE_OLLAMA_API=false or disable Ollama in Admin Settings → Connections.
Hermes Agent may be executing multiple tool calls (reading files, running commands, searching the web) before producing its final response. This is normal for complex queries. The response appears all at once when the agent finishes.
To run separate Hermes instances per user — each with their own config, memory, and skills — use profiles. Each profile runs its own API server on a different port and automatically advertises the profile name as the model in Open WebUI.
In Admin Settings → Connections → OpenAI API → Manage, add one connection per profile:
Connection
URL
API Key
Alice
http://host.docker.internal:8643/v1
alice-secret
Bob
http://host.docker.internal:8644/v1
bob-secret
The model dropdown will show alice and bob as distinct models. You can assign models to Open WebUI users via the admin panel, giving each user their own isolated Hermes agent.
Tip: Custom Model Names
The model name defaults to the profile name. To override it, set API_SERVER_MODEL_NAME in the profile’s .env: