Connect Hermes Agent to Nbility: Use a Smoother OpenAI-Compatible Model API Entry

This is Part 2 of the AI Agent Getting Started series. In the previous article, we got Hermes Agent running. This time we do something that directly affects the experience: connect the model API properly so the Agent can call models, use tools, write files, search information, and run commands reliably.

Cover: niku helping a user configure the Hermes Agent API panel

Summary

Many people hit a very practical problem the first time they try an AI Agent:

Which model should I enter?
Where should I put the API Key?
Should the Base URL include /v1?
Why does token usage feel fine in normal chat but climb quickly with an Agent?
After configuration, how do I verify that it works for real tasks instead of only looking correct?

This article uses Hermes Agent as the example and records a complete model API connection flow. To keep the steps reusable, I explain it through an OpenAI-compatible interface. If your service provides an endpoint compatible with /v1/chat/completions, the same idea usually applies.

If you do not want to manage multiple model providers yourself, you can use my own token site directly:

Nbility: https://nbility.dev

It works well as a unified model entry for AI Agents: configure one API Key and one Base URL, then call models by name from Hermes. This article does not push top-ups for their own sake. Nbility appears where an API is actually needed.

Why Connect the Model API First?

Installing Hermes Agent only gives you the frame of the vehicle. To really run it, you need three things:

Model: understands the task, plans steps, and calls tools.
Tool permissions: lets the Agent read and write files, execute commands, and access the web.
Stable token supply: supports multi-turn context and tool calls.

A normal chatbot usually has one user message and one model reply. An Agent is different. It may go through a chain like this:

Token consumption chain in one AI Agent task

Hermes Agent calling models and tools through an OpenAI-compatible API

For example, if you ask Hermes:

帮我检查这个项目为什么启动失败，修好并跑一遍测试。

It may:

read the project directory
open configuration files
search for error keywords
run install or test commands
modify code based on output
run verification again
summarize the result

Every step sends context, tool results, error logs, and code snippets back to the model. In other words, Agent token usage is not determined by “number of chat messages”. It is determined by task complexity, tool-call count, and context length.

So in this second article we connect the model entry first. This becomes the foundation for later deployments of OpenClaw, Dify, LobeChat, NextChat, and similar apps.

Prerequisites

Prepare:

a computer or server where Hermes Agent is already installed
a usable model API Key
an OpenAI-compatible Base URL
at least one model name you plan to call

If you use Nbility, the dashboard gives you:

API Key: sk-xxxxxxxxxxxxxxxxxxxx
Base URL: https://api.nbility.dev/v1
Model: the model name you plan to use

Note: Do not expose real API Keys in screenshots, tutorials, or group chats. All keys in this article use [REDACTED] or sk-xxxx.

Step 1: Confirm the Hermes Configuration File Location

Hermes keeps its main configuration under the user directory:

~/.hermes/
├── config.yaml     # model, tools, terminal, compression, and other settings
├── .env            # sensitive values such as API Keys
├── skills/         # skills
├── sessions/       # session history
└── logs/           # logs

Check the config paths:

hermes config path
hermes config env-path

You will usually see something like:

/root/.hermes/config.yaml
/root/.hermes/.env

If you installed it as a normal user, the paths may be:

/home/your-username/.hermes/config.yaml
/home/your-username/.hermes/.env

Step 2: Put the API Key into `.env`

Sensitive values should go into .env, not directly into articles, screenshots, or Git repositories.

If your Hermes version supports hermes config set for writing environment variables, use:

hermes config set NBILITY_API_KEY "sk-你的真实key"

You can also edit manually:

nano ~/.hermes/.env

Add:

NBILITY_API_KEY=sk-你的真实key

After saving, confirm the variable exists without printing the real value:

grep '^NBILITY_API_KEY=' ~/.hermes/.env | sed 's/=.*/=[REDACTED]/'

Expected output:

NBILITY_API_KEY=[REDACTED]

Step 3: Configure an OpenAI-Compatible Model Entry in Hermes

Hermes supports many providers. For an OpenAI-compatible interface such as Nbility, the most general option is to configure a custom_providers entry.

Open the config file:

hermes config edit

Add a configuration like this:

model:
  provider: custom:nbility
  default: gpt-4.1-mini

custom_providers:
  - name: nbility
    base_url: https://api.nbility.dev/v1
    api_key: ${NBILITY_API_KEY}
    api_mode: chat_completions

What this means:

provider: custom:nbility: tells Hermes to use a custom provider named nbility.
default: gpt-4.1-mini: the default model name. Replace it with a model you can actually use.
base_url: OpenAI-compatible endpoint, usually including /v1.
api_key: read from .env; do not hard-code the real key.
api_mode: chat_completions: use the OpenAI Chat Completions-style interface.

If you want to specify a model temporarily, pass it when starting:

hermes chat -m gpt-4.1-mini

Or run a single-question test:

hermes chat -q "用一句话介绍 Hermes Agent" -m gpt-4.1-mini

Step 4: Run a Configuration Check

After editing the config, let Hermes check itself:

hermes config check

If the config is valid, inspect status:

hermes status --all

Pay attention to three things:

whether the current provider is custom:nbility
whether the default model is the one you entered
whether Hermes reports a missing API Key or a misspelled config field

Hermes to Nbility configuration check flow

If you are using Hermes through a Telegram, Discord, or Weixin gateway, restart the gateway after changing .env or config.yaml:

hermes gateway restart

If you only use Hermes in the command line, opening a new terminal session is usually enough.

Step 5: Verify the Agent with a Small Task

Do not ask the Agent to modify a large project immediately. For the first verification, use a small task:

mkdir -p /tmp/hermes-api-test
cd /tmp/hermes-api-test
hermes chat -q "创建一个 hello.py，打印 hello nbility，然后运行它验证输出"

A normal Agent execution should look roughly like this:

understand the task
write hello.py
call the terminal to run Python
read the output
return the verification result

If this works, at least three things are connected:

Hermes can reach the model API
the model can plan tool calls
the local terminal tool can be used by the Agent

This is more reliable than only asking “who are you”. The value of an Agent is not small talk; it is task execution.

Common Troubleshooting

1. 401 Unauthorized

Usually the API Key is wrong, or the environment variable was not loaded.

Check:

grep '^NBILITY_API_KEY=' ~/.hermes/.env | sed 's/=.*/=[REDACTED]/'

Then restart the Hermes session or gateway.

2. 404 Not Found

The common cause is an incorrect Base URL.

Confirm that it is:

https://api.nbility.dev/v1

Do not omit /v1, and do not append another /chat/completions. Hermes will build the concrete endpoint path itself.

3. Model not found

The model name is unavailable, or your account does not have permission to use it.

Switch to a model name that your dashboard clearly marks as available, then test again:

model:
  default: 你的可用模型名

4. The Agent Replies Normally But Does Not Use Tools

This may not be an API issue. It may be a tool-permission or runtime-environment issue.

Check tools:

hermes tools list

For command-line development workflows, terminal, file, and web tools are usually the minimum useful set.

5. It Gets Slower and More Expensive After Several Turns

This is typical for Agent-style apps and is not necessarily an error.

Common causes:

context keeps growing
tool output is too large
the Agent reads many files or logs
the model performs multi-step planning and verification

Ways to reduce cost:

give the task a clearer scope
ask the Agent to list a plan before executing
only show key snippets from large logs
use cheaper models for simple tasks
switch to stronger models only for complex tasks

This is why I suggest using Nbility as a unified entry: different models can be selected by task type without reconfiguring a key and Base URL for every app.

Recommended Model Strategy

If you are new to AI Agents, split model use by task level:

Agent task layers and model selection

Lightweight Tasks

Suitable for:

copy edits
log summaries
simple scripts
small Q&A

Use a fast, low-cost model.

Medium Tasks

Suitable for:

writing tutorials
analyzing project structure
changing small features
debugging from error output

Use a more stable general-purpose model.

Heavy Tasks

Suitable for:

large refactors
multi-file bug fixes
complex architecture design
long-context code understanding

Use a stronger model, and ask the Agent to produce a plan first to avoid large trial-and-error loops.

A Realistic View: Agents Spend Tokens, But Save Human Time

I do not like advertising AI Agents as “unlimited free productivity”. Users will be disappointed the first time they see a real bill.

A more honest statement is:

Agents consume more tokens because they are not just answering questions. They read files, run commands, inspect results, and revise plans. The tokens you spend buy fewer context switches, less copy-paste, and less manual debugging.

So when introducing a token site, it is better to anchor it in real scenarios:

deploy a long-running Agent on a server
connect a useful assistant to Telegram / QQ / Weixin
use an Agent to maintain a small project
use an Agent to generate articles, images, and summaries
use an Agent for scheduled jobs and monitoring

These scenarios naturally need stable API tokens. There is no need to invent fake demand just to sell tokens.

Summary

In this article, we connected Hermes Agent to a model API:

found the Hermes config directory
put the API Key into .env
configured Nbility through custom_providers
checked the config with hermes config check
verified the Agent with a small real task
explained why Agents consume more tokens than ordinary chat

If you already have Hermes Agent, you can follow this article and replace the model entry with your own OpenAI-compatible service.

If you do not yet have a stable token source, try:

https://nbility.dev

My suggestion: start with a small top-up and run a lightweight task. Once your Agent workflow feels right, decide whether to use it long term.

Next I plan to write:

“Connect Hermes Agent to Telegram: Control Your Server from Your Phone”

This scenario feels more like a real productivity tool: when you are away from your desk, one message to a bot can ask it to inspect a server, edit files, run scripts, or generate a daily report.

Image Prompts

Cover Prompt

A polished tech blog cover illustration for an article about connecting Hermes Agent to an OpenAI-compatible API provider. Include niku, Nbility mascot: cute anime catgirl with long fluffy black hair with warm brown highlights, black cat ears with pink inner ears and white fur, fluffy black cat tail with orange bow, oversized black hoodie with orange drawstrings and orange lightning logo, black choker with golden bell. Use the support version: wearing black over-ear headset with orange paw-print icon, microphone near mouth, one hand pointing at a floating API configuration panel. Scene: dark server room, glowing terminal, API key field masked as [REDACTED], Base URL field showing https://api.nbility.dev/v1, model selector, token stream particles, Hermes Agent logo-like abstract wing icon. Black and orange brand palette, clean composition, leave empty space at top for Chinese title text, no real secrets, no messy small text, high quality anime tech illustration, 16:9 landscape.

Body Image Prompt

A clean anime-tech illustration showing an AI Agent workflow: user sends task, Hermes Agent plans, calls tools, reads files, sends requests to OpenAI-compatible API, receives model response, writes result back. Include a small cute black catgirl mascot niku as a guide character, black hoodie with orange lightning logo, headset, pointing at the flow. Dark UI, orange highlights, no real API keys, no readable tiny text except [REDACTED], high-quality blog illustration.

Connect Hermes Agent to Nbility: Use a Smoother OpenAI-Compatible Model API Entry

Summary

Why Connect the Model API First?

Prerequisites

Step 1: Confirm the Hermes Configuration File Location

Step 2: Put the API Key into `.env`

Step 3: Configure an OpenAI-Compatible Model Entry in Hermes

Step 4: Run a Configuration Check

Step 5: Verify the Agent with a Small Task

Common Troubleshooting

1. 401 Unauthorized

2. 404 Not Found

3. Model not found

4. The Agent Replies Normally But Does Not Use Tools

5. It Gets Slower and More Expensive After Several Turns

Recommended Model Strategy

Lightweight Tasks

Medium Tasks

Heavy Tasks

A Realistic View: Agents Spend Tokens, But Save Human Time

Summary

Next Article

Image Prompts

Cover Prompt

Body Image Prompt

Related posts

Why Do AI Agents Use More Tokens Than Normal Chat? A Beginner Cost Guide

OpenClaw Deployment Guide: Run a 24/7 AI Agent on Your Server

Connect Hermes Agent to Telegram: Turn Your Server-Side AI Agent into a Mobile Remote Assistant

Run your Agent workflow through Nbility