Model Selection & Troubleshooting
The model you assign to a Kin has a massive impact on how well it performs — especially for autonomous tasks. This guide helps you choose the right model and debug common problems.
Recommended models
Section titled “Recommended models”For autonomous / agentic Kins
Section titled “For autonomous / agentic Kins”These Kins run crons, process webhooks, and work without human oversight. They must reliably call tools.
| Model | Provider | Verdict | Notes |
|---|---|---|---|
| Claude Sonnet 4 | Anthropic | ✅ Best choice | Excellent tool use, follows complex instructions |
| Claude Sonnet 3.5 | Anthropic | ✅ Excellent | Battle-tested, great cost/performance ratio |
| Claude Haiku 3.5 | Anthropic | ✅ Good for simple tasks | Fast and cheap, but less reliable on complex multi-step workflows |
| GPT-4o | OpenAI | ⚠️ Usable with caveats | Sometimes falls into “text mode” — needs stronger prompting |
| GPT-4o-mini | OpenAI | ⚠️ Limited | Struggles with complex tool sequences |
| Gemini 2.5 Pro | ✅ Good | Strong tool use, very large context window | |
| Gemini 2.5 Flash | ⚠️ Usable | Fast but sometimes skips tool calls on complex tasks | |
| DeepSeek V3 | DeepSeek | ⚠️ Usable | Can work but less consistent on multi-step tool use |
| Llama 3.x (70B+) | Groq/Together/Ollama | ⚠️ Limited | Open models struggle with reliable tool calling |
| Mistral Large | Mistral | ⚠️ Usable | Decent tool use but less consistent than Claude |
For conversational Kins
Section titled “For conversational Kins”These Kins primarily chat with users and occasionally use tools. Most capable models work fine.
| Model | Provider | Verdict |
|---|---|---|
| Claude Sonnet 4 | Anthropic | ✅ Excellent |
| Claude Haiku 3.5 | Anthropic | ✅ Great for fast responses |
| GPT-4o | OpenAI | ✅ Excellent |
| GPT-4o-mini | OpenAI | ✅ Good and cheap |
| Gemini 2.5 Pro | ✅ Excellent | |
| Gemini 2.5 Flash | ✅ Fast and capable | |
| Llama 3.x (70B+) | Groq/Together/Ollama | ✅ Good for self-hosted |
The “text mode” problem
Section titled “The “text mode” problem”The most common issue with autonomous Kins is the model falling into text mode — where it describes what it would do instead of actually calling tools.
What it looks like
Section titled “What it looks like”Instead of calling web_search("latest AI news"), the model outputs:
I’ll search the web for the latest AI news and compile a summary. Let me start by looking at major tech publications for recent developments in artificial intelligence…
No tool calls appear. The model writes a plausible-sounding response entirely from its training data, without accessing any real information.
Why it happens
Section titled “Why it happens”- Model capability — Some models aren’t trained for reliable function calling
- Prompt ambiguity — If the prompt sounds like a conversation, the model converses instead of acting
- Missing instruction — The model doesn’t know it should USE tools rather than DESCRIBE tool usage
- Context confusion — Very long contexts can cause the model to “forget” it has tools available
How to fix it
Section titled “How to fix it”1. Use a recommended model
Section titled “1. Use a recommended model”Claude Sonnet models are specifically trained for tool use. If you’re experiencing text mode with another model, switch to Claude Sonnet first — this fixes the problem in most cases.
2. Add explicit execution instructions
Section titled “2. Add explicit execution instructions”In your Kin’s system prompt, include:
You ALWAYS use tools to accomplish tasks. You NEVER describe what you would do —you DO it by calling the appropriate tools.
When you need information, call web_search or browse_url.When you need to save something, call memorize or write_file.When you need to process data, call the relevant tools step by step.
WRONG: "I'll search for the latest news about AI..."RIGHT: [calls web_search("latest AI news", freshness="pd")]3. Use the EXEC pattern in task descriptions
Section titled “3. Use the EXEC pattern in task descriptions”For sub-Kin tasks (crons, webhooks), structure the task description as explicit commands:
## Steps — EXECUTE each one using tools
EXEC: web_search("artificial intelligence news", freshness="pd")EXEC: browse_url on the top 3 resultsEXEC: memorize the key findingsEXEC: update_task_status("completed", summary)
Do NOT describe these steps. CALL the tools.This pattern tells the model unambiguously that it should execute tool calls, not write about them.
4. Check tool call indicators
Section titled “4. Check tool call indicators”In the KinBot UI, each message shows whether tool calls were made. Look for the tool call indicators (collapsible sections showing the tool name and parameters). If a response has no tool calls, the Kin operated in text mode.
Provider setup tips
Section titled “Provider setup tips”Anthropic (recommended)
Section titled “Anthropic (recommended)”- Get an API key from console.anthropic.com
- In KinBot, go to Settings > Providers > Add Provider
- Select Anthropic, paste your API key
- The connection test will verify models are accessible
Anthropic also supports OAuth via Claude Max — no API key needed if you have a Claude Max subscription.
OpenAI
Section titled “OpenAI”- Get an API key from platform.openai.com
- Add as a provider in KinBot
- For autonomous Kins, use
gpt-4o(notgpt-4o-mini)
Ollama (self-hosted)
Section titled “Ollama (self-hosted)”- Install Ollama and pull a model:
ollama pull llama3.3:70b - In KinBot, add Ollama as a provider with base URL
http://localhost:11434 - From Docker, use
http://host.docker.internal:11434
OpenRouter (access to many models)
Section titled “OpenRouter (access to many models)”- Get an API key from openrouter.ai
- Add as a provider in KinBot
- You can access Claude, GPT-4o, Gemini, and many other models through a single provider
OpenRouter is convenient if you want to test different models without setting up multiple providers.
Verifying tool use is working
Section titled “Verifying tool use is working”After setting up a Kin, verify it’s actually using tools:
Quick test
Section titled “Quick test”Send your Kin a message that requires a tool call:
What’s the current weather in Paris? Use web search to find out.
A working Kin will call web_search and return real, current data. A text-mode Kin will make up a plausible weather report.
Cron test
Section titled “Cron test”- Create a simple cron job: “Search the web for ‘KinBot’ and summarize what you find”
- Trigger it manually
- Check the task result — does it contain actual search results or fabricated content?
- Look at the task detail for tool call indicators
What to check in the UI
Section titled “What to check in the UI”- Tool call sections: Each message shows collapsible tool call blocks. No blocks = no tools were called
- Task status: Autonomous tasks should end with
completedand a meaningful result - Cron journal: Check
get_cron_journalfor execution history — failed runs often indicate tool issues
Cost considerations
Section titled “Cost considerations”Autonomous Kins consume more tokens than conversational ones because:
- Cron jobs run on schedule regardless of whether there’s work to do
- Webhook tasks process each event individually
- Sub-tasks each require their own LLM call(s)
- Tool results are included in the context, adding to input token count
Cost optimization tips
Section titled “Cost optimization tips”| Tip | Impact |
|---|---|
| Use Haiku for simple, single-step crons | 5–10x cheaper than Sonnet |
| Add webhook payload filters | Avoid processing irrelevant events |
| Set concurrency limits on webhook tasks | Prevent burst cost spikes |
| Use concise task descriptions | Fewer input tokens per run |
| Store results in memory instead of long outputs | Keeps future context smaller |
Quick reference: model selection flowchart
Section titled “Quick reference: model selection flowchart”-
Is the Kin autonomous? (crons, webhooks, sub-tasks)
- Yes → Claude Sonnet 4 or Claude Sonnet 3.5
- No → continue
-
Does the Kin use tools frequently?
- Yes → Claude Sonnet 3.5 or GPT-4o
- No → continue
-
Is cost the primary concern?
- Yes → Claude Haiku 3.5 or GPT-4o-mini
- No → Claude Sonnet 3.5 (best all-rounder)
-
Must it be self-hosted?
- Yes → Llama 3.3 70B+ via Ollama (conversational) or Gemini 2.5 Flash via API (agentic)
- No → Use a cloud provider