Skip to content

Model Selection & Troubleshooting

The model you assign to a Kin has a massive impact on how well it performs — especially for autonomous tasks. This guide helps you choose the right model and debug common problems.

These Kins run crons, process webhooks, and work without human oversight. They must reliably call tools.

ModelProviderVerdictNotes
Claude Sonnet 4Anthropic✅ Best choiceExcellent tool use, follows complex instructions
Claude Sonnet 3.5Anthropic✅ ExcellentBattle-tested, great cost/performance ratio
Claude Haiku 3.5Anthropic✅ Good for simple tasksFast and cheap, but less reliable on complex multi-step workflows
GPT-4oOpenAI⚠️ Usable with caveatsSometimes falls into “text mode” — needs stronger prompting
GPT-4o-miniOpenAI⚠️ LimitedStruggles with complex tool sequences
Gemini 2.5 ProGoogle✅ GoodStrong tool use, very large context window
Gemini 2.5 FlashGoogle⚠️ UsableFast but sometimes skips tool calls on complex tasks
DeepSeek V3DeepSeek⚠️ UsableCan work but less consistent on multi-step tool use
Llama 3.x (70B+)Groq/Together/Ollama⚠️ LimitedOpen models struggle with reliable tool calling
Mistral LargeMistral⚠️ UsableDecent tool use but less consistent than Claude

These Kins primarily chat with users and occasionally use tools. Most capable models work fine.

ModelProviderVerdict
Claude Sonnet 4Anthropic✅ Excellent
Claude Haiku 3.5Anthropic✅ Great for fast responses
GPT-4oOpenAI✅ Excellent
GPT-4o-miniOpenAI✅ Good and cheap
Gemini 2.5 ProGoogle✅ Excellent
Gemini 2.5 FlashGoogle✅ Fast and capable
Llama 3.x (70B+)Groq/Together/Ollama✅ Good for self-hosted

The most common issue with autonomous Kins is the model falling into text mode — where it describes what it would do instead of actually calling tools.

Instead of calling web_search("latest AI news"), the model outputs:

I’ll search the web for the latest AI news and compile a summary. Let me start by looking at major tech publications for recent developments in artificial intelligence…

No tool calls appear. The model writes a plausible-sounding response entirely from its training data, without accessing any real information.

  1. Model capability — Some models aren’t trained for reliable function calling
  2. Prompt ambiguity — If the prompt sounds like a conversation, the model converses instead of acting
  3. Missing instruction — The model doesn’t know it should USE tools rather than DESCRIBE tool usage
  4. Context confusion — Very long contexts can cause the model to “forget” it has tools available

Claude Sonnet models are specifically trained for tool use. If you’re experiencing text mode with another model, switch to Claude Sonnet first — this fixes the problem in most cases.

In your Kin’s system prompt, include:

You ALWAYS use tools to accomplish tasks. You NEVER describe what you would do —
you DO it by calling the appropriate tools.
When you need information, call web_search or browse_url.
When you need to save something, call memorize or write_file.
When you need to process data, call the relevant tools step by step.
WRONG: "I'll search for the latest news about AI..."
RIGHT: [calls web_search("latest AI news", freshness="pd")]

3. Use the EXEC pattern in task descriptions

Section titled “3. Use the EXEC pattern in task descriptions”

For sub-Kin tasks (crons, webhooks), structure the task description as explicit commands:

## Steps — EXECUTE each one using tools
EXEC: web_search("artificial intelligence news", freshness="pd")
EXEC: browse_url on the top 3 results
EXEC: memorize the key findings
EXEC: update_task_status("completed", summary)
Do NOT describe these steps. CALL the tools.

This pattern tells the model unambiguously that it should execute tool calls, not write about them.

In the KinBot UI, each message shows whether tool calls were made. Look for the tool call indicators (collapsible sections showing the tool name and parameters). If a response has no tool calls, the Kin operated in text mode.

  1. Get an API key from console.anthropic.com
  2. In KinBot, go to Settings > Providers > Add Provider
  3. Select Anthropic, paste your API key
  4. The connection test will verify models are accessible

Anthropic also supports OAuth via Claude Max — no API key needed if you have a Claude Max subscription.

  1. Get an API key from platform.openai.com
  2. Add as a provider in KinBot
  3. For autonomous Kins, use gpt-4o (not gpt-4o-mini)
  1. Install Ollama and pull a model: ollama pull llama3.3:70b
  2. In KinBot, add Ollama as a provider with base URL http://localhost:11434
  3. From Docker, use http://host.docker.internal:11434
  1. Get an API key from openrouter.ai
  2. Add as a provider in KinBot
  3. You can access Claude, GPT-4o, Gemini, and many other models through a single provider

OpenRouter is convenient if you want to test different models without setting up multiple providers.

After setting up a Kin, verify it’s actually using tools:

Send your Kin a message that requires a tool call:

What’s the current weather in Paris? Use web search to find out.

A working Kin will call web_search and return real, current data. A text-mode Kin will make up a plausible weather report.

  1. Create a simple cron job: “Search the web for ‘KinBot’ and summarize what you find”
  2. Trigger it manually
  3. Check the task result — does it contain actual search results or fabricated content?
  4. Look at the task detail for tool call indicators
  • Tool call sections: Each message shows collapsible tool call blocks. No blocks = no tools were called
  • Task status: Autonomous tasks should end with completed and a meaningful result
  • Cron journal: Check get_cron_journal for execution history — failed runs often indicate tool issues

Autonomous Kins consume more tokens than conversational ones because:

  • Cron jobs run on schedule regardless of whether there’s work to do
  • Webhook tasks process each event individually
  • Sub-tasks each require their own LLM call(s)
  • Tool results are included in the context, adding to input token count
TipImpact
Use Haiku for simple, single-step crons5–10x cheaper than Sonnet
Add webhook payload filtersAvoid processing irrelevant events
Set concurrency limits on webhook tasksPrevent burst cost spikes
Use concise task descriptionsFewer input tokens per run
Store results in memory instead of long outputsKeeps future context smaller

Quick reference: model selection flowchart

Section titled “Quick reference: model selection flowchart”
  1. Is the Kin autonomous? (crons, webhooks, sub-tasks)

    • Yes → Claude Sonnet 4 or Claude Sonnet 3.5
    • No → continue
  2. Does the Kin use tools frequently?

    • Yes → Claude Sonnet 3.5 or GPT-4o
    • No → continue
  3. Is cost the primary concern?

    • Yes → Claude Haiku 3.5 or GPT-4o-mini
    • No → Claude Sonnet 3.5 (best all-rounder)
  4. Must it be self-hosted?

    • Yes → Llama 3.3 70B+ via Ollama (conversational) or Gemini 2.5 Flash via API (agentic)
    • No → Use a cloud provider