Running local LLMs is easy right up until you want them to do something useful.

That was the core lesson from our OpenClaw journey. On paper, the setup sounded straightforward: run a local model through Ollama, connect it to OpenClaw, use Telegram as the chat surface, and let the agent manage recurring tasks like weather reports or daily facts. In practice, the real work started when we moved from “the model answers questions” to “the model must call tools correctly, create scheduled jobs, and send results back into the chat.”

This post summarizes that journey: the false starts, the misleading symptoms, the actual root causes, and what finally worked. It is written from the perspective of a local, self-hosted setup using OpenClaw, Ollama, and an NVIDIA RTX 3080.

The original goal

The initial use case was modest:

  • chat with an OpenClaw agent through Telegram
  • ask it to create recurring jobs
  • have those jobs deliver useful output back into the same chat
  • keep everything local where possible

A typical request looked like this:

“Give me a weather report for Munich every workday at 8:45 in this chat.”

At first glance, the agent seemed to understand the request perfectly. It responded with something that looked like a valid tool call for the cron system. That felt promising.

It turned out to be deceptive.

The first trap: a tool call that only looked real

Our first symptom was subtle but important: instead of actually creating a cron job, the agent printed markup that looked like a tool invocation into the Telegram chat.

It looked roughly like this:

<function=cron>
<parameter=action>
add
</parameter>
<parameter=job>
{"name":"Weather - Workday","schedule":{"kind":"cron","expr":"45 8 * * 1-5","tz":"Europe/Berlin"},"payload":{"kind":"systemEvent","text":"Weather report:"},"sessionTarget":"main","enabled":true}
</parameter>
</function>

At first, this can fool you into thinking the agent is “almost working.” It is not. It means the model is producing a textual imitation of a tool call, but the runtime is not executing it as an actual structured tool invocation.

That distinction became the central debugging theme.

Early hypothesis: maybe the model was wrong

The first instinct was to blame the model.

We switched from one model to another, moving from a lighter general-purpose model to a stronger local model family. The idea was simple: maybe one model understood tools poorly and another would generate the correct JSON or native function-call structure.

That turned out to be only partially relevant.

Changing models did not immediately fix the visible behavior in Telegram. The agent still produced XML-like pseudo tool calls in the chat. That pushed us toward a common but flawed line of thinking:

“How do we get the model to emit JSON instead of XML?”

In hindsight, that was the wrong question.

The real question: can the model already do native tool calling?

To isolate the problem, we bypassed OpenClaw entirely and tested Ollama directly.

We sent a native /api/chat request with a tool definition and asked the model to list cron jobs.

A simplified version of that test looked like this:

curl -s http://127.0.0.1:11434/api/chat \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "qwen3:14b",
    "stream": false,
    "messages": [
      { "role": "user", "content": "List my cron jobs." }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "cron",
          "description": "Manage cron jobs",
          "parameters": {
            "type": "object",
            "properties": {
              "action": {
                "type": "string",
                "enum": ["list"]
              }
            },
            "required": ["action"]
          }
        }
      }
    ]
  }'

This was the turning point.

The model returned a real tool_calls structure.

That one test proved two important things:

  1. Ollama was fine.
  2. The model was fine.

So the issue was no longer “How do we teach the model to call tools?”
The issue became “Why does OpenClaw’s Telegram path not preserve that correct tool-calling behavior?”

That was a much better question.

The second trap: fixing the wrong layer

Once we knew the model and Ollama could do proper tool calls, several earlier assumptions fell apart.

We no longer needed to force JSON formatting in prompts.
We no longer needed to blame the specific model family.
We no longer needed to assume that XML output meant lack of tool support.

Instead, the problem clearly sat in one of these layers:

  • OpenClaw’s prompt assembly
  • OpenClaw’s Telegram agent handling
  • session contamination from previous bad assistant outputs
  • runtime support mismatches for specific tools

This was a huge narrowing of scope.

What the configuration told us

The next step was to inspect the OpenClaw configuration rather than guess.

The important parts were:

  • Ollama was configured as a native provider
  • the base URL pointed to the native Ollama API, not an OpenAI-style compatibility endpoint
  • cron was enabled
  • cron was included in the tool allowlist
  • Telegram was enabled and working as a channel

That meant the foundational setup was broadly correct.

This was encouraging, because it meant we were not dealing with a fundamentally broken installation. We were dealing with a more annoying class of problem: a setup that was mostly correct but failed at a critical junction.

The false lead around prompt files

At one point, the workspace prompt files looked suspicious. If an agent keeps outputting fake tool markup, it is natural to suspect that some custom prompt file explicitly tells it to do so.

We checked the workspace guidance file and found references to examples such as:

  • <function=...>
  • <tool_call>
  • raw JSON pretending to be a tool invocation

At first, that looked damning.

But reading the file carefully showed the opposite: it explicitly told the agent not to output fake tool markup and to use real structured tool calls whenever possible.

That was actually good news. It ruled out one more class of self-inflicted problems.

The lesson here was simple: never stop at grep output. Read the surrounding lines.

The third trap: bad session history can poison future behavior

Even after ruling out the prompt files, one thing still stood out: the session history already contained multiple assistant messages with fake XML-like tool calls.

That matters more than many people realize.

If a model has already produced a pattern several times in the same conversation, it may continue reproducing that pattern even after you fix the underlying configuration. That is especially true for formatting behavior.

So we tested again in a fresh session.

That turned out to be the right move.

In the fresh session, the agent finally stopped printing raw fake tool markup and started returning normal, useful results.

This was one of the biggest practical lessons of the whole exercise:

When debugging tool use, do not keep reusing a contaminated session.

A bad session can keep a bug alive long after the root cause is gone.

The cron problem was actually two separate problems

Even once tool calling started working properly, we found that our original recurring weather job still had design issues.

The first version used a main session target and a systemEvent payload. That seems plausible at first, but for a recurring user-facing chat message, it was the wrong shape. What we really wanted was:

  • an isolated run
  • an actual agent turn
  • explicit delivery back into Telegram

The correct pattern looked more like this:

node /app/openclaw.mjs cron add \
  --name "Weather - Workday" \
  --description "Workday weather report via Telegram" \
  --cron "45 8 * * 1-5" \
  --tz "Europe/Berlin" \
  --session isolated \
  --message "Create a short weather report for Munich for today in German. Include temperature range, chance of rain, wind, and a brief recommendation on whether a jacket or umbrella makes sense. Answer in 4 to 6 sentences. No small talk." \
  --announce \
  --channel telegram \
  --to "<chat-target>"

This worked far better than the early pseudo job definition.

So the cron journey had two distinct threads:

  1. getting the agent to call tools correctly at all
  2. defining the scheduled job in the right way once tools worked

Those are related, but not identical.

A practical workaround: use the CLI when the agent path is flaky

One of the best decisions we made was to stop insisting that every operation had to be created by the chat agent itself.

Once it became clear that the agent path was sometimes unreliable, the CLI became an excellent control path. It let us test the underlying runtime independently of the model behavior.

For example:

node /app/openclaw.mjs cron list --json

and:

node /app/openclaw.mjs cron add ...

This helped us separate three things cleanly:

  • model capability
  • OpenClaw runtime capability
  • Telegram agent behavior

That separation prevented a lot of wasted time.

The OpenClaw CLI inside Docker was another small rabbit hole

Even getting the CLI invocation right took some trial and error.

A command like this failed:

docker exec openclaw openclaw cron list

because the expected executable was not on the path in that container image.

A relative Node path also failed at first because it resolved against the wrong working directory.

What finally worked was invoking the correct entry point explicitly:

docker exec openclaw node /app/openclaw.mjs cron list

This was one of those small but important lessons that show up constantly in self-hosted systems: a technically correct command can still fail if you assume the wrong container layout.

Manual run testing was less reliable than expected

After successfully creating a cron job, we tried to trigger it manually.

The scheduler accepted the run:

{
  "ok": true,
  "enqueued": true,
  "runId": "manual:..."
}

But the run log remained empty immediately afterward.

That looked like yet another failure, but the interpretation was nuanced.

At that point, there were two possibilities:

  • the run was merely queued and had not yet materialized in the log
  • the manual-run path was weaker or less reliable than the natural scheduled execution path

Either way, it taught us not to over-trust “Run now” style tests for judging whether a recurring job is valid. In agent systems, asynchronous execution paths often behave differently from natural scheduled execution.

Choosing a larger model on an RTX 3080

Once the tooling behavior stabilized, we revisited model choice.

For coding-focused usage, a coder-tuned model made sense. The obvious upgrade path was from a 14B general model to a 30B coder model.

This raised the practical hardware question:

Is a 30B coder model a good fit for an RTX 3080?

The honest answer is: not cleanly, but sometimes still acceptably depending on your tolerance for latency.

A 30B model is simply much larger than what a typical RTX 3080 can hold comfortably in VRAM. That does not necessarily mean it will not run. It means you should expect trade-offs:

  • more offloading
  • higher latency
  • less comfortable interactive performance
  • but potentially stronger coding behavior

In our case, we still chose the larger coder model because coding quality mattered more than raw responsiveness.

This was a deliberate trade, not an accident.

That is worth emphasizing because local LLM work is full of fake absolutes. “This GPU can’t run that model” is often shorthand for “It won’t be fast or elegant.” Those are different claims.

The web_search warning: scary-looking, not fatal

Another recurring warning appeared around web_search in the tool allowlist.

This initially looked like a deeper problem, but it turned out to be more of a configuration hygiene issue than a core defect.

The runtime warned that web_search was allowed but not currently available in the active provider/runtime combination. In practice, that meant one of two things:

  • either remove it from the allowlist
  • or configure a proper web-search provider

Since we wanted web search, the right fix was to configure it properly rather than ignore the warning forever.

The key lesson here was that not every warning is a sign of systemic failure. Some are simply reminders that the declared tool surface and the configured providers are out of sync.

What actually solved the agent tool problem

Looking back, the solution was not one magic change. It was a sequence of clarifications:

What did not solve it directly

  • blindly switching models
  • trying to prompt the model to emit JSON
  • assuming every XML-like output meant the model lacked tool support
  • continuing to test in an already contaminated session

What did move things forward

  • testing Ollama directly with native tool definitions
  • confirming that the model already supported real tool_calls
  • validating the OpenClaw config instead of guessing
  • using the CLI to prove that cron itself worked
  • moving to a fresh session after bad tool-output history
  • separating runtime problems from chat-surface problems

In other words, the breakthrough came from isolating layers.

That is probably the most important engineering lesson of the entire experience.

Generic debugging steps we would recommend now

If we had to do it again from scratch, we would take this order from day one.

1. Verify native model tool support directly

Do not start with Telegram. Do not start with the full agent loop.
Start with a direct Ollama API request and a tiny tool schema.

2. Verify the runtime independently

Use the OpenClaw CLI for job creation, listing, and inspection before asking the agent to do the same.

3. Inspect config, not just symptoms

Check:

  • provider mode
  • base URL
  • tool allowlist
  • channel config
  • cron enablement

4. Read prompt files in context

Never rely on grep output alone. Read the surrounding lines and verify whether a prompt file is prescribing or forbidding the behavior you see.

5. Use fresh sessions for behavioral retests

A contaminated chat history can preserve formatting bugs even after the underlying issue is fixed.

6. Separate “tool call worked” from “job design is correct”

Even a valid tool call can create the wrong type of scheduled job if the payload, target, or delivery mode is poorly chosen.

A sanitized example config pattern

A minimal conceptual pattern for this kind of setup looks like this:

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434",
        "api": "ollama"
      }
    }
  },
  "agents": {
    "defaults": {
      "model": "ollama/qwen3-coder:30b",
      "userTimezone": "Europe/Berlin"
    }
  },
  "channels": {
    "telegram": {
      "enabled": true
    }
  },
  "cron": {
    "enabled": true
  },
  "tools": {
    "allow": [
      "cron",
      "group:web",
      "web_fetch",
      "web_search"
    ]
  }
}

The exact values will vary, but the important thing is the shape:

  • native Ollama provider
  • explicit default model
  • Telegram channel enabled
  • cron enabled
  • tools explicitly allowed

The most satisfying moment

The real success moment was not a benchmark and not a log line.

It was the first time the Telegram agent answered normally with a clean human-readable list of cron jobs instead of dumping fake XML-like tool markup into the chat.

That was the moment the system stopped feeling like a demo and started feeling like an agent.

Summary

Running OpenClaw locally with Ollama and an RTX 3080 taught us that local agent engineering is less about raw model intelligence and more about disciplined system isolation.

The biggest mistake would have been to keep blaming the model. The model was capable of native tool calling all along. The real work was figuring out where that capability was being lost, distorted, or imitated by the surrounding stack.

The key takeaways were:

  • test native tool calling directly against Ollama first
  • use the OpenClaw CLI to validate runtime behavior independently
  • do not mistake fake markup for successful tool execution
  • do not keep debugging in a contaminated chat session
  • choose cron job shapes carefully for real delivery behavior
  • accept that a 30B coder model on an RTX 3080 is a trade-off between quality and latency, not a simple yes-or-no decision

Most importantly, local agent systems become much easier to reason about once you stop treating them as one black box.

They are not one system. They are a chain of systems.

And debugging starts to work the moment you treat them that way.

Views: 0

What It Really Took to Get a Local OpenClaw Agent Working Reliably with an RTX 3080

Johannes Rest


.NET Architekt und Entwickler


Beitragsnavigation


Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert