🇫🇷 Français

Ollama Now Supports Claude Code

Use Claude Code with local models thanks to Ollama's new Anthropic API

By Angelo Lima

Big news for Claude Code users: Ollama now supports the Anthropic Messages API, which allows you to use Claude Code with local open-source models. No more exclusive dependency on Anthropic’s cloud!

Why This Integration Is a Game Changer

Until now, Claude Code required a connection to Anthropic’s servers. With this Ollama integration, you can now:

Benefit Description
Privacy Your code stays on your machine
Costs No API fees, just your electricity
Independence No single vendor lock-in
Offline Work without internet connection
Customization Choose the model that fits your needs

Requirements

1. Ollama v0.14.0+

The integration requires Ollama version 0.14.0 or higher. Check your version:

ollama --version

If needed, update Ollama from ollama.com.

2. Model with Large Context

Claude Code requires a large context window to work properly. The official recommendation is 64k tokens minimum.

Configure context in Ollama:

# Create a Modelfile with extended context
cat > Modelfile << 'EOF'
FROM qwen3-coder
PARAMETER num_ctx 65536
EOF

ollama create qwen3-coder-64k -f Modelfile

3. Claude Code Installed

If not already done:

# macOS/Linux
curl -fsSL https://claude.ai/install.sh | bash

# Windows
irm https://claude.ai/install.ps1 | iex

Configuration

Ollama provides a simplified command:

ollama launch claude

For interactive configuration mode:

ollama launch claude --config

This method automatically configures the necessary environment variables.

Method 2: Manual Configuration

Set the three required environment variables:

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434

Then launch Claude Code with your chosen model:

claude --model qwen3-coder-64k

Method 3: Single Line

For a one-time launch without modifying your environment:

ANTHROPIC_AUTH_TOKEN=ollama \
ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_API_KEY="" \
claude --model qwen3-coder

Persistent Configuration

Add these lines to your ~/.bashrc or ~/.zshrc:

# Claude Code with Ollama
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
alias claude-local='claude --model qwen3-coder-64k'

Then reload:

source ~/.bashrc  # or source ~/.zshrc

For Development

Model Size Strengths
qwen3-coder ~14B Code-specialized, excellent quality/size ratio
glm-4.7 ~9B Good balance, multilingual
codestral ~22B Performs well on complex code

For Powerful Machines

Model Size Strengths
gpt-oss:20b 20B Performant generalist
gpt-oss:120b 120B Close to proprietary models
deepseek-coder:33b 33B Excellent on code

Download a Model

# Download the model
ollama pull qwen3-coder

# Check available models
ollama list

Example Session

# 1. Start Ollama (if not running)
ollama serve &

# 2. Launch Claude Code
ANTHROPIC_AUTH_TOKEN=ollama \
ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_API_KEY="" \
claude --model qwen3-coder

# 3. Use normally
> Analyze the file @src/api/users.ts and suggest improvements

Limitations to Know

Performance

Local models are generally less performant than Claude Sonnet or Opus on complex tasks. Expect:

  • Sometimes less accurate responses
  • Longer thinking time on modest hardware
  • Less advanced reasoning capability

Resource Consumption

Model Size Minimum RAM Recommended GPU
7-14B 16 GB 8 GB VRAM
20-33B 32 GB 16 GB VRAM
70B+ 64 GB+ 24 GB+ VRAM

Features

Some advanced features may not work perfectly:

  • Vision (image analysis)
  • Complex tool use
  • Subagents

Ideal Use Cases

When to Use Ollama

  • Sensitive proprietary code: Code never leaves your machine
  • Offline development: Work on planes, areas without internet
  • Rapid prototyping: No API cost concerns
  • Learning: Experiment without limits

When to Stay on Anthropic

  • Complex tasks: Major refactoring, architecture
  • In-depth code reviews: Security analysis
  • Production: When quality is critical

Switching Between Local and Cloud

Create aliases to easily switch:

# In ~/.bashrc or ~/.zshrc

# Ollama mode (local)
alias claude-local='ANTHROPIC_AUTH_TOKEN=ollama \
  ANTHROPIC_BASE_URL=http://localhost:11434 \
  ANTHROPIC_API_KEY="" \
  claude --model qwen3-coder-64k'

# Anthropic mode (cloud) - requires ANTHROPIC_API_KEY configured
alias claude-cloud='claude'

Usage:

claude-local   # For sensitive or offline work
claude-cloud   # For complex tasks

Troubleshooting

“Connection Refused” Error

Ollama is not started:

ollama serve

“Context Too Long” Error

The model doesn’t have enough context. Create an extended version:

cat > Modelfile << 'EOF'
FROM your-model
PARAMETER num_ctx 65536
EOF

ollama create your-model-64k -f Modelfile

Slow Responses

  • Check that GPU is being used: nvidia-smi or ollama ps
  • Use a smaller model
  • Close VRAM-hungry applications

Insufficient Quality

Try a larger model or switch back to Claude Cloud for that specific task.

Conclusion

The Ollama integration opens new possibilities for Claude Code:

  • Privacy for sensitive code
  • Savings on API costs
  • Flexibility in model choice
  • Offline work possible

For most daily tasks, a good local model like qwen3-coder does the job very well. Keep access to Anthropic’s cloud for cases where you need maximum power.


To go further with Claude Code, check out my other articles on AI and development.

Share: