Big news for Claude Code users: Ollama now supports the Anthropic Messages API, which allows you to use Claude Code with local open-source models. No more exclusive dependency on Anthropic’s cloud!
Why This Integration Is a Game Changer
Until now, Claude Code required a connection to Anthropic’s servers. With this Ollama integration, you can now:
| Benefit | Description |
|---|---|
| Privacy | Your code stays on your machine |
| Costs | No API fees, just your electricity |
| Independence | No single vendor lock-in |
| Offline | Work without internet connection |
| Customization | Choose the model that fits your needs |
Requirements
1. Ollama v0.14.0+
The integration requires Ollama version 0.14.0 or higher. Check your version:
ollama --version
If needed, update Ollama from ollama.com.
2. Model with Large Context
Claude Code requires a large context window to work properly. The official recommendation is 64k tokens minimum.
Configure context in Ollama:
# Create a Modelfile with extended context
cat > Modelfile << 'EOF'
FROM qwen3-coder
PARAMETER num_ctx 65536
EOF
ollama create qwen3-coder-64k -f Modelfile
3. Claude Code Installed
If not already done:
# macOS/Linux
curl -fsSL https://claude.ai/install.sh | bash
# Windows
irm https://claude.ai/install.ps1 | iex
Configuration
Method 1: Quick Launch (Recommended)
Ollama provides a simplified command:
ollama launch claude
For interactive configuration mode:
ollama launch claude --config
This method automatically configures the necessary environment variables.
Method 2: Manual Configuration
Set the three required environment variables:
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
Then launch Claude Code with your chosen model:
claude --model qwen3-coder-64k
Method 3: Single Line
For a one-time launch without modifying your environment:
ANTHROPIC_AUTH_TOKEN=ollama \
ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_API_KEY="" \
claude --model qwen3-coder
Persistent Configuration
Add these lines to your ~/.bashrc or ~/.zshrc:
# Claude Code with Ollama
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
alias claude-local='claude --model qwen3-coder-64k'
Then reload:
source ~/.bashrc # or source ~/.zshrc
Recommended Models
For Development
| Model | Size | Strengths |
|---|---|---|
| qwen3-coder | ~14B | Code-specialized, excellent quality/size ratio |
| glm-4.7 | ~9B | Good balance, multilingual |
| codestral | ~22B | Performs well on complex code |
For Powerful Machines
| Model | Size | Strengths |
|---|---|---|
| gpt-oss:20b | 20B | Performant generalist |
| gpt-oss:120b | 120B | Close to proprietary models |
| deepseek-coder:33b | 33B | Excellent on code |
Download a Model
# Download the model
ollama pull qwen3-coder
# Check available models
ollama list
Example Session
# 1. Start Ollama (if not running)
ollama serve &
# 2. Launch Claude Code
ANTHROPIC_AUTH_TOKEN=ollama \
ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_API_KEY="" \
claude --model qwen3-coder
# 3. Use normally
> Analyze the file @src/api/users.ts and suggest improvements
Limitations to Know
Performance
Local models are generally less performant than Claude Sonnet or Opus on complex tasks. Expect:
- Sometimes less accurate responses
- Longer thinking time on modest hardware
- Less advanced reasoning capability
Resource Consumption
| Model Size | Minimum RAM | Recommended GPU |
|---|---|---|
| 7-14B | 16 GB | 8 GB VRAM |
| 20-33B | 32 GB | 16 GB VRAM |
| 70B+ | 64 GB+ | 24 GB+ VRAM |
Features
Some advanced features may not work perfectly:
- Vision (image analysis)
- Complex tool use
- Subagents
Ideal Use Cases
When to Use Ollama
- Sensitive proprietary code: Code never leaves your machine
- Offline development: Work on planes, areas without internet
- Rapid prototyping: No API cost concerns
- Learning: Experiment without limits
When to Stay on Anthropic
- Complex tasks: Major refactoring, architecture
- In-depth code reviews: Security analysis
- Production: When quality is critical
Switching Between Local and Cloud
Create aliases to easily switch:
# In ~/.bashrc or ~/.zshrc
# Ollama mode (local)
alias claude-local='ANTHROPIC_AUTH_TOKEN=ollama \
ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_API_KEY="" \
claude --model qwen3-coder-64k'
# Anthropic mode (cloud) - requires ANTHROPIC_API_KEY configured
alias claude-cloud='claude'
Usage:
claude-local # For sensitive or offline work
claude-cloud # For complex tasks
Troubleshooting
“Connection Refused” Error
Ollama is not started:
ollama serve
“Context Too Long” Error
The model doesn’t have enough context. Create an extended version:
cat > Modelfile << 'EOF'
FROM your-model
PARAMETER num_ctx 65536
EOF
ollama create your-model-64k -f Modelfile
Slow Responses
- Check that GPU is being used:
nvidia-smiorollama ps - Use a smaller model
- Close VRAM-hungry applications
Insufficient Quality
Try a larger model or switch back to Claude Cloud for that specific task.
Conclusion
The Ollama integration opens new possibilities for Claude Code:
- Privacy for sensitive code
- Savings on API costs
- Flexibility in model choice
- Offline work possible
For most daily tasks, a good local model like qwen3-coder does the job very well. Keep access to Anthropic’s cloud for cases where you need maximum power.
To go further with Claude Code, check out my other articles on AI and development.