🇫🇷 Français

OpenAI gpt-oss on Ollama: When Proprietary Models Go Open Source

OpenAI just announced its first open weight models available on Ollama. A collaboration marking a turning point in the local LLMs ecosystem.

By Angelo Lima

OpenAI gpt-oss on Ollama: When Proprietary Models Go Open Source

OpenAI has just announced its first open weight models available on Ollama. A collaboration marking a turning point in the local LLMs ecosystem, but one that fits into a dynamic already initiated by Meta and Mistral.

An Opening Movement in the Existing Ecosystem

On August 5, 2025, OpenAI finally joins an already well-established opening movement by releasing gpt-oss, its first open weight models since GPT-2 in 2019. This collaboration with Ollama certainly marks a change in approach for a company historically oriented towards paid APIs, but fits into a fundamental trend.

Meta already dominates the open source ecosystem with Llama 3.1 (405 billion parameters) which surpasses GPT-4o on several benchmarks¹, while Mistral Large 2 equals or exceeds proprietary systems with only 123 billion parameters².

Two models are offered: a 20B optimized for local performance and a 120B intended for production use cases. This diversification responds to different needs according to hardware constraints and performance requirements.

In this competitive context, OpenAI’s strategy raises an important technical question: how to catch up with Meta and Mistral by making these models accessible on standard hardware? The answer lies in memory optimization innovations with MXFP4 quantization, allowing gpt-oss-20b to run with only 16 GB of memory³.

Technical Architecture: MXFP4 Quantization

Understanding Mixture-of-Experts (MoE)

Before addressing quantization, we need to understand the Mixture-of-Experts architecture. Imagine an AI model as a team of specialists: instead of having one “brain” that processes everything, the MoE has several specialized experts (sub-networks) that share the work.

When you ask a programming question, certain experts activate. For a literature question, other experts take over. This approach allows for a very powerful model without all parameters being active simultaneously.

The catch? These experts represent 90% of the model’s parameters, essentially the bulk of the memory weight.

Quantization Explained

Quantization consists of reducing the precision of numbers stored in the model. By default, model parameters often use 16 bits (or more) for each weight. It’s like going from a millimeter-graduated ruler to a centimeter-graduated ruler: you lose precision but gain enormously in space.

OpenAI’s MXFP4 format pushes this logic by compressing expert weights to 4.25 bits per parameter. Concretely, this divides the memory space needed to store these parameters by 3 to 4.

Practical Impact

This optimization transforms hardware constraints. Where a non-quantized 20B model would require 40-50 GB of memory, the MXFP4 version fits in 16 GB. It’s the difference between a dedicated server and a standard gaming PC.

Ollama supports this format natively, avoiding additional conversions that often degrade performance. This direct integration guarantees maximum fidelity compared to OpenAI’s reference implementations.

Two Models, Two Uses

The gpt-oss-20b targets constrained environments and applications requiring low latency. With its 16 GB RAM requirement, it becomes accessible on consumer hardware.

The gpt-oss-120b aims for production deployments where reasoning quality takes precedence over hardware constraints. It requires an 80 GB GPU but offers advanced reasoning capabilities.

But beyond these technical prowess, it’s in the advanced features that gpt-oss reveals its true assets for modern AI application development.

Built-in Advanced Features

Transparent Chain-of-Thought

Unlike classic APIs, gpt-oss exposes its complete reasoning process. This transparency facilitates debugging and allows understanding how the model reaches its conclusions. A significant advantage for critical applications where traceability matters.

The reasoning effort is configurable (low, medium, high), allowing adaptation of the speed/quality compromise according to usage context.

Native Agentic Capabilities

The models directly integrate:

  • Function calling for interaction with external APIs
  • Optional web search integrated in Ollama
  • Python execution for calculations and analyses
  • Native structured outputs JSON and XML

This approach avoids the multiplication of external tools and simplifies the architecture of agent-based applications.

These technical innovations would be nothing without OpenAI’s strategic choices aimed at facilitating massive enterprise adoption.

Implications for Local Development

Apache 2.0 License: A Permissive Approach

The choice of the Apache 2.0 license eliminates copyleft restrictions. This contractual freedom facilitates enterprise adoption and commercial deployments without particular legal constraints.

This decision contrasts with some more restrictive approaches in the sector and could influence other actors’ licensing strategies.

NVIDIA Partnership

The specific optimization for GeForce RTX and RTX PRO GPUs aims to democratize access to high performance. This technical collaboration ensures optimal exploitation of hardware capabilities available on the consumer market.

After this technical and strategic analysis, it’s time to move on to practical work: how to deploy and effectively use these models in real projects?

Practical Implementation

Installation and First Tests

The integration follows Ollama’s standard process:

# Download the latest version of Ollama
# Then launch one of the models:

ollama run gpt-oss:20b    # For the 16 GB model
ollama run gpt-oss:120b   # For the 80 GB model

No additional configuration, no API keys to manage. The user experience remains consistent with the existing Ollama ecosystem.

Technical Considerations

The 20B model represents a good compromise for developers wanting to test OpenAI capabilities locally without significant hardware investment. The 120B is aimed more at teams with dedicated GPU infrastructure.

The transparency of the chain-of-thought brings significant added value compared to classic APIs, particularly for debugging and validation of complex reasoning.

This practical implementation reveals OpenAI’s broader ambition: to establish itself in a rapidly changing ecosystem where the boundaries between local and cloud are being redefined.

Ecosystem Perspective: Towards Intelligent Hybridization

This OpenAI-Ollama collaboration illustrates a broader sector evolution towards cloud/local hybridization. The precision gap between open source and proprietary models is now negligible, with the former being cheaper, faster, and more customizable⁴.

This trend finds its most accomplished incarnation in Ollama’s “Minions” approach⁷, which revolutionizes hybrid model usage. The concept is ingenious: local models (the “minions”) intelligently collaborate with more powerful cloud models, reducing costs by 30.4x while maintaining 87% of cloud performance. This distributed architecture transforms our consumer devices into collaborative assistants.

In an ecosystem where Llama 3.3 70B offers comparable performance to the 405B model at a fraction of the computational cost⁵, OpenAI’s gpt-oss models arrive at the ideal moment to enrich these hybrid architectures.

This dynamic of opening proprietary models, initiated by Meta and accelerated by Mistral, definitively transforms the AI landscape towards intelligent collaboration between local and cloud.

Conclusion

gpt-oss marks OpenAI’s catch-up in an already mature open source ecosystem. For developers using Ollama with Open WebUI, it’s an opportunity to compare OpenAI architectures with established references like Llama and Mistral.

This evolution fits perfectly into the state of AI in development in 2025, where local accessibility becomes a determining criterion for adoption.

With permissive Apache 2.0 licenses and native integration into platforms like Ollama⁶, this democratization accelerates an innovation already in motion for months with Meta and Mistral.

Sources

  1. Silicon Valley shaken as open-source AI models Llama 3.1 and Mistral Large 2 match industry leaders - VentureBeat
  2. Mistral’s Large 2 is its answer to Meta and OpenAI’s latest models - TechCrunch
  3. Introducing gpt-oss - OpenAI
  4. Best Open Source LLMs of 2025 - Klu
  5. Top 10 open source LLMs for 2025 - Instaclustr
  6. OpenAI Returns to Open Source Roots, Releases 120B and 20B AI Models - Beebom
  7. Minions: AI Collaboration Between Local and Cloud LLMs - Ollama
Tags: AI Development
Share: