Google DeepMind Releases Gemma 4 Open Models Under Apache 2.0 License With Four Size Variants

Google Releases Gemma 4 Open Models Under Apache 2.0

Developers working with Google's open AI models now have access to commercial deployment rights they did not previously hold. Google released Gemma 4 on April 2, 2026, abandoning its restrictive custom license in favour of the Apache 2.0 standard, the same permissive terms used by Qwen, Mistral, and much of the broader open-source community.

Four Model Sizes Targeting Distinct Hardware Tiers

The release includes four distinct variants: E2B (2.3B effective parameters), E4B (4.5B effective), a 26B Mixture-of-Experts (with 4B active parameters), and a 31B dense model. All four are built from the same research and technology as Gemini 3.

The E2B and E4B are compact models engineered specifically for smartphones, Raspberry Pi boards, and IoT devices. Despite their size, both carry a 128,000-token context window and can natively process images, video, and audio, all while running fully offline. The larger 26B and 31B models feature a 256,000-token context window, making them suited for local code assistants, a use case Google highlighted in its launch announcement.

The MoE architecture in the 26B model activates only 3.8 billion of its total parameters during inference, delivering reasoning quality competitive with much larger dense models while generating tokens at speeds closer to a 4B model. The unquantized 31B model fits on a single 80GB NVIDIA H100 GPU.

Benchmark Performance Against Larger Models

The flagship 31B instruction-tuned model ranks third on Arena AI's text leaderboard at 1452 Elo, while the 26B MoE variant ranks sixth at 1441 Elo. Google confirmed in its April 2 product blog that the 31B model outperforms models twenty times its size on that leaderboard.

Compared to Gemma 3, the performance improvements are substantial: the AIME 2026 math benchmark jumps from 20.8% to 89.2%, LiveCodeBench coding from 29.1% to 80.0%, and GPQA science from 42.4% to 84.3%. The multilingual MMMU benchmark shows 85.2% for the 31B model versus 67.6% for Gemma 3 27B, while multimodal MMMU Pro reaches 76.9% compared to 49.7%.

Multimodal Capabilities Across All Variants

All four Gemma 4 models include native multimodal support for video and images at various resolutions, with OCR and chart understanding listed as key use cases. The E2B and E4B models also include native audio input for speech recognition.

All models support function calling, structured JSON output, and native system instructions for building agent workflows. Google states the models were trained on over 140 languages natively.

In collaboration with the Google Pixel team, Qualcomm Technologies, and MediaTek, the edge models run completely offline with near-zero latency across devices, including phones, Raspberry Pi, and NVIDIA Jetson Orin Nano. Android developers can prototype agentic flows in the AICore Developer Preview today for forward-compatibility with Gemini Nano 4.

A Licensing Shift With Commercial Implications

Gemma 4's most significant change may be the switch to Apache 2.0. Previously, Google's Gemma license had prohibited use in certain scenarios and reserved the right to terminate a user's access. The move to Apache 2.0 means enterprises can deploy the models without fear of Google changing those terms.

This marks a pivotal moment: Gemma 4 models are the first in the Gemmaverse to be released under the OSI-approved Apache 2.0 license. By applying the industry-standard Apache 2.0 license terms, Google states it is providing clarity about developers' rights and responsibilities so they can build without navigating prescriptive terms of service.

Hugging Face co-founder and CEO Clément Delangue responded to the release in a statement provided by Google. "The release of Gemma 4 under an Apache 2.0 license is a huge milestone," he said.

Since the launch of the first Gemma generation, developers have downloaded the model family over 400 million times, producing more than 100,000 community variants.

Availability and Deployment Options

Gemma 4 is available now on Hugging Face, Kaggle, and Ollama. The 31B and 26B MoE models are accessible through Google AI Studio, while the edge models can be explored in Google AI Edge Gallery.

Day-one support is confirmed for Hugging Face (Transformers, TRL, Transformers.js), LiteRT-LM, vLLM, and llama.cpp, MLX, Ollama, NVIDIA NIM, NeMo, LM Studio, Unsloth, SGLang, Keras, and others.

For marketers and digital teams running AI tools in-house, for content generation, SEO analysis, or data workflows, the shift to Apache 2.0 removes the legal friction that previously made Gemma difficult to deploy commercially. The E2B and E4B edge models, capable of running fully offline on consumer hardware, also lower the cost of entry for teams that need on-device inference without cloud API dependency. These are practical implications based on the model's confirmed specifications and licensing terms, not guaranteed outcomes.

Google confirmed in the Android Developers Blog that code written today for Gemma 4 will automatically work on Gemini Nano 4-enabled devices set to be available later this year.

It's a competitive market. Contact us to learn how you can stand out from the crowd.

The comments are closed.

Ready To Rule The First Page of Google?

Contact us for an exclusive 20-minute assessment & strategy discussion. Fill out the form, and we will get back to you right away!

What Our Clients Have To Say

L
Luciano Zeppieri
S
Sharon Tierney
S
Sheena Owen
A
Andrea Bodi - Lab Works
D
Dr. Philip Solomon MD
Newsletter
Subscribe to Our Newsletter
Newsletter
Subscribe to Our Newsletter