Written By Madhav Malhotra
Published By: Madhav Malhotra | Published: Jun 27, 2025, 02:49 PM (IST)
Google has been bullish when it comes to artificial intelligence. From rolling out new Gemini models with frequent updates to unveiling video generation tools like Veo 3 and Flow, Google is clearly trying to dominate the entire AI space. Now, it looks like the company is taking things a step further by making AI more accessible than ever. The tech giant has just launched Gemma 3n, its most efficient open-source AI model yet. Also Read: Gemini Is Finally Coming To Your Google Home Devices: Launch Date Revealed
The best part about this new model is that it can run offline and supports text, image, audio, and video inputs. This could be a game-changer for real-time AI on smartphones, especially for users with budget devices or those concerned about privacy. Also Read: Google’s Next Smart Speaker To Feature Gemini Live, Launch Expected Soon
Gemma 3n is Google’s newest on-device AI model designed to bring powerful multimodal capabilities to devices like smartphones and tablets without needing an internet connection. This new model uses a new architecture called MatFormer (Matryoshka Transformer). This architecture nests smaller and fully operational sub-models within larger ones. Also Read: Google Pixel 10 Series Launched Globally With Better AI Features: Check Prices In India, Specifications, Features, More
This design really plays in the favour of the developers as they can directly download and use either the main E4B model for the highest capabilities, or the standalone E2B sub-model which Google already extracts and offers up to 2x faster inference.
For extra control, Google also offers a spectrum of custom-sized models between E2B and E4B. This approach is great for those using older machines and still want the best performance possible.
Another area where Google gets it right is memory efficiency. Even with 5–8 billion parameters, the models still behave like much smaller ones in terms of memory usage. This is possible thanks to innovations like Per-Layer Embeddings (PLE), which offload tasks to the CPU to free up GPU memory. Another feature that Gemma 3n brings is KV Cache Sharing, which optimizes how the model handles the initial input processing stage. According to Google, this delivers a notable 2x improvement in prefill performance compared to Gemma 3 4B.
Some other additional features include smarter speech recognition. Google uses an audio encoder based on the Universal Speech Model, which supports speech-to-text, language translation, and more. It’s optimised for translating English and European languages like Spanish, French, and Italian.
Additionally, this model also leverages MobileNet-V5, a new lightweight vision model designed to handle 60fps video on phones like the Pixel. It outperforms older models in both speed and accuracy.
Gemma 3n sets a new bar for on-device AI, with support for 140 languages in text processing and the ability to understand 35 languages for tasks like math, coding, and reasoning. Even in benchmarks, it impresses by achieving an LMArena score of over 1300, making it the first model under 10 billion parameters to hit that mark.
The biggest advantage is that it can run without internet access, making it ideal for remote work situations where developers might face connectivity issues. Since it’s lightweight, it requires just 2GB of RAM to run, it enables real-time AI experiences on mobile hardware.