Google
Google

Gemma 4 31B

TL;DR

While users value Gemma 4 for its local privacy and impressive vision capabilities, many express frustration regarding its heavy hardware requirements and occasional logical failures or repetitive hallucinations.

YouTubeRedditHacker News
530 comments analyzedApr 30, 2026

Popularity metrics

Last 30 days · 25 items analyzed
  • Related views
    3.3M
  • Related endorsements
    114.0k
  • Related comments
    4.9k
  • Buzz
    Steady

Comment distribution

  • Overall
    41%
    59%
  • YouTube
    44%
    56%
  • Reddit
    48%
    52%
PositiveNegative

Comment summary

Privacy Benefits and Competitive Positioning

68 comments

Many users appreciate the model as a private, local alternative to frontier APIs, noting its superior tool-calling abilities compared to previous Gemma versions.

Hardware Requirements and Inference Performance

64 comments

Users are heavily debating VRAM constraints and slow tokens-per-second rates, specifically questioning how 31B models and various quantizations perform on consumer GPUs.

Logical Reliability and Hallucination Concerns

53 comments

Discussion highlights issues with model consistency, including endless output loops and hallucinations during coding tasks or complex logical reasoning tests.

Vision and OCR Capabilities

14 comments

The community shows high enthusiasm for vision performance, specifically praising the model's accuracy in OCR tasks and its ability to handle bounding boxes.

Top comments

  • Amazing! shrinking size while high intelligence, Apache 2, works on Phones. This is the 2026 news we need. Keep going Google.
    YouTube@AmanBansil134View original
  • I installed Gemma 4 (gemma4:e2b 7.2GB) running locally derived a Constraint-Dynamical Hamiltonian for a Clifford algebra project I am working on. And you have to trust me what it provided was amazing... So I have a thinking model running locally using RAG to read files and can do quite advanced math... that is the absolute bomb.. :)
    YouTube@JimNichols69View original
  • Gemma 4 welcome! 🎉 And thanks to everyone behind Gemma 4's development. We all appreciate the incredible work you all do.
    YouTube@TravisLee3380View original
  • Not surprised. Gemma is just a mini Gemini, it's good with that stuff. Where GLM 5.1 shines is coding.
    Redditatape_179View original
  • I don't know how you ran it, if you're running it locally using llama.cpp, use the b8660 llama.cpp build (more recent versions have a regression, another tokenization issue) and use --temp 0.3 --top-p 0.9 --min-p 0.1 --top-k 20 I am sure the 26B will do much better. Also, Claude might favor better formatting etc., a boolean test is not good. Try the below prompt for the judge: I am benchmarking many AIs in many tasks. You are a judge. Go through them question by question, not LLM by LLM. Go through each question and, for every question, give all AIs a score out of 10, and be sure to be fair with them. Later, rank them all by their total score. MAKE SURE to evaluate them correctly, not based on vibe alone (check for misinformation, hallucinations, if they are useful or not, and not on formatting). PROMPT= AI 1: ... AI 2: ....
    RedditSadman78247View original
  • LLM as judge = no thanks. It also depends how you're running Gemma 4 for the test. The new custom parser for gemma 4 in llama.cpp b8665 has fixed it for me. Before, it failed the test of just being given the image below. Now it solves it.
    Redditambient_temp_xeno45View original
  • Super excited about the direction things are going. Next generation will be frontier quality for most daily uses and fit on a single solid GPU like the Intel B70. A couple more turbo quant type advances and we're there on SOTA phones, prob two generations. Genuinely concerned about the economy if the AI takeoff is entirely agents running on edge devices and the major labs' trillions in capital goes stale, but very glad we're leaning towards the good path where AI won't be controlled by the few.
    RedditLeucisticBear28View original
  • Gemma 4 is the first actual leap AI did in a "long" time. It makes it smaller but also use less computing power. I am running it on my PC and while it takes up 20gb its equivalent to a 400gb model… insane and on Apache 2.0 so you can make and sell any product you make with it.
    YouTube@Mister_Morrigan25View original
  • Even since Gemma 2 it's been useful for being good at interacting instead of being a 'yes man' (girl). Agreeableness is a flaw and I don't like it in Qwen. (I'm absolutely right)
    Redditambient_temp_xeno20View original
  • qwen3 coder next losing to the 4b at actual game logic is the most demoralizing benchmark result i've seen this week, playwright mcp doing the heavy lifting probably explains a lot of the variance here.
    RedditAngeloKappos13View original

Source breakdown

Graph based on sampled comments per item (n≤30)