About this video
Your M5 MacBook Pro is basically an expensive paperweight for advanced local AI. We pushed the 32GB limit to the edge comparing Qwen 3.5 and Gemma 4 across real-world tasks. Here is what we discovered: Key Takeaways: - Gemma 4 crushes Qwen 3.5 in speed with 248 tokens per second on Apple Silicon. - Local models still suffer from 'spatial hallucinations' when planning travel and GPS routes. - Tool use with 'Open Claw' remains a hardware-limited struggle for models under 9B parameters. - Creative outputs like MIDI music are possible, though stylistic accuracy varies wildly. - 32GB of RAM is the absolute minimum for balancing decent context windows and model weights.
The M5 MacBook Pro Reality Check: Qwen 3.5 vs Gemma 4
Most 'Pro' users are vastly overestimating what their hardware can do in the local AI space.
The promise of local LLMs is enticing: privacy, no subscription fees, and total control. But as I found out during a week of intensive testing on a maxed-out base model M5 MacBook Pro, the gap between promise and performance is still wide.
The Hardware Constraints
With 32GB of VRAM, you would expect the M5 to breeze through most tasks. However, once you factor in context windows (we used 32,000 for these tests) and the requirements of tools like Open Claw, the memory disappears rapidly. Gemma 4 alone was eating up 27GB of memory just to function at a reasonable level.
Speed Benchmarks: Gemma Takes the Lead
One of the most immediate differences was inference speed.
- Gemma 4: 248 tokens per second
- Qwen 3.5: 153 tokens per second
For simple tasks like scriptwriting and content planning, Gemma feels significantly more responsive. This is likely due to the turbo quant optimisations that favour the Gemma architecture on Apple Silicon.
Real World Failures: The Madeira Incident
Where local models still fall short is in complex spatial reasoning and tool integration. I tasked Qwen 3.5 with planning a hiking trip in Madeira. While it successfully generated GPX files, the coordinates were wildly inaccurate, often placing waypoints in the ocean rather than on the trails.
The Future of Local Orchestration
Trying to use Open Claw for multi-step planning across Notion and Google Calendar proved to be the breaking point. To get a stable result, I had to downgrade to 2B parameter models, which simply lack the intelligence to handle complex API interactions.
Final Thoughts
If you are choosing between these two today for a local machine with limited RAM:
- Choose Gemma 4 for speed and general reliability.
- Choose Qwen 3.5 if you need specific multi-lingual support, but be wary of its tool-use limitations.
We are getting closer to a truly autonomous local AI, but for now, the hardware remains the biggest hurdle.