About this video

Thanks to Clico for sponsoring this video: https://samuelgregory.co.uk/clico Apple's latest Silicon is a marketing gimmick for anyone serious about local AI development. In this video, I take the new M5 MacBook Pro and push it to the absolute limit with Llama CPP, Turbo Quant, and heavy-duty coding tasks to see if it can actually handle the heat. Key Takeaways: - RAM is the ultimate bottleneck: 32GB is simply not enough for high-context AI coding. - Pro vs Mini: The performance gap is almost non-existent for AI inference tasks. - Quantization is mandatory: You will be forced to use 3-bit or 4-bit models to get any usable context. - Casual vs Pro: The M5 is perfect for emails and basic tasks, but fails miserably at complex AI development.

Your Expensive New MacBook is a Marketing Gimmick for Local AI

Apple has spent years convincing us that their 'Pro' machines are the pinnacle of performance. However, after putting the M5 MacBook Pro through a rigorous testing cycle focused on local AI development, the reality is far less impressive. If you are planning to buy this machine for heavy-duty coding or complex modelling, you need to read this first.

The RAM Bottleneck

The most significant hurdle for any AI enthusiast is memory. The M5 MacBook Pro tops out at 32GB of RAM. While this sounds substantial for standard video editing, it is a massive constraint for Large Language Models (LLMs). When you factor in the context size needed for coding, that 32GB disappears almost instantly.

I found that to get any usable performance for coding tasks, I had to use heavily optimised, four-bit or even three-bit quantized models. This compromises the intelligence of the model just to make it fit within the hardware limitations.

Performance Parity: Pro vs Mini

One of the most surprising findings was the performance comparison. The M5 MacBook Pro offers negligible improvements over the expected performance of a Mac Mini. If you are buying the Pro for the 'extra juice' for AI, you are essentially paying a premium for a screen and a keyboard that do nothing to speed up your local inference.

What Is It Good For?

It is not all bad news. For casual AI usage, the M5 is more than capable. If your daily workflow involves:

Summarising long articles or PDFs
Basic email formatting and spell checking
Running small, two-billion parameter models for creative writing
Simple knowledge retrieval

Then this machine will feel like a dream. It is fast, efficient, and handles these lightweight tasks with ease. But let us be clear: this is not a machine for 'AI Pros'.

The Verdict

If you are a tinkerer who enjoys playing with Llama CPP and Turbo Quant, you will find plenty to love here. But for anyone looking to build the next big AI-powered application locally, the M5 MacBook Pro is a compromise, not a solution. Keep your expectations in check, or start looking at a dedicated server setup.

I have been conducting extensive testing and coding on the new M5 MacBook Pro, comparing it directly to the M5 MacBook Air. I wanted to understand the true capabilities of this machine regarding Artificial Intelligence. I have run various benchmarks, including Qwen versus Gemma, and heavy-duty coding tasks. I also tested Claude code to see if this machine qualifies as an 'AI monster'.

Before we dive in, a quick word on our sponsor: Clicker. It is a Chrome extension that makes browser-based typing much easier. You can use command-key shortcuts to summarise pages, memorise information for later, and reference different models like GPT 5.4. It is currently free, so do check it out.

Regarding the hardware, the performance of the MacBook Pro is often negligible when compared to the Mac Mini. If you are considering the M5 Mini, this video applies to you as well. My testing involved Llama CPP and the new Turbo Quant from Google, which allows for context size compression. This is vital because models and their context must fit within the VRAM. With a maximum of 32GB of RAM on the M5 MacBook Pro, the pressure on the GPU increases as the context fills up.

I also utilised OMLX for faster performance on Mac hardware. My coding tasks involved analysing Vercel errors and explaining codebases. To get sufficient context, I often had to download models quantized to three or four bits. While nine-billion parameter models were faster, the actual coding results were poor. Two-billion parameter models, even at 2.4 bits, are clearly what this machine is intended to run.

For casual tasks like planning a trip or writing stories, it performs well, though GPS coordinates and musical notes were often inaccurate. Overall, the M5 MacBook is for casual AI users. Professional coding is not feasible on this machine due to the 32GB RAM limit. It is good for knowledge gathering and email formatting, but for anything advanced, you might need a local server setup. If you are a tinkerer, you will enjoy the learning process with Llama CPP and Turbo Quant, but do not expect a portable AI powerhouse.

Final Verdict on Local AI with M5 MacBook Pro