About this video
Apple has effectively turned your current iPhone into a legacy device with the announcement of AFM3. This deep dive explores the technical architecture of Apple's newest foundation models, the 12GB RAM requirement that is causing an uproar, and how the partnership with Google and Nvidia actually works. Key Takeaways: - The new 20B parameter 'Advanced' model requires an iPhone 17 Pro or M3+ Mac to run locally. - Apple is using a 'Mixture of Experts' architecture to swap model weights between flash memory and RAM. - A new 'System Orchestrator' manages your personal context and on-screen awareness. - Cloud processing is handled via Apple Private Computer in collaboration with Google and Nvidia. - Performance benchmarks are currently based on human preference rather than objective data.
The Silicon Ceiling: Decoding Apple’s 2026 AI Strategy
Apple has officially decided that your current hardware is not good enough for their vision of the future. During the WWDC 2026 conference, the veil was lifted on the Apple Foundation Models (AFM), revealing a sophisticated but demanding ecosystem that draws a sharp line between those with the latest silicon and those left behind.
The Architecture of Intelligence
At the heart of the new iOS and macOS experience sits the System Orchestrator. This layer is responsible for managing personal context, world knowledge, and on-screen awareness. However, the real story lies in the models themselves. Apple has introduced a tiered system:
- AFM3 Core: A 3 billion parameter dense model designed for speed. This handles the basics: spellchecking and quick email drafts.
- AFM3 Core Advanced: A 20 billion parameter sparse model that uses instruction following pruning. It is limited to the iPhone 17 Pro and M3/M4 chips due to its massive 12GB RAM requirement.
- AFM3 Cloud & Cloud Pro: The heavy lifters, developed in partnership with Google and Nvidia, designed to handle the tasks your pocket-sized computer simply cannot manage.
The RAM Gating Scandal
The most controversial takeaway from the technical reveal is the memory requirement. While the AFM3 Core Advanced model is technically 20 billion parameters, it only activates 1 to 4 billion parameters at a time using a mixture of experts (MoE) approach. Despite this efficiency, Apple is still mandating at least 12GB of RAM. This effectively renders the iPhone 15 Pro, once the pinnacle of Apple Intelligence, a second-class citizen in the new AI economy.
Privacy or Performance?
Apple continues to beat the drum of privacy with their Private Cloud Compute. However, the revelation that they are licensing data from third parties raises uncomfortable questions. While they respect robots.txt files for web crawling, the ethical origins of their 'purchased data' remain shrouded in mystery.
The 2026 update shows a company struggling to balance the massive context requirements of modern LLMs with the physical limitations of mobile hardware. Whether the expressive new Siri and on-device photo reframing are enough to justify a hardware upgrade remains to be seen. For now, the 'personal' in personal context comes with a very specific price tag.
Transcript▾
Apple just had their 2026 WWDC conference in which they unveiled iOS, Mac OS, all of the OS's 27 and a part of that was a huge update to Apple Intelligence, including a new Siri AI. Now, you can check out this video where I go over all of the releases and the features and what I think of them. However, in this video, we're going to dive a little bit more deeper into the nerdy stuff when it comes to the Apple Foundation model, the series of models they've released, both private and on device, and their partnership with Google and Nvidia to get the cloud models up and running. So, if you like your local AI nerdy stuff, then strap on in. Oh, and hit the subscribe button.
So, I think it'd be cool to start off with this. This is their understanding of everything. And to break this down, at the center of it all is you and your devices. And you interact with Apple AI, Siri, whatever through image, voice, and text. And this is what they call the Apple Foundation models. These then interact with the system orchestrator, which combines personal context, world knowledge, which is effectively just web search, actions, and on-screen awareness. And in turn, these interact with systemwide experiences through apps and Siri AI.
So they released five foundation models in collaboration with Google. As I say, I don't know exactly how much they were involved in the four of the five. However, definitely in one of them, they played a big role. So at the start, you've got AFM, Apple Foundation model 3 core. This is the one that's going to run on your local hardware and it's a 3 bill parameter dense model. This will be responsible for quick things like drafting an email, spellchecking, the quick fire stuff that needs to happen on your device with very very low latency. It's not very intelligent. It's built for speed. And this is supported on all of the current Apple intelligence devices 15 Pro and upwards, all of the M series Max, all of the M series iPads.
And then from an ondevice perspective, we step up to AFM 3 core advanced. Now this is a 20 billion parameter model which uses sparse architecture activating just 1 to 4 billion parameters at a time. We're going to go deeper into the activation method they used and all the rest of it, but this one is really only supported on the iPhone 17 Pro. On Macs, it's the M3 upwards. On iPads, it's the M4 upwards. Basically, anything with 12 gig of RAM onwards is really where this is going to sit. So, me and my iPhone 15 Pro ain't going to be able to run this advanced Apple model.
But unless you don't know, Apple are approaching this in a hybrid way, which is they can hand off some of the more advanced ones to their cloud models, which operate on their system called Apple Private Computer. So, AFM3 cloud is a more advanced version of the core. However, this is squeezing out a lot more performance and a lot more context. This will be the one that's most responsible for a lot of your personal context and world knowledge as well. Then they have their dedicated cloud image model which as it suggests is going to be involved in the image generation and editing. You might have seen that there's this photo reframing feature that uses the on device for a basic blur of you know pictures of the image that haven't been generated yet just just to enable that basic reframing and then you will send it off to the cloud and it will fill in the blanks and that will be the image model here.
the AFM3 Cloud Pro. This is the one that they've clearly articulated as being a relationship with Nvidia and Google seemingly running on Google Cloud as well. However, they do emphasize the fact that this is still maintaining all of the users privacy. It will be interesting to see how you know where the limits are when it comes to the ondevice advanced model and at what point do the more advanced devices hand off to the cloud model from for my iPhone 15. I can imagine it handing off to the cloud model fairly frequently. Again, I'm just doing basic spellchecking. But 20 billion parameters is fairly good but probably doesn't have a lot of personal context. it just has a lot of onboard knowledge if that's if that makes sense.
Now there's an interesting bit here about maximizing ondevice capabilities. Traditional large language models whether dense or sparsely activated require all weights to reside in active memory DRAM creating a massive footprint that limits scalability on consumer hardware. Now, this is interesting as well because yes, like you need to reserve the context space because you want to make sure that you've got enough context as that conversation builds and the KV cache storage fills up. You're reserving that memory, but you also want the device to perform and and work as expected as well, switching between apps. So, it does need a lot of headroom.
To break this barrier, AFM3 Core Advanced introduces novel sparsely activated architecture built on instruction following pruning, a technique developed by Apple. And I looked into this here. It's basically a mixture of experts that combines track parallel. So what that tells me is it can look at several things all at once or preemptively look at different experts simultaneously in order to increase on speed. Instead of forcing the entire model into DRAM, the full model is stored in flash memory, NAND. And they've got a little diagram here of it. I mean, this essentially is mixture of experts using a predetermined number of active parameters tailored to each specific use case.
They also mention in this article as well, they've got improved on the imaging, which is true. However, I wouldn't necessarily say it's frontier level. It's is good enough for basic photo editing and maintaining aspects of the original image which in the presentation they stay they want to stay true to photographers basically giving them a free pass not to touch the original image and then everything outside of that is what is generated by these models.
They also go into detail here about the training data to train our foundation models. We use a mixture of data that includes publicly available information, data licensed or purchased from third parties, open source data obtained through dedicated studies, and synthetic data. Now, a few of these raise eyebrows like you purchased from third party. That doesn't necessarily mean that third party gained it ethically. Like there would be it would be nice to have a little bit more research into who these companies are and how they obtained that data. And they also say they respect web publishers. Basically, if you have a I I believe it's like a robot txt or something that that explicitly blocks bots and and AI LLMs and things like that from crawling the website, they will respect that though. How many websites have this signed up have actually this in their robots.txt, I don't know.
And I won't go through it too much here, but it's interesting as well that they actually use a lot of human graders that looked at the previous model compared to the previous model and this new one. And the AFM AFM3 core model improved upon its pre predecessor, earning a preference of 45.6% of prompts compared to 23.3% of the 2025 baseline. Now, these are preferences. These are people's opinions, not necessarily objective. However, there's no real good way to do benchmark. So, it's just interesting to see that they're using humans to do it.
They also do a bunch here of the the VO text to voice, which to be honest, I only have one more chapter to go. Melvin repli, that's the current one. One more chapter to go. Melvin replied, I'm in to me. I see a row of shops to the left, including a shoe shop with left, including a shoe shop with bright. I think there's an element of expression in the AFM3 core advanced version of the voice, which by the way, this is the only model allows you to tweak the voice, whereas you're stuck with the the sort of generic voice on the current talk to speech stuff.
So yeah, it feels a bit more fluid and a little bit more expressive, but how much that matters, I'm not too sure. So it's a really interesting perspective that Apple are bringing to the whole AI thing. Obviously, we're so used to Frontier models that have stolen data. They are running on insane hardware. There are certain expectations that we have at this point that I wonder if Apple can convince enough people that yes, maybe it's not as good. However, your data is private. It's also very necessary like is the AI there when and where you need it. Is it relevant to the task at hand? They already messed up the first installation of Apple Intelligence. If people if this one falls short, then I don't know how they're going to call back.
I've got Mac OS 27 on my Mac. I'm waiting for my Siri and Apple Intelligence to to be allowed on that. We're able to already use some of this stuff on the beta releases and it's working fairly well. My only lasting thought at this point, again, comes down to this personal context and the limitations of what we're trying to squeeze out of local hardware. 21 billion parameters is okay. I've done a lot of tests on larger models which they're kind of satisfactory. 32 billion parameters, 31 billion parameter models, they're okay. still not again frontier performance. 21 billion parameters, it'll be interesting to see how it performs. 12 gig of RAM being the sort of minimum or average headroom these these models have don't allow for a tremendous amount of context.
I've already seen people you trying to search file systems across Mac OS using Siri and really it's just indexing an initial chunk of information of files. It's not doing any sort of deep reasoning on any of your files. Long story short, there's just not a lot of context to play around with for it to be the personal context to be anything truly meaningful without having the latest hardware or handing it off to slower cloud models. To me, this is probably the thing that they struggled with in their first installation is managing or coming to the realization that context needs a lot of hardware. needs a lot of RAM for it to build up as you're using your phone and more and more things are getting pushed into that context. It's yet to be seen how effective it's going to be. So, that'll be everything. Like, subscribe if you haven't already. Oh, and don't forget to check out my video on all the features that they released yesterday. Thank you to my Patreon subscribers who support me directly. Until next time, keep on vibing.