About this video
Thanks Wavlink for sponsoring this video! https://amzn.to/4tLIam6 Most people are ruining their Mac performance by running local AI the wrong way. In this video, I show you how to offload the heavy lifting to a dedicated host machine using the Wavlink 12-in-1 Thunderbolt 5 Dock. This setup allows you to run massive 35B models on a lightweight MacBook Neo without any lag. Key takeaways: - Why RAM limitations make local AI frustrating on a single machine. - Setting up OMLX as a lightweight server for MLX models. - Using the Wavlink Thunderbolt 5 Dock to create a high speed local network hub. - How to access your home AI server from anywhere in the world using Tailscale. - The benefits of a wired 2.5Gb Ethernet connection for consistent AI performance.
Your current Mac setup is probably holding back your AI potential.
Running large language models locally is a dream for privacy and speed, but the reality is often a sluggish machine and a spinning beachball of death. If you are trying to code whilst your RAM is being devoured by a 35 billion parameter model, you are doing it wrong.
The Bottleneck Problem
The more you push a Mac with limited RAM, the slower the entire experience becomes. Even an M5 MacBook Pro with 32GB of RAM starts to feel the pinch when local models are active. The solution isn't necessarily a more expensive laptop, but a more intelligent distribution of resources.
The Remote Local Solution
By using a host machine, such as an M1 Max MacBook Pro with 64GB of RAM, as a dedicated AI server, you can offload the heavy lifting. The Wavlink 12-in-1 Thunderbolt 5 Dock acts as the central nervous system for this setup, providing the 2.5Gb Ethernet and high speed connectivity required to serve models across your local network.
Why OMLX Matters
Whilst LM Studio is popular, it can often be resource heavy. OMLX offers a more lightweight approach to running MLX models on Apple Silicon. It allows you to serve models over a local IP address, which you can then hook into your IDE or interface of choice, such as Kilocode, on a completely different machine.
Freedom Through Tailscale
The real magic happens when you introduce Tailscale. This creates a secure tunnel to your home network, allowing you to sit in a coffee shop with a lightweight MacBook Neo whilst utilising the full power of your desktop Mac back home. You get the mobility of a thin laptop with the raw power of a workstation, all without the heat or the battery drain.
It is time to stop treating your primary laptop like a pack mule. Offload the compute, embrace the dock, and start running AI the right way.
Tags
Transcript▾
Over the last few weeks, we've been exploring what is the bare minimum hardware you need to run local AIs on your Macs. And one thing that proved obvious from very early on is that the more you push your Mac with limited RAM, the slower that Mac feels. So, that got me thinking, what would be a really cheap and cool setup for running local AI without bogging down your resources? And that's where this little puppy came into play. This is the MacBook Neo, and I thought there must be a way we can get local AI running this, some sort of remote setup. And I was able to achieve it with the Wavlink 12-in-1 Thunderbolt 5 Dock, who are sponsoring this video. Now, I have is an M1 Max MacBook Pro with 64 GB RAM. Not the fastest, but certainly enough room for that context. This could just as well be a Mac Studio. This could be a Mac mini. We've explored the M5 MacBook Pro, which is a great candidate for this. It's fairly fast and capable of running AI, but with the limited RAM, really starts to slow down your machine with that capping of 32 GB of RAM. That is connected with Thunderbolt 5 into the host port of the Wavlink 12-in-1 Thunderbolt Dock. This is capable of 140 W charging at 120 GB per second. Can even power an 8K display at 144 Hz. Then I've got this front USB-C port, which is capable of 30 W charging, connected to my OpenClaw laptop going off there. That's capable of 80 GB per second. Obviously, at the front here, you've got two USB A ports capable of 10 GB per second, SD card, microSD card, and audio in and output jack there. And then round the back here, we've got my camera, which is not only charging the camera, but it's also delivering 1080p. So, that means it's USB-C Gen 3.2 capable there at 80 GB per second. Here, I've just got an NVMe storage drive there for all my backup files. I've got a USB A here that's connected to my light up there. Another USB A, which is connected to the microphone here. And then we've got the 2.5 GB Ethernet cable, which I cannot recommend enough specifically for this setup, and luckily we have it. And obviously, I can connect my iPad if I want and charge that up. So, all of this is ready to go. And all of this starts then with an application called OMLX, and this allows us to run MLX models on our MacBook Pro and serve them over a server. And the reason why we use MLX over something like LM Studio is because quite frankly, I've been reading a lot of things to do with LM Studio that are in fact slowing down some of these models. Plus, OMLX just feels a little bit more lightweight, reducing those necessary resources whilst we run these local models. So, with that, you want to choose your model, and I'm going to choose I have enough RAM to be able to support the Qwen 3.6 35 billion mixture of experts model quantized to 8-bit. Now, all we do is copy that URL, and back in OMLX, we can go to models. You can see I've already downloaded the 4-bit version, but if we go to download it here, we can actually just paste in the URL for the Hugging Face repository, and that's going to begin downloading. And if we refresh this, you'll start to see this, and then the megabytes will slowly build up. And so, with that downloaded, if we go to model settings, I'm going to go into the configuration here. I reckon, I mean, a large model here, I've got 64 GB of RAM. I think I can get away with 110,000, but obviously, this is the balancing act. You can watch all my other videos that take this into account. You obviously can't get a lot out of a Mac M5 MacBook. We save that, and we can load that model into memory. So, with that loaded, if we go through to the dashboard and scroll down here, we've got some API endpoints, which we can select a variety of how we can access this model. If you're wanting to access it on the same machine, you'll obviously do the localhost, but if we come here, the 192.168.0.50, yours might be different. This URL is how we can now access it on any machine within your local network. And this should match up to, if we go system preferences network here, depending on how you're connected, connect the Wi-Fi and go to details, this should match this IP address here. Now, what I'd strongly suggest if you have access to it is Ethernet. And the Wavlink 12-in-1 Thunderbolt Dock supports Ethernet. So, if we plug this into the back here, and with that connected here, you can go in, and similarly, you'll get that IP address, which you can use inside a Kilocode. And so, on the MacBook Neo, as we've done many, many times before using Kilocode, if we go to settings and then providers here, we come down to custom provider. We can put provider ID as anything we really want. We're going to put MacBook 16. I'm just going to put MacBook 16 as a human readable. And now, we're going to put in that URL that we took, 192.168.50:8000/v1. And our API key is something that you set. I've just set mine as 1234 for the time being. And we should actually get a reading of those local models across our local network. We can add those two models there, scroll down, submit. And now, when we go down to the model selector here, scroll right down, we've got access to those remote models over the network. And so, with that, I can say, 'What does this codebase do?' Hit enter, and then over on this machine, if we hit on to go into logs, we should start to see some action going on here. And I can already start to hear my fans kicking off completely locally on a MacBook Neo. And the other great thing about this whole setup is that I have Tailscale set up. And this makes it super easy to connect your main machine to the outside world with a secure connection. So, if I copy my Tailscale IP address, I could be sat in a coffee shop somewhere running local AI on the MacBook Neo. Now, this obviously this setup isn't going to be ideal for every single person. However, this is kind of ideal for me in my situation. I have a light MacBook running OpenClaw that I do like to just sit in my bed. There's something about sitting down and coding on a laptop that I really like. Now, if there's one thing that I would really like on the Wavlink that I think will help this massively is 10 gig Ethernet, or at least 5 gigs. We're limited to 2.5 second, and there is a restriction on Thunderbolt that doesn't quite give us that 2.5 anyway. However, we do gain the consistency of a wired connection. And what I'd personally also like to see is the host USB-C is actually on the front, whereas I'd probably like it on the back, because more of my permanent ports I like just to sit and do nothing on the backside, whereas the more regular in and out ports, things like USB sticks, SD cards, stuff like that, makes sense to have on the front. So, the host port, move it to the back. I think it looks really sick in this setup. The design of it is really cool. And speaking of cool, actually has a built-in fan, so it's got active cooling in it, so it does remain quite cool. But given I create content, I've got my cameras, I've got my lights, my microphone, having all of this setup in a 12-in-1 system is really, really handy. So, that'll do it for this one. If there's something I've missed, let me know down in the comments. I'm genuinely curious as well how you would use the Wavlink, because everyone's system is so different. Like, subscribe if you haven't already. Thank you to Wavlink for sponsoring this video. Till next time, keep on vibing.