About this video
You are throwing your money away on frontier models that are barely faster than a snail compared to the new Minimax M3. This model is currently smashing benchmarks and rivaling Opus 4.7 while remaining a fraction of the cost. In this video, we explore how to initialise the M3 model in your local workflows and take a look at the incredible speed increases provided by its sparse attention architecture. Key takeaways: - Sparse attention leads to 4x faster performance than standard flash attention. - Native multimodal capabilities allow for direct image inference without external models. - A massive 1 million token context window for large-scale data processing. - Incredible cost-to-performance ratio with multi-agent support. - Easy integration with OpenClaw, Hermes, and Pi.
MiniMax provider config
"minimax": {
"baseUrl": "https://api.minimax.io/anthropic",
"apiKey": "${MINIMAX_API_KEY}",
"api": "anthropic-messages",
"models": [
{
"id": "MiniMax-M3",
"name": "MiniMax M3",
"reasoning": true,
"input": ["text", "image"],
"cost": { "input": 0.6, "output": 2.4, "cacheRead": 0.12, "cacheWrite": 0 },
"contextWindow": 1000000,
"maxTokens": 131072
},
{
"id": "MiniMax-M2.7",
"name": "MiniMax M2.7",
"reasoning": true,
"input": ["text"],
"cost": { "input": 0.3, "output": 1.2, "cacheRead": 0.06, "cacheWrite": 0.375 },
"contextWindow": 204800,
"maxTokens": 131072
},
{
"id": "MiniMax-M2.7-highspeed",
"name": "MiniMax M2.7 Highspeed",
"reasoning": true,
"input": ["text"],
"cost": { "input": 0.6, "output": 2.4, "cacheRead": 0.06, "cacheWrite": 0.375 },
"contextWindow": 204800,
"maxTokens": 131072
}
]
}The Minimax M3 Revolution: Frontier Power at a Fraction of the Cost
You are throwing your money away on frontier models that are barely faster than a snail compared to the new Minimax M3.
The AI landscape is shifting rapidly, and Minimax has just thrown a massive spanner in the works with their latest release: M3. This model is not just a minor upgrade; it is a significant leap forward in speed, context, and multimodal capability.
The Speed Demon: Sparse Attention
What makes M3 so fast? The answer lies in its sparse attention architecture. By partitioning the KV into blocks more precisely, M3 achieves high effective context coverage while being four times faster than standard open-source flash attention. This leads to a model that can run for 12 hours continuously without significant degradation.
Native Multimodality
Gone are the days of needing a separate model like Quen 3.5 for image inference. M3 is now fully multimodal. Whether you are uploading a selfie or a complex technical diagram, the model can infer the context and integrate it into your workflow seamlessly.
Benchmarks and Performance
In the SweetBench Pro Terminal Bench, M3 is coming incredibly close to models like Opus 4.7. It is competing with the absolute best in the industry while maintaining a pricing structure that allows for massive concurrent agent usage.
How to Get Started
Setting up M3 is straightforward if you are already using tools like OpenClaw or Hermes.
- Update your models.json or OpenClaw.json files.
- Set your API keys with the new SKCP prefix.
- Switch your primary model to minimax/minimax-m3.
Whether you are building complex coding agents or just need a faster, more affordable daily driver, Minimax M3 is currently the model to beat.
Transcript▾
Flashbang warning: I know what is with it and the English guys and the bald heads. Anyway, Miniax have done it again with their latest release M3, which is why once again I'm very happy to announce that Miniax are supporting this video and they've solved one of the most annoying little problems I had with the previous models in this new model which we'll get into.
You've probably seen already this is absolutely smashing the benchmarks coming very very close to Opus 4.7 in SweetBench Pro Terminal Bench very close to all of the Frontier models and all of this for a fraction of the cost of these Frontier models. I'm particularly looking forward to the result on the deep suite benchmark. They also have this new sparse attention which leads to context scaling basically adding a prefiltering stage compared with approaches like dynamic sparse attention and mixture of block attention. Mixture of sparse attention can partition the KV into blocks more precisely achieving high effective context coverage. Basically for you and me four times faster than open source flash attention and flash mixture of block attention.
So, all that to say, this is a significant improvement in speed over previous models. They've also bumped up the context to 1 million, which is great. And they've even claimed they've ran this model for up to 12 hours continuously. And I said they did all this at a fraction of the cost. So, looking at the pricing model here, we start at £20 per month. But if you need loads of concurrent agents, the max plan allows four to five concurrent agents or indeed the ultra plan gives us six seven concurrent agents because we all know these multi-agent workflows are getting more and more crucial.
There's a nice little breakdown of everything here. Plus gets you about 34,000 calls. Max gets you about 102,000 calls and the ultra 250,000 calls. But with the link below in my description, you get 12% off. I've been using M2.7 in my open claw instance for the last few months now. Let's get it updated to M3. Now, honestly, if you've already got OpenClaw installed, which I assume most of you have, I think the safest way is to go to your settings here, go to advanced, and then open up your OpenClaw.json file.
The first thing you want to do is set your ENV Miniax API key, and then you'll set that up here. You'll get that from inside of your portal and your subscription key, which should start with the SKCP prefix. And then under models providers add in a new miniax which has changed from miniax portal in previous installations you'll set the base URL at api.mminiax.io/anthropic the API key as miniax API key which will take it from up here API is anthropic messages and then you can set your models to these below. I'll leave a link to this down below so you can just copy and paste it.
In our last installation we used Quen 3.5 as our image model. You can now finally remove that as M3 has gone multimodal which is great. We can upload images and it'll infer those images and use them in our context for whatever it is that we're doing. You'll set your primary as miniax/ miniax m3. It's not minimax portal anymore. And then I've set my fallback as any miniax m2.7 highspeed there. Just to be sure, just run open claw gateway restart. And in your chat here should be able to select Miniax M3.
What I really like is again we can upload a picture and say what is this? Because we have multimodal. Now that's a selfie of you Sam taken with what looks like a Sony camera. All of the important stuff that you need to know when it comes to an image. Now, I really like Miniaax on OpenClaw just because there there's a lot of parallels between Opus and Anthropic models, but I also run Hermes. And the process is just as simple. All we need to do is type Hermes model. Scroll down here to Miniax. We're in the UK, so we'll go to Oorth, but obviously if you're in China, you'll go to China. It should open up this for me to log in. Now I can select M3. And now we're using Miniax M3.
If we exit here, run Hermes, you can see it's running Miniax M3 as my default model. I also have videos setting this up on claw code as well, which I'll link to above. And just for a booccy bonus because I want to get into more Pi content. You might be tempted to go to Pi here and type login where you'll see go to API. You'll see Miniaax there. Unfortunately thing is at the moment this is not picking up M3. So what I suggest you do go to your root folder pi Asian models.json. Again I'll leave links to this down below. It's the same as openclaw but you're going to paste this in.
Now I've taken the API key, I've exported it in my Zush RC file there as I think it can get a bit fragmented here. So then with that updated, if we reload our Zush file, load up PI and go models, then you'll see we've now got M3. Hello. And what I've noticed is that M3 is a significant improvement over speed than M2.7. The high-speed version is still a lot faster. However, M3 is significantly faster than M2.7. So, that sparse attention architecture is coming in clutch.
So, let's try something here. I'm planning a trip to Madeira in Portugal. Give me a list of top 10 hikes including GPX format files so that I can review and look at the route. Whilst it's doing that, let's go into here and create a new session: create a single page.html file running a classic snake game because of course we have multiple concurrent agents right now so we might as well use them. Cool. So it's apparently this is my claud workspace. Okay, here we go. Now, I did this with local models before and I just couldn't get the food to work. That was a one shot. And you can see that I'm growing with every sweetie or whatever it is I'm eating. I am back in the 90s baby when I had hair.
And here we go. This is a telegram on the old mackin. I can attest they all check out from my this is an old trip so I've actually done this trip already so all GPX are ready let me send them over to you now I think I just saw them here so if we can download that already looking spot on there we zoom in maybe we'll even see the starting point. It looks pretty much what we did. Dragging it into a different one. Looks like it's a little bit off there. So, I'm not sure where it's getting the GPX data from. However, the information in the text that to me looks pretty accurate.
Honestly, I think Miniax is the perfect model to run on Open Claw. I'm a massive fan of his parallels between Anthropic. It scores really well against even Opus 4.7 at this point, even though 4.8 has been released. All that with a for a fraction of the cost is actually incredible. So, once again, I have a link down in my description which gives you 12% off of that coding plan. I'll also leave links to my website where I have additional information about the M3 model over on there. Like, subscribe if you haven't already.