Xiaomi MiMo-V2.5-Pro: 1M context, low API prices, and why it is not a normal Mac model
Xiaomi MiMo-V2.5-Pro API pricing, Token Plan, setup and the key Mac answer: it is a cloud model, not a normal local Apple Silicon download.
Xiaomi MiMo-V2.5-Pro is not the kind of model you install in Ollama on a MacBook and casually use offline. It is a sparse Mixture-of-Experts model with 1.02 trillion total parameters, 42 billion active parameters and a context window of up to 1 million tokens. For Mac users, Xiaomi’s API or a Token Plan for coding tools is the realistic entry point.
The other headline is cost. Xiaomi currently lists $0.435 per million cache-miss input tokens and $0.87 per million output tokens for MiMo-V2.5-Pro in overseas pay-as-you-go pricing. The frequently repeated “up to 99% lower” claim comes from Xiaomi’s comparison with its own earlier pricing. It is not an independent price-performance ranking.
Published model sizes and capabilities. Sources: MiMo-V2.5-Pro model card and MiMo-V2.5 model card.
The short answer
MiMo-V2.5-Pro is worth testing for low-cost cloud coding agents, large repositories and long contexts. It is not a realistic local model for ordinary Apple Silicon Macs.
The MIT license is useful for research, server deployments and the wider ecosystem. It does not change the hardware math. At 1.02T parameters, even FP8 weights alone are roughly in the terabyte range before runtime overhead, KV cache and parallelism.
MiMo-V2.5 vs. MiMo-V2.5-Pro
| Property | MiMo-V2.5 | MiMo-V2.5-Pro |
|---|---|---|
| Primary use | Multimodal tasks | Text, coding and agents |
| Inputs | Text, image, video and audio | Text |
| Architecture | Sparse MoE | Sparse MoE |
| Total parameters | 310B | 1.02T |
| Active parameters per token | 15B | 42B |
| Context window | up to 1M tokens | up to 1M tokens |
| Model ID | mimo-v2.5 | mimo-v2.5-pro |
| Weights | MIT license, according to Xiaomi | MIT license, according to Xiaomi |
| Realistic Mac route | API | API |
MiMo-V2.5 is the multimodal member of the family, built for text, images, video and audio. MiMo-V2.5-Pro is the text-focused model Xiaomi positions for agents, complex software tasks and long tool-use workflows.
Active parameter count helps estimate compute per generated token. It is not the same as the RAM or storage requirement of a local deployment. The full set of weights must still be available, which is why 15B active parameters for MiMo-V2.5 and 42B active parameters for Pro do not make either model a practical Mac mini or MacBook deployment.
What do the official benchmarks say?
Xiaomi publishes benchmark results in the MiMo-V2.5-Pro model card. The following values are from its base-model evaluation. They are not independent API tests, nor are they a guarantee for a particular coding-agent harness.
| Benchmark | Setting | MiMo-V2.5-Pro Base | MiMo-V2.5 Base |
|---|---|---|---|
| MMLU | 5-shot | 89.4 | 86.3 |
| MMLU-Pro | 5-shot | 68.5 | 65.8 |
| HumanEval+ | 1-shot | 75.6 | 71.3 |
| MBPP+ | 3-shot | 74.1 | 70.9 |
| LiveCodeBench v6 | 1-shot | 39.6 | 35.5 |
| SWE-Bench, Agentless | 3-shot | 35.7 | 30.8 |
Source: Xiaomi’s MiMo-V2.5-Pro model card. Results are vendor-published base-model figures. Benchmark settings, prompts, agent harnesses and tools vary.
These results are useful signals, but they do not answer the practical question: will the model work well with your repository, tools, security requirements and budget?
Before making MiMo part of a real workflow, test at least:
- a small bug fix with a passing test,
- a multi-file refactor,
- an ambiguous task with incomplete requirements,
- repeated runs for cost, latency and failure rate.
A 1M-token context is useful, not magical
A million-token context window can help with large codebases, logs and document collections. It does not mean that sending everything is the best workflow.
Large prompts have real costs:
- A one-off large input will normally be billed at the cache-miss price.
- Important evidence can become harder for a model to retrieve in an enormous prompt.
- More input increases the chance of exposing sensitive files to a cloud provider.
A better pattern is to search locally, select the relevant files, state a precise task and send only the context the model needs.
Current API pricing
The table below uses Xiaomi’s overseas pay-as-you-go prices per million tokens. Xiaomi’s pricing page was updated on June 17, 2026.
| Model | Input: cache hit | Input: cache miss | Output |
|---|---|---|---|
mimo-v2.5-pro | $0.0036 | $0.435 | $0.87 |
mimo-v2.5 | $0.0028 | $0.14 | $0.28 |
Source: Xiaomi MiMo API Pricing, overseas pricing. Prices can change.
Cache hits are especially cheap because a reused prompt prefix can be served from Xiaomi’s prompt cache. For a one-off analysis of a large repository, budget using the cache-miss price.
Web search is billed separately. Xiaomi lists its Overseas Internet Connectivity Service at $5 per 1,000 calls. Any web-enabled agent needs an explicit search-call cap and cost logging.
Migration note for legacy MiMo model names
Xiaomi says legacy V2 models are being routed to V2.5 and will be deprecated on June 30, 2026. New integrations should use mimo-v2.5 or mimo-v2.5-pro, not build around the old mimo-v2-* names.
Pay-as-you-go vs. Token Plan
| Topic | Pay-as-you-go | Token Plan |
|---|---|---|
| Key format | sk-xxxxx | tp-xxxxx |
| Billing | Actual token usage | Package with a shared credit quota |
| Intended use | API integrations and custom applications | Supported coding and agent tools |
| Keys interchangeable? | No | No |
| Base URL | Xiaomi API endpoint | The endpoint shown in the Token Plan account |
Token Plan makes sense when you want to use MiMo inside supported developer tools. Xiaomi lists Claude Code, Cline, OpenCode and other coding tools. Usage is shared across the tools connected to the same plan.
There is an important limitation: Xiaomi’s documentation restricts Token Plan usage to programming tools. Automated scripts, custom application backends and other clearly non-coding API usage do not belong on this plan. For a custom application, pay-as-you-go is the cleaner option.
Always use the Base URL displayed in your Token Plan account. Xiaomi provides different OpenAI-compatible and Anthropic-compatible endpoints depending on cluster.
Test the MiMo API from a Mac
For a first pay-as-you-go test, curl is enough. Store the key in an environment variable first:
export MIMO_API_KEY="sk-your-key"
Then make a small, inspectable request:
curl --request POST "https://api.xiaomimimo.com/v1/chat/completions" \
--header "api-key: $MIMO_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"model": "mimo-v2.5-pro",
"messages": [
{
"role": "user",
"content": "Review this refactoring plan. Identify missing tests, risky assumptions and rollback steps."
}
],
"max_completion_tokens": 800,
"temperature": 1.0,
"top_p": 0.95,
"stream": false
}'
Xiaomi documents both the api-key header and Authorization: Bearer .... Never place an API key in Astro frontend code, public repositories or browser-side JavaScript. A website integration needs a server-side endpoint or serverless function.
What should stay local, and what can go to MiMo?
Keep it local
- private notes, research data and client documents
- personal data
- offline workflows
- short repeatable tasks
- tasks that fit a local 7B, 14B or 32B model
MiMo API can fit well
- long, non-confidential code and document context
- tool-calling agent experiments
- large refactoring plans
- multimodal cloud work with MiMo-V2.5
- a second opinion alongside a local model
Before production use, check Xiaomi’s current privacy documentation, contractual terms, data region and retention practice for your exact situation. Cloud processing is not a neutral storage location.
Recommendation
MiMo-V2.5-Pro is an interesting cloud option for long-context and coding-agent work at clearly stated, currently low API prices. Open weights are a meaningful positive, but they do not justify promising local Mac use where it is not practical.
The sensible AI-on-Mac split is simple:
Use local models for private and everyday work. Use MiMo selectively for non-confidential long contexts, coding agents and API experiments.
Sources and status
Status: June 21, 2026.
Frequently Asked Questions
Can Xiaomi MiMo-V2.5-Pro run locally on a normal Mac?
No, not as a practical everyday setup. MiMo-V2.5-Pro has 1.02T total parameters and Xiaomi's official deployment examples target distributed server infrastructure with SGLang or vLLM.
How much does MiMo-V2.5-Pro cost through the API?
For overseas pay-as-you-go, Xiaomi currently lists $0.0036 per million input tokens on a cache hit, $0.435 on a cache miss and $0.87 per million output tokens.
What is the difference between pay-as-you-go and Token Plan?
Pay-as-you-go bills API usage by token through sk- keys. Token Plan uses separate tp- keys and Xiaomi positions it for supported coding and agent tools, rather than arbitrary scripts or custom product backends.
Are MiMo-V2.5 and MiMo-V2.5-Pro open source?
Xiaomi publishes the weights under the MIT license on Hugging Face. Available weights do not make either model practical to run locally on typical consumer Mac hardware.