Can Xiaomi MiMo-V2.5-Pro run locally on a normal Mac?

No, not as a practical everyday setup. MiMo-V2.5-Pro has 1.02T total parameters and Xiaomi's official deployment examples target distributed server infrastructure with SGLang or vLLM.

How much does MiMo-V2.5-Pro cost through the API?

For overseas pay-as-you-go, Xiaomi currently lists $0.0036 per million input tokens on a cache hit, $0.435 on a cache miss and $0.87 per million output tokens.

What is the difference between pay-as-you-go and Token Plan?

Pay-as-you-go bills API usage by token through sk- keys. Token Plan uses separate tp- keys and Xiaomi positions it for supported coding and agent tools, rather than arbitrary scripts or custom product backends.

Are MiMo-V2.5 and MiMo-V2.5-Pro open source?

Xiaomi publishes the weights under the MIT license on Hugging Face. Available weights do not make either model practical to run locally on typical consumer Mac hardware.

Xiaomi MiMo-V2.5-Pro API: Pricing, API Key & Mac Reality

Xiaomi MiMo-V2.5-Pro is not the kind of model you install in Ollama on a MacBook and casually use offline. It is a sparse Mixture-of-Experts model with 1.02 trillion total parameters, 42 billion active parameters and a context window of up to 1 million tokens. For Mac users, Xiaomi’s API or a Token Plan for coding tools is the realistic entry point.

The other headline is cost. Xiaomi currently lists $0.435 per million cache-miss input tokens and $0.87 per million output tokens for MiMo-V2.5-Pro in overseas pay-as-you-go pricing. The frequently repeated “up to 99% lower” claim comes from Xiaomi’s comparison with its own earlier pricing. It is not an independent price-performance ranking.

Xiaomi MiMo-V2.5 and MiMo-V2.5-Pro compared

Published model sizes and capabilities. Sources: MiMo-V2.5-Pro model card and MiMo-V2.5 model card.

The short answer

MiMo-V2.5-Pro is worth testing for low-cost cloud coding agents, large repositories and long contexts. It is not a realistic local model for ordinary Apple Silicon Macs.

The MIT license is useful for research, server deployments and the wider ecosystem. It does not change the hardware math. At 1.02T parameters, even FP8 weights alone are roughly in the terabyte range before runtime overhead, KV cache and parallelism.

MiMo-V2.5 vs. MiMo-V2.5-Pro

Property	MiMo-V2.5	MiMo-V2.5-Pro
Primary use	Multimodal tasks	Text, coding and agents
Inputs	Text, image, video and audio	Text
Architecture	Sparse MoE	Sparse MoE
Total parameters	310B	1.02T
Active parameters per token	15B	42B
Context window	up to 1M tokens	up to 1M tokens
Model ID	`mimo-v2.5`	`mimo-v2.5-pro`
Weights	MIT license, according to Xiaomi	MIT license, according to Xiaomi
Realistic Mac route	API	API

MiMo-V2.5 is the multimodal member of the family, built for text, images, video and audio. MiMo-V2.5-Pro is the text-focused model Xiaomi positions for agents, complex software tasks and long tool-use workflows.

Active parameter count helps estimate compute per generated token. It is not the same as the RAM or storage requirement of a local deployment. The full set of weights must still be available, which is why 15B active parameters for MiMo-V2.5 and 42B active parameters for Pro do not make either model a practical Mac mini or MacBook deployment.

What do the official benchmarks say?

Xiaomi publishes benchmark results in the MiMo-V2.5-Pro model card. The following values are from its base-model evaluation. They are not independent API tests, nor are they a guarantee for a particular coding-agent harness.

Benchmark	Setting	MiMo-V2.5-Pro Base	MiMo-V2.5 Base
MMLU	5-shot	89.4	86.3
MMLU-Pro	5-shot	68.5	65.8
HumanEval+	1-shot	75.6	71.3
MBPP+	3-shot	74.1	70.9
LiveCodeBench v6	1-shot	39.6	35.5
SWE-Bench, Agentless	3-shot	35.7	30.8

Selected official MiMo-V2.5-Pro base-model benchmark results

Source: Xiaomi’s MiMo-V2.5-Pro model card. Results are vendor-published base-model figures. Benchmark settings, prompts, agent harnesses and tools vary.

These results are useful signals, but they do not answer the practical question: will the model work well with your repository, tools, security requirements and budget?

Before making MiMo part of a real workflow, test at least:

a small bug fix with a passing test,
a multi-file refactor,
an ambiguous task with incomplete requirements,
repeated runs for cost, latency and failure rate.

A 1M-token context is useful, not magical

A million-token context window can help with large codebases, logs and document collections. It does not mean that sending everything is the best workflow.

Large prompts have real costs:

A one-off large input will normally be billed at the cache-miss price.
Important evidence can become harder for a model to retrieve in an enormous prompt.
More input increases the chance of exposing sensitive files to a cloud provider.

A better pattern is to search locally, select the relevant files, state a precise task and send only the context the model needs.

Current API pricing

The table below uses Xiaomi’s overseas pay-as-you-go prices per million tokens. Xiaomi’s pricing page was updated on June 17, 2026.

Model	Input: cache hit	Input: cache miss	Output
`mimo-v2.5-pro`	$0.0036	$0.435	$0.87
`mimo-v2.5`	$0.0028	$0.14	$0.28

Xiaomi MiMo-V2.5 API pricing per million tokens

Source: Xiaomi MiMo API Pricing, overseas pricing. Prices can change.

Cache hits are especially cheap because a reused prompt prefix can be served from Xiaomi’s prompt cache. For a one-off analysis of a large repository, budget using the cache-miss price.

Web search is billed separately. Xiaomi lists its Overseas Internet Connectivity Service at $5 per 1,000 calls. Any web-enabled agent needs an explicit search-call cap and cost logging.

Migration note for legacy MiMo model names

Xiaomi says legacy V2 models are being routed to V2.5 and will be deprecated on June 30, 2026. New integrations should use mimo-v2.5 or mimo-v2.5-pro, not build around the old mimo-v2-* names.

Pay-as-you-go vs. Token Plan

Topic	Pay-as-you-go	Token Plan
Key format	`sk-xxxxx`	`tp-xxxxx`
Billing	Actual token usage	Package with a shared credit quota
Intended use	API integrations and custom applications	Supported coding and agent tools
Keys interchangeable?	No	No
Base URL	Xiaomi API endpoint	The endpoint shown in the Token Plan account

Token Plan makes sense when you want to use MiMo inside supported developer tools. Xiaomi lists Claude Code, Cline, OpenCode and other coding tools. Usage is shared across the tools connected to the same plan.

There is an important limitation: Xiaomi’s documentation restricts Token Plan usage to programming tools. Automated scripts, custom application backends and other clearly non-coding API usage do not belong on this plan. For a custom application, pay-as-you-go is the cleaner option.

Always use the Base URL displayed in your Token Plan account. Xiaomi provides different OpenAI-compatible and Anthropic-compatible endpoints depending on cluster.

Test the MiMo API from a Mac

For a first pay-as-you-go test, curl is enough. Store the key in an environment variable first:

export MIMO_API_KEY="sk-your-key"

Then make a small, inspectable request:

curl --request POST "https://api.xiaomimimo.com/v1/chat/completions" \
  --header "api-key: $MIMO_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "mimo-v2.5-pro",
    "messages": [
      {
        "role": "user",
        "content": "Review this refactoring plan. Identify missing tests, risky assumptions and rollback steps."
      }
    ],
    "max_completion_tokens": 800,
    "temperature": 1.0,
    "top_p": 0.95,
    "stream": false
  }'

Xiaomi documents both the api-key header and Authorization: Bearer .... Never place an API key in Astro frontend code, public repositories or browser-side JavaScript. A website integration needs a server-side endpoint or serverless function.

What should stay local, and what can go to MiMo?

Decision guide: local Mac model or Xiaomi MiMo API

Keep it local

private notes, research data and client documents
personal data
offline workflows
short repeatable tasks
tasks that fit a local 7B, 14B or 32B model

MiMo API can fit well

long, non-confidential code and document context
tool-calling agent experiments
large refactoring plans
multimodal cloud work with MiMo-V2.5
a second opinion alongside a local model

Before production use, check Xiaomi’s current privacy documentation, contractual terms, data region and retention practice for your exact situation. Cloud processing is not a neutral storage location.

Recommendation

MiMo-V2.5-Pro is an interesting cloud option for long-context and coding-agent work at clearly stated, currently low API prices. Open weights are a meaningful positive, but they do not justify promising local Mac use where it is not practical.

The sensible AI-on-Mac split is simple:

Use local models for private and everyday work. Use MiMo selectively for non-confidential long contexts, coding agents and API experiments.

Sources and status

Status: June 21, 2026.

Xiaomi MiMo-V2.5-Pro: 1M context, low API prices, and why it is not a normal Mac model

The short answer

MiMo-V2.5 vs. MiMo-V2.5-Pro

What do the official benchmarks say?

A 1M-token context is useful, not magical

Current API pricing

Migration note for legacy MiMo model names

Pay-as-you-go vs. Token Plan

Test the MiMo API from a Mac

What should stay local, and what can go to MiMo?

Keep it local

MiMo API can fit well

Recommendation

Sources and status

Frequently Asked Questions

The short answer

MiMo-V2.5 vs. MiMo-V2.5-Pro

What do the official benchmarks say?

A 1M-token context is useful, not magical

Current API pricing

Migration note for legacy MiMo model names

Pay-as-you-go vs. Token Plan

Test the MiMo API from a Mac

What should stay local, and what can go to MiMo?

Keep it local

MiMo API can fit well

Recommendation

Sources and status

Frequently Asked Questions

Read more