Cloud AI 12 min read

Xiaomi MiMo-V2.5-Pro: 1M context, low API prices, and why it is not a normal Mac model

Xiaomi MiMo-V2.5-Pro API pricing, Token Plan, setup and the key Mac answer: it is a cloud model, not a normal local Apple Silicon download.

Technical research and editorial review. Original measurements are explicitly identified in the article.

Published: May 27, 2026 Updated: June 22, 2026

Editorial method

Xiaomi MiMo-V2.5-Pro is not the kind of model you install in Ollama on a MacBook and casually use offline. It is a sparse Mixture-of-Experts model with 1.02 trillion total parameters, 42 billion active parameters and a context window of up to 1 million tokens. For Mac users, Xiaomi’s API or a Token Plan for coding tools is the realistic entry point.

The other headline is cost. Xiaomi currently lists $0.435 per million cache-miss input tokens and $0.87 per million output tokens for MiMo-V2.5-Pro in overseas pay-as-you-go pricing. The frequently repeated “up to 99% lower” claim comes from Xiaomi’s comparison with its own earlier pricing. It is not an independent price-performance ranking.

Xiaomi MiMo-V2.5 and MiMo-V2.5-Pro compared

Published model sizes and capabilities. Sources: MiMo-V2.5-Pro model card and MiMo-V2.5 model card.

The short answer

MiMo-V2.5-Pro is worth testing for low-cost cloud coding agents, large repositories and long contexts. It is not a realistic local model for ordinary Apple Silicon Macs.

The MIT license is useful for research, server deployments and the wider ecosystem. It does not change the hardware math. At 1.02T parameters, even FP8 weights alone are roughly in the terabyte range before runtime overhead, KV cache and parallelism.

MiMo-V2.5 vs. MiMo-V2.5-Pro

PropertyMiMo-V2.5MiMo-V2.5-Pro
Primary useMultimodal tasksText, coding and agents
InputsText, image, video and audioText
ArchitectureSparse MoESparse MoE
Total parameters310B1.02T
Active parameters per token15B42B
Context windowup to 1M tokensup to 1M tokens
Model IDmimo-v2.5mimo-v2.5-pro
WeightsMIT license, according to XiaomiMIT license, according to Xiaomi
Realistic Mac routeAPIAPI

MiMo-V2.5 is the multimodal member of the family, built for text, images, video and audio. MiMo-V2.5-Pro is the text-focused model Xiaomi positions for agents, complex software tasks and long tool-use workflows.

Active parameter count helps estimate compute per generated token. It is not the same as the RAM or storage requirement of a local deployment. The full set of weights must still be available, which is why 15B active parameters for MiMo-V2.5 and 42B active parameters for Pro do not make either model a practical Mac mini or MacBook deployment.

What do the official benchmarks say?

Xiaomi publishes benchmark results in the MiMo-V2.5-Pro model card. The following values are from its base-model evaluation. They are not independent API tests, nor are they a guarantee for a particular coding-agent harness.

BenchmarkSettingMiMo-V2.5-Pro BaseMiMo-V2.5 Base
MMLU5-shot89.486.3
MMLU-Pro5-shot68.565.8
HumanEval+1-shot75.671.3
MBPP+3-shot74.170.9
LiveCodeBench v61-shot39.635.5
SWE-Bench, Agentless3-shot35.730.8

Selected official MiMo-V2.5-Pro base-model benchmark results

Source: Xiaomi’s MiMo-V2.5-Pro model card. Results are vendor-published base-model figures. Benchmark settings, prompts, agent harnesses and tools vary.

These results are useful signals, but they do not answer the practical question: will the model work well with your repository, tools, security requirements and budget?

Before making MiMo part of a real workflow, test at least:

  1. a small bug fix with a passing test,
  2. a multi-file refactor,
  3. an ambiguous task with incomplete requirements,
  4. repeated runs for cost, latency and failure rate.

A 1M-token context is useful, not magical

A million-token context window can help with large codebases, logs and document collections. It does not mean that sending everything is the best workflow.

Large prompts have real costs:

  • A one-off large input will normally be billed at the cache-miss price.
  • Important evidence can become harder for a model to retrieve in an enormous prompt.
  • More input increases the chance of exposing sensitive files to a cloud provider.

A better pattern is to search locally, select the relevant files, state a precise task and send only the context the model needs.

Current API pricing

The table below uses Xiaomi’s overseas pay-as-you-go prices per million tokens. Xiaomi’s pricing page was updated on June 17, 2026.

ModelInput: cache hitInput: cache missOutput
mimo-v2.5-pro$0.0036$0.435$0.87
mimo-v2.5$0.0028$0.14$0.28

Xiaomi MiMo-V2.5 API pricing per million tokens

Source: Xiaomi MiMo API Pricing, overseas pricing. Prices can change.

Cache hits are especially cheap because a reused prompt prefix can be served from Xiaomi’s prompt cache. For a one-off analysis of a large repository, budget using the cache-miss price.

Web search is billed separately. Xiaomi lists its Overseas Internet Connectivity Service at $5 per 1,000 calls. Any web-enabled agent needs an explicit search-call cap and cost logging.

Migration note for legacy MiMo model names

Xiaomi says legacy V2 models are being routed to V2.5 and will be deprecated on June 30, 2026. New integrations should use mimo-v2.5 or mimo-v2.5-pro, not build around the old mimo-v2-* names.

Pay-as-you-go vs. Token Plan

TopicPay-as-you-goToken Plan
Key formatsk-xxxxxtp-xxxxx
BillingActual token usagePackage with a shared credit quota
Intended useAPI integrations and custom applicationsSupported coding and agent tools
Keys interchangeable?NoNo
Base URLXiaomi API endpointThe endpoint shown in the Token Plan account

Token Plan makes sense when you want to use MiMo inside supported developer tools. Xiaomi lists Claude Code, Cline, OpenCode and other coding tools. Usage is shared across the tools connected to the same plan.

There is an important limitation: Xiaomi’s documentation restricts Token Plan usage to programming tools. Automated scripts, custom application backends and other clearly non-coding API usage do not belong on this plan. For a custom application, pay-as-you-go is the cleaner option.

Always use the Base URL displayed in your Token Plan account. Xiaomi provides different OpenAI-compatible and Anthropic-compatible endpoints depending on cluster.

Test the MiMo API from a Mac

For a first pay-as-you-go test, curl is enough. Store the key in an environment variable first:

export MIMO_API_KEY="sk-your-key"

Then make a small, inspectable request:

curl --request POST "https://api.xiaomimimo.com/v1/chat/completions" \
  --header "api-key: $MIMO_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "mimo-v2.5-pro",
    "messages": [
      {
        "role": "user",
        "content": "Review this refactoring plan. Identify missing tests, risky assumptions and rollback steps."
      }
    ],
    "max_completion_tokens": 800,
    "temperature": 1.0,
    "top_p": 0.95,
    "stream": false
  }'

Xiaomi documents both the api-key header and Authorization: Bearer .... Never place an API key in Astro frontend code, public repositories or browser-side JavaScript. A website integration needs a server-side endpoint or serverless function.

What should stay local, and what can go to MiMo?

Decision guide: local Mac model or Xiaomi MiMo API

Keep it local

  • private notes, research data and client documents
  • personal data
  • offline workflows
  • short repeatable tasks
  • tasks that fit a local 7B, 14B or 32B model

MiMo API can fit well

  • long, non-confidential code and document context
  • tool-calling agent experiments
  • large refactoring plans
  • multimodal cloud work with MiMo-V2.5
  • a second opinion alongside a local model

Before production use, check Xiaomi’s current privacy documentation, contractual terms, data region and retention practice for your exact situation. Cloud processing is not a neutral storage location.

Recommendation

MiMo-V2.5-Pro is an interesting cloud option for long-context and coding-agent work at clearly stated, currently low API prices. Open weights are a meaningful positive, but they do not justify promising local Mac use where it is not practical.

The sensible AI-on-Mac split is simple:

Use local models for private and everyday work. Use MiMo selectively for non-confidential long contexts, coding agents and API experiments.

Sources and status

Status: June 21, 2026.

Frequently Asked Questions

Can Xiaomi MiMo-V2.5-Pro run locally on a normal Mac?

No, not as a practical everyday setup. MiMo-V2.5-Pro has 1.02T total parameters and Xiaomi's official deployment examples target distributed server infrastructure with SGLang or vLLM.

How much does MiMo-V2.5-Pro cost through the API?

For overseas pay-as-you-go, Xiaomi currently lists $0.0036 per million input tokens on a cache hit, $0.435 on a cache miss and $0.87 per million output tokens.

What is the difference between pay-as-you-go and Token Plan?

Pay-as-you-go bills API usage by token through sk- keys. Token Plan uses separate tp- keys and Xiaomi positions it for supported coding and agent tools, rather than arbitrary scripts or custom product backends.

Are MiMo-V2.5 and MiMo-V2.5-Pro open source?

Xiaomi publishes the weights under the MIT license on Hugging Face. Available weights do not make either model practical to run locally on typical consumer Mac hardware.