Qwen3.6 on Mac: 27B, 35B-A3B, Vision and Ollama
Run Qwen3.6 locally on Apple Silicon: 27B vs 35B-A3B, Ollama and MLX tags, vision, benchmarks and realistic RAM limits.
Qwen3.6 is useful for Mac users only if you keep the variants separate. There is a dense Qwen3.6-27B, a MoE model Qwen3.6-35B-A3B, Ollama tags with vision, text-only MLX tags and different memory footprints. I have been testing qwen3.6:27b on my Mac Mini M4 with 32 GB RAM, and it is currently my favorite all-rounder for coding and vision tasks.
This guide clears things up: which Qwen3.6 variant makes sense on Apple Silicon, which Ollama command is correct, when you get vision, when you only get text, and which benchmark scores actually belong to which model.
Original diagram based on the official Qwen model cards and the Ollama model page. Sources: Ollama Qwen3.6, Qwen3.6-27B Model Card, Qwen3.6-35B-A3B Model Card. Checked May 27, 2026.
Qwen3.6 — Facts as of May 2026
| Criteria | Qwen3.6-27B | Qwen3.6-35B-A3B |
|---|---|---|
| Release | April 22, 2026 | April 14, 2026 |
| Architecture | Dense | Mixture of Experts |
| Parameters | 27B | 35B total / 3B active |
| Active per token | all 27B | 3B active |
| Vision | yes, with vision-capable tag/runtime | yes |
| Context | 262,144 native, up to ~1,010,000 extended | 262,144 native, up to ~1,010,000 extended |
| License | Apache 2.0 / Open Weights | Apache 2.0 / Open Weights |
| Strength | Dense profile, vision, local Ollama use | MoE, agentic coding, vision |
| Mac suitability | realistic as a quantized Ollama/MLX tag on 24-32 GB | better with 32 GB+, depending on quantization and context |
Qwen3.6-27B vs Qwen3.6-35B-A3B — Not the Same
Qwen3.6-27B is a dense model: all 27B parameters activate per token. Simpler to quantize, locally stable and a good entry point on Apple Silicon.
Qwen3.6-35B-A3B is a Mixture-of-Experts model: 35B total parameters, but only 3B activate per token. It is more efficient at inference than a dense model of the same total size, but it requires more care around tag selection, runtime, context length and vision support.
Which Ollama Tag Should I Use on Mac?
Per Ollama Library (as of May 2026):
| Goal | Recommended tag | Why |
|---|---|---|
| Vision + local use | qwen3.6:27b | 17 GB, Text + Image, lower barrier to entry |
| Larger vision/MoE variant | qwen3.6:35b | 24 GB, Text + Image, better on 32 GB+ |
| MLX/Coding without vision | qwen3.6:27b-mlx | 20 GB, MLX, Text-only |
| MoE/MLX without vision | qwen3.6:35b-mlx | 22 GB, MLX, Text-only |
| Maximum quality / BF16 | BF16 tags | only for very large Macs/workstations |
Important: The MLX tags are listed as text-only on Ollama. If you need vision, do not automatically use an MLX tag. Check in the Ollama library whether the specific tag supports text, image, or text only.
Also important:
ollama run qwen3.6uses the current default/latest tag. If you specifically want the lighter local starting point, name the tag explicitly:qwen3.6:27b.
Setup with Ollama
Install Ollama
brew install ollama
Download a model
# Vision-capable 27B entry point (Text + Image)
ollama pull qwen3.6:27b
# Larger 35B-A3B/MoE variant (Text + Image)
ollama pull qwen3.6:35b
# MLX tag — Text-only, NOT vision-capable
ollama pull qwen3.6:27b-mlx
# MoE/MLX tag — Text-only
ollama pull qwen3.6:35b-mlx
Start a model
# Vision variant
ollama run qwen3.6:27b
# MLX variant
ollama run qwen3.6:27b-mlx
Vision: Only with the Right Tag
Qwen3.6 supports vision — but not every Ollama tag is vision-capable. qwen3.6:27b and qwen3.6:35b are listed as Text + Image. The MLX tags are listed as Text-only. I accidentally tried to use the MLX tag for image analysis and spent 20 minutes wondering why it was not working — so save yourself the trouble and check the tag before you start.
Image with a vision-capable variant:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
model="qwen3.6:27b", # or qwen3.6:35b — not mlx for vision
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}},
{"type": "text", "text": "What does this image show?"}
]
}]
)
print(response.choices[0].message.content)
Thinking Used Correctly — Without the Thinking-Trace Hype
Qwen3.6 is trained for thinking workflows. But you should not blindly force long thinking traces:
- For normal responses, a brief explanation is enough.
- For agents,
preserve_thinkingis more important — the model uses previous thinking/work contexts more consistently. enable_thinking: falsedisables thinking on supported runtimes.
In Ollama without special API parameters: simply ask for a brief explanation and the result, rather than asking the model to output its full reasoning chain.
Benchmarks: Which Numbers Belong to Which Model?
Many values use special agent scaffolds, long context windows or tool setups — not directly transferable to your Mac.
Qwen3.6-35B-A3B, per Qwen (April 2026)
| Benchmark | Score | Interpretation |
|---|---|---|
| AIME 2026 | 92.7 % | full AIME 2026 I & II, per Qwen |
| MMLU-Pro | 85.2 % | knowledge/reasoning benchmark |
| LiveCodeBench v6 | 80.4 % | coding |
| SWE-bench Verified | 73.4 % | agentic coding, internal scaffold |
| Terminal-Bench 2.0 | 51.5 % | 5 runs, 256K context, special harness |
| MMMU | 81.7 % | vision/multimodal |
| MathVista mini | 86.4 % | visual mathematical reasoning |
Qwen3.6-27B
The 27B scores come from the separate Qwen3.6-27B publication (April 22, 2026) and must not be mixed with the 35B-A3B table.
Note: Many Qwen benchmarks use special agent scaffolds, long context windows, multiple runs and tool setups. They are useful for orientation, but no guarantee for the same performance on a Mac with a quantized Ollama tag.
RAM / Unified Memory Recommendations
The Ollama model size is only a rough indicator. Actual memory use depends on quantization, context length, KV cache, runtime and other apps.
| Unified memory | Recommendation |
|---|---|
| 16 GB | Smaller Qwen3 models or heavily quantized 27B experiments; Qwen3.6 not relaxed |
| 24 GB | qwen3.6:27b or 27b-mlx realistic, but context and parallel apps limit it |
| 32 GB | Good sweet spot for 27B and cautious 35B-A3B use |
| 48 GB+ | Much more comfortable for 35B-A3B, vision and longer context |
| 64 GB+ | BF16 and high context more realistic, still runtime-dependent |
Speed on Mac depends heavily on Mac model, RAM, context, quantization, Ollama/MLX version, prompt length and vision.
Qwen3.6 vs Alternatives on Mac
| Model | Local on Mac? | Vision? | Strength | Note |
|---|---|---|---|---|
| Qwen3.6-27B | yes, quantized realistically | yes with right tag | Coding, vision, dense 27B all-rounder | needs 24-32 GB to be comfortable |
| Qwen3.6-35B-A3B | yes, but prefer 32 GB+ | yes | MoE, agentic coding, long contexts | check tags/quantization carefully |
| Gemma 3 27B | yes | yes | Vision + solid local quality | older, different benchmarks |
| Qwen3 14B/32B | yes | mostly text-only depending on tag | lighter/faster | less agent focus than Qwen3.6 |
| Cloud models | no/local not | varies by provider | maximum quality | privacy/cost/API |
FAQ
Is Qwen3.6-27B the same as Qwen3.6-35B-A3B? No. 27B is a dense model, 35B-A3B is a MoE model with 35B total and 3B active per token.
Which Qwen3.6 tag should I install on my Mac?
For vision: qwen3.6:27b as entry point. For MLX/text without vision: qwen3.6:27b-mlx. For more RAM/experimentation: qwen3.6:35b or qwen3.6:35b-mlx.
Does qwen3.6:27b-mlx have vision?
Per Ollama, this tag is Text-only. For images, use a Text+Image tag like qwen3.6:27b or qwen3.6:35b.
Is 24 GB unified memory enough? For the 27B tag it can work with limited context, but it is not comfortable for many apps, vision and large context. 32 GB is more relaxed, 48 GB is better for 35B-A3B.
Are the Qwen benchmarks directly transferable to my Mac? No. Many scores come from special agent/server setups. On Mac, quantization, context length, runtime and memory strongly affect real performance.
Bottom Line
Qwen3.6 is useful for Mac users, but only if the variants are kept separate. Qwen3.6-27B is the more practical local starting point, especially with quantized Ollama or MLX tags. Qwen3.6-35B-A3B is relevant as a MoE model, but it requires more care around tags, runtime, vision support and memory.
The key rule: do not blindly install the tag with the most tempting name. Check whether you need vision, how much unified memory is free, whether the tag is text-only and whether the benchmark you cite actually belongs to that model variant. On my Mac Mini M4, qwen3.6:27b with the Text+Image tag hits the right balance between capability and memory usage.
Sources and Disclaimer
Checked on May 27, 2026. Qwen3.6 evolves quickly; Ollama tags, MLX quantizations and benchmark tables may change. Benchmark values come predominantly from Qwen’s own publications and should not be transferred unchecked to quantized Mac setups.
Frequently Asked Questions
Which Qwen3.6 variant is the right one for my Mac?
For 16 GB Macs: no Qwen3.6 variant runs comfortably — pick Qwen3 8B or Gemma 3 4B instead. For 24 GB: 27B-Dense with Q4 quantization works with short context. For 32-48 GB: 27B-Dense is comfortable, or 35B-A3B (MoE) — A3B is sparser on activation, so faster per token. For 64 GB+: 35B-A3B with full 256K context. On M4 Pro 64 GB, 35B-A3B is the sweet spot for 2026.
What does A3B mean in Qwen3.6-35B-A3B?
35B-A3B is a mixture-of-experts model with 35 billion total parameters and 3 billion active parameters per token. All expert weights still need to remain reachable in memory. Sparse activation reduces compute, but routing, shared layers and memory access do not make it equivalent to a dense 3B model.
Does Qwen3.6 support vision?
Yes, Ollama offers Qwen3.6 tags with text and image input. MLX tags may be text-only. Actual image quality depends on the model variant, runtime and task; a blanket equivalence with Qwen2-VL is not established.
How big are the Qwen3.6 Ollama packages?
27B-Dense is about 16–17 GB as Q4; 35B-A3B is about 20–22 GB because all experts must be loaded. BF16 needs roughly four times the weight storage of Q4; 35B BF16 alone is around 70 GB before runtime and context overhead.
Qwen3.6 vs Qwen3.5 — what is new?
Qwen3.6 brings better coding and agent benchmarks per Alibaba, a larger family (27B Dense, 35B MoE), and longer context windows. If you use Qwen3 14B or 30B-A3B in production, you should test Qwen3.6-35B-A3B as a direct successor. For new setups, the jump from 14B to 27B is usually more worthwhile than from 27B to 35B-A3B, because RAM scales linearly while quality does not.