Guides 8 min read

Qwen3.6 on Mac: 27B, 35B-A3B, Vision and Ollama

Run Qwen3.6 locally on Apple Silicon: 27B vs 35B-A3B, Ollama and MLX tags, vision, benchmarks and realistic RAM limits.

Technical research and editorial review. Original measurements are explicitly identified in the article.

Published: May 9, 2026 Updated: May 29, 2026

Editorial method

Qwen3.6 is useful for Mac users only if you keep the variants separate. There is a dense Qwen3.6-27B, a MoE model Qwen3.6-35B-A3B, Ollama tags with vision, text-only MLX tags and different memory footprints. I have been testing qwen3.6:27b on my Mac Mini M4 with 32 GB RAM, and it is currently my favorite all-rounder for coding and vision tasks.

This guide clears things up: which Qwen3.6 variant makes sense on Apple Silicon, which Ollama command is correct, when you get vision, when you only get text, and which benchmark scores actually belong to which model.

Qwen3.6 on Mac: choose by Ollama tag, vision, coding and unified memory

Original diagram based on the official Qwen model cards and the Ollama model page. Sources: Ollama Qwen3.6, Qwen3.6-27B Model Card, Qwen3.6-35B-A3B Model Card. Checked May 27, 2026.


Qwen3.6 — Facts as of May 2026

CriteriaQwen3.6-27BQwen3.6-35B-A3B
ReleaseApril 22, 2026April 14, 2026
ArchitectureDenseMixture of Experts
Parameters27B35B total / 3B active
Active per tokenall 27B3B active
Visionyes, with vision-capable tag/runtimeyes
Context262,144 native, up to ~1,010,000 extended262,144 native, up to ~1,010,000 extended
LicenseApache 2.0 / Open WeightsApache 2.0 / Open Weights
StrengthDense profile, vision, local Ollama useMoE, agentic coding, vision
Mac suitabilityrealistic as a quantized Ollama/MLX tag on 24-32 GBbetter with 32 GB+, depending on quantization and context

Qwen3.6-27B vs Qwen3.6-35B-A3B — Not the Same

Qwen3.6-27B is a dense model: all 27B parameters activate per token. Simpler to quantize, locally stable and a good entry point on Apple Silicon.

Qwen3.6-35B-A3B is a Mixture-of-Experts model: 35B total parameters, but only 3B activate per token. It is more efficient at inference than a dense model of the same total size, but it requires more care around tag selection, runtime, context length and vision support.


Which Ollama Tag Should I Use on Mac?

Per Ollama Library (as of May 2026):

GoalRecommended tagWhy
Vision + local useqwen3.6:27b17 GB, Text + Image, lower barrier to entry
Larger vision/MoE variantqwen3.6:35b24 GB, Text + Image, better on 32 GB+
MLX/Coding without visionqwen3.6:27b-mlx20 GB, MLX, Text-only
MoE/MLX without visionqwen3.6:35b-mlx22 GB, MLX, Text-only
Maximum quality / BF16BF16 tagsonly for very large Macs/workstations

Important: The MLX tags are listed as text-only on Ollama. If you need vision, do not automatically use an MLX tag. Check in the Ollama library whether the specific tag supports text, image, or text only.

Also important: ollama run qwen3.6 uses the current default/latest tag. If you specifically want the lighter local starting point, name the tag explicitly: qwen3.6:27b.


Setup with Ollama

Install Ollama

brew install ollama

Download a model

# Vision-capable 27B entry point (Text + Image)
ollama pull qwen3.6:27b

# Larger 35B-A3B/MoE variant (Text + Image)
ollama pull qwen3.6:35b

# MLX tag — Text-only, NOT vision-capable
ollama pull qwen3.6:27b-mlx

# MoE/MLX tag — Text-only
ollama pull qwen3.6:35b-mlx

Start a model

# Vision variant
ollama run qwen3.6:27b

# MLX variant
ollama run qwen3.6:27b-mlx

Vision: Only with the Right Tag

Qwen3.6 supports vision — but not every Ollama tag is vision-capable. qwen3.6:27b and qwen3.6:35b are listed as Text + Image. The MLX tags are listed as Text-only. I accidentally tried to use the MLX tag for image analysis and spent 20 minutes wondering why it was not working — so save yourself the trouble and check the tag before you start.

Image with a vision-capable variant:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
    model="qwen3.6:27b",  # or qwen3.6:35b — not mlx for vision
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/image.png"}},
            {"type": "text", "text": "What does this image show?"}
        ]
    }]
)
print(response.choices[0].message.content)

Thinking Used Correctly — Without the Thinking-Trace Hype

Qwen3.6 is trained for thinking workflows. But you should not blindly force long thinking traces:

  • For normal responses, a brief explanation is enough.
  • For agents, preserve_thinking is more important — the model uses previous thinking/work contexts more consistently.
  • enable_thinking: false disables thinking on supported runtimes.

In Ollama without special API parameters: simply ask for a brief explanation and the result, rather than asking the model to output its full reasoning chain.


Benchmarks: Which Numbers Belong to Which Model?

Many values use special agent scaffolds, long context windows or tool setups — not directly transferable to your Mac.

Qwen3.6-35B-A3B, per Qwen (April 2026)

BenchmarkScoreInterpretation
AIME 202692.7 %full AIME 2026 I & II, per Qwen
MMLU-Pro85.2 %knowledge/reasoning benchmark
LiveCodeBench v680.4 %coding
SWE-bench Verified73.4 %agentic coding, internal scaffold
Terminal-Bench 2.051.5 %5 runs, 256K context, special harness
MMMU81.7 %vision/multimodal
MathVista mini86.4 %visual mathematical reasoning

Qwen3.6-27B

The 27B scores come from the separate Qwen3.6-27B publication (April 22, 2026) and must not be mixed with the 35B-A3B table.

Note: Many Qwen benchmarks use special agent scaffolds, long context windows, multiple runs and tool setups. They are useful for orientation, but no guarantee for the same performance on a Mac with a quantized Ollama tag.


RAM / Unified Memory Recommendations

The Ollama model size is only a rough indicator. Actual memory use depends on quantization, context length, KV cache, runtime and other apps.

Unified memoryRecommendation
16 GBSmaller Qwen3 models or heavily quantized 27B experiments; Qwen3.6 not relaxed
24 GBqwen3.6:27b or 27b-mlx realistic, but context and parallel apps limit it
32 GBGood sweet spot for 27B and cautious 35B-A3B use
48 GB+Much more comfortable for 35B-A3B, vision and longer context
64 GB+BF16 and high context more realistic, still runtime-dependent

Speed on Mac depends heavily on Mac model, RAM, context, quantization, Ollama/MLX version, prompt length and vision.


Qwen3.6 vs Alternatives on Mac

ModelLocal on Mac?Vision?StrengthNote
Qwen3.6-27Byes, quantized realisticallyyes with right tagCoding, vision, dense 27B all-rounderneeds 24-32 GB to be comfortable
Qwen3.6-35B-A3Byes, but prefer 32 GB+yesMoE, agentic coding, long contextscheck tags/quantization carefully
Gemma 3 27ByesyesVision + solid local qualityolder, different benchmarks
Qwen3 14B/32Byesmostly text-only depending on taglighter/fasterless agent focus than Qwen3.6
Cloud modelsno/local notvaries by providermaximum qualityprivacy/cost/API

FAQ

Is Qwen3.6-27B the same as Qwen3.6-35B-A3B? No. 27B is a dense model, 35B-A3B is a MoE model with 35B total and 3B active per token.

Which Qwen3.6 tag should I install on my Mac? For vision: qwen3.6:27b as entry point. For MLX/text without vision: qwen3.6:27b-mlx. For more RAM/experimentation: qwen3.6:35b or qwen3.6:35b-mlx.

Does qwen3.6:27b-mlx have vision? Per Ollama, this tag is Text-only. For images, use a Text+Image tag like qwen3.6:27b or qwen3.6:35b.

Is 24 GB unified memory enough? For the 27B tag it can work with limited context, but it is not comfortable for many apps, vision and large context. 32 GB is more relaxed, 48 GB is better for 35B-A3B.

Are the Qwen benchmarks directly transferable to my Mac? No. Many scores come from special agent/server setups. On Mac, quantization, context length, runtime and memory strongly affect real performance.


Bottom Line

Qwen3.6 is useful for Mac users, but only if the variants are kept separate. Qwen3.6-27B is the more practical local starting point, especially with quantized Ollama or MLX tags. Qwen3.6-35B-A3B is relevant as a MoE model, but it requires more care around tags, runtime, vision support and memory.

The key rule: do not blindly install the tag with the most tempting name. Check whether you need vision, how much unified memory is free, whether the tag is text-only and whether the benchmark you cite actually belongs to that model variant. On my Mac Mini M4, qwen3.6:27b with the Text+Image tag hits the right balance between capability and memory usage.


Sources and Disclaimer

Checked on May 27, 2026. Qwen3.6 evolves quickly; Ollama tags, MLX quantizations and benchmark tables may change. Benchmark values come predominantly from Qwen’s own publications and should not be transferred unchecked to quantized Mac setups.

Frequently Asked Questions

Which Qwen3.6 variant is the right one for my Mac?

For 16 GB Macs: no Qwen3.6 variant runs comfortably — pick Qwen3 8B or Gemma 3 4B instead. For 24 GB: 27B-Dense with Q4 quantization works with short context. For 32-48 GB: 27B-Dense is comfortable, or 35B-A3B (MoE) — A3B is sparser on activation, so faster per token. For 64 GB+: 35B-A3B with full 256K context. On M4 Pro 64 GB, 35B-A3B is the sweet spot for 2026.

What does A3B mean in Qwen3.6-35B-A3B?

35B-A3B is a mixture-of-experts model with 35 billion total parameters and 3 billion active parameters per token. All expert weights still need to remain reachable in memory. Sparse activation reduces compute, but routing, shared layers and memory access do not make it equivalent to a dense 3B model.

Does Qwen3.6 support vision?

Yes, Ollama offers Qwen3.6 tags with text and image input. MLX tags may be text-only. Actual image quality depends on the model variant, runtime and task; a blanket equivalence with Qwen2-VL is not established.

How big are the Qwen3.6 Ollama packages?

27B-Dense is about 16–17 GB as Q4; 35B-A3B is about 20–22 GB because all experts must be loaded. BF16 needs roughly four times the weight storage of Q4; 35B BF16 alone is around 70 GB before runtime and context overhead.

Qwen3.6 vs Qwen3.5 — what is new?

Qwen3.6 brings better coding and agent benchmarks per Alibaba, a larger family (27B Dense, 35B MoE), and longer context windows. If you use Qwen3 14B or 30B-A3B in production, you should test Qwen3.6-35B-A3B as a direct successor. For new setups, the jump from 14B to 27B is usually more worthwhile than from 27B to 35B-A3B, because RAM scales linearly while quality does not.