Local Models
Local language, vision and audio models tested on Apple Silicon: Qwen3, Gemma3, Llama, Mistral and more — benchmarks and RAM requirements for M1–M4 Macs.
- Find the right model
- Setup per model
- Benchmark comparisons
- Know RAM needs
What counts as a local model?
Runs on your Mac
The model weights are downloaded and inference runs locally through Ollama, LM Studio, MLX, llama.cpp or a similar runtime.
Open weights, not always open source
Many local models are open-weight, but their license may still restrict commercial use, redistribution or fine-tuning.
Memory matters
Model size is not total memory use. Context length, KV cache, quantization, vision input and other apps also affect unified memory.
Privacy depends on configuration
Local inference can keep prompts on your Mac, but downloads, plugins, cloud features, exposed local servers and backups can still create data paths.
Start here
Local model checklist
- Is the model actually downloadable?
- Does it have Ollama, GGUF, MLX or LM Studio support?
- Is it text-only, vision-capable, audio-capable, or multimodal?
- What license applies: open source, open weights, research-only or commercial?
- How much unified memory is realistic after context and KV cache?
- Does it need cloud features, API calls or online tools?
- Can you run it offline after download?
- Does it fit your task better than a smaller model?
-
Gemma 4 12B on Mac: Does Google's New Model Really Work with 16 GB?
Gemma 4 12B can run locally from 16 GB and brings 256K context plus image and audio understanding. What actually works on Mac.
-
Qwen3-ASR + Qwen3-TTS vs. Grok Voice: Local or Cloud?
Qwen3-ASR, Qwen3-TTS and Grok Voice compared: ASR, TTS, voice agents, privacy and pricing.
-
Laguna XS.2 on Mac: Coding Model, Benchmarks and RAM Limits
Laguna XS.2 from Poolside scores 69.9% on SWE-bench Verified. What runs locally on Mac, which Ollama tags matter and where Qwen3.6 leads.
-
Local Vision LLMs on Mac: Which Models Are Actually Worth It?
Gemma 3, Qwen2.5-VL, Llama 3.2 Vision, and Moondream compared on Apple Silicon: OCR, screenshots, documents, benchmarks, RAM, and solid prompts.
-
Small LLMs on Mac: Which Ones Are Worth It?
Small local LLMs for Apple Silicon: Qwen3, Qwen3.5, Ollama, memory needs and practical settings.
-
Best Local LLMs for Mac (2026): 16 GB, 24 GB, 32 GB & 64 GB Picks
The best local LLMs for Mac in 2026, split by unified memory: practical Qwen3.6, Gemma 4 and Llama 4 choices for 16 GB to 64 GB+ Macs.
How local model recommendations are made
Local model recommendations on AI on Mac should separate model size, quantization, runtime, context length, Apple Silicon generation and unified memory. A model that works on a 48 GB Mac Studio may be unrealistic on an 8 GB MacBook Air. The articles in this category should also distinguish between open source, open weights, cloud-only APIs and hybrid tools.