Setting Up Ollama on Mac Mini M4 — Your Local AI Assistant

You want to run large language models directly on your Mac Mini M4? Then Ollama is exactly the right tool for you. With the M4 chip, you have enough power to use various AI models locally and without cloud connectivity. Here’s how to get started in just a few steps.

TL;DR

Ollama installs in under 2 minutes via drag & drop
16 GB RAM is sufficient for smaller models (Llama 3.2 1B/3B), 24 GB for more demanding ones
After installation, download a model with ollama pull <name>
Chat directly in the terminal with ollama run <name>
Enable API server for external tools: ollama serve

Who Is This For?

Ollama on the Mac Mini M4 makes sense for you if you:

Privacy is paramount — no data leaves your machine
Cost efficiency is important — no ongoing API costs after purchase
Want to work offline or need to
Develop and want to integrate local LLMs into tools

Less useful: If you need maximum performance and don’t care about cloud costs. An M4 Mac Mini performs well in benchmarks, but a large cloud cluster is faster.

Check Prerequisites

Before you start, make sure:

macOS Sonoma (14.x) or newer is installed
RAM: 16 GB is sufficient for smaller models (Llama 3.2 1B/3B, Phi-3), 24 GB enables larger variants (Llama 3.2 7B, Mistral), 32 GB for the most demanding models
Storage: A model requires 2–20 GB of space depending on size

Install Ollama

Option 1: Official Download

Go to ollama.com/download
Download the macOS package (~180 MB)
Open the .dmg file
Drag the Ollama icon to your Applications folder

On first launch, no window appears. Ollama runs automatically in the background — you can recognize it by the icon in the menu bar.

Option 2: Homebrew

brew install ollama
brew services start ollama

Advantage: Stay up to date with brew upgrade.

Download Your First Model

Open the terminal and download a model:

ollama pull llama3.2

The first download takes a few minutes depending on your internet connection. Ollama stores all models locally under ~/.ollama/models/.

Recommended Models for the M4

Model	RAM Requirement	Use Case
llama3.2:1b	~1 GB	Quick tests, resource-efficient
llama3.2:3b	~2 GB	Good all-round entry point
phi3:latest	~2 GB	Compact, good quality
mistral:latest	~4 GB	Well-balanced
codellama:7b	~4 GB	Programming tasks

You can find more models at ollama.com/library.

Start and Use a Model

ollama run llama3.2

From now on, you can chat directly in the terminal. End the session with /bye or Ctrl+D.

Switch Between Models

ollama run mistral
ollama run codellama:7b

Every model you have downloaded with ollama pull is available immediately.

Enable API Server

Ollama comes with a built-in API server. This allows you to integrate LLMs into other tools:

ollama serve

The server runs on http://localhost:11434. You can now address it with HTTP requests:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain Apple Silicon in two sentences.",
  "stream": false
}'

Important: The server is only reachable locally by default. Do not expose it to the network without protection.

Use Ollama with a WebUI

If you prefer a graphical interface, you can install a WebUI alongside Ollama. The most well-known option is Ollama WebUI — a modern chat interface that feels like a local ChatGPT.

# Start WebUI (Docker version)
docker run -d -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://localhost:11434 \
  --name webui ghcr.io/ollama-webui/ollama-webui:main

Then open http://localhost:3000 in your browser.

Common Errors and Fixes

”command not found: ollama”

Ollama is not running. Start it either via the app or with brew services start ollama.

Model won’t start — not enough RAM

Your Mac has insufficient free memory. Close other apps or switch to a smaller model.

Download aborts

Check your internet connection. Alternatively: continue the download with ollama pull <model> — Ollama resumes interrupted downloads.

Slow responses

The larger the model, the slower on the M4. Check with ollama ps which model is active and how much RAM it’s consuming.

Tradeoffs — Honestly Considered

What’s good:

Complete privacy — no data leaves your machine
No ongoing costs after purchase
Simple installation and operation

What’s less good:

M4 Mac Mini is slower than an H100 cluster in the cloud
Models need to be downloaded and stored locally
Not all models run optimally on Apple Silicon (native MLX optimization is partially missing in Ollama)
Updates and new models need to be fetched manually

Conclusion

Ollama on the Mac Mini M4 is the fastest way to try out local AI models. Installation takes no more than 5 minutes, and you’re productive immediately. For developers, privacy enthusiasts, and anyone who doesn’t want cloud dependency, the Mac Mini M4 + Ollama combination is a pragmatic entry point.

If you want to dive deeper, check out LM Studio as an alternative — there you get native MLX support and an even more comfortable interface.