Whisper on Mac: Local Transcription Without Cloud
Whisper locally on Apple Silicon: mlx-whisper, WhisperKit, privacy and speaker diarization.
Whisper can transcribe audio files, interviews, meetings, or voice memos locally. On Apple Silicon, mlx-whisper is the easiest entry point.
Quick start
Start with mlx-whisper and the small or medium model. Use large-v3 when quality matters more than speed. For speaker diarization, add pyannote.
Model choice
tiny (~75 MB): only for very short clips or pre-filtering.
base (~140 MB): fast, but inaccurate.
small (~460 MB): good for real-time streaming.
medium (~1.5 GB): my go-to for most tasks.
large-v3 (~3 GB): maximum accuracy, especially for German.
What I tested
I ran mlx-whisper on my Mac Mini M4 with different models. Here’s what I noticed:
medium is the sweet spot. For interviews and meetings, it delivers usable transcripts. Speed is acceptable.
large-v3 is noticeably better, especially for German with dialect. But it takes more time and memory.
Important: Privacy
Whisper itself doesn’t send data to the cloud. But apps like MacWhisper or WhisperKit are offline. Some commercial apps forward audio to cloud APIs. Check network activity with Little Snitch.
Speaker diarization
Standard Whisper doesn’t support diarization. For speaker labels, add pyannote.audio. It runs locally, but adds another model pass (~1-2 GB).
My verdict
Whisper is the best local speech recognition for Mac. Start with mlx-whisper and medium, upgrade to large-v3 if needed.
Tested June 2026 on Mac Mini M4 with 32 GB.
Transparency
Sources and review basis
These primary and reference sources form the basis of the technical assessment. Vendor claims and external benchmarks are identified as such in the article.