macOS系统用 oMLX 在本地运行大模型

2026-05-02

#macOS #AI

1 引言

相比于 ollama、LM Studio等，macOS系统使用 oMLX 优化，本地大模型的响应速度明显提高。

正如官网宣传：

oMLX: macOS-native MLX server with smart caching. Claude Code, OpenClaw, and Cursor respond in 5 seconds, not 90.

2 配置概览

总内存：32 GB；可用显存：23.0 GB（系统内核及后台进程保留约 9 GB）
总内存：64 GB；可用显存：50.4 GB（系统内核及后台进程保留约 13.6 GB）

3 环境安装

下载地址： https://omlx.ai
要求：Apple Silicon / macOS 15+

4 资源预估

总原则：模型不超过可用显存。

推荐 Hugging Face 专为 Apple Silicon 优化的 oMLX 格式的 mlx-community 系列模型，比如：

DeepSeek-R1-0528-Qwen3-8B-MLX-8bit
DeepSeek-R1-Distill-Llama-70B-4bit
GLM-4.7-Flash-MLX-8bit
Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-mlx-8bit
Llama-3.3-70B-Instruct-4bit
MLX-Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-8bit
Phi-4-reasoning-plus-MLX-4bit
Qwen2.5-72B-Instruct-4bit
Qwen2.5-Coder-32B-Instruct-MLX-8bit
Qwen3-Coder-30B-A3B-Instruct-MLX-4bit
Qwen3-Coder-Next-MLX-4bit
Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit
Qwen3.6-35B-A3B-4bit
gemma-4-31b-it-4bit
gpt-oss-20b-MXFP4-Q8