macOS内存32GB和64GB实测本地大模型
1 说明
oMLX - LLM inference, optimized for your Mac https://github.com/jundot/omlx
实测本地大模型是在 macOS 系统 32 GB 和 64 GB的两台设备进行:
- 总内存:32 GB;可用显存:23.0 GB(系统内核及后台进程保留约 9 GB)
- 总内存:64 GB;可用显存:50.4 GB(系统内核及后台进程保留约 13.6 GB)
2 结论
最好单次 E2E(s) 时间为 5s 内,流畅度高,体验流畅。
总内存 32 GB详细测试:
| 模型 | 耗时 |
|---|---|
| Qwen3.6-35B-A3B-4bit | 3.240s |
| Qwen3.6-27B-4bit | 15.251s |
| Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | 3.375s |
| gemma-4-26B-A4B-it-MLX-4bit | 3.375s |
总内存 64 GB详细测试:
| 模型 | 耗时 |
|---|---|
| Qwen3.6-35B-A3B-4bit | 2.837s |
| Qwen3.6-35B-A3B-6bit | 3.367s |
| Qwen3.6-35B-A3B-nvfp4 | 2.990s |
| Qwen3.6-35B-A3B-8bit | 3.615s |
| DeepSeek-R1-Distill-Llama-70B-4bit | 2.990s |
| GLM-4.7-Flash-MLX-8bit | 4.673s |
| Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-mlx-8bit | 3.627s |
| Llama-3.3-70B-Instruct-4bit | 42.217s |
| MLX-Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-8bit | 3.625s |
| Qwen3-Coder-30B-A3B-Instruct-MLX-4bit | 2.944s |
| Qwen3-Coder-Next-MLX-4bit | 3.747s |
| Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit | 20.165s |
| gemma-4-31b-it-4bit | 20.325s |
| gpt-oss-20b-MXFP4-Q8 | 3.485s |
3 总内存 32 GB详细测试
Benchmark Model: Qwen3.6-35B-A3B-4bit
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1731.1 11.88 591.5 tok/s 84.8 tok/s 3.240 355.5 tok/s 19.24 GB
pp4096/tg128 5490.4 12.78 746.0 tok/s 78.9 tok/s 7.114 593.8 tok/s 20.04 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 84.8 tok/s 1.00x 591.5 tok/s 591.5 tok/s 1731.1 3.240
2x 121.2 tok/s 1.43x 620.8 tok/s 310.4 tok/s 3162.9 5.411
4x 149.8 tok/s 1.77x 620.7 tok/s 155.2 tok/s 6202.0 10.018
Benchmark Model: Qwen3.6-27B-4bit
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 9144.2 48.09 112.0 tok/s 21.0 tok/s 15.251 75.5 tok/s 15.85 GB
pp4096/tg128 35801.6 50.76 114.4 tok/s 19.9 tok/s 42.248 100.0 tok/s 17.27 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 21.0 tok/s 1.00x 112.0 tok/s 112.0 tok/s 9144.2 15.251
2x 28.3 tok/s 1.35x 111.4 tok/s 55.7 tok/s 18230.3 27.413
4x 30.1 tok/s 1.43x 111.2 tok/s 27.8 tok/s 36281.8 53.836
Benchmark Model: Qwen3-Coder-30B-A3B-Instruct-MLX-4bit
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1736.9 12.90 589.6 tok/s 78.2 tok/s 3.375 341.4 tok/s 16.59 GB
pp4096/tg128 6513.8 15.99 628.8 tok/s 63.0 tok/s 8.545 494.3 tok/s 17.20 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 78.2 tok/s 1.00x 589.6 tok/s 589.6 tok/s 1736.9 3.375
2x 103.1 tok/s 1.32x 598.2 tok/s 299.1 tok/s 3347.2 5.908
4x 123.8 tok/s 1.58x 697.9 tok/s 174.5 tok/s 4452.0 10.004
Benchmark Model: gemma-4-26B-A4B-it-MLX-4bit
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1884.6 14.65 543.4 tok/s 68.8 tok/s 3.745 307.6 tok/s 14.27 GB
pp4096/tg128 7043.6 15.51 581.5 tok/s 65.0 tok/s 9.014 468.6 tok/s 14.95 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 68.8 tok/s 1.00x 543.4 tok/s 543.4 tok/s 1884.6 3.745
2x 95.3 tok/s 1.39x 547.1 tok/s 273.6 tok/s 3591.9 6.430
4x 112.6 tok/s 1.64x 536.6 tok/s 134.2 tok/s 7019.8 12.182
4 总内存 64 GB详细测试
Benchmark Model: Qwen3.6-35B-A3B-4bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1345.8 11.74 760.9 tok/s 85.9 tok/s 2.837 406.1 tok/s 19.27 GB
pp4096/tg128 4554.1 12.27 899.4 tok/s 82.2 tok/s 6.112 691.1 tok/s 20.04 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 85.9 tok/s 1.00x 760.9 tok/s 760.9 tok/s 1345.8 2.837
2x 129.2 tok/s 1.50x 763.7 tok/s 381.9 tok/s 2567.0 4.664
4x 160.9 tok/s 1.87x 749.5 tok/s 187.4 tok/s 5067.9 8.648
Benchmark Model: Qwen3.6-35B-A3B-6bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1451.9 15.08 705.3 tok/s 66.8 tok/s 3.367 342.1 tok/s 27.30 GB
pp4096/tg128 4730.4 15.63 865.9 tok/s 64.5 tok/s 6.716 629.0 tok/s 28.11 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 66.8 tok/s 1.00x 705.3 tok/s 705.3 tok/s 1451.9 3.367
2x 96.3 tok/s 1.44x 728.2 tok/s 364.1 tok/s 2685.8 5.470
4x 112.7 tok/s 1.69x 709.9 tok/s 177.5 tok/s 5364.2 10.313
Benchmark Model: Qwen3.6-35B-A3B-nvfp4
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1431.4 12.27 715.4 tok/s 82.1 tok/s 2.990 385.2 tok/s 19.27 GB
pp4096/tg128 4890.5 12.81 837.5 tok/s 78.7 tok/s 6.517 648.2 tok/s 20.04 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 82.1 tok/s 1.00x 715.4 tok/s 715.4 tok/s 1431.4 2.990
2x 129.7 tok/s 1.58x 712.5 tok/s 356.3 tok/s 2752.1 4.848
4x 165.4 tok/s 2.01x 705.9 tok/s 176.5 tok/s 5441.3 8.898
Benchmark Model: Qwen3.6-35B-A3B-8bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1393.0 17.50 735.1 tok/s 57.6 tok/s 3.615 318.6 tok/s 35.40 GB
pp4096/tg128 4621.6 18.03 886.3 tok/s 55.9 tok/s 6.911 611.2 tok/s 36.17 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 57.6 tok/s 1.00x 735.1 tok/s 735.1 tok/s 1393.0 3.615
2x 89.5 tok/s 1.55x 739.9 tok/s 369.9 tok/s 2645.3 5.628
4x 116.6 tok/s 2.02x 712.3 tok/s 178.1 tok/s 5355.1 10.141
Benchmark Model: DeepSeek-R1-Distill-Llama-70B-4bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1431.4 12.27 715.4 tok/s 82.1 tok/s 2.990 385.2 tok/s 19.27 GB
pp4096/tg128 4890.5 12.81 837.5 tok/s 78.7 tok/s 6.517 648.2 tok/s 20.04 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 82.1 tok/s 1.00x 715.4 tok/s 715.4 tok/s 1431.4 2.990
2x 129.7 tok/s 1.58x 712.5 tok/s 356.3 tok/s 2752.1 4.848
4x 165.4 tok/s 2.01x 705.9 tok/s 176.5 tok/s 5441.3 8.898
Benchmark Model: GLM-4.7-Flash-MLX-8bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1708.2 23.35 599.5 tok/s 43.2 tok/s 4.673 246.5 tok/s 30.26 GB
pp4096/tg128 7375.0 26.24 555.4 tok/s 38.4 tok/s 10.707 394.5 tok/s 31.31 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 43.2 tok/s 1.00x 599.5 tok/s 599.5 tok/s 1708.2 4.673
2x 61.4 tok/s 1.42x 616.7 tok/s 308.4 tok/s 3236.3 7.493
4x 75.1 tok/s 1.74x 783.0 tok/s 195.8 tok/s 3827.0 12.047
Benchmark Model: Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-mlx-8bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1349.4 17.93 758.8 tok/s 56.2 tok/s 3.627 317.6 tok/s 35.40 GB
pp4096/tg128 4593.4 18.43 891.7 tok/s 54.7 tok/s 6.934 609.2 tok/s 36.17 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 56.2 tok/s 1.00x 758.8 tok/s 758.8 tok/s 1349.4 3.627
2x 90.8 tok/s 1.62x 752.4 tok/s 376.2 tok/s 2605.6 5.542
4x 116.3 tok/s 2.07x 745.9 tok/s 186.5 tok/s 5110.9 9.893
Benchmark Model: Llama-3.3-70B-Instruct-4bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 20966.4 167.33 48.9 tok/s 6.0 tok/s 42.217 27.3 tok/s 37.78 GB
pp4096/tg128 97078.0 174.81 42.2 tok/s 5.8 tok/s 119.279 35.4 tok/s 39.01 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 6.0 tok/s 1.00x 48.9 tok/s 48.9 tok/s 20966.4 42.217
2x 10.8 tok/s 1.80x 45.1 tok/s 22.6 tok/s 45211.1 68.986
4x 11.9 tok/s 1.98x 89.3 tok/s 22.3 tok/s 34228.8 88.796
Benchmark Model: MLX-Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-8bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1349.6 17.91 758.7 tok/s 56.3 tok/s 3.625 317.8 tok/s 35.40 GB
pp4096/tg128 4605.5 18.52 889.4 tok/s 54.4 tok/s 6.958 607.1 tok/s 36.17 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 56.3 tok/s 1.00x 758.7 tok/s 758.7 tok/s 1349.6 3.625
2x 84.9 tok/s 1.51x 751.6 tok/s 375.8 tok/s 2610.4 5.741
4x 112.2 tok/s 1.99x 745.8 tok/s 186.4 tok/s 5120.0 10.056
Benchmark Model: Qwen3-Coder-30B-A3B-Instruct-MLX-4bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1296.9 12.97 789.6 tok/s 77.7 tok/s 2.944 391.4 tok/s 16.59 GB
pp4096/tg128 5177.1 15.70 791.2 tok/s 64.2 tok/s 7.171 589.1 tok/s 17.20 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 77.7 tok/s 1.00x 789.6 tok/s 789.6 tok/s 1296.9 2.944
2x 104.9 tok/s 1.35x 796.6 tok/s 398.3 tok/s 2497.2 5.011
4x 128.4 tok/s 1.65x 920.1 tok/s 230.0 tok/s 3328.3 8.440
Benchmark Model: Qwen3-Coder-Next-MLX-4bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1660.1 16.43 616.8 tok/s 61.3 tok/s 3.747 307.4 tok/s 43.07 GB
pp4096/tg128 5969.2 17.42 686.2 tok/s 57.9 tok/s 8.181 516.3 tok/s 43.95 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 61.3 tok/s 1.00x 616.8 tok/s 616.8 tok/s 1660.1 3.747
2x 91.7 tok/s 1.50x 612.7 tok/s 306.4 tok/s 3273.3 6.134
4x 120.7 tok/s 1.97x 601.4 tok/s 150.3 tok/s 6588.0 11.052
Benchmark Model: Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 8569.4 91.30 119.5 tok/s 11.0 tok/s 20.165 57.1 tok/s 22.10 GB
pp4096/tg128 38513.1 94.14 106.4 tok/s 10.7 tok/s 50.469 83.7 tok/s 23.54 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 11.0 tok/s 1.00x 119.5 tok/s 119.5 tok/s 8569.4 20.165
2x 16.1 tok/s 1.46x 107.5 tok/s 53.8 tok/s 18880.1 34.982
4x 18.0 tok/s 1.64x 109.0 tok/s 27.3 tok/s 36916.4 65.946
Benchmark Model: gemma-4-31b-it-4bit
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 10034.5 81.03 102.0 tok/s 12.4 tok/s 20.325 56.7 tok/s 17.58 GB
pp4096/tg128 42774.2 82.78 95.8 tok/s 12.2 tok/s 53.287 79.3 tok/s 20.18 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 12.4 tok/s 1.00x 102.0 tok/s 102.0 tok/s 10034.5 20.325
2x 20.5 tok/s 1.65x 94.9 tok/s 47.5 tok/s 21397.3 34.070
4x 22.8 tok/s 1.84x 95.0 tok/s 23.8 tok/s 42386.0 65.527
Benchmark Model: gpt-oss-20b-MXFP4-Q8
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 1627.4 14.63 629.2 tok/s 68.9 tok/s 3.485 330.6 tok/s 11.77 GB
pp4096/tg128 5367.5 15.53 763.1 tok/s 64.9 tok/s 7.340 575.5 tok/s 11.78 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 68.9 tok/s 1.00x 629.2 tok/s 629.2 tok/s 1627.4 3.485
2x 97.6 tok/s 1.42x 694.6 tok/s 347.3 tok/s 2842.7 5.573
4x 116.2 tok/s 1.69x 892.0 tok/s 223.0 tok/s 3666.6 8.972
5 指标参考
TTFT (Time to First Token) TTFT (首 Token 时间)
Latency until the model starts responding. Measures prompt processing (prefill) speed. Lower is better.
模型开始响应前的延迟。衡量提示词处理(预填充)速度。越低越好。
TPOT (Time Per Output Token) TPOT (每输出 Token 时间)
Average time between each generated token. Measures decode speed. Lower is better.
每个生成 Token 之间的平均时间。衡量解码速度。越低越好。
pp TPS (Prompt Processing TPS) pp TPS (提示词处理 TPS)
Input/prompt tokens processed per second during prefill. Higher is better.
预填充阶段每秒处理的输入/提示词 Token 数。越高越好。
tg TPS (Token Generation TPS) tg TPS (Token 生成 TPS)
Output tokens generated per second. Inverse of TPOT. Higher is better.
每秒生成的输出 Token 数。TPOT 的倒数。越高越好。
E2E Latency (End-to-End) E2E(s) 端到端延迟
Total time from request submission to complete response. Includes prefill + generation.
从提交请求到完整响应的总时间。包含预填充 + 生成。
Total Throughput Throughput 总吞吐量
Overall tokens (input + output) per second. Measures total system utilization.
每秒处理的总 Token 数(输入 + 输出)。衡量系统整体利用率。
Batch Size 批量大小
Number of concurrent requests processed together. Higher batch sizes improve total throughput but increase per-request latency.
同时处理的并发请求数。批量越大总吞吐量越高,但每个请求的延迟会增加。
Speedup 加速比
Token generation throughput multiplier compared to single-request baseline (1x). Higher is better.
相对于单请求基准(1x)的 Token 生成吞吐量倍数。越高越好。
pp TPS/req (Per-Request Prompt Processing TPS) pp TPS/req (每请求提示词处理 TPS)
Prompt processing throughput divided by batch size. Shows per-request prefill speed under batching load.
提示词处理吞吐量除以批量大小。显示批量负载下每个请求的预填充速度。