Back to Blog
technical-referencebenchmarkmac-studioapple-siliconinferencellmperformance

Mac Studio Local LLM Benchmark Report

Mac Studio M4 Max overnight benchmark report: token throughput, latency, and memory bandwidth across 12+ LLM models at various quantization levels.

March 8, 2026·10 min read

Mac Studio Local LLM Benchmark Report

Run ID: macstudio_overnight_all_models_20260524_022731
Generated: 2026-05-24 04:04 UTC

1. Executive Summary

  • codellama:7b-instruct: Best for coding, Best for summarization
  • command-r-plus:latest: Most stable under concurrency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency
  • deepseek-coder:latest: Best for agentic tool calls, Highest throughput
  • llama3.1:70b: Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency
  • qwen3.6:latest: Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency

2. System Information

  • hostname: MacBook-Pro.local
  • os: macOS 26.3.1
  • python: 3.11.14
  • cpu_cores_physical: 16
  • ram_total_gb: 51.5
  • ram_available_gb: 14.2

3. Models Tested

  • codellama:7b-instruct
  • command-r-plus:latest
  • deepseek-coder:33b
  • deepseek-coder:6.7b
  • deepseek-coder:latest
  • llama3.1:70b
  • llama3:8b
  • mistral:latest
  • qwen2.5-coder:1.5b
  • qwen2.5-coder:14b
  • qwen2.5-coder:32b
  • qwen2.5-coder:latest
  • qwen3-coder:30b
  • qwen3.6:latest

4. Benchmark Methodology

  • TTFT measured from request send to first non-empty streaming token chunk
  • Total latency measured from request send to stream completion
  • Tokens/sec from Ollama eval_duration field (completion tokens only)
  • Concurrency: parallel requests via ThreadPoolExecutor, stats per level
  • Quality scores: heuristic (keyword presence, structure) — not HumanEval
  • All temperature=0 for reproducibility

5. Results by Workload

coding

Model p50 lat p95 lat TTFT p50 tok/s errors score
qwen2.5-coder:1.5b 1500ms 1569ms 154ms 214.7 0/9 0.33
deepseek-coder:latest 1933ms 1997ms 78ms 286.6 0/9 0.10
deepseek-coder:6.7b 2825ms 2918ms 150ms 118.9 0/9 0.47
codellama:7b-instruct 2946ms 3226ms 201ms 118.5 0/9 0.60
qwen2.5-coder:latest 3076ms 5485ms 236ms 104.2 0/9 0.27
mistral:latest 3695ms 5198ms 122ms 102.9 0/9 0.03
llama3:8b 4894ms 5126ms 292ms 109.8 0/9 0.47
qwen3-coder:30b 6071ms 6342ms 271ms 88.1 0/9 0.50
qwen2.5-coder:14b 9619ms 9757ms 326ms 55.8 0/9 0.03
deepseek-coder:33b 10975ms 12151ms 407ms 31.5 0/9 0.17
qwen3.6:latest 12811ms 12961ms 12811ms 41.7 0/9 0.00
qwen2.5-coder:32b 18310ms 18930ms 454ms 28.1 0/9 0.17
llama3.1:70b 68423ms 88840ms 4226ms 6.1 0/9 0.20
command-r-plus:latest 85374ms 122501ms 7881ms 4.0 0/9 0.47

concurrency

Model p50 lat p95 lat TTFT p50 tok/s errors score
qwen2.5-coder:1.5b 302ms 515ms 270ms 171.0 0/15
qwen3-coder:30b 420ms 899ms 379ms 72.4 0/15
deepseek-coder:latest 463ms 2755ms 249ms 126.9 0/15
llama3:8b 557ms 2776ms 309ms 72.1 0/15
deepseek-coder:6.7b 657ms 1121ms 375ms 44.5 0/15
qwen2.5-coder:latest 689ms 1432ms 311ms 68.7 0/15
codellama:7b-instruct 709ms 1230ms 212ms 45.5 0/15
mistral:latest 875ms 2546ms 384ms 48.3 0/15
qwen2.5-coder:14b 1112ms 2044ms 592ms 43.8 0/15
qwen2.5-coder:32b 1824ms 3799ms 1405ms 17.4 0/15
deepseek-coder:33b 2641ms 14676ms 1216ms 13.3 0/15
llama3.1:70b 9516ms 27844ms 7863ms 5.4 0/15
command-r-plus:latest 17526ms 28799ms 12535ms 2.7 0/15
qwen3.6:latest 18860ms 61051ms 18860ms 42.0 0/15

latency

Model p50 lat p95 lat TTFT p50 tok/s errors score
qwen2.5-coder:1.5b 267ms 1022ms 214ms 260.2 0/9
deepseek-coder:6.7b 351ms 3598ms 190ms 122.7 0/9
qwen2.5-coder:latest 408ms 2699ms 252ms 124.5 0/9
mistral:latest 414ms 2929ms 208ms 115.7 0/9
llama3:8b 416ms 2474ms 289ms 149.0 0/9
codellama:7b-instruct 434ms 3909ms 216ms 117.4 0/9
qwen3-coder:30b 449ms 12903ms 289ms 115.7 0/9
qwen2.5-coder:14b 499ms 5297ms 309ms 72.4 0/9
deepseek-coder:latest 574ms 1788ms 108ms 274.7 0/9
qwen2.5-coder:32b 655ms 10385ms 407ms 37.7 0/9
deepseek-coder:33b 790ms 10779ms 380ms 35.6 0/9
llama3.1:70b 3934ms 38290ms 2150ms 8.5 0/9
command-r-plus:latest 4088ms 42628ms 2447ms 5.6 0/9
qwen3.6:latest 11760ms 13542ms 11412ms 41.9 0/9

reasoning

Model p50 lat p95 lat TTFT p50 tok/s errors score
qwen2.5-coder:1.5b 177ms 702ms 158ms 253.9 0/9 0.67
llama3:8b 277ms 976ms 226ms 163.6 0/9 0.83
qwen2.5-coder:latest 278ms 1085ms 220ms 123.7 0/9 0.83
qwen3-coder:30b 280ms 1314ms 237ms 110.1 0/9 0.83
codellama:7b-instruct 453ms 906ms 205ms 118.5 0/9 0.83
qwen2.5-coder:14b 514ms 2254ms 304ms 65.8 0/9 0.83
qwen2.5-coder:32b 596ms 2717ms 514ms 33.6 0/9 0.83
deepseek-coder:6.7b 1003ms 2348ms 208ms 119.2 0/9 0.33
mistral:latest 1209ms 1661ms 166ms 98.9 0/9 0.50
deepseek-coder:latest 1595ms 2006ms 77ms 288.1 0/9 0.33
deepseek-coder:33b 2507ms 3495ms 463ms 31.8 0/9 0.50
llama3.1:70b 5072ms 19427ms 4770ms 9.3 0/9 0.83
command-r-plus:latest 9145ms 27777ms 8023ms 4.7 0/9 0.83
qwen3.6:latest 12809ms 12917ms 12809ms 41.8 0/9 0.33

summarization

Model p50 lat p95 lat TTFT p50 tok/s errors score
qwen2.5-coder:1.5b 799ms 854ms 147ms 214.6 0/3 1.00
llama3:8b 959ms 1065ms 227ms 110.6 0/3 1.00
deepseek-coder:latest 1109ms 1171ms 55ms 285.4 0/3 1.00
codellama:7b-instruct 1427ms 1537ms 86ms 116.9 0/3 1.00
qwen2.5-coder:latest 1470ms 1579ms 241ms 104.4 0/3 1.00
qwen3-coder:30b 1573ms 1756ms 151ms 88.8 0/3 1.00
deepseek-coder:6.7b 1574ms 1755ms 92ms 117.6 0/3 1.00
mistral:latest 1759ms 2040ms 92ms 102.6 0/3 1.00
qwen2.5-coder:14b 2228ms 2509ms 208ms 56.2 0/3 1.00
qwen2.5-coder:32b 4198ms 4794ms 237ms 28.3 0/3 1.00
deepseek-coder:33b 5714ms 6241ms 274ms 31.4 0/3 1.00
qwen3.6:latest 12873ms 12882ms 12873ms 41.7 0/3 0.00
llama3.1:70b 18274ms 24014ms 286ms 6.1 0/3 1.00
command-r-plus:latest 27778ms 37445ms 685ms 4.1 0/3 1.00

throughput

Model p50 lat p95 lat TTFT p50 tok/s errors score
deepseek-coder:latest 3858ms 3888ms 76ms 282.6 0/3
qwen2.5-coder:1.5b 3897ms 3914ms 134ms 213.4 0/3
deepseek-coder:6.7b 5448ms 5509ms 101ms 117.3 0/3
llama3:8b 6497ms 6597ms 207ms 109.6 0/3
mistral:latest 6859ms 7003ms 57ms 102.2 0/3
qwen2.5-coder:latest 7574ms 7628ms 198ms 103.7 0/3
codellama:7b-instruct 9028ms 9096ms 56ms 115.5 0/3
qwen3-coder:30b 12192ms 12397ms 169ms 87.0 0/3
qwen2.5-coder:14b 13655ms 13765ms 244ms 55.7 0/3
deepseek-coder:33b 20843ms 21047ms 173ms 31.2 0/3
qwen3.6:latest 25231ms 25277ms 25231ms 41.8 0/3
qwen2.5-coder:32b 35291ms 35457ms 168ms 28.0 0/3
llama3.1:70b 150084ms 150527ms 403ms 6.0 0/3
command-r-plus:latest 226478ms 239940ms 507ms 3.9 0/3

tool_calling

Model p50 lat p95 lat TTFT p50 tok/s errors score
qwen2.5-coder:1.5b 286ms 1557ms 165ms 224.0 0/9 0.50
qwen2.5-coder:latest 662ms 2311ms 214ms 106.0 0/9 0.50
mistral:latest 1138ms 2616ms 129ms 99.7 0/9 0.17
codellama:7b-instruct 1154ms 2211ms 121ms 119.9 0/9 0.00
deepseek-coder:33b 1316ms 3528ms 254ms 32.1 0/9 0.50
qwen2.5-coder:14b 1359ms 5805ms 286ms 56.9 0/9 0.50
deepseek-coder:6.7b 1373ms 3250ms 124ms 119.3 0/9 0.00
qwen3-coder:30b 1549ms 2621ms 241ms 89.1 0/9 0.00
qwen2.5-coder:32b 1561ms 6213ms 470ms 28.7 0/9 0.50
llama3:8b 1755ms 2587ms 211ms 110.1 0/9 0.00
deepseek-coder:latest 1900ms 1962ms 71ms 287.4 0/9 0.00
qwen3.6:latest 12796ms 12937ms 12796ms 41.8 0/9 0.00
llama3.1:70b 15779ms 45620ms 4779ms 6.2 0/9 0.33
command-r-plus:latest 16524ms 59496ms 7940ms 4.1 0/9 0.33

6. Latency Ranking (p50, all workloads)

Rank Model Workload p50 (ms) p95 (ms)
1 qwen2.5-coder:1.5b reasoning 177 702
2 qwen2.5-coder:1.5b latency 267 1022
3 llama3:8b reasoning 277 976
4 qwen2.5-coder:latest reasoning 278 1085
5 qwen3-coder:30b reasoning 280 1314
6 qwen2.5-coder:1.5b tool_calling 286 1557
7 qwen2.5-coder:1.5b concurrency 302 515
8 deepseek-coder:6.7b latency 351 3598
9 qwen2.5-coder:latest latency 408 2699
10 mistral:latest latency 414 2929
11 llama3:8b latency 416 2474
12 qwen3-coder:30b concurrency 420 899
13 codellama:7b-instruct latency 434 3909
14 qwen3-coder:30b latency 449 12903
15 codellama:7b-instruct reasoning 453 906
16 deepseek-coder:latest concurrency 463 2755
17 qwen2.5-coder:14b latency 499 5297
18 qwen2.5-coder:14b reasoning 514 2254
19 llama3:8b concurrency 557 2776
20 deepseek-coder:latest latency 574 1788
21 qwen2.5-coder:32b reasoning 596 2717
22 qwen2.5-coder:32b latency 655 10385
23 deepseek-coder:6.7b concurrency 657 1121
24 qwen2.5-coder:latest tool_calling 662 2311
25 qwen2.5-coder:latest concurrency 689 1432
26 codellama:7b-instruct concurrency 709 1230
27 deepseek-coder:33b latency 790 10779
28 qwen2.5-coder:1.5b summarization 799 854
29 mistral:latest concurrency 875 2546
30 llama3:8b summarization 959 1065
31 deepseek-coder:6.7b reasoning 1003 2348
32 deepseek-coder:latest summarization 1109 1171
33 qwen2.5-coder:14b concurrency 1112 2044
34 mistral:latest tool_calling 1138 2616
35 codellama:7b-instruct tool_calling 1154 2211
36 mistral:latest reasoning 1209 1661
37 deepseek-coder:33b tool_calling 1316 3528
38 qwen2.5-coder:14b tool_calling 1359 5805
39 deepseek-coder:6.7b tool_calling 1373 3250
40 codellama:7b-instruct summarization 1427 1537
41 qwen2.5-coder:latest summarization 1470 1579
42 qwen2.5-coder:1.5b coding 1500 1569
43 qwen3-coder:30b tool_calling 1549 2621
44 qwen2.5-coder:32b tool_calling 1561 6213
45 qwen3-coder:30b summarization 1573 1756
46 deepseek-coder:6.7b summarization 1574 1755
47 deepseek-coder:latest reasoning 1595 2006
48 llama3:8b tool_calling 1755 2587
49 mistral:latest summarization 1759 2040
50 qwen2.5-coder:32b concurrency 1824 3799
51 deepseek-coder:latest tool_calling 1900 1962
52 deepseek-coder:latest coding 1933 1997
53 qwen2.5-coder:14b summarization 2228 2509
54 deepseek-coder:33b reasoning 2507 3495
55 deepseek-coder:33b concurrency 2641 14676
56 deepseek-coder:6.7b coding 2825 2918
57 codellama:7b-instruct coding 2946 3226
58 qwen2.5-coder:latest coding 3076 5485
59 mistral:latest coding 3695 5198
60 deepseek-coder:latest throughput 3858 3888
61 qwen2.5-coder:1.5b throughput 3897 3914
62 llama3.1:70b latency 3934 38290
63 command-r-plus:latest latency 4088 42628
64 qwen2.5-coder:32b summarization 4198 4794
65 llama3:8b coding 4894 5126
66 llama3.1:70b reasoning 5072 19427
67 deepseek-coder:6.7b throughput 5448 5509
68 deepseek-coder:33b summarization 5714 6241
69 qwen3-coder:30b coding 6071 6342
70 llama3:8b throughput 6497 6597
71 mistral:latest throughput 6859 7003
72 qwen2.5-coder:latest throughput 7574 7628
73 codellama:7b-instruct throughput 9028 9096
74 command-r-plus:latest reasoning 9145 27777
75 llama3.1:70b concurrency 9516 27844
76 qwen2.5-coder:14b coding 9619 9757
77 deepseek-coder:33b coding 10975 12151
78 qwen3.6:latest latency 11760 13542
79 qwen3-coder:30b throughput 12192 12397
80 qwen3.6:latest tool_calling 12796 12937
81 qwen3.6:latest reasoning 12809 12917
82 qwen3.6:latest coding 12811 12961
83 qwen3.6:latest summarization 12873 12882
84 qwen2.5-coder:14b throughput 13655 13765
85 llama3.1:70b tool_calling 15779 45620
86 command-r-plus:latest tool_calling 16524 59496
87 command-r-plus:latest concurrency 17526 28799
88 llama3.1:70b summarization 18274 24014
89 qwen2.5-coder:32b coding 18310 18930
90 qwen3.6:latest concurrency 18860 61051
91 deepseek-coder:33b throughput 20843 21047
92 qwen3.6:latest throughput 25231 25277
93 command-r-plus:latest summarization 27778 37445
94 qwen2.5-coder:32b throughput 35291 35457
95 llama3.1:70b coding 68423 88840
96 command-r-plus:latest coding 85374 122501
97 llama3.1:70b throughput 150084 150527
98 command-r-plus:latest throughput 226478 239940

7. Throughput Ranking

Rank Model tok/s
1 deepseek-coder:latest 282.6
2 qwen2.5-coder:1.5b 213.4
3 deepseek-coder:6.7b 117.3
4 codellama:7b-instruct 115.5
5 llama3:8b 109.6
6 qwen2.5-coder:latest 103.7
7 mistral:latest 102.2
8 qwen3-coder:30b 87.0
9 qwen2.5-coder:14b 55.7
10 qwen3.6:latest 41.8
11 deepseek-coder:33b 31.2
12 qwen2.5-coder:32b 28.0
13 llama3.1:70b 6.0
14 command-r-plus:latest 3.9

8. Concurrency Stability

codellama:7b-instruct

Level p50 p95 p99 errors
1 261ms 261ms 261ms 0/1
2 271ms 311ms 315ms 0/2
4 348ms 501ms 515ms 0/4
8 873ms 1230ms 1230ms 0/8

command-r-plus:latest

Level p50 p95 p99 errors
1 2073ms 2073ms 2073ms 0/1
2 10468ms 10986ms 11032ms 0/2
4 14393ms 17951ms 18273ms 0/4
8 21472ms 28801ms 28802ms 0/8

deepseek-coder:33b

Level p50 p95 p99 errors
1 424ms 424ms 424ms 0/1
2 1177ms 1240ms 1246ms 0/2
4 1890ms 6624ms 7247ms 0/4
8 4025ms 14678ms 14679ms 0/8

deepseek-coder:6.7b

Level p50 p95 p99 errors
1 123ms 123ms 123ms 0/1
2 325ms 341ms 343ms 0/2
4 423ms 517ms 523ms 0/4
8 1011ms 1122ms 1123ms 0/8

deepseek-coder:latest

Level p50 p95 p99 errors
1 94ms 94ms 94ms 0/1
2 341ms 451ms 461ms 0/2
4 435ms 1216ms 1301ms 0/4
8 1551ms 2755ms 2756ms 0/8

llama3.1:70b

Level p50 p95 p99 errors
1 1472ms 1472ms 1472ms 0/1
2 2345ms 4028ms 4178ms 0/2
4 8948ms 18620ms 19889ms 0/4
8 12205ms 27846ms 27847ms 0/8

llama3:8b

Level p50 p95 p99 errors
1 216ms 216ms 216ms 0/1
2 303ms 308ms 309ms 0/2
4 483ms 1380ms 1506ms 0/4
8 1030ms 2783ms 2788ms 0/8

mistral:latest

Level p50 p95 p99 errors
1 194ms 194ms 194ms 0/1
2 524ms 840ms 868ms 0/2
4 785ms 1315ms 1376ms 0/4
8 1140ms 2548ms 2550ms 0/8

qwen2.5-coder:1.5b

Level p50 p95 p99 errors
1 252ms 252ms 252ms 0/1
2 183ms 211ms 214ms 0/2
4 302ms 345ms 351ms 0/4
8 452ms 515ms 516ms 0/8

qwen2.5-coder:14b

Level p50 p95 p99 errors
1 249ms 249ms 249ms 0/1
2 320ms 452ms 463ms 0/2
4 709ms 1030ms 1065ms 0/4
8 1427ms 2045ms 2046ms 0/8

qwen2.5-coder:32b

Level p50 p95 p99 errors
1 495ms 495ms 495ms 0/1
2 446ms 651ms 669ms 0/2
4 1681ms 2374ms 2436ms 0/4
8 2659ms 3802ms 3804ms 0/8

qwen2.5-coder:latest

Level p50 p95 p99 errors
1 258ms 258ms 258ms 0/1
2 419ms 512ms 520ms 0/2
4 429ms 622ms 636ms 0/4
8 1061ms 1434ms 1435ms 0/8

qwen3-coder:30b

Level p50 p95 p99 errors
1 354ms 354ms 354ms 0/1
2 341ms 356ms 357ms 0/2
4 451ms 587ms 601ms 0/4
8 586ms 899ms 899ms 0/8

qwen3.6:latest

Level p50 p95 p99 errors
1 3169ms 3169ms 3169ms 0/1
2 14124ms 15487ms 15608ms 0/2
4 20893ms 29851ms 30228ms 0/4
8 31755ms 65461ms 68990ms 0/8

9. Tool Calling / JSON Reliability

Model JSON score TTFT p50 errors
deepseek-coder:33b 0.50 254ms 0/9
qwen2.5-coder:1.5b 0.50 165ms 0/9
qwen2.5-coder:14b 0.50 286ms 0/9
qwen2.5-coder:32b 0.50 470ms 0/9
qwen2.5-coder:latest 0.50 214ms 0/9
command-r-plus:latest 0.33 7940ms 0/9
llama3.1:70b 0.33 4779ms 0/9
mistral:latest 0.17 129ms 0/9
codellama:7b-instruct 0.00 121ms 0/9
deepseek-coder:6.7b 0.00 124ms 0/9
deepseek-coder:latest 0.00 71ms 0/9
llama3:8b 0.00 211ms 0/9
qwen3-coder:30b 0.00 241ms 0/9
qwen3.6:latest 0.00 12796ms 0/9
  • codellama:7b-instruct: Best for coding, Best for summarization
  • command-r-plus:latest: Most stable under concurrency
  • deepseek-coder:latest: Best for agentic tool calls, Highest throughput

11. Issues / Failures

No failures.

12. Next Steps for DGX Benchmarking

See docs/DGX_PHASE_2_PLAN.md for the full DGX phase 2 plan.

  • Deploy vLLM on DGX with a test model
  • Configure endpoint in configs/endpoints.yaml
  • Run: macbench run --provider openai_compatible --all-models
  • Add DCGM GPU metrics hook in metrics/system.py