Mac Studio Local LLM Benchmark Report
Run ID: macstudio_overnight_all_models_20260524_022731
Generated: 2026-05-24 04:04 UTC
1. Executive Summary
- codellama:7b-instruct: Best for coding, Best for summarization
- command-r-plus:latest: Most stable under concurrency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency
- deepseek-coder:latest: Best for agentic tool calls, Highest throughput
- llama3.1:70b: Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency
- qwen3.6:latest: Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency, Avoid: high latency
- hostname: MacBook-Pro.local
- os: macOS 26.3.1
- python: 3.11.14
- cpu_cores_physical: 16
- ram_total_gb: 51.5
- ram_available_gb: 14.2
3. Models Tested
- codellama:7b-instruct
- command-r-plus:latest
- deepseek-coder:33b
- deepseek-coder:6.7b
- deepseek-coder:latest
- llama3.1:70b
- llama3:8b
- mistral:latest
- qwen2.5-coder:1.5b
- qwen2.5-coder:14b
- qwen2.5-coder:32b
- qwen2.5-coder:latest
- qwen3-coder:30b
- qwen3.6:latest
4. Benchmark Methodology
- TTFT measured from request send to first non-empty streaming token chunk
- Total latency measured from request send to stream completion
- Tokens/sec from Ollama
eval_duration field (completion tokens only)
- Concurrency: parallel requests via ThreadPoolExecutor, stats per level
- Quality scores: heuristic (keyword presence, structure) — not HumanEval
- All temperature=0 for reproducibility
5. Results by Workload
coding
| Model |
p50 lat |
p95 lat |
TTFT p50 |
tok/s |
errors |
score |
| qwen2.5-coder:1.5b |
1500ms |
1569ms |
154ms |
214.7 |
0/9 |
0.33 |
| deepseek-coder:latest |
1933ms |
1997ms |
78ms |
286.6 |
0/9 |
0.10 |
| deepseek-coder:6.7b |
2825ms |
2918ms |
150ms |
118.9 |
0/9 |
0.47 |
| codellama:7b-instruct |
2946ms |
3226ms |
201ms |
118.5 |
0/9 |
0.60 |
| qwen2.5-coder:latest |
3076ms |
5485ms |
236ms |
104.2 |
0/9 |
0.27 |
| mistral:latest |
3695ms |
5198ms |
122ms |
102.9 |
0/9 |
0.03 |
| llama3:8b |
4894ms |
5126ms |
292ms |
109.8 |
0/9 |
0.47 |
| qwen3-coder:30b |
6071ms |
6342ms |
271ms |
88.1 |
0/9 |
0.50 |
| qwen2.5-coder:14b |
9619ms |
9757ms |
326ms |
55.8 |
0/9 |
0.03 |
| deepseek-coder:33b |
10975ms |
12151ms |
407ms |
31.5 |
0/9 |
0.17 |
| qwen3.6:latest |
12811ms |
12961ms |
12811ms |
41.7 |
0/9 |
0.00 |
| qwen2.5-coder:32b |
18310ms |
18930ms |
454ms |
28.1 |
0/9 |
0.17 |
| llama3.1:70b |
68423ms |
88840ms |
4226ms |
6.1 |
0/9 |
0.20 |
| command-r-plus:latest |
85374ms |
122501ms |
7881ms |
4.0 |
0/9 |
0.47 |
concurrency
| Model |
p50 lat |
p95 lat |
TTFT p50 |
tok/s |
errors |
score |
| qwen2.5-coder:1.5b |
302ms |
515ms |
270ms |
171.0 |
0/15 |
— |
| qwen3-coder:30b |
420ms |
899ms |
379ms |
72.4 |
0/15 |
— |
| deepseek-coder:latest |
463ms |
2755ms |
249ms |
126.9 |
0/15 |
— |
| llama3:8b |
557ms |
2776ms |
309ms |
72.1 |
0/15 |
— |
| deepseek-coder:6.7b |
657ms |
1121ms |
375ms |
44.5 |
0/15 |
— |
| qwen2.5-coder:latest |
689ms |
1432ms |
311ms |
68.7 |
0/15 |
— |
| codellama:7b-instruct |
709ms |
1230ms |
212ms |
45.5 |
0/15 |
— |
| mistral:latest |
875ms |
2546ms |
384ms |
48.3 |
0/15 |
— |
| qwen2.5-coder:14b |
1112ms |
2044ms |
592ms |
43.8 |
0/15 |
— |
| qwen2.5-coder:32b |
1824ms |
3799ms |
1405ms |
17.4 |
0/15 |
— |
| deepseek-coder:33b |
2641ms |
14676ms |
1216ms |
13.3 |
0/15 |
— |
| llama3.1:70b |
9516ms |
27844ms |
7863ms |
5.4 |
0/15 |
— |
| command-r-plus:latest |
17526ms |
28799ms |
12535ms |
2.7 |
0/15 |
— |
| qwen3.6:latest |
18860ms |
61051ms |
18860ms |
42.0 |
0/15 |
— |
latency
| Model |
p50 lat |
p95 lat |
TTFT p50 |
tok/s |
errors |
score |
| qwen2.5-coder:1.5b |
267ms |
1022ms |
214ms |
260.2 |
0/9 |
— |
| deepseek-coder:6.7b |
351ms |
3598ms |
190ms |
122.7 |
0/9 |
— |
| qwen2.5-coder:latest |
408ms |
2699ms |
252ms |
124.5 |
0/9 |
— |
| mistral:latest |
414ms |
2929ms |
208ms |
115.7 |
0/9 |
— |
| llama3:8b |
416ms |
2474ms |
289ms |
149.0 |
0/9 |
— |
| codellama:7b-instruct |
434ms |
3909ms |
216ms |
117.4 |
0/9 |
— |
| qwen3-coder:30b |
449ms |
12903ms |
289ms |
115.7 |
0/9 |
— |
| qwen2.5-coder:14b |
499ms |
5297ms |
309ms |
72.4 |
0/9 |
— |
| deepseek-coder:latest |
574ms |
1788ms |
108ms |
274.7 |
0/9 |
— |
| qwen2.5-coder:32b |
655ms |
10385ms |
407ms |
37.7 |
0/9 |
— |
| deepseek-coder:33b |
790ms |
10779ms |
380ms |
35.6 |
0/9 |
— |
| llama3.1:70b |
3934ms |
38290ms |
2150ms |
8.5 |
0/9 |
— |
| command-r-plus:latest |
4088ms |
42628ms |
2447ms |
5.6 |
0/9 |
— |
| qwen3.6:latest |
11760ms |
13542ms |
11412ms |
41.9 |
0/9 |
— |
reasoning
| Model |
p50 lat |
p95 lat |
TTFT p50 |
tok/s |
errors |
score |
| qwen2.5-coder:1.5b |
177ms |
702ms |
158ms |
253.9 |
0/9 |
0.67 |
| llama3:8b |
277ms |
976ms |
226ms |
163.6 |
0/9 |
0.83 |
| qwen2.5-coder:latest |
278ms |
1085ms |
220ms |
123.7 |
0/9 |
0.83 |
| qwen3-coder:30b |
280ms |
1314ms |
237ms |
110.1 |
0/9 |
0.83 |
| codellama:7b-instruct |
453ms |
906ms |
205ms |
118.5 |
0/9 |
0.83 |
| qwen2.5-coder:14b |
514ms |
2254ms |
304ms |
65.8 |
0/9 |
0.83 |
| qwen2.5-coder:32b |
596ms |
2717ms |
514ms |
33.6 |
0/9 |
0.83 |
| deepseek-coder:6.7b |
1003ms |
2348ms |
208ms |
119.2 |
0/9 |
0.33 |
| mistral:latest |
1209ms |
1661ms |
166ms |
98.9 |
0/9 |
0.50 |
| deepseek-coder:latest |
1595ms |
2006ms |
77ms |
288.1 |
0/9 |
0.33 |
| deepseek-coder:33b |
2507ms |
3495ms |
463ms |
31.8 |
0/9 |
0.50 |
| llama3.1:70b |
5072ms |
19427ms |
4770ms |
9.3 |
0/9 |
0.83 |
| command-r-plus:latest |
9145ms |
27777ms |
8023ms |
4.7 |
0/9 |
0.83 |
| qwen3.6:latest |
12809ms |
12917ms |
12809ms |
41.8 |
0/9 |
0.33 |
summarization
| Model |
p50 lat |
p95 lat |
TTFT p50 |
tok/s |
errors |
score |
| qwen2.5-coder:1.5b |
799ms |
854ms |
147ms |
214.6 |
0/3 |
1.00 |
| llama3:8b |
959ms |
1065ms |
227ms |
110.6 |
0/3 |
1.00 |
| deepseek-coder:latest |
1109ms |
1171ms |
55ms |
285.4 |
0/3 |
1.00 |
| codellama:7b-instruct |
1427ms |
1537ms |
86ms |
116.9 |
0/3 |
1.00 |
| qwen2.5-coder:latest |
1470ms |
1579ms |
241ms |
104.4 |
0/3 |
1.00 |
| qwen3-coder:30b |
1573ms |
1756ms |
151ms |
88.8 |
0/3 |
1.00 |
| deepseek-coder:6.7b |
1574ms |
1755ms |
92ms |
117.6 |
0/3 |
1.00 |
| mistral:latest |
1759ms |
2040ms |
92ms |
102.6 |
0/3 |
1.00 |
| qwen2.5-coder:14b |
2228ms |
2509ms |
208ms |
56.2 |
0/3 |
1.00 |
| qwen2.5-coder:32b |
4198ms |
4794ms |
237ms |
28.3 |
0/3 |
1.00 |
| deepseek-coder:33b |
5714ms |
6241ms |
274ms |
31.4 |
0/3 |
1.00 |
| qwen3.6:latest |
12873ms |
12882ms |
12873ms |
41.7 |
0/3 |
0.00 |
| llama3.1:70b |
18274ms |
24014ms |
286ms |
6.1 |
0/3 |
1.00 |
| command-r-plus:latest |
27778ms |
37445ms |
685ms |
4.1 |
0/3 |
1.00 |
throughput
| Model |
p50 lat |
p95 lat |
TTFT p50 |
tok/s |
errors |
score |
| deepseek-coder:latest |
3858ms |
3888ms |
76ms |
282.6 |
0/3 |
— |
| qwen2.5-coder:1.5b |
3897ms |
3914ms |
134ms |
213.4 |
0/3 |
— |
| deepseek-coder:6.7b |
5448ms |
5509ms |
101ms |
117.3 |
0/3 |
— |
| llama3:8b |
6497ms |
6597ms |
207ms |
109.6 |
0/3 |
— |
| mistral:latest |
6859ms |
7003ms |
57ms |
102.2 |
0/3 |
— |
| qwen2.5-coder:latest |
7574ms |
7628ms |
198ms |
103.7 |
0/3 |
— |
| codellama:7b-instruct |
9028ms |
9096ms |
56ms |
115.5 |
0/3 |
— |
| qwen3-coder:30b |
12192ms |
12397ms |
169ms |
87.0 |
0/3 |
— |
| qwen2.5-coder:14b |
13655ms |
13765ms |
244ms |
55.7 |
0/3 |
— |
| deepseek-coder:33b |
20843ms |
21047ms |
173ms |
31.2 |
0/3 |
— |
| qwen3.6:latest |
25231ms |
25277ms |
25231ms |
41.8 |
0/3 |
— |
| qwen2.5-coder:32b |
35291ms |
35457ms |
168ms |
28.0 |
0/3 |
— |
| llama3.1:70b |
150084ms |
150527ms |
403ms |
6.0 |
0/3 |
— |
| command-r-plus:latest |
226478ms |
239940ms |
507ms |
3.9 |
0/3 |
— |
| Model |
p50 lat |
p95 lat |
TTFT p50 |
tok/s |
errors |
score |
| qwen2.5-coder:1.5b |
286ms |
1557ms |
165ms |
224.0 |
0/9 |
0.50 |
| qwen2.5-coder:latest |
662ms |
2311ms |
214ms |
106.0 |
0/9 |
0.50 |
| mistral:latest |
1138ms |
2616ms |
129ms |
99.7 |
0/9 |
0.17 |
| codellama:7b-instruct |
1154ms |
2211ms |
121ms |
119.9 |
0/9 |
0.00 |
| deepseek-coder:33b |
1316ms |
3528ms |
254ms |
32.1 |
0/9 |
0.50 |
| qwen2.5-coder:14b |
1359ms |
5805ms |
286ms |
56.9 |
0/9 |
0.50 |
| deepseek-coder:6.7b |
1373ms |
3250ms |
124ms |
119.3 |
0/9 |
0.00 |
| qwen3-coder:30b |
1549ms |
2621ms |
241ms |
89.1 |
0/9 |
0.00 |
| qwen2.5-coder:32b |
1561ms |
6213ms |
470ms |
28.7 |
0/9 |
0.50 |
| llama3:8b |
1755ms |
2587ms |
211ms |
110.1 |
0/9 |
0.00 |
| deepseek-coder:latest |
1900ms |
1962ms |
71ms |
287.4 |
0/9 |
0.00 |
| qwen3.6:latest |
12796ms |
12937ms |
12796ms |
41.8 |
0/9 |
0.00 |
| llama3.1:70b |
15779ms |
45620ms |
4779ms |
6.2 |
0/9 |
0.33 |
| command-r-plus:latest |
16524ms |
59496ms |
7940ms |
4.1 |
0/9 |
0.33 |
6. Latency Ranking (p50, all workloads)
| Rank |
Model |
Workload |
p50 (ms) |
p95 (ms) |
| 1 |
qwen2.5-coder:1.5b |
reasoning |
177 |
702 |
| 2 |
qwen2.5-coder:1.5b |
latency |
267 |
1022 |
| 3 |
llama3:8b |
reasoning |
277 |
976 |
| 4 |
qwen2.5-coder:latest |
reasoning |
278 |
1085 |
| 5 |
qwen3-coder:30b |
reasoning |
280 |
1314 |
| 6 |
qwen2.5-coder:1.5b |
tool_calling |
286 |
1557 |
| 7 |
qwen2.5-coder:1.5b |
concurrency |
302 |
515 |
| 8 |
deepseek-coder:6.7b |
latency |
351 |
3598 |
| 9 |
qwen2.5-coder:latest |
latency |
408 |
2699 |
| 10 |
mistral:latest |
latency |
414 |
2929 |
| 11 |
llama3:8b |
latency |
416 |
2474 |
| 12 |
qwen3-coder:30b |
concurrency |
420 |
899 |
| 13 |
codellama:7b-instruct |
latency |
434 |
3909 |
| 14 |
qwen3-coder:30b |
latency |
449 |
12903 |
| 15 |
codellama:7b-instruct |
reasoning |
453 |
906 |
| 16 |
deepseek-coder:latest |
concurrency |
463 |
2755 |
| 17 |
qwen2.5-coder:14b |
latency |
499 |
5297 |
| 18 |
qwen2.5-coder:14b |
reasoning |
514 |
2254 |
| 19 |
llama3:8b |
concurrency |
557 |
2776 |
| 20 |
deepseek-coder:latest |
latency |
574 |
1788 |
| 21 |
qwen2.5-coder:32b |
reasoning |
596 |
2717 |
| 22 |
qwen2.5-coder:32b |
latency |
655 |
10385 |
| 23 |
deepseek-coder:6.7b |
concurrency |
657 |
1121 |
| 24 |
qwen2.5-coder:latest |
tool_calling |
662 |
2311 |
| 25 |
qwen2.5-coder:latest |
concurrency |
689 |
1432 |
| 26 |
codellama:7b-instruct |
concurrency |
709 |
1230 |
| 27 |
deepseek-coder:33b |
latency |
790 |
10779 |
| 28 |
qwen2.5-coder:1.5b |
summarization |
799 |
854 |
| 29 |
mistral:latest |
concurrency |
875 |
2546 |
| 30 |
llama3:8b |
summarization |
959 |
1065 |
| 31 |
deepseek-coder:6.7b |
reasoning |
1003 |
2348 |
| 32 |
deepseek-coder:latest |
summarization |
1109 |
1171 |
| 33 |
qwen2.5-coder:14b |
concurrency |
1112 |
2044 |
| 34 |
mistral:latest |
tool_calling |
1138 |
2616 |
| 35 |
codellama:7b-instruct |
tool_calling |
1154 |
2211 |
| 36 |
mistral:latest |
reasoning |
1209 |
1661 |
| 37 |
deepseek-coder:33b |
tool_calling |
1316 |
3528 |
| 38 |
qwen2.5-coder:14b |
tool_calling |
1359 |
5805 |
| 39 |
deepseek-coder:6.7b |
tool_calling |
1373 |
3250 |
| 40 |
codellama:7b-instruct |
summarization |
1427 |
1537 |
| 41 |
qwen2.5-coder:latest |
summarization |
1470 |
1579 |
| 42 |
qwen2.5-coder:1.5b |
coding |
1500 |
1569 |
| 43 |
qwen3-coder:30b |
tool_calling |
1549 |
2621 |
| 44 |
qwen2.5-coder:32b |
tool_calling |
1561 |
6213 |
| 45 |
qwen3-coder:30b |
summarization |
1573 |
1756 |
| 46 |
deepseek-coder:6.7b |
summarization |
1574 |
1755 |
| 47 |
deepseek-coder:latest |
reasoning |
1595 |
2006 |
| 48 |
llama3:8b |
tool_calling |
1755 |
2587 |
| 49 |
mistral:latest |
summarization |
1759 |
2040 |
| 50 |
qwen2.5-coder:32b |
concurrency |
1824 |
3799 |
| 51 |
deepseek-coder:latest |
tool_calling |
1900 |
1962 |
| 52 |
deepseek-coder:latest |
coding |
1933 |
1997 |
| 53 |
qwen2.5-coder:14b |
summarization |
2228 |
2509 |
| 54 |
deepseek-coder:33b |
reasoning |
2507 |
3495 |
| 55 |
deepseek-coder:33b |
concurrency |
2641 |
14676 |
| 56 |
deepseek-coder:6.7b |
coding |
2825 |
2918 |
| 57 |
codellama:7b-instruct |
coding |
2946 |
3226 |
| 58 |
qwen2.5-coder:latest |
coding |
3076 |
5485 |
| 59 |
mistral:latest |
coding |
3695 |
5198 |
| 60 |
deepseek-coder:latest |
throughput |
3858 |
3888 |
| 61 |
qwen2.5-coder:1.5b |
throughput |
3897 |
3914 |
| 62 |
llama3.1:70b |
latency |
3934 |
38290 |
| 63 |
command-r-plus:latest |
latency |
4088 |
42628 |
| 64 |
qwen2.5-coder:32b |
summarization |
4198 |
4794 |
| 65 |
llama3:8b |
coding |
4894 |
5126 |
| 66 |
llama3.1:70b |
reasoning |
5072 |
19427 |
| 67 |
deepseek-coder:6.7b |
throughput |
5448 |
5509 |
| 68 |
deepseek-coder:33b |
summarization |
5714 |
6241 |
| 69 |
qwen3-coder:30b |
coding |
6071 |
6342 |
| 70 |
llama3:8b |
throughput |
6497 |
6597 |
| 71 |
mistral:latest |
throughput |
6859 |
7003 |
| 72 |
qwen2.5-coder:latest |
throughput |
7574 |
7628 |
| 73 |
codellama:7b-instruct |
throughput |
9028 |
9096 |
| 74 |
command-r-plus:latest |
reasoning |
9145 |
27777 |
| 75 |
llama3.1:70b |
concurrency |
9516 |
27844 |
| 76 |
qwen2.5-coder:14b |
coding |
9619 |
9757 |
| 77 |
deepseek-coder:33b |
coding |
10975 |
12151 |
| 78 |
qwen3.6:latest |
latency |
11760 |
13542 |
| 79 |
qwen3-coder:30b |
throughput |
12192 |
12397 |
| 80 |
qwen3.6:latest |
tool_calling |
12796 |
12937 |
| 81 |
qwen3.6:latest |
reasoning |
12809 |
12917 |
| 82 |
qwen3.6:latest |
coding |
12811 |
12961 |
| 83 |
qwen3.6:latest |
summarization |
12873 |
12882 |
| 84 |
qwen2.5-coder:14b |
throughput |
13655 |
13765 |
| 85 |
llama3.1:70b |
tool_calling |
15779 |
45620 |
| 86 |
command-r-plus:latest |
tool_calling |
16524 |
59496 |
| 87 |
command-r-plus:latest |
concurrency |
17526 |
28799 |
| 88 |
llama3.1:70b |
summarization |
18274 |
24014 |
| 89 |
qwen2.5-coder:32b |
coding |
18310 |
18930 |
| 90 |
qwen3.6:latest |
concurrency |
18860 |
61051 |
| 91 |
deepseek-coder:33b |
throughput |
20843 |
21047 |
| 92 |
qwen3.6:latest |
throughput |
25231 |
25277 |
| 93 |
command-r-plus:latest |
summarization |
27778 |
37445 |
| 94 |
qwen2.5-coder:32b |
throughput |
35291 |
35457 |
| 95 |
llama3.1:70b |
coding |
68423 |
88840 |
| 96 |
command-r-plus:latest |
coding |
85374 |
122501 |
| 97 |
llama3.1:70b |
throughput |
150084 |
150527 |
| 98 |
command-r-plus:latest |
throughput |
226478 |
239940 |
7. Throughput Ranking
| Rank |
Model |
tok/s |
| 1 |
deepseek-coder:latest |
282.6 |
| 2 |
qwen2.5-coder:1.5b |
213.4 |
| 3 |
deepseek-coder:6.7b |
117.3 |
| 4 |
codellama:7b-instruct |
115.5 |
| 5 |
llama3:8b |
109.6 |
| 6 |
qwen2.5-coder:latest |
103.7 |
| 7 |
mistral:latest |
102.2 |
| 8 |
qwen3-coder:30b |
87.0 |
| 9 |
qwen2.5-coder:14b |
55.7 |
| 10 |
qwen3.6:latest |
41.8 |
| 11 |
deepseek-coder:33b |
31.2 |
| 12 |
qwen2.5-coder:32b |
28.0 |
| 13 |
llama3.1:70b |
6.0 |
| 14 |
command-r-plus:latest |
3.9 |
8. Concurrency Stability
codellama:7b-instruct
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
261ms |
261ms |
261ms |
0/1 |
| 2 |
271ms |
311ms |
315ms |
0/2 |
| 4 |
348ms |
501ms |
515ms |
0/4 |
| 8 |
873ms |
1230ms |
1230ms |
0/8 |
command-r-plus:latest
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
2073ms |
2073ms |
2073ms |
0/1 |
| 2 |
10468ms |
10986ms |
11032ms |
0/2 |
| 4 |
14393ms |
17951ms |
18273ms |
0/4 |
| 8 |
21472ms |
28801ms |
28802ms |
0/8 |
deepseek-coder:33b
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
424ms |
424ms |
424ms |
0/1 |
| 2 |
1177ms |
1240ms |
1246ms |
0/2 |
| 4 |
1890ms |
6624ms |
7247ms |
0/4 |
| 8 |
4025ms |
14678ms |
14679ms |
0/8 |
deepseek-coder:6.7b
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
123ms |
123ms |
123ms |
0/1 |
| 2 |
325ms |
341ms |
343ms |
0/2 |
| 4 |
423ms |
517ms |
523ms |
0/4 |
| 8 |
1011ms |
1122ms |
1123ms |
0/8 |
deepseek-coder:latest
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
94ms |
94ms |
94ms |
0/1 |
| 2 |
341ms |
451ms |
461ms |
0/2 |
| 4 |
435ms |
1216ms |
1301ms |
0/4 |
| 8 |
1551ms |
2755ms |
2756ms |
0/8 |
llama3.1:70b
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
1472ms |
1472ms |
1472ms |
0/1 |
| 2 |
2345ms |
4028ms |
4178ms |
0/2 |
| 4 |
8948ms |
18620ms |
19889ms |
0/4 |
| 8 |
12205ms |
27846ms |
27847ms |
0/8 |
llama3:8b
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
216ms |
216ms |
216ms |
0/1 |
| 2 |
303ms |
308ms |
309ms |
0/2 |
| 4 |
483ms |
1380ms |
1506ms |
0/4 |
| 8 |
1030ms |
2783ms |
2788ms |
0/8 |
mistral:latest
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
194ms |
194ms |
194ms |
0/1 |
| 2 |
524ms |
840ms |
868ms |
0/2 |
| 4 |
785ms |
1315ms |
1376ms |
0/4 |
| 8 |
1140ms |
2548ms |
2550ms |
0/8 |
qwen2.5-coder:1.5b
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
252ms |
252ms |
252ms |
0/1 |
| 2 |
183ms |
211ms |
214ms |
0/2 |
| 4 |
302ms |
345ms |
351ms |
0/4 |
| 8 |
452ms |
515ms |
516ms |
0/8 |
qwen2.5-coder:14b
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
249ms |
249ms |
249ms |
0/1 |
| 2 |
320ms |
452ms |
463ms |
0/2 |
| 4 |
709ms |
1030ms |
1065ms |
0/4 |
| 8 |
1427ms |
2045ms |
2046ms |
0/8 |
qwen2.5-coder:32b
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
495ms |
495ms |
495ms |
0/1 |
| 2 |
446ms |
651ms |
669ms |
0/2 |
| 4 |
1681ms |
2374ms |
2436ms |
0/4 |
| 8 |
2659ms |
3802ms |
3804ms |
0/8 |
qwen2.5-coder:latest
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
258ms |
258ms |
258ms |
0/1 |
| 2 |
419ms |
512ms |
520ms |
0/2 |
| 4 |
429ms |
622ms |
636ms |
0/4 |
| 8 |
1061ms |
1434ms |
1435ms |
0/8 |
qwen3-coder:30b
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
354ms |
354ms |
354ms |
0/1 |
| 2 |
341ms |
356ms |
357ms |
0/2 |
| 4 |
451ms |
587ms |
601ms |
0/4 |
| 8 |
586ms |
899ms |
899ms |
0/8 |
qwen3.6:latest
| Level |
p50 |
p95 |
p99 |
errors |
| 1 |
3169ms |
3169ms |
3169ms |
0/1 |
| 2 |
14124ms |
15487ms |
15608ms |
0/2 |
| 4 |
20893ms |
29851ms |
30228ms |
0/4 |
| 8 |
31755ms |
65461ms |
68990ms |
0/8 |
| Model |
JSON score |
TTFT p50 |
errors |
| deepseek-coder:33b |
0.50 |
254ms |
0/9 |
| qwen2.5-coder:1.5b |
0.50 |
165ms |
0/9 |
| qwen2.5-coder:14b |
0.50 |
286ms |
0/9 |
| qwen2.5-coder:32b |
0.50 |
470ms |
0/9 |
| qwen2.5-coder:latest |
0.50 |
214ms |
0/9 |
| command-r-plus:latest |
0.33 |
7940ms |
0/9 |
| llama3.1:70b |
0.33 |
4779ms |
0/9 |
| mistral:latest |
0.17 |
129ms |
0/9 |
| codellama:7b-instruct |
0.00 |
121ms |
0/9 |
| deepseek-coder:6.7b |
0.00 |
124ms |
0/9 |
| deepseek-coder:latest |
0.00 |
71ms |
0/9 |
| llama3:8b |
0.00 |
211ms |
0/9 |
| qwen3-coder:30b |
0.00 |
241ms |
0/9 |
| qwen3.6:latest |
0.00 |
12796ms |
0/9 |
10. Recommended Default Models
- codellama:7b-instruct: Best for coding, Best for summarization
- command-r-plus:latest: Most stable under concurrency
- deepseek-coder:latest: Best for agentic tool calls, Highest throughput
11. Issues / Failures
No failures.
12. Next Steps for DGX Benchmarking
See docs/DGX_PHASE_2_PLAN.md for the full DGX phase 2 plan.
- Deploy vLLM on DGX with a test model
- Configure endpoint in
configs/endpoints.yaml
- Run:
macbench run --provider openai_compatible --all-models
- Add DCGM GPU metrics hook in
metrics/system.py