| Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente |
| informatique:ai_lm:model_bench [26/05/2026 09:32] – cyrille | informatique:ai_lm:model_bench [26/05/2026 10:15] (Version actuelle) – cyrille |
|---|
| ====== Model Benchmark ====== | ====== Model bench ====== |
| |
| ===== Avec Opencode ===== | Avec OpenCode et le prompt "hello". |
| |
| le prompt "hello". | gemma-4-26B-A4B-it-Q4_K_M |
| |
| ''~/Code/bronx/AI_Coding/llama.cpp-86/build/bin/llama-server --host 0.0.0.0 --port 8012 -m ~/Data/AI_Models/gemma-4-26B-A4B-it-Q4_K_M.gguf --jinja --no-mmap -c 0'' | ''~/Code/bronx/AI_Coding/llama.cpp-86/build/bin/llama-server --host 0.0.0.0 --port 8012 -m ~/Data/AI_Models/gemma-4-26B-A4B-it-Q4_K_M.gguf --jinja -c 0'' |
| |
| <code> | <code> |
| 0.15.581.681 I srv init: init: chat template, thinking = 1 | 0.20.076.426 I srv init: init: chat template, thinking = 1 |
| 0.15.581.967 I srv llama_server: model loaded | 0.20.076.461 I srv llama_server: model loaded |
| 0.15.581.973 I srv llama_server: server is listening on http://0.0.0.0:8012 | 0.20.076.464 I srv llama_server: server is listening on http://0.0.0.0:8012 |
| 0.15.581.981 I srv update_slots: all slots are idle | 0.20.076.470 I srv update_slots: all slots are idle |
| 0.27.297.594 I srv params_from_: Chat format: peg-gemma4 | 0.35.420.649 I srv params_from_: Chat format: peg-gemma4 |
| 0.27.300.952 I slot get_availabl: id 3 | task -1 | selected slot by LRU, t_last = -1 | 0.35.423.645 I slot get_availabl: id 3 | task -1 | selected slot by LRU, t_last = -1 |
| 0.27.300.957 I srv get_availabl: updating prompt cache | 0.35.423.649 I srv get_availabl: updating prompt cache |
| 0.27.300.966 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000 | 0.35.423.655 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000 |
| 0.27.300.971 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 262144 tokens, 8589934592 est) | 0.35.423.660 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 262144 tokens, 8589934592 est) |
| 0.27.300.972 I srv get_availabl: prompt cache update took 0.01 ms | 0.35.423.660 I srv get_availabl: prompt cache update took 0.01 ms |
| 0.27.301.028 I slot launch_slot_: id 3 | task 0 | processing task, is_child = 0 | 0.35.423.720 I slot launch_slot_: id 3 | task 0 | processing task, is_child = 0 |
| 0.27.406.740 I srv params_from_: Chat format: peg-gemma4 | 0.35.535.675 I srv params_from_: Chat format: peg-gemma4 |
| 0.31.470.682 I slot get_availabl: id 2 | task -1 | selected slot by LRU, t_last = -1 | 0.39.629.088 I slot get_availabl: id 2 | task -1 | selected slot by LRU, t_last = -1 |
| 0.31.470.685 I srv get_availabl: updating prompt cache | 0.39.629.091 I srv get_availabl: updating prompt cache |
| 0.31.470.689 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000 | 0.39.629.094 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000 |
| 0.31.470.692 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 262144 tokens, 8589934592 est) | 0.39.629.104 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 262144 tokens, 8589934592 est) |
| 0.31.470.692 I srv get_availabl: prompt cache update took 0.01 ms | 0.39.629.105 I srv get_availabl: prompt cache update took 0.01 ms |
| 0.31.480.325 I slot launch_slot_: id 2 | task 2 | processing task, is_child = 0 | 0.39.629.271 I slot launch_slot_: id 2 | task 2 | processing task, is_child = 0 |
| 0.48.111.290 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 2048, progress = 0.27, t = 16.63 s / 123.14 tokens per second | 0.56.428.045 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 2048, progress = 0.27, t = 16.80 s / 121.91 tokens per second |
| 1.04.164.694 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 4096, progress = 0.54, t = 32.68 s / 125.32 tokens per second | 1.12.652.905 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 4096, progress = 0.54, t = 33.02 s / 124.03 tokens per second |
| 1.20.627.773 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 6144, progress = 0.81, t = 49.15 s / 125.01 tokens per second | 1.29.289.605 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 6144, progress = 0.81, t = 49.66 s / 123.72 tokens per second |
| 1.20.627.950 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 346, progress = 0.40, t = 53.33 s / 6.49 tokens per second | 1.29.289.781 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 346, progress = 0.40, t = 53.87 s / 6.42 tokens per second |
| 1.31.552.764 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 7034, progress = 0.93, t = 60.07 s / 117.09 tokens per second | 1.40.334.265 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 7034, progress = 0.93, t = 60.70 s / 115.87 tokens per second |
| 1.31.552.931 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 546, progress = 0.63, t = 64.25 s / 8.50 tokens per second | 1.40.334.442 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 546, progress = 0.63, t = 64.91 s / 8.41 tokens per second |
| 1.31.639.619 I slot create_check: id 3 | task 0 | created context checkpoint 1 of 32 (pos_min = 0, pos_max = 545, n_tokens = 546, size = 106.647 MiB) | 1.40.412.062 I slot create_check: id 3 | task 0 | created context checkpoint 1 of 32 (pos_min = 0, pos_max = 545, n_tokens = 546, size = 106.647 MiB) |
| 1.35.810.635 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 7234, progress = 0.96, t = 64.33 s / 112.45 tokens per second | 1.44.624.183 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 7234, progress = 0.96, t = 64.99 s / 111.30 tokens per second |
| 1.36.313.785 I slot create_check: id 2 | task 2 | created context checkpoint 1 of 32 (pos_min = 3484, pos_max = 7233, n_tokens = 7234, size = 732.465 MiB) | 1.45.125.971 I slot create_check: id 2 | task 2 | created context checkpoint 1 of 32 (pos_min = 3484, pos_max = 7233, n_tokens = 7234, size = 732.465 MiB) |
| 1.36.313.791 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 858, progress = 1.00, t = 69.01 s / 12.43 tokens per second | 1.45.125.976 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 858, progress = 1.00, t = 69.70 s / 12.31 tokens per second |
| 1.36.432.948 I slot create_check: id 3 | task 0 | created context checkpoint 2 of 32 (pos_min = 0, pos_max = 857, n_tokens = 858, size = 167.589 MiB) | 1.45.244.936 I slot create_check: id 3 | task 0 | created context checkpoint 2 of 32 (pos_min = 0, pos_max = 857, n_tokens = 858, size = 167.589 MiB) |
| 1.40.181.720 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 7546, progress = 1.00, t = 68.70 s / 109.84 tokens per second | 1.49.037.281 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 7546, progress = 1.00, t = 69.41 s / 108.72 tokens per second |
| 1.40.690.306 I slot create_check: id 2 | task 2 | created context checkpoint 2 of 32 (pos_min = 3800, pos_max = 7545, n_tokens = 7546, size = 731.684 MiB) | 1.49.539.191 I slot create_check: id 2 | task 2 | created context checkpoint 2 of 32 (pos_min = 3800, pos_max = 7545, n_tokens = 7546, size = 731.684 MiB) |
| 1.40.889.823 I reasoning-budget: activated, budget=2147483647 tokens | 1.49.733.653 I reasoning-budget: activated, budget=2147483647 tokens |
| 1.49.385.767 I reasoning-budget: deactivated (natural end) | 1.57.457.755 I reasoning-budget: deactivated (natural end) |
| 1.49.386.539 I slot print_timing: id 3 | task 0 | n_decoded = 100, tg = 10.86 t/s | 1.57.643.455 I slot print_timing: id 3 | task 0 | n_decoded = 100, tg = 11.62 t/s |
| 1.49.476.807 I slot print_timing: id 2 | task 2 | n_decoded = 100, tg = 11.56 t/s | 1.57.734.954 I slot print_timing: id 2 | task 2 | n_decoded = 100, tg = 12.39 t/s |
| 1.50.323.353 I slot print_timing: id 2 | task 2 | prompt eval time = 69344.18 ms / 7550 tokens ( 9.18 ms per token, 108.88 tokens per second) | 1.58.436.181 I slot print_timing: id 2 | task 2 | prompt eval time = 70036.31 ms / 7550 tokens ( 9.28 ms per token, 107.80 tokens per second) |
| 1.50.323.360 I slot print_timing: id 2 | task 2 | eval time = 9498.81 ms / 109 tokens ( 87.15 ms per token, 11.48 tokens per second) | 1.58.436.188 I slot print_timing: id 2 | task 2 | eval time = 8770.57 ms / 107 tokens ( 81.97 ms per token, 12.20 tokens per second) |
| 1.50.323.361 I slot print_timing: id 2 | task 2 | total time = 78842.98 ms / 7659 tokens | 1.58.436.189 I slot print_timing: id 2 | task 2 | total time = 78806.87 ms / 7657 tokens |
| 1.50.323.367 I slot print_timing: id 2 | task 2 | graphs reused = 107 | 1.58.436.194 I slot print_timing: id 2 | task 2 | graphs reused = 105 |
| 1.50.324.599 I slot release: id 2 | task 2 | stop processing: n_tokens = 7658, truncated = 0 | 1.58.436.991 I slot release: id 2 | task 2 | stop processing: n_tokens = 7656, truncated = 0 |
| 1.52.425.033 I slot print_timing: id 3 | task 0 | n_decoded = 152, tg = 12.41 t/s | 2.00.674.198 I slot print_timing: id 3 | task 0 | n_decoded = 153, tg = 13.15 t/s |
| 1.55.456.041 I slot print_timing: id 3 | task 0 | n_decoded = 216, tg = 14.14 t/s | 2.03.697.960 I slot print_timing: id 3 | task 0 | n_decoded = 217, tg = 14.80 t/s |
| 1.58.479.230 I slot print_timing: id 3 | task 0 | n_decoded = 279, tg = 15.25 t/s | 2.06.726.056 I slot print_timing: id 3 | task 0 | n_decoded = 281, tg = 15.89 t/s |
| 2.01.502.693 I slot print_timing: id 3 | task 0 | n_decoded = 344, tg = 16.13 t/s | 2.09.759.680 I slot print_timing: id 3 | task 0 | n_decoded = 345, tg = 16.65 t/s |
| 2.04.511.987 I slot print_timing: id 3 | task 0 | n_decoded = 404, tg = 16.60 t/s | 2.12.763.818 I slot print_timing: id 3 | task 0 | n_decoded = 408, tg = 17.20 t/s |
| 2.07.519.350 I slot print_timing: id 3 | task 0 | n_decoded = 464, tg = 16.97 t/s | 2.15.807.460 I slot print_timing: id 3 | task 0 | n_decoded = 474, tg = 17.71 t/s |
| 2.10.526.448 I slot print_timing: id 3 | task 0 | n_decoded = 521, tg = 17.17 t/s | 2.18.833.658 I slot print_timing: id 3 | task 0 | n_decoded = 538, tg = 18.06 t/s |
| 2.12.137.202 I slot print_timing: id 3 | task 0 | prompt eval time = 72880.63 ms / 862 tokens ( 84.55 ms per token, 11.83 tokens per second) | 2.21.846.198 I slot print_timing: id 3 | task 0 | n_decoded = 602, tg = 18.35 t/s |
| 2.12.137.205 I slot print_timing: id 3 | task 0 | eval time = 31955.50 ms / 555 tokens ( 57.58 ms per token, 17.37 tokens per second) | 2.24.862.006 I slot print_timing: id 3 | task 0 | n_decoded = 667, tg = 18.62 t/s |
| 2.12.137.206 I slot print_timing: id 3 | task 0 | total time = 104836.13 ms / 1417 tokens | 2.27.863.732 I slot print_timing: id 3 | task 0 | n_decoded = 731, tg = 18.83 t/s |
| 2.12.137.207 I slot print_timing: id 3 | task 0 | graphs reused = 549 | 2.30.873.932 I slot print_timing: id 3 | task 0 | n_decoded = 797, tg = 19.05 t/s |
| 2.12.137.341 I slot release: id 3 | task 0 | stop processing: n_tokens = 1416, truncated = 0 | 2.33.923.339 I slot print_timing: id 3 | task 0 | n_decoded = 862, tg = 19.20 t/s |
| 2.12.137.354 I srv update_slots: all slots are idle | 2.36.953.349 I slot print_timing: id 3 | task 0 | n_decoded = 926, tg = 19.33 t/s |
| | 2.39.978.864 I slot print_timing: id 3 | task 0 | n_decoded = 991, tg = 19.45 t/s |
| | 2.42.989.048 I slot print_timing: id 3 | task 0 | n_decoded = 1053, tg = 19.52 t/s |
| | 2.43.175.514 I slot print_timing: id 3 | task 0 | prompt eval time = 73613.51 ms / 862 tokens ( 85.40 ms per token, 11.71 tokens per second) |
| | 2.43.175.519 I slot print_timing: id 3 | task 0 | eval time = 54138.25 ms / 1057 tokens ( 51.22 ms per token, 19.52 tokens per second) |
| | 2.43.175.520 I slot print_timing: id 3 | task 0 | total time = 127751.76 ms / 1919 tokens |
| | 2.43.175.521 I slot print_timing: id 3 | task 0 | graphs reused = 1049 |
| | 2.43.175.623 I slot release: id 3 | task 0 | stop processing: n_tokens = 1918, truncated = 0 |
| | 2.43.175.638 I srv update_slots: all slots are idle |
| </code> | </code> |
| |
| ''~/Code/bronx/AI_Coding/llama.cpp-86/build/bin/llama-server --host 0.0.0.0 --port 8012 -m ~/Data/AI_Models/gpt-oss-20b-UD-Q4_K_XL.gguf --jinja --no-mmap -c 0'' | gpt-oss-20b-UD-Q4_K_XL |
| | |
| | ''~/Code/bronx/AI_Coding/llama.cpp-86/build/bin/llama-server --host 0.0.0.0 --port 8012 -m ~/Data/AI_Models/gpt-oss-20b-UD-Q4_K_XL.gguf --jinja -c 0'' |
| |
| <code> | <code> |
| | 0.14.412.387 I srv init: init: chat template, thinking = 1 |
| | 0.14.412.686 I srv llama_server: model loaded |
| | 0.14.412.689 I srv llama_server: server is listening on http://0.0.0.0:8012 |
| | 0.14.412.697 I srv update_slots: all slots are idle |
| | 0.53.838.855 I srv params_from_: Chat format: peg-native |
| | 0.53.859.921 I slot get_availabl: id 3 | task -1 | selected slot by LRU, t_last = -1 |
| | 0.53.859.923 I srv get_availabl: updating prompt cache |
| | 0.53.859.929 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000 |
| | 0.53.859.934 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 131072 tokens, 8589934592 est) |
| | 0.53.859.934 I srv get_availabl: prompt cache update took 0.01 ms |
| | 0.53.860.696 I slot launch_slot_: id 3 | task 0 | processing task, is_child = 0 |
| | 0.53.962.888 I srv params_from_: Chat format: peg-native |
| | 0.55.442.476 I slot get_availabl: id 2 | task -1 | selected slot by LRU, t_last = -1 |
| | 0.55.442.478 I srv get_availabl: updating prompt cache |
| | 0.55.442.482 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000 |
| | 0.55.442.484 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 131072 tokens, 8589934592 est) |
| | 0.55.442.485 I srv get_availabl: prompt cache update took 0.01 ms |
| | 0.55.443.928 I slot launch_slot_: id 2 | task 2 | processing task, is_child = 0 |
| | 1.00.811.354 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 2048, progress = 0.30, t = 5.37 s / 381.56 tokens per second |
| | 1.05.921.432 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 4096, progress = 0.61, t = 10.48 s / 390.93 tokens per second |
| | 1.11.031.597 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 6144, progress = 0.91, t = 15.59 s / 394.16 tokens per second |
| | 1.11.031.682 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 371, progress = 0.42, t = 17.17 s / 21.61 tokens per second |
| | 1.12.311.375 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 6248, progress = 0.92, t = 16.87 s / 370.42 tokens per second |
| | 1.12.311.469 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 582, progress = 0.66, t = 18.45 s / 31.54 tokens per second |
| | 1.12.317.858 I slot create_check: id 3 | task 0 | created context checkpoint 1 of 32 (pos_min = 244, pos_max = 581, n_tokens = 582, size = 7.926 MiB) |
| | 1.13.691.441 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 6459, progress = 0.95, t = 18.25 s / 353.97 tokens per second |
| | 1.13.698.938 I slot create_check: id 2 | task 2 | created context checkpoint 1 of 32 (pos_min = 6074, pos_max = 6458, n_tokens = 6459, size = 9.028 MiB) |
| | 1.13.698.943 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 883, progress = 1.00, t = 19.84 s / 44.51 tokens per second |
| | 1.13.711.424 I slot create_check: id 3 | task 0 | created context checkpoint 2 of 32 (pos_min = 244, pos_max = 882, n_tokens = 883, size = 14.984 MiB) |
| | 1.14.989.097 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 6760, progress = 1.00, t = 19.55 s / 345.87 tokens per second |
| | 1.14.999.395 I slot create_check: id 2 | task 2 | created context checkpoint 2 of 32 (pos_min = 6248, pos_max = 6759, n_tokens = 6760, size = 12.006 MiB) |
| | 1.17.275.512 I slot print_timing: id 2 | task 2 | prompt eval time = 19650.35 ms / 6764 tokens ( 2.91 ms per token, 344.22 tokens per second) |
| | 1.17.275.514 I slot print_timing: id 2 | task 2 | eval time = 2181.20 ms / 49 tokens ( 44.51 ms per token, 22.46 tokens per second) |
| | 1.17.275.515 I slot print_timing: id 2 | task 2 | total time = 21831.56 ms / 6813 tokens |
| | 1.17.275.518 I slot print_timing: id 2 | task 2 | graphs reused = 47 |
| | 1.17.322.530 I slot release: id 2 | task 2 | stop processing: n_tokens = 6812, truncated = 0 |
| | 1.18.577.818 I slot print_timing: id 3 | task 0 | n_decoded = 100, tg = 27.86 t/s |
| | 1.18.833.520 I slot print_timing: id 3 | task 0 | prompt eval time = 21128.34 ms / 887 tokens ( 23.82 ms per token, 41.98 tokens per second) |
| | 1.18.833.522 I slot print_timing: id 3 | task 0 | eval time = 3844.46 ms / 111 tokens ( 34.63 ms per token, 28.87 tokens per second) |
| | 1.18.833.522 I slot print_timing: id 3 | task 0 | total time = 24972.80 ms / 998 tokens |
| | 1.18.833.523 I slot print_timing: id 3 | task 0 | graphs reused = 107 |
| | 1.18.844.486 I slot release: id 3 | task 0 | stop processing: n_tokens = 997, truncated = 0 |
| | 1.18.844.505 I srv update_slots: all slots are idle |
| </code> | </code> |
| |