informatique:ai_lm:model_bench
Ceci est une ancienne révision du document !
Table des matières
Model Benchmark
Avec Opencode
le prompt “hello”.
0.15.581.681 I srv init: init: chat template, thinking = 1 0.15.581.967 I srv llama_server: model loaded 0.15.581.973 I srv llama_server: server is listening on http://0.0.0.0:8012 0.15.581.981 I srv update_slots: all slots are idle 0.27.297.594 I srv params_from_: Chat format: peg-gemma4 0.27.300.952 I slot get_availabl: id 3 | task -1 | selected slot by LRU, t_last = -1 0.27.300.957 I srv get_availabl: updating prompt cache 0.27.300.966 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000 0.27.300.971 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 262144 tokens, 8589934592 est) 0.27.300.972 I srv get_availabl: prompt cache update took 0.01 ms 0.27.301.028 I slot launch_slot_: id 3 | task 0 | processing task, is_child = 0 0.27.406.740 I srv params_from_: Chat format: peg-gemma4 0.31.470.682 I slot get_availabl: id 2 | task -1 | selected slot by LRU, t_last = -1 0.31.470.685 I srv get_availabl: updating prompt cache 0.31.470.689 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000 0.31.470.692 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 262144 tokens, 8589934592 est) 0.31.470.692 I srv get_availabl: prompt cache update took 0.01 ms 0.31.480.325 I slot launch_slot_: id 2 | task 2 | processing task, is_child = 0 0.48.111.290 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 2048, progress = 0.27, t = 16.63 s / 123.14 tokens per second 1.04.164.694 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 4096, progress = 0.54, t = 32.68 s / 125.32 tokens per second 1.20.627.773 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 6144, progress = 0.81, t = 49.15 s / 125.01 tokens per second 1.20.627.950 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 346, progress = 0.40, t = 53.33 s / 6.49 tokens per second 1.31.552.764 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 7034, progress = 0.93, t = 60.07 s / 117.09 tokens per second 1.31.552.931 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 546, progress = 0.63, t = 64.25 s / 8.50 tokens per second 1.31.639.619 I slot create_check: id 3 | task 0 | created context checkpoint 1 of 32 (pos_min = 0, pos_max = 545, n_tokens = 546, size = 106.647 MiB) 1.35.810.635 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 7234, progress = 0.96, t = 64.33 s / 112.45 tokens per second 1.36.313.785 I slot create_check: id 2 | task 2 | created context checkpoint 1 of 32 (pos_min = 3484, pos_max = 7233, n_tokens = 7234, size = 732.465 MiB) 1.36.313.791 I slot print_timing: id 3 | task 0 | prompt processing, n_tokens = 858, progress = 1.00, t = 69.01 s / 12.43 tokens per second 1.36.432.948 I slot create_check: id 3 | task 0 | created context checkpoint 2 of 32 (pos_min = 0, pos_max = 857, n_tokens = 858, size = 167.589 MiB) 1.40.181.720 I slot print_timing: id 2 | task 2 | prompt processing, n_tokens = 7546, progress = 1.00, t = 68.70 s / 109.84 tokens per second 1.40.690.306 I slot create_check: id 2 | task 2 | created context checkpoint 2 of 32 (pos_min = 3800, pos_max = 7545, n_tokens = 7546, size = 731.684 MiB) 1.40.889.823 I reasoning-budget: activated, budget=2147483647 tokens 1.49.385.767 I reasoning-budget: deactivated (natural end) 1.49.386.539 I slot print_timing: id 3 | task 0 | n_decoded = 100, tg = 10.86 t/s 1.49.476.807 I slot print_timing: id 2 | task 2 | n_decoded = 100, tg = 11.56 t/s 1.50.323.353 I slot print_timing: id 2 | task 2 | prompt eval time = 69344.18 ms / 7550 tokens ( 9.18 ms per token, 108.88 tokens per second) 1.50.323.360 I slot print_timing: id 2 | task 2 | eval time = 9498.81 ms / 109 tokens ( 87.15 ms per token, 11.48 tokens per second) 1.50.323.361 I slot print_timing: id 2 | task 2 | total time = 78842.98 ms / 7659 tokens 1.50.323.367 I slot print_timing: id 2 | task 2 | graphs reused = 107 1.50.324.599 I slot release: id 2 | task 2 | stop processing: n_tokens = 7658, truncated = 0 1.52.425.033 I slot print_timing: id 3 | task 0 | n_decoded = 152, tg = 12.41 t/s 1.55.456.041 I slot print_timing: id 3 | task 0 | n_decoded = 216, tg = 14.14 t/s 1.58.479.230 I slot print_timing: id 3 | task 0 | n_decoded = 279, tg = 15.25 t/s 2.01.502.693 I slot print_timing: id 3 | task 0 | n_decoded = 344, tg = 16.13 t/s 2.04.511.987 I slot print_timing: id 3 | task 0 | n_decoded = 404, tg = 16.60 t/s 2.07.519.350 I slot print_timing: id 3 | task 0 | n_decoded = 464, tg = 16.97 t/s 2.10.526.448 I slot print_timing: id 3 | task 0 | n_decoded = 521, tg = 17.17 t/s 2.12.137.202 I slot print_timing: id 3 | task 0 | prompt eval time = 72880.63 ms / 862 tokens ( 84.55 ms per token, 11.83 tokens per second) 2.12.137.205 I slot print_timing: id 3 | task 0 | eval time = 31955.50 ms / 555 tokens ( 57.58 ms per token, 17.37 tokens per second) 2.12.137.206 I slot print_timing: id 3 | task 0 | total time = 104836.13 ms / 1417 tokens 2.12.137.207 I slot print_timing: id 3 | task 0 | graphs reused = 549 2.12.137.341 I slot release: id 3 | task 0 | stop processing: n_tokens = 1416, truncated = 0 2.12.137.354 I srv update_slots: all slots are idle
~/Code/bronx/AI_Coding/llama.cpp-86/build/bin/llama-server –host 0.0.0.0 –port 8012 -m ~/Data/AI_Models/gpt-oss-20b-UD-Q4_K_XL.gguf –jinja –no-mmap -c 0
informatique/ai_lm/model_bench.1779780587.txt.gz · Dernière modification : de cyrille
