Différences

Ci-dessous, les différences entre deux révisions de la page.

--- informatique:ai_coding:gpu_bench [03/12/2025 07:48] – [GPU Bench] cyrille
+++ informatique:ai_coding:gpu_bench [03/12/2025 07:56] (Version actuelle) – [EuroLLM-9B-Instruct-Q4_0] cyrille
@@ Ligne 29: / Ligne 29: @@
 | //size: 4.94 GiB//               | tg256 |     ... |     55.96 |       71.15 |
 |                                  | tg512 |     ... |     53.87 |       69.45 |
-|                                  | b128  |     ... |           |  CUDA error |
+|                                  | b128  |     ... |   1433.95 |  CUDA error |
-|                                  | b256  |     ... |           |       ... |
+|                                  | b256  |     ... |   1535.06 |         ... |
-|                                  | b512  |     ... |           |       ... |
+|                                  | b512  |     ... |   1559.88 |         ... |
-| Qwen3-14B-UD-Q5_K_XL             | tg128 |     ... |        |       37.66 |
+| Qwen3-14B-UD-Q5_K_XL             | tg128 |     ... |     30.00 |       37.66 |
-| //size: 9.82 GiB//               | tg256 |     ... |        |       38.17 |
+| //size: 9.82 GiB//               | tg256 |     ... |     29.97 |       38.17 |
-|                                  | tg512 |     ... |        |       37.30 |
+|                                  | tg512 |     ... |     29.25 |       37.30 |
-|                                  | b128  |     ... |        |  CUDA error |
+|                                  | b128  |     ... |    903.97 |  CUDA error |
-|                                  | b256  |     ... |        |         ... |
+|                                  | b256  |     ... |    951.71 |         ... |
-|                                  | b512  |     ... |        |         ... |
+|                                  | b512  |     ... |    963.76 |         ... |
 ===== Intel® Core™ i7-1360P 13th Gen =====
@@ Ligne 75: / Ligne 75: @@
 </code>
-=== Qwen2.5-coder-7b-instruct-q8_0 ===
-<code>
-./build/bin/llama-bench -m ~/Data/AI_Models/Qwen2.5-coder-7b-instruct-q8_0.gguf -p 0 -n 128,256,512
-ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
-ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
-ggml_cuda_init: found 1 CUDA devices:
-  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
-| model                          |       size |     params | backend    | ngl |            test |                  t/s |
-| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
-| qwen2 7B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |           tg128 |         41.42 ± 0.00 |
-| qwen2 7B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |           tg256 |         41.38 ± 0.05 |
-| qwen2 7B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |           tg512 |         40.70 ± 0.01 |
-</code>
-=== EuroLLM-9B-Instruct-Q4_0 ===
-<code>
-./build/bin/llama-bench -m ~/Data/AI_Models/EuroLLM-9B-Instruct-Q4_0.gguf -p 0 -n 128,256,512
-ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
-ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
-ggml_cuda_init: found 1 CUDA devices:
-  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
-| model                          |       size |     params | backend    | ngl |            test |                  t/s |
-| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
-| llama ?B Q4_0                  |   4.94 GiB |     9.15 B | CUDA       |  99 |           tg128 |         56.06 ± 0.01 |
-| llama ?B Q4_0                  |   4.94 GiB |     9.15 B | CUDA       |  99 |           tg256 |         55.96 ± 0.02 |
-| llama ?B Q4_0                  |   4.94 GiB |     9.15 B | CUDA       |  99 |           tg512 |         53.87 ± 0.03 |
-</code>
@@ Ligne 125: / Ligne 96: @@
 </code>
-=== Qwen2.5-coder-7b-instruct-q8_0 ===
-<code>
-$ ~/Code/bronx/AI_Coding/llama.cpp/build/bin/llama-bench -m ~/Data/AI_Models/Qwen2.5-coder-7b-instruct-q8_0.gguf -p 0 -n 128,256,512
-ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
-ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
-ggml_cuda_init: found 1 CUDA devices:
-  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
-| model                          |       size |     params | backend    | ngl |            test |                  t/s |
-| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
-| qwen2 7B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |           tg128 |         50.33 ± 0.01 |
-| qwen2 7B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |           tg256 |         50.33 ± 0.01 |
-| qwen2 7B Q8_0                  |   7.54 GiB |     7.62 B | CUDA       |  99 |           tg512 |         49.62 ± 0.02 |
-build: 3f3a4fb9c (7130)
-</code>
-=== EuroLLM-9B-Instruct-Q4_0 ===
-<code>
-$ ./llama.cpp/build/bin/llama-bench -m ~/Data/AI_Models/EuroLLM-9B-Instruct-Q4_0.gguf -p 0 -n 128,256,512
-ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
-ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
-ggml_cuda_init: found 1 CUDA devices:
-  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
-| model                          |       size |     params | backend    | ngl |            test |                  t/s |
-| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
-| llama ?B Q4_0                  |   4.94 GiB |     9.15 B | CUDA       |  99 |           tg128 |         71.41 ± 0.05 |
-| llama ?B Q4_0                  |   4.94 GiB |     9.15 B | CUDA       |  99 |           tg256 |         71.15 ± 0.60 |
-| llama ?B Q4_0                  |   4.94 GiB |     9.15 B | CUDA       |  99 |           tg512 |         69.45 ± 0.08 |
-build: 3f3a4fb9c (7130)
-</code>
 ===== Traduction =====