Différences

Ci-dessous, les différences entre deux révisions de la page.

--- informatique:ai_coding [30/11/2025 14:42] – [Autres usages] cyrille
+++ informatique:ai_coding [05/12/2025 17:40] (Version actuelle) – [Avec GPU] cyrille
@@ Ligne 189: / Ligne 189: @@
       * [[https://huggingface.co/utter-project/EuroLLM-9B|huggingface/utter-project/EuroLLM-9B]]
         * https://huggingface.co/bartowski/EuroLLM-9B-Instruct-GGUF
-  * [[https://github.com/bofenghuang/vigogne/blob/main/docs/model.md|Vigogne]] modèles réentrainer en français
+  * [[https://github.com/bofenghuang/vigogne/blob/main/docs/model.md|Vigogne]] modèles réentrainer en français (//2023//)
     * [[https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md|Voilà Voilà: Unleashing Vigogne Chat V2.0]]
   * [[https://www.channelnews.fr/avec-son-moteur-ia-ultra-leger-et-ultra-puissant-lighton-rend-la-deep-research-accessible-et-souveraine-148246|LightOn dévoile Reason-ModernColBERT]]
@@ Ligne 203: / Ligne 203: @@
 <code bash>
 ./bin/llama-server -m devstralQ5_K_M.gguf --port 8012 --jinja --ctx-size 20000
+~/Code/bronx/AI_Coding/llama.cpp/build/bin/llama-server --port 8012 --chatml -m ~/Data/AI_Models/Qwen2.5-coder-7b-instruct-q8_0.gguf --ctx-size 48000
 </code>
-Models:
+Quid des chat formats ? Est-ce lié au modèle ?
+  * ''--jinja''
+  * ''--chatml''
+Modèles:
   * Les models au format GGUF, en fichier ou url sur [[https://huggingface.co/|Hugging Face]], [[https://modelscope.cn/|ModelScope]]
   * [[https://github.com/ggml-org/llama.cpp#obtaining-and-quantizing-models|Obtaining and quantizing models]]
@@ Ligne 217: / Ligne 223: @@
 Il faut le compiler avec CUDA. Avec une version >= 11.7 pour [[https://github.com/ggml-org/llama.cpp/issues/11112|compatibilité syntaxe]].
+  * [[https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cuda|Build llama.cpp with CUDA]]
 J'ai [[https://linuxcapable.com/how-to-install-cuda-on-debian-linux/|installé CUDA]] le [[https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key|dépot Nvidia]] Cuda et cuda toolkit 13
 <code>
-$ cat /etc/apt/sources.list.d/nvidia-cuda.list
+$ sudo cat /etc/apt/sources.list.d/cuda-ubuntu2404-x86_64.list
-deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /
+deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg]
+ https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/ /
 </code>
@@ Ligne 230: / Ligne 239: @@
 </code>
-puis une très longue compilation avec :
+Ensuite une très très longue compilation :
+DCMAKE_CUDA_ARCHITECTURES: ''86'' pour RTX 3060 et ''120'' pour RTX 5060.
 <code>
@@ Ligne 236: / Ligne 247: @@
 # RTX 3060 : 86
 # RTX 5060 : 120
-cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="86;120"
+$ cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="86;120" \
-cmake --build build --config Release
+ -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.9/bin/nvcc -DCMAKE_INSTALL_RPATH="/usr/local/cuda-12.9/lib64;\$ORIGIN" -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON
+-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
+-- CMAKE_SYSTEM_PROCESSOR: x86_64
+-- GGML_SYSTEM_ARCH: x86
+-- Including CPU backend
+-- x86 detected
+-- Adding CPU backend variant ggml-cpu: -march=native
+-- CUDA Toolkit found
+-- Using CUDA architectures: 86;120
+-- CUDA host compiler is GNU 13.3.0
+-- Including CUDA backend
+-- ggml version: 0.9.4
+-- ggml commit:  6016d0bd4
+-- Configuring done (0.5s)
+-- Generating done (0.2s)
+-- Build files have been written to: /home/cyrille/Code/bronx/AI_Coding/llama.cpp/build
+$ cmake --build build --config Release
+...
+real	44m35,149s
+user	42m38,100s
+sys	1m51,594s
 </code>