Différences

Ci-dessous, les différences entre deux révisions de la page.

--- informatique:ai_lm:ai_vision [14/02/2026 15:44] – [llama.cpp] cyrille
+++ informatique:ai_lm:ai_vision [15/02/2026 12:28] (Version actuelle) – cyrille
@@ Ligne 1: / Ligne 1: @@
 ====== AI Vision ======
-Il y a YOLO et tout plein d'outils dédiés à la détection dans des images.
+Il y a YOLO et tout plein d'outils dédiés à la détection dans des images. Là je teste avec des modèles multimodaux, sans entraînement spécifique.
+===== expérience =====
+Le prompt demande s'il y a des panneaux solaire dans l'image fournie, avec sa bbox, et si "oui" de calculer les coordonnées géographiques de l'objet trouvé. Les 2 instructions permettent d'éliminer des faux positifs.
+Par exemple le modèle trouve un panneau solaire dans cette image, mais ne trouve pas les coordonnées géo, on peut donc l'évacuer des positifs.
+{{:informatique:ai_lm:ai_vision:champ-avec-rayures_18-131487-91478.jpeg?direct&140|champ avec rayures}}
 ===== llama.cpp =====
@@ Ligne 9: / Ligne 17: @@
 Nécessite un modèle multimodal et un fichier ''mmproj'' approprié.
-  * gemma-3-4b
+Avec **llama-mtmd-cli** et **gemma-3-4b-it** :
     * [[https://huggingface.co/ggml-org/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-Q4_K_M.gguf|gemma-3-4b-it-Q4_K_M.gguf]]
     * [[https://huggingface.co/ggml-org/gemma-3-4b-it-GGUF/resolve/main/mmproj-model-f16.gguf|mmproj-model-f16.gguf]]
-Avec **llama-mtmd-cli** :
+<code bash>
-<code>
+# gemma-3-4b-it-UD-Q8_K_XL
 $ time ~/llama.cpp/build/bin/llama-mtmd-cli --log-timestamps \
- -m ~/Data/AI_ModelsVision/gemma-3-4b-it-UD-Q8_K_XL.gguf \
+ -m ~/Data/gemma-3-4b-it-UD-Q8_K_XL.gguf \
- --mmproj ~/Data/AI_ModelsVision/mmproj-model-f16.gguf \
+ --mmproj ~/Data/mmproj-model-f16.gguf \
  --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?'
-main: loading model: /home/cyrille/Data/AI_ModelsVision/gemma-3-4b-it-UD-Q8_K_XL.gguf
+main: loading model: ~/Data/gemma-3-4b-it-UD-Q8_K_XL.gguf
 WARN: This is an experimental CLI for testing multimodal capability.
       For normal use cases, please use the standard llama-cli
@@ Ligne 52: / Ligne 61: @@
 user	0m4,880s
 sys	0m3,269s
+# gemma-3-4b-it-Q4_K_M.gguf
+Oui, je vois des panneaux solaires sur l'image. Ils sont installés sur le toit du bâtiment principal, qui est une grande structure rectangulaire.
+llama_perf_context_print:        load time =    1614.35 ms
+llama_perf_context_print: prompt eval time =     856.85 ms /   278 tokens (    3.08 ms per token,   324.45 tokens per second)
+llama_perf_context_print:        eval time =     359.10 ms /    33 runs   (   10.88 ms per token,    91.90 tokens per second)
+llama_perf_context_print:       total time =    2049.84 ms /   311 tokens
+llama_perf_context_print:    graphs reused =         32
+real	0m6,531s
+user	0m3,426s
+sys	0m3,041s
+</code>
+Avec **llama-mtmd-cli** et **SmolVLM2-2.2B-Instruct** :
+<code bash>
+# SmolVLM2-2.2B-Instruct-Q4_0
+time ~/llama.cpp/build/bin/llama-mtmd-cli -m ~/Data/SmolVLM2-2.2B-Instruct-Q4_0.gguf --mmproj ~/Data/mmproj-SmolVLM2-2.2B-Instruct-f16.gguf --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?' --log-timestamps
+build: 7971 (5fa1c190d) with GNU 13.3.0 for Linux x86_64
+common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
+gguf_init_from_file_impl: invalid magic characters: 'Entr', expected 'GGUF'
+llama_model_load: error loading model: llama_model_loader: failed to load model from ~/Data/SmolVLM2-2.2B-Instruct-Q4_0.gguf
+llama_model_load_from_file_impl: failed to load model
+llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
+llama_params_fit: fitting params to free memory took 0.10 seconds
+Erreur de segmentation (core dumped)
+</code>
+Avec **llama-mtmd-cli** et **Qwen2-VL-2B-Instruct** :
+<code bash>
+# Qwen2-VL-2B-Instruct-Q4_0
+time ~/llama.cpp/build/bin/llama-mtmd-cli -m ~/Data/Qwen2-VL-2B-Instruct-Q4_0.gguf --mmproj ~/Data/mmproj-Qwen2-VL-2B-Instruct-f16.gguf --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?' --log-timestamps -ngl 99
+ggml_cuda_init: found 1 CUDA devices:
+  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
+build: 7971 (5fa1c190d) with GNU 13.3.0 for Linux x86_64
+common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
+gguf_init_from_file_impl: invalid magic characters: 'Entr', expected 'GGUF'
+llama_model_load: error loading model: llama_model_loader: failed to load model from ~/Data/Qwen2-VL-2B-Instruct-Q4_0.gguf
+llama_model_load_from_file_impl: failed to load model
+llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
+llama_params_fit: fitting params to free memory took 0.09 seconds
+Erreur de segmentation (core dumped)
+</code>
+Avec **llama-mtmd-cli** et **MobileVLM-3B** :
+<code bash>
+$ time ~/llama.cpp/build/bin/llama-mtmd-cli -m ~/Data/MobileVLM-3B-q3_K_S.gguf --mmproj ~/Data/MobileVLM-3B-mmproj-f16.gguf --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?' --log-timestamps -ngl 99
+ggml_cuda_init: found 1 CUDA devices:
+  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
+build: 7971 (5fa1c190d) with GNU 13.3.0 for Linux x86_64
+common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
+gguf_init_from_file_impl: invalid magic characters: 'Repo', expected 'GGUF'
+llama_model_load: error loading model: llama_model_loader: failed to load model from ~/Data/MobileVLM-3B-q3_K_S.gguf
+llama_model_load_from_file_impl: failed to load model
+llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
+llama_params_fit: fitting params to free memory took 0.09 seconds
+Erreur de segmentation (core dumped)
 </code>