Table des matières

AI Vision

Il y a YOLO et tout plein d'outils dédiés à la détection dans des images. Là je teste avec des modèles multimodaux, sans entraînement spécifique.

expérience

Le prompt demande s'il y a des panneaux solaire dans l'image fournie, avec sa bbox, et si “oui” de calculer les coordonnées géographiques de l'objet trouvé. Les 2 instructions permettent d'éliminer des faux positifs.

Par exemple le modèle trouve un panneau solaire dans cette image, mais ne trouve pas les coordonnées géo, on peut donc l'évacuer des positifs.

champ avec rayures

llama.cpp

Nécessite un modèle multimodal et un fichier mmproj approprié.

Avec llama-mtmd-cli et gemma-3-4b-it :

# gemma-3-4b-it-UD-Q8_K_XL
$ time ~/llama.cpp/build/bin/llama-mtmd-cli --log-timestamps \
 -m ~/Data/gemma-3-4b-it-UD-Q8_K_XL.gguf \
 --mmproj ~/Data/mmproj-model-f16.gguf \
 --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?'
 
main: loading model: ~/Data/gemma-3-4b-it-UD-Q8_K_XL.gguf
WARN: This is an experimental CLI for testing multimodal capability.
      For normal use cases, please use the standard llama-cli
encoding image slice...
image slice encoded in 789 ms
decoding image batch 1/1, n_tokens_batch = 256
sched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =   517.12 MiB
sched_reserve:  CUDA_Host compute buffer size =   269.02 MiB
sched_reserve: graph nodes  = 1369
sched_reserve: graph splits = 2
sched_reserve: reserve took 109.44 ms, sched copies = 1
image decoded (batch 1/1) in 201 ms
sched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =   517.12 MiB
sched_reserve:  CUDA_Host compute buffer size =   269.02 MiB
sched_reserve: graph nodes  = 1369
sched_reserve: graph splits = 2
sched_reserve: reserve took 188.38 ms, sched copies = 1
 
Oui, je vois des panneaux solaires sur l'image. Ils sont disposés sur le toit du bâtiment principal au centre de l'image.
 
 
llama_perf_context_print:        load time =    2846.21 ms
llama_perf_context_print: prompt eval time =     852.69 ms /   278 tokens (    3.07 ms per token,   326.03 tokens per second)
llama_perf_context_print:        eval time =     542.06 ms /    30 runs   (   18.07 ms per token,    55.34 tokens per second)
llama_perf_context_print:       total time =    2344.07 ms /   308 tokens
llama_perf_context_print:    graphs reused =         29
 
real	0m8,165s
user	0m4,880s
sys	0m3,269s
 
# gemma-3-4b-it-Q4_K_M.gguf
 
Oui, je vois des panneaux solaires sur l'image. Ils sont installés sur le toit du bâtiment principal, qui est une grande structure rectangulaire.
 
 
llama_perf_context_print:        load time =    1614.35 ms
llama_perf_context_print: prompt eval time =     856.85 ms /   278 tokens (    3.08 ms per token,   324.45 tokens per second)
llama_perf_context_print:        eval time =     359.10 ms /    33 runs   (   10.88 ms per token,    91.90 tokens per second)
llama_perf_context_print:       total time =    2049.84 ms /   311 tokens
llama_perf_context_print:    graphs reused =         32
 
real	0m6,531s
user	0m3,426s
sys	0m3,041s

Avec llama-mtmd-cli et SmolVLM2-2.2B-Instruct :

# SmolVLM2-2.2B-Instruct-Q4_0
time ~/llama.cpp/build/bin/llama-mtmd-cli -m ~/Data/SmolVLM2-2.2B-Instruct-Q4_0.gguf --mmproj ~/Data/mmproj-SmolVLM2-2.2B-Instruct-f16.gguf --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?' --log-timestamps
 
build: 7971 (5fa1c190d) with GNU 13.3.0 for Linux x86_64
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
gguf_init_from_file_impl: invalid magic characters: 'Entr', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from ~/Data/SmolVLM2-2.2B-Instruct-Q4_0.gguf
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
llama_params_fit: fitting params to free memory took 0.10 seconds
Erreur de segmentation (core dumped)

Avec llama-mtmd-cli et Qwen2-VL-2B-Instruct :

# Qwen2-VL-2B-Instruct-Q4_0
time ~/llama.cpp/build/bin/llama-mtmd-cli -m ~/Data/Qwen2-VL-2B-Instruct-Q4_0.gguf --mmproj ~/Data/mmproj-Qwen2-VL-2B-Instruct-f16.gguf --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?' --log-timestamps -ngl 99
 
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
build: 7971 (5fa1c190d) with GNU 13.3.0 for Linux x86_64
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
gguf_init_from_file_impl: invalid magic characters: 'Entr', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from ~/Data/Qwen2-VL-2B-Instruct-Q4_0.gguf
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
llama_params_fit: fitting params to free memory took 0.09 seconds
Erreur de segmentation (core dumped)

Avec llama-mtmd-cli et MobileVLM-3B :

$ time ~/llama.cpp/build/bin/llama-mtmd-cli -m ~/Data/MobileVLM-3B-q3_K_S.gguf --mmproj ~/Data/MobileVLM-3B-mmproj-f16.gguf --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?' --log-timestamps -ngl 99
 
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
build: 7971 (5fa1c190d) with GNU 13.3.0 for Linux x86_64
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
gguf_init_from_file_impl: invalid magic characters: 'Repo', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from ~/Data/MobileVLM-3B-q3_K_S.gguf
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
llama_params_fit: fitting params to free memory took 0.09 seconds
Erreur de segmentation (core dumped)