informatique:ai_lm:ai_vision
Ceci est une ancienne révision du document !
AI Vision
Il y a YOLO et tout plein d'outils dédiés à la détection dans des images.
llama.cpp
Nécessite un modèle multimodal et un fichier mmproj approprié.
- gemma-3-4b
Avec llama-mtmd-cli et gemma-3-4b-it :
# gemma-3-4b-it-UD-Q8_K_XL $ time ~/llama.cpp/build/bin/llama-mtmd-cli --log-timestamps \ -m ~/Data/AI_ModelsVision/gemma-3-4b-it-UD-Q8_K_XL.gguf \ --mmproj ~/Data/AI_ModelsVision/mmproj-model-f16.gguf \ --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?' main: loading model: /home/cyrille/Data/AI_ModelsVision/gemma-3-4b-it-UD-Q8_K_XL.gguf WARN: This is an experimental CLI for testing multimodal capability. For normal use cases, please use the standard llama-cli encoding image slice... image slice encoded in 789 ms decoding image batch 1/1, n_tokens_batch = 256 sched_reserve: reserving ... sched_reserve: CUDA0 compute buffer size = 517.12 MiB sched_reserve: CUDA_Host compute buffer size = 269.02 MiB sched_reserve: graph nodes = 1369 sched_reserve: graph splits = 2 sched_reserve: reserve took 109.44 ms, sched copies = 1 image decoded (batch 1/1) in 201 ms sched_reserve: reserving ... sched_reserve: CUDA0 compute buffer size = 517.12 MiB sched_reserve: CUDA_Host compute buffer size = 269.02 MiB sched_reserve: graph nodes = 1369 sched_reserve: graph splits = 2 sched_reserve: reserve took 188.38 ms, sched copies = 1 Oui, je vois des panneaux solaires sur l'image. Ils sont disposés sur le toit du bâtiment principal au centre de l'image. llama_perf_context_print: load time = 2846.21 ms llama_perf_context_print: prompt eval time = 852.69 ms / 278 tokens ( 3.07 ms per token, 326.03 tokens per second) llama_perf_context_print: eval time = 542.06 ms / 30 runs ( 18.07 ms per token, 55.34 tokens per second) llama_perf_context_print: total time = 2344.07 ms / 308 tokens llama_perf_context_print: graphs reused = 29 real 0m8,165s user 0m4,880s sys 0m3,269s # gemma-3-4b-it-Q4_K_M.gguf Oui, je vois des panneaux solaires sur l'image. Ils sont installés sur le toit du bâtiment principal, qui est une grande structure rectangulaire. llama_perf_context_print: load time = 1614.35 ms llama_perf_context_print: prompt eval time = 856.85 ms / 278 tokens ( 3.08 ms per token, 324.45 tokens per second) llama_perf_context_print: eval time = 359.10 ms / 33 runs ( 10.88 ms per token, 91.90 tokens per second) llama_perf_context_print: total time = 2049.84 ms / 311 tokens llama_perf_context_print: graphs reused = 32 real 0m6,531s user 0m3,426s sys 0m3,041s
Avec llama-mtmd-cli et SmolVLM2-2.2B-Instruct :
# time ~/llama.cpp/build/bin/llama-mtmd-cli -m ~/Data/SmolVLM2-2.2B-Instruct-Q4_0.gguf --mmproj ~/Data/mmproj-SmolVLM2-2.2B-Instruct-f16.gguf --image ~/Data/screenshot_20260214-141126.png -p 'Vois tu des panneaux solaires sur cette image ?' --log-timestamps build: 7971 (5fa1c190d) with GNU 13.3.0 for Linux x86_64 common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on gguf_init_from_file_impl: invalid magic characters: 'Entr', expected 'GGUF' llama_model_load: error loading model: llama_model_loader: failed to load model from /home/cyrille/Data/AI_ModelsVision/SmolVLM2-2.2B-Instruct-Q4_0.gguf llama_model_load_from_file_impl: failed to load model llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model llama_params_fit: fitting params to free memory took 0.10 seconds Erreur de segmentation (core dumped)
informatique/ai_lm/ai_vision.1771080715.txt.gz · Dernière modification : de cyrille
