Différences

Ci-dessous, les différences entre deux révisions de la page.

--- informatique:egpu [08/12/2025 15:25] – [PNY OC 16GB Geforce RTX 5060 Ti] cyrille
+++ informatique:egpu [15/01/2026 14:12] (Version actuelle) – [eGpu] cyrille
@@ Ligne 14: / Ligne 14: @@
     * acheté
     * ✅ RTX 3060 ok
-    * ✗ RTX 5060 à peu près ok ([[/informatique/ai_coding/gpu_bench|ça plante selon les modèles]])
+    * ❌ RTX 5060 à peu près ok ([[informatique:ai_lm:gpu_bench|ça plante selon les modèles]])
-  * [[https://fr.aliexpress.com/item/1005007990218564.html?|WKG-L19C70]] Wikingoo
-    * Le vendeur dit qu'elle fonctionnera mieux que la L17 avec la RTX 5060 ...
   * [[http://www.cyidpcie.cn/page/HL7.html|TB3-HL7]]
     * acheté
@@ Ligne 22: / Ligne 20: @@
     * ✅ RTX 3060 ok
     * ❌ RTX 5060 failed
-  * [[https://fr.aliexpress.com/item/1005008424134383.html|ADT UT4G-BK7]] TB3/TB4 vers PCIe x16 PCIe 4.0 x4 GPU Dock
+  * [[https://fr.aliexpress.com/item/1005007990218564.html?|WKG-L19C70]] Wikingoo
+    * Le vendeur dit qu'elle fonctionnera mieux que la L17 avec la RTX 5060 ...
+    * Mais des acheteurs signalent des déconnexions https://community.frame.work/t/egpu-disconnects-fw-13-amd/73265
+  * ADT UT4G
+    * USB4/TB3/T4B to Pcie X16 adapter for eGPU
+    * https://www.adtlink.cn/en/product/UT4G.html
+      * $128 https://www.adt.link/product/UT4G-Shop.html
+    * [[https://fr.aliexpress.com/item/1005008424134383.html|ADT UT4G-BK7]] TB3/TB4 vers PCIe x16 PCIe 4.0 x4 GPU Dock
+  * AOOSTAR
+    * AG02 Oculink/USB4, avec PSU
+      * $219 https://aoostar.com/products/aoostar-ag01-egpu-dock-with-oculink-port-built-in-huntkey-400w-power-supply-supports-tgx-interface-hot-swap
+    * AOOSTAR EG02 TB5+Oculink
+      * $219 https://aoostar.com/collections/egpu-series/products/aoosatr-eg02-tb5-oculink
+  * Minisforum DEG2 OCulink Thunderbolt 5 eGPU Dock
+    * Thunderbolt 5 Port | Up to 80Gps, OCuLink (PCIe 4.0 ×4) | Up to 64Gps, Built-in M.2 2280 SSD, Compatible with ATX / SFX PSU
+    * $259 https://www.minisforum.com/fr/products/deg2 ("ajouter au panier" pour voir le prix)
+  * EXP-GDC TH5P4
+    * ???
 Au final on ne fait tourner que de petits models avec de petit context ...
@@ Ligne 62: / Ligne 77: @@
 ggml_cuda_init: failed to initialize CUDA: CUDA driver version is insufficient for CUDA runtime version
 </code>
+==== nvidia-uvm ====
+<code>
+$ modinfo nvidia-uvm
+filename:       /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-uvm.ko.zst
+version:        580.126.09
+supported:      external
+license:        Dual MIT/GPL
+srcversion:     B7E9DECF7BD1D315EBCCCF0
+depends:        nvidia
+name:           nvidia_uvm
+retpoline:      Y
+vermagic:       6.14.0-37-generic SMP preempt mod_unload modversions
+sig_id:         PKCS#7
+signer:         NS5x-NS7xAU Secure Boot Module Signature key
+sig_key:        3B:82:8F:E4:B9:99:2E:1F:E5:76:9C:33:AC:26:A9:F0:0A:1A:E3:46
+sig_hashalgo:   sha512
+signature:      66:E9:9A:75:7C:2D:5B:1C:56:B9:CD:CE:E4:64:3B:5F:66:BB:F3:B2:
+F:E8:34:44:62:FD:02:32:A3:27:A8:EA:20:BB:BA:87:6F:F7:F8:6E:
+		F5:27:67:07:97:55:39:39:B2:7E:DE:01:F1:E5:64:AF:3A:29:98:90:
+D:A3:7A:0C:D9:D2:60:A8:15:C1:55:6E:F1:53:FE:85:D2:07:54:12:
+		B0:A4:D5:76:96:D4:A9:5F:85:B4:75:18:B4:38:A2:8B:15:3D:8C:8B:
+		F3:0A:AA:1E:F6:81:F1:27:CC:1E:22:EC:E6:72:BC:DC:3A:FD:39:2F:
+		F4:BF:DE:47:38:7E:1D:FE:04:D1:29:24:AD:CB:46:44:7F:4F:62:67:
+:FA:96:10:58:47:02:C8:65:05:67:7A:53:A6:70:76:A1:10:39:56:
+B:B3:5F:98:E2:D3:F1:FC:7E:85:02:E0:37:04:E4:91:E6:7D:92:25:
+		FE:3E:CD:0F:E1:26:B8:78:FA:C6:DB:AD:AA:CB:A9:22:2E:E7:20:DA:
+:46:FC:14:EB:54:54:B4:AF:1D:66:72:9B:C2:99:18:1B:57:77:14:
+		FD:65:14:B0:96:A5:0A:78:A4:AA:E2:F3:49:96:85:53:A3:28:50:C9:
+		E4:74:89:65:C7:24:19:BC:AF:4C:15:5E:55:8C:53:CC
+parm:           uvm_conf_computing_channel_iv_rotation_limit:ulong
+parm:           uvm_ats_mode:Set to 0 to disable ATS (Address Translation Services). Any other value is ignored. Has no effect unless the platform supports ATS. (int)
+parm:           uvm_perf_prefetch_enable:uint
+parm:           uvm_perf_prefetch_threshold:uint
+parm:           uvm_perf_prefetch_min_faults:uint
+parm:           uvm_perf_thrashing_enable:uint
+parm:           uvm_perf_thrashing_threshold:uint
+parm:           uvm_perf_thrashing_pin_threshold:uint
+parm:           uvm_perf_thrashing_lapse_usec:uint
+parm:           uvm_perf_thrashing_nap:uint
+parm:           uvm_perf_thrashing_epoch:uint
+parm:           uvm_perf_thrashing_pin:uint
+parm:           uvm_perf_thrashing_max_resets:uint
+parm:           uvm_perf_map_remote_on_native_atomics_fault:uint
+parm:           uvm_disable_hmm:Force-disable HMM functionality in the UVM driver. Default: false (HMM is enabled if possible). However, even with uvm_disable_hmm=false, HMM will not be enabled if is not supported in this driver build configuration, or if ATS settings conflict with HMM. (bool)
+parm:           uvm_perf_migrate_cpu_preunmap_enable:int
+parm:           uvm_perf_migrate_cpu_preunmap_block_order:uint
+parm:           uvm_global_oversubscription:Enable (1) or disable (0) global oversubscription support. (int)
+parm:           uvm_perf_pma_batch_nonpinned_order:uint
+parm:           uvm_cpu_chunk_allocation_sizes:OR'ed value of all CPU chunk allocation sizes. (uint)
+parm:           uvm_leak_checker:Enable uvm memory leak checking. 0 = disabled, 1 = count total bytes allocated and freed, 2 = per-allocation origin tracking. (int)
+parm:           uvm_force_prefetch_fault_support:uint
+parm:           uvm_debug_enable_push_desc:Enable push description tracking (uint)
+parm:           uvm_debug_enable_push_acquire_info:Enable push acquire information tracking (uint)
+parm:           uvm_page_table_location:Set the location for UVM-allocated page tables. Choices are: vid, sys. (charp)
+parm:           uvm_perf_access_counter_migration_enable:Whether access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
+parm:           uvm_perf_access_counter_batch_count:uint
+parm:           uvm_perf_access_counter_threshold:Number of remote accesses on a region required to trigger a notification.Valid values: [1, 65535] (uint)
+parm:           uvm_perf_reenable_prefetch_faults_lapse_msec:uint
+parm:           uvm_perf_fault_batch_count:uint
+parm:           uvm_perf_fault_replay_policy:uint
+parm:           uvm_perf_fault_replay_update_put_ratio:uint
+parm:           uvm_perf_fault_max_batches_per_service:uint
+parm:           uvm_perf_fault_max_throttle_per_service:uint
+parm:           uvm_perf_fault_coalesce:uint
+parm:           uvm_fault_force_sysmem:Force (1) using sysmem storage for pages that faulted. Default: 0. (int)
+parm:           uvm_perf_map_remote_on_eviction:int
+parm:           uvm_block_cpu_to_cpu_copy_with_ce:Use GPU CEs for CPU-to-CPU migrations. (int)
+parm:           uvm_exp_gpu_cache_peermem:Force caching for mappings to peer memory. This is an experimental parameter that may cause correctness issues if used. (uint)
+parm:           uvm_exp_gpu_cache_sysmem:Force caching for mappings to system memory. This is an experimental parameter that may cause correctness issues if used. (uint)
+parm:           uvm_downgrade_force_membar_sys:Force all TLB invalidation downgrades to use MEMBAR_SYS (uint)
+parm:           uvm_channel_num_gpfifo_entries:uint
+parm:           uvm_channel_gpfifo_loc:charp
+parm:           uvm_channel_gpput_loc:charp
+parm:           uvm_channel_pushbuffer_loc:charp
+parm:           uvm_enable_va_space_mm:Set to 0 to disable UVM from using mmu_notifiers to create an association between a UVM VA space and a process. This will also disable pageable memory access via either ATS or HMM. (int)
+parm:           uvm_enable_debug_procfs:Enable debug procfs entries in /proc/driver/nvidia-uvm (int)
+parm:           uvm_peer_copy:Choose the addressing mode for peer copying, options: phys [default] or virt. Valid for Ampere+ GPUs. (charp)
+parm:           uvm_debug_prints:Enable uvm debug prints. (int)
+parm:           uvm_enable_builtin_tests:Enable the UVM built-in tests. (This is a security risk) (int)
+parm:           uvm_release_asserts:Enable uvm asserts included in release builds. (int)
+parm:           uvm_release_asserts_dump_stack:dump_stack() on failed UVM release asserts. (int)
+parm:           uvm_release_asserts_set_global_error:Set UVM global fatal error on failed release asserts. (int)
+$ systool -m nvidia_uvm -v
+Module = "nvidia_uvm"
+  Attributes:
+    coresize            = "2154496"
+    initsize            = "0"
+    initstate           = "live"
+    refcnt              = "4"
+    srcversion          = "B7E9DECF7BD1D315EBCCCF0"
+    taint               = "OE"
+    uevent              = <store method only>
+    version             = "580.126.09"
+  Parameters:
+    uvm_ats_mode        = "1"
+    uvm_block_cpu_to_cpu_copy_with_ce= "0"
+    uvm_channel_gpfifo_loc= "auto"
+    uvm_channel_gpput_loc= "auto"
+    uvm_channel_num_gpfifo_entries= "1024"
+    uvm_channel_pushbuffer_loc= "auto"
+    uvm_conf_computing_channel_iv_rotation_limit= "2147483648"
+    uvm_cpu_chunk_allocation_sizes= "2166784"
+    uvm_debug_enable_push_acquire_info= "0"
+    uvm_debug_enable_push_desc= "0"
+    uvm_debug_prints    = "0"
+    uvm_disable_hmm     = "Y"
+    uvm_downgrade_force_membar_sys= "1"
+    uvm_enable_builtin_tests= "0"
+    uvm_enable_debug_procfs= "0"
+    uvm_enable_va_space_mm= "1"
+    uvm_exp_gpu_cache_peermem= "0"
+    uvm_exp_gpu_cache_sysmem= "0"
+    uvm_fault_force_sysmem= "0"
+    uvm_force_prefetch_fault_support= "0"
+    uvm_global_oversubscription= "1"
+    uvm_leak_checker    = "0"
+    uvm_page_table_location= "(null)"
+    uvm_peer_copy       = "phys"
+    uvm_perf_access_counter_batch_count= "256"
+    uvm_perf_access_counter_migration_enable= "-1"
+    uvm_perf_access_counter_threshold= "256"
+    uvm_perf_fault_batch_count= "256"
+    uvm_perf_fault_coalesce= "1"
+    uvm_perf_fault_max_batches_per_service= "20"
+    uvm_perf_fault_max_throttle_per_service= "5"
+    uvm_perf_fault_replay_policy= "2"
+    uvm_perf_fault_replay_update_put_ratio= "50"
+    uvm_perf_map_remote_on_eviction= "1"
+    uvm_perf_map_remote_on_native_atomics_fault= "0"
+    uvm_perf_migrate_cpu_preunmap_block_order= "2"
+    uvm_perf_migrate_cpu_preunmap_enable= "1"
+    uvm_perf_pma_batch_nonpinned_order= "6"
+    uvm_perf_prefetch_enable= "1"
+    uvm_perf_prefetch_min_faults= "1"
+    uvm_perf_prefetch_threshold= "51"
+    uvm_perf_reenable_prefetch_faults_lapse_msec= "1000"
+    uvm_perf_thrashing_enable= "1"
+    uvm_perf_thrashing_epoch= "2000"
+    uvm_perf_thrashing_lapse_usec= "500"
+    uvm_perf_thrashing_max_resets= "4"
+    uvm_perf_thrashing_nap= "1"
+    uvm_perf_thrashing_pin= "300"
+    uvm_perf_thrashing_pin_threshold= "10"
+    uvm_perf_thrashing_threshold= "3"
+    uvm_release_asserts = "1"
+    uvm_release_asserts_dump_stack= "0"
+    uvm_release_asserts_set_global_error= "0"
+</code>
+Le plantage de la RTX 5060 Ti arrive plus tard si ''options nvidia_uvm uvm_disable_hmm=1''.
 ==== Séries RTX ====
@@ Ligne 99: / Ligne 269: @@
 Ticket ouvert chez Nvidia : [[https://github.com/NVIDIA/open-gpu-kernel-modules/issues/974#issuecomment-3627087138|kgspBootstrap_GH100: GSP-FMC reported an error while attempting to boot GSP]]
+J'ai acheté un câble Thunderbolt certifié (50€) pour remplacer celui fourni avec l'eGPU. **On dirait que ça fonctionne mieux, mais ça plante facilement** ''kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from pRmApi->Control() ... NVRM: nvGpuOpsReportFatalError: uvm encountered global fatal error 0x60, requiring os reboot to recover ...''
 === nvidia-kkms-565 ===
@@ Ligne 190: / Ligne 360: @@
 === Avec le bridge Wikingoo WKGL17-C50 ===
-Avec certains modèles ya "[[/informatique/ai_coding/gpu_bench|CUDA Error]]" et dans les logs ya :
+Avec certains modèles ya "[[informatique:ai_lm:gpu_bench|CUDA Error]]" et dans les logs ya :
 <code>