informatique:ai_lm:ai_nlp
Différences
Ci-dessous, les différences entre deux révisions de la page.
| Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente | ||
| informatique:ai_lm:ai_nlp [13/01/2026 13:19] – [Glossaire] cyrille | informatique:ai_lm:ai_nlp [18/01/2026 10:16] (Version actuelle) – [Wikidata] cyrille | ||
|---|---|---|---|
| Ligne 22: | Ligne 22: | ||
| * Cross Encoder (a.k.a reranker): Calculates a similarity score given pairs of texts. Generally provides superior performance compared to a Sentence Transformer (a.k.a. bi-encoder) model. | * Cross Encoder (a.k.a reranker): Calculates a similarity score given pairs of texts. Generally provides superior performance compared to a Sentence Transformer (a.k.a. bi-encoder) model. | ||
| * Sparse Encoder : sparse vector representations is a list of '' | * Sparse Encoder : sparse vector representations is a list of '' | ||
| + | * RAG (Retrieval-Augmented Generation): | ||
| ===== Models embedding ===== | ===== Models embedding ===== | ||
| Ligne 33: | Ligne 33: | ||
| used to compute embeddings using Sentence Transformer models ([[https:// | used to compute embeddings using Sentence Transformer models ([[https:// | ||
| + | ===== Vectors databases ===== | ||
| + | |||
| + | {{ : | ||
| + | * FAISS Facebook AI Similarity Search, optimisé pour la recherche de similarité | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | |||
| + | Solutions plus évoluées en SaaS | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | * Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database. | ||
| + | * https:// | ||
| + | * https:// | ||
| + | |||
| + | ==== ChromaDB ==== | ||
| + | |||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | |||
| + | Client Api | ||
| + | * Php https:// | ||
| + | |||
| + | ==== Wikidata ==== | ||
| + | |||
| + | Utiliser 2 méthodes différentes pour | ||
| + | * Pour extraire les labels, aliases et déclarations (claims) | ||
| + | * Pour extraire le graph des P31/P279 | ||
| + | permet d' | ||
| + | |||
| + | === Wikidata Dumps === | ||
| + | |||
| + | Il y a des dumps Wikidata (préférer un mirroir pour être sympa). | ||
| + | |||
| + | Dump Json, streamable (GZ) : | ||
| + | * https:// | ||
| + | * 151 Go, plus de '' | ||
| + | |||
| + | Dump RDF N-Triples (brut), streamable (GZ) : | ||
| + | * https:// | ||
| + | * 246 Go | ||
| + | |||
| + | Dump RDF N-Triples (brut), streamable (GZ) ET nettoyé des '' | ||
| + | * https:// | ||
| + | * 69.6 Go 👌 pour '' | ||
| + | |||
| + | Lectures: | ||
| + | * PDF [[https:// | ||
| + | |||
| + | Query services: | ||
| + | * Original https:// | ||
| + | * The graph was split in two some time ago. The scholarly articles must be queried on https:// | ||
| + | * QLever démo https:// | ||
informatique/ai_lm/ai_nlp.1768306762.txt.gz · Dernière modification : de cyrille
