{"id":2360,"date":"2026-06-06T18:53:13","date_gmt":"2026-06-06T16:53:13","guid":{"rendered":"https:\/\/askem.eu\/?p=2360"},"modified":"2026-06-06T18:53:17","modified_gmt":"2026-06-06T16:53:17","slug":"llama-cpp-et-la-quantification-gguf","status":"publish","type":"post","link":"https:\/\/askem.eu\/en\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/","title":{"rendered":"llama.cpp et la quantification GGUF"},"content":{"rendered":"<h2 class=\"wp-block-heading\">llama.cpp et la quantification GGUF&nbsp;: faire tourner des LLM capables sur du mat\u00e9riel modeste<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Ollama, vLLM ou LM Studio reposent tous, directement ou indirectement, sur le m\u00eame moteur d&rsquo;inf\u00e9rence&nbsp;: <strong>llama.cpp<\/strong>. Comprendre ce moteur et son format de poids <strong>GGUF<\/strong>, c&rsquo;est sortir de la bo\u00eete noire et reprendre le contr\u00f4le sur le couple pr\u00e9cision\/co\u00fbt. C&rsquo;est aussi la cl\u00e9 pour h\u00e9berger un mod\u00e8le de qualit\u00e9 sur un serveur sans GPU d\u00e9di\u00e9, ou sur une carte grand public.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Le probl\u00e8me&nbsp;: un LLM en pleine pr\u00e9cision ne tient pas sur votre machine<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Un mod\u00e8le de 8 milliards de param\u00e8tres en pr\u00e9cision native (FP16) p\u00e8se environ 16 Go, et un mod\u00e8le de 70 milliards plus de 140 Go. Impossible \u00e0 charger sur la majorit\u00e9 des serveurs auto-h\u00e9berg\u00e9s. La <strong>quantification<\/strong> r\u00e9pond \u00e0 ce probl\u00e8me&nbsp;: elle r\u00e9duit le nombre de bits utilis\u00e9s pour stocker chaque poids du r\u00e9seau, en acceptant une perte de pr\u00e9cision contr\u00f4l\u00e9e. On passe par exemple de 16 bits \u00e0 4 bits par poids, soit une division par quatre de l&#8217;empreinte m\u00e9moire.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">GGUF&nbsp;: le format de poids qui a tout simplifi\u00e9<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">GGUF (GGML Universal Format) est le format de fichier unique introduit par le projet llama.cpp. Un fichier <code>.gguf<\/code> contient tout ce qu&rsquo;il faut pour charger et ex\u00e9cuter le mod\u00e8le&nbsp;: les poids quantifi\u00e9s, le vocabulaire du tokenizer, le gabarit de prompt (chat template) et les m\u00e9tadonn\u00e9es. Ses avantages concrets sont les suivants&nbsp;:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Un seul fichier<\/strong> \u00e0 t\u00e9l\u00e9charger et \u00e0 d\u00e9placer, sans arborescence complexe.<\/li>\n\n\n\n<li><strong>Chargement m\u00e9moire-mapp\u00e9<\/strong> (mmap)&nbsp;: le mod\u00e8le n&rsquo;est pas copi\u00e9 enti\u00e8rement en RAM au d\u00e9marrage, ce qui acc\u00e9l\u00e8re le lancement.<\/li>\n\n\n\n<li><strong>Ex\u00e9cution mixte CPU\/GPU<\/strong>&nbsp;: on peut d\u00e9charger une partie des couches sur le GPU et laisser le reste sur le CPU, utile quand la VRAM est limit\u00e9e.<\/li>\n\n\n\n<li><strong>Portabilit\u00e9 totale<\/strong>&nbsp;: le m\u00eame fichier fonctionne sur Linux, macOS (Metal), Windows, ARM et x86.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Choisir le bon niveau de quantification<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">C&rsquo;est la d\u00e9cision la plus importante. Les quantifications dites \u00ab\u00a0K-quants\u00a0\u00bb (suffixe <code>_K<\/code>) sont aujourd&rsquo;hui la r\u00e9f\u00e9rence&nbsp;: elles r\u00e9partissent intelligemment la pr\u00e9cision entre les couches sensibles et les couches tol\u00e9rantes. Voici un rep\u00e8re pratique&nbsp;:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th class=\"has-text-align-left\" data-align=\"left\">Quantification<\/th><th class=\"has-text-align-left\" data-align=\"left\">Bits \/ poids<\/th><th class=\"has-text-align-left\" data-align=\"left\">Usage recommand\u00e9<\/th><\/tr><\/thead><tbody><tr><td><code>Q8_0<\/code><\/td><td>8 bits<\/td><td>Qualit\u00e9 quasi identique au FP16, si la m\u00e9moire le permet<\/td><\/tr><tr><td><code>Q6_K<\/code><\/td><td>~6,5 bits<\/td><td>Excellent compromis, perte n\u00e9gligeable<\/td><\/tr><tr><td><code>Q5_K_M<\/code><\/td><td>~5,5 bits<\/td><td>Tr\u00e8s bon \u00e9quilibre qualit\u00e9\/taille<\/td><\/tr><tr><td><code>Q4_K_M<\/code><\/td><td>~4,5 bits<\/td><td>Le choix par d\u00e9faut conseill\u00e9&nbsp;: meilleur rapport qualit\u00e9\/m\u00e9moire<\/td><\/tr><tr><td><code>Q3_K_M \/ Q2_K<\/code><\/td><td>2 \u00e0 3 bits<\/td><td>Dernier recours quand la m\u00e9moire est tr\u00e8s contrainte, qualit\u00e9 d\u00e9grad\u00e9e<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">R\u00e8gle simple&nbsp;: on descend en pr\u00e9cision uniquement quand le mod\u00e8le ne tient pas dans la m\u00e9moire disponible. En pratique, un <code>Q4_K_M<\/code> est le point de d\u00e9part raisonnable pour la plupart des usages.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">La matrice d&rsquo;importance (imatrix), pour aller plus loin<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Les quantifications les plus r\u00e9centes peuvent \u00eatre calibr\u00e9es avec une <strong>matrice d&rsquo;importance<\/strong> (imatrix)&nbsp;: on fait passer un \u00e9chantillon de texte repr\u00e9sentatif dans le mod\u00e8le pour mesurer quels poids comptent le plus, puis on prot\u00e8ge ces poids lors de la quantification. \u00c0 taille de fichier \u00e9gale, un mod\u00e8le quantifi\u00e9 avec imatrix conserve nettement mieux ses capacit\u00e9s, en particulier aux basses pr\u00e9cisions (2 \u00e0 3 bits). De nombreux mod\u00e8les sont d\u00e9j\u00e0 publi\u00e9s en versions <code>IQ<\/code> calibr\u00e9es sur Hugging Face.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lancer un serveur compatible OpenAI en quelques minutes<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">llama.cpp embarque un serveur HTTP qui expose une API compatible avec celle d&rsquo;OpenAI. Il s&rsquo;int\u00e8gre donc directement avec LiteLLM, Open WebUI, LangGraph ou n&rsquo;importe quel client existant&nbsp;:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># R\u00e9cup\u00e9rer et compiler avec acc\u00e9l\u00e9ration GPU (CUDA)\ngit clone https:\/\/github.com\/ggml-org\/llama.cpp\ncd llama.cpp\ncmake -B build -DGGML_CUDA=ON\ncmake --build build --config Release\n\n# Lancer le serveur, en d\u00e9chargeant 35 couches sur le GPU\n.\/build\/bin\/llama-server \\\n  -m modele-Q4_K_M.gguf \\\n  --n-gpu-layers 35 \\\n  --ctx-size 8192 \\\n  --host 0.0.0.0 --port 8080<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Le param\u00e8tre <code>--n-gpu-layers<\/code> est le levier central&nbsp;: il fixe combien de couches du r\u00e9seau sont calcul\u00e9es sur le GPU. Le reste tourne sur le CPU. On l&rsquo;ajuste jusqu&rsquo;\u00e0 saturer la VRAM disponible sans la d\u00e9passer, pour obtenir le meilleur d\u00e9bit possible.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">O\u00f9 placer llama.cpp dans une stack open source<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">llama.cpp est le socle bas niveau de l&rsquo;inf\u00e9rence. Pour un poste de travail ou un petit serveur polyvalent, on l&rsquo;utilise via <strong>Ollama<\/strong>, qui en simplifie la gestion. Pour un service \u00e0 fort d\u00e9bit avec batching et beaucoup d&rsquo;utilisateurs simultan\u00e9s, on lui pr\u00e9f\u00e8re <strong>vLLM<\/strong> ou <strong>SGLang<\/strong>, con\u00e7us pour le GPU. llama.cpp garde un avantage d\u00e9cisif dans deux cas&nbsp;: l&rsquo;ex\u00e9cution sur CPU ou mat\u00e9riel h\u00e9t\u00e9rog\u00e8ne, et le d\u00e9ploiement sur des machines aux ressources tr\u00e8s limit\u00e9es (Raspberry Pi, mini-PC, serveurs sans GPU).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u00c0 retenir<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">La quantification GGUF n&rsquo;est pas un compromis subi, c&rsquo;est un r\u00e9glage ma\u00eetrisable. En choisissant la bonne pr\u00e9cision pour votre budget m\u00e9moire, vous faites tourner un mod\u00e8le r\u00e9ellement utile l\u00e0 o\u00f9 la pleine pr\u00e9cision serait hors de port\u00e9e. Le couple llama.cpp et GGUF reste, en 2026, la voie la plus robuste et la plus portable pour l&rsquo;inf\u00e9rence LLM auto-h\u00e9berg\u00e9e et souveraine.<\/p>","protected":false},"excerpt":{"rendered":"<p>llama.cpp et la quantification GGUF&nbsp;: faire tourner des LLM capables sur du mat\u00e9riel modeste Ollama, vLLM ou LM Studio reposent tous, directement ou indirectement, sur le m\u00eame moteur d&rsquo;inf\u00e9rence&nbsp;: llama.cpp. Comprendre ce moteur et son format de poids GGUF, c&rsquo;est sortir de la bo\u00eete noire et reprendre le contr\u00f4le sur le couple pr\u00e9cision\/co\u00fbt. C&rsquo;est aussi [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2361,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ocean_post_layout":"","ocean_both_sidebars_style":"","ocean_both_sidebars_content_width":0,"ocean_both_sidebars_sidebars_width":0,"ocean_sidebar":"","ocean_second_sidebar":"","ocean_disable_margins":"enable","ocean_add_body_class":"","ocean_shortcode_before_top_bar":"","ocean_shortcode_after_top_bar":"","ocean_shortcode_before_header":"","ocean_shortcode_after_header":"","ocean_has_shortcode":"","ocean_shortcode_after_title":"","ocean_shortcode_before_footer_widgets":"","ocean_shortcode_after_footer_widgets":"","ocean_shortcode_before_footer_bottom":"","ocean_shortcode_after_footer_bottom":"","ocean_display_top_bar":"default","ocean_display_header":"default","ocean_header_style":"","ocean_center_header_left_menu":"","ocean_custom_header_template":"","ocean_custom_logo":0,"ocean_custom_retina_logo":0,"ocean_custom_logo_max_width":0,"ocean_custom_logo_tablet_max_width":0,"ocean_custom_logo_mobile_max_width":0,"ocean_custom_logo_max_height":0,"ocean_custom_logo_tablet_max_height":0,"ocean_custom_logo_mobile_max_height":0,"ocean_header_custom_menu":"","ocean_menu_typo_font_family":"","ocean_menu_typo_font_subset":"","ocean_menu_typo_font_size":0,"ocean_menu_typo_font_size_tablet":0,"ocean_menu_typo_font_size_mobile":0,"ocean_menu_typo_font_size_unit":"px","ocean_menu_typo_font_weight":"","ocean_menu_typo_font_weight_tablet":"","ocean_menu_typo_font_weight_mobile":"","ocean_menu_typo_transform":"","ocean_menu_typo_transform_tablet":"","ocean_menu_typo_transform_mobile":"","ocean_menu_typo_line_height":0,"ocean_menu_typo_line_height_tablet":0,"ocean_menu_typo_line_height_mobile":0,"ocean_menu_typo_line_height_unit":"","ocean_menu_typo_spacing":0,"ocean_menu_typo_spacing_tablet":0,"ocean_menu_typo_spacing_mobile":0,"ocean_menu_typo_spacing_unit":"","ocean_menu_link_color":"","ocean_menu_link_color_hover":"","ocean_menu_link_color_active":"","ocean_menu_link_background":"","ocean_menu_link_hover_background":"","ocean_menu_link_active_background":"","ocean_menu_social_links_bg":"","ocean_menu_social_hover_links_bg":"","ocean_menu_social_links_color":"","ocean_menu_social_hover_links_color":"","ocean_disable_title":"default","ocean_disable_heading":"default","ocean_post_title":"","ocean_post_subheading":"","ocean_post_title_style":"","ocean_post_title_background_color":"","ocean_post_title_background":0,"ocean_post_title_bg_image_position":"","ocean_post_title_bg_image_attachment":"","ocean_post_title_bg_image_repeat":"","ocean_post_title_bg_image_size":"","ocean_post_title_height":0,"ocean_post_title_bg_overlay":0.5,"ocean_post_title_bg_overlay_color":"","ocean_disable_breadcrumbs":"default","ocean_breadcrumbs_color":"","ocean_breadcrumbs_separator_color":"","ocean_breadcrumbs_links_color":"","ocean_breadcrumbs_links_hover_color":"","ocean_display_footer_widgets":"default","ocean_display_footer_bottom":"default","ocean_custom_footer_template":"","osh_disable_topbar_sticky":"default","osh_disable_header_sticky":"default","osh_sticky_header_style":"default","osh_sticky_header_effect":"","osh_custom_sticky_logo":0,"osh_custom_retina_sticky_logo":0,"osh_custom_sticky_logo_height":0,"osh_background_color":"","osh_links_color":"","osh_links_hover_color":"","osh_links_active_color":"","osh_links_bg_color":"","osh_links_hover_bg_color":"","osh_links_active_bg_color":"","osh_menu_social_links_color":"","osh_menu_social_hover_links_color":"","ocean_post_oembed":"","ocean_post_self_hosted_media":"","ocean_post_video_embed":"","ocean_link_format":"","ocean_link_format_target":"self","ocean_quote_format":"","ocean_quote_format_link":"post","ocean_gallery_link_images":"on","ocean_gallery_id":[],"footnotes":""},"categories":[16],"tags":[],"class_list":["post-2360","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","entry","has-media"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>llama.cpp et la quantification GGUF - askem<\/title>\n<meta name=\"description\" content=\"ASKEM BUREAU D&#039;\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/askem.eu\/en\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"llama.cpp et la quantification GGUF - askem\" \/>\n<meta property=\"og:description\" content=\"ASKEM BUREAU D&#039;\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/askem.eu\/en\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/\" \/>\n<meta property=\"og:site_name\" content=\"askem\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/fb.me\/askem.eu\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-06T16:53:13+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-06T16:53:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/06\/sujet-askem-2026-05-31.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1600\" \/>\n\t<meta property=\"og:image:height\" content=\"900\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"askemadmin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"askemadmin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/\"},\"author\":{\"name\":\"askemadmin\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#\\\/schema\\\/person\\\/8bbee74ab9a977d56bf4826662e9d2e9\"},\"headline\":\"llama.cpp et la quantification GGUF\",\"datePublished\":\"2026-06-06T16:53:13+00:00\",\"dateModified\":\"2026-06-06T16:53:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/\"},\"wordCount\":851,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2026\\/06\\/sujet-askem-2026-05-31.png\",\"articleSection\":[\"AI\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/\",\"url\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/\",\"name\":\"llama.cpp et la quantification GGUF - askem\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2026\\/06\\/sujet-askem-2026-05-31.png\",\"datePublished\":\"2026-06-06T16:53:13+00:00\",\"dateModified\":\"2026-06-06T16:53:17+00:00\",\"description\":\"ASKEM BUREAU D'\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/#primaryimage\",\"url\":\"https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2026\\/06\\/sujet-askem-2026-05-31.png\",\"contentUrl\":\"https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2026\\/06\\/sujet-askem-2026-05-31.png\",\"width\":1600,\"height\":900},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/06\\\/06\\\/llama-cpp-et-la-quantification-gguf\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/askem.eu\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"llama.cpp et la quantification GGUF\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#website\",\"url\":\"https:\\\/\\\/askem.eu\\\/\",\"name\":\"askem\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/askem.eu\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#organization\",\"name\":\"Askem\",\"url\":\"https:\\\/\\\/askem.eu\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\/\\/mlpi0fxo3sth.i.optimole.com\\/cb:3obA.c61\\/w:760\\/h:480\\/q:mauto\\/f:best\\/https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2020\\/10\\/logoGalaxieAskem3.png\",\"contentUrl\":\"https:\\/\\/mlpi0fxo3sth.i.optimole.com\\/cb:3obA.c61\\/w:760\\/h:480\\/q:mauto\\/f:best\\/https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2020\\/10\\/logoGalaxieAskem3.png\",\"width\":760,\"height\":480,\"caption\":\"Askem\"},\"image\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/fb.me\\\/askem.eu\",\"https:\\\/\\\/linkedin.com\\\/company\\\/askem-eu\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#\\\/schema\\\/person\\\/8bbee74ab9a977d56bf4826662e9d2e9\",\"name\":\"askemadmin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g\",\"caption\":\"askemadmin\"},\"sameAs\":[\"https:\\\/\\\/askem.eu\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"llama.cpp et la quantification GGUF - askem","description":"ASKEM BUREAU D'\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/askem.eu\/en\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/","og_locale":"en_US","og_type":"article","og_title":"llama.cpp et la quantification GGUF - askem","og_description":"ASKEM BUREAU D'\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.","og_url":"https:\/\/askem.eu\/en\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/","og_site_name":"askem","article_publisher":"https:\/\/fb.me\/askem.eu","article_published_time":"2026-06-06T16:53:13+00:00","article_modified_time":"2026-06-06T16:53:17+00:00","og_image":[{"width":1600,"height":900,"url":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/06\/sujet-askem-2026-05-31.png","type":"image\/png"}],"author":"askemadmin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"askemadmin","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/#article","isPartOf":{"@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/"},"author":{"name":"askemadmin","@id":"https:\/\/askem.eu\/#\/schema\/person\/8bbee74ab9a977d56bf4826662e9d2e9"},"headline":"llama.cpp et la quantification GGUF","datePublished":"2026-06-06T16:53:13+00:00","dateModified":"2026-06-06T16:53:17+00:00","mainEntityOfPage":{"@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/"},"wordCount":851,"commentCount":0,"publisher":{"@id":"https:\/\/askem.eu\/#organization"},"image":{"@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/#primaryimage"},"thumbnailUrl":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/06\/sujet-askem-2026-05-31.png","articleSection":["AI"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/","url":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/","name":"llama.cpp et la quantification GGUF - askem","isPartOf":{"@id":"https:\/\/askem.eu\/#website"},"primaryImageOfPage":{"@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/#primaryimage"},"image":{"@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/#primaryimage"},"thumbnailUrl":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/06\/sujet-askem-2026-05-31.png","datePublished":"2026-06-06T16:53:13+00:00","dateModified":"2026-06-06T16:53:17+00:00","description":"ASKEM BUREAU D'\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.","breadcrumb":{"@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/#primaryimage","url":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/06\/sujet-askem-2026-05-31.png","contentUrl":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/06\/sujet-askem-2026-05-31.png","width":1600,"height":900},{"@type":"BreadcrumbList","@id":"https:\/\/askem.eu\/2026\/06\/06\/llama-cpp-et-la-quantification-gguf\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/askem.eu\/"},{"@type":"ListItem","position":2,"name":"llama.cpp et la quantification GGUF"}]},{"@type":"WebSite","@id":"https:\/\/askem.eu\/#website","url":"https:\/\/askem.eu\/","name":"askem","description":"","publisher":{"@id":"https:\/\/askem.eu\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/askem.eu\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/askem.eu\/#organization","name":"Askem","url":"https:\/\/askem.eu\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/askem.eu\/#\/schema\/logo\/image\/","url":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:760\/h:480\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2020\/10\/logoGalaxieAskem3.png","contentUrl":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:760\/h:480\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2020\/10\/logoGalaxieAskem3.png","width":760,"height":480,"caption":"Askem"},"image":{"@id":"https:\/\/askem.eu\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/fb.me\/askem.eu","https:\/\/linkedin.com\/company\/askem-eu"]},{"@type":"Person","@id":"https:\/\/askem.eu\/#\/schema\/person\/8bbee74ab9a977d56bf4826662e9d2e9","name":"askemadmin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g","caption":"askemadmin"},"sameAs":["https:\/\/askem.eu"]}]}},"_links":{"self":[{"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/posts\/2360","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/comments?post=2360"}],"version-history":[{"count":1,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/posts\/2360\/revisions"}],"predecessor-version":[{"id":2362,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/posts\/2360\/revisions\/2362"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/media\/2361"}],"wp:attachment":[{"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/media?parent=2360"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/categories?post=2360"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/tags?post=2360"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}