{"id":2216,"date":"2026-04-10T08:23:17","date_gmt":"2026-04-10T06:23:17","guid":{"rendered":"https:\/\/askem.eu\/?p=2216"},"modified":"2026-04-10T08:23:23","modified_gmt":"2026-04-10T06:23:23","slug":"docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag","status":"publish","type":"post","link":"https:\/\/askem.eu\/en\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/","title":{"rendered":"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG"},"content":{"rendered":"<h2 class=\"wp-block-heading\">Docling&nbsp;: convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG<\/h2>\n\n\n\n<p>Vous avez d\u00e9ploy\u00e9 un pipeline <a href=\"https:\/\/askem.eu\/en\/2026\/03\/14\/construire-un-pipeline-rag-pour-exploiter-les-donnees-ouvertes-avec-un-llm\/\" type=\"post\" id=\"2085\">RAG<\/a> avec <a href=\"https:\/\/askem.eu\/en\/2026\/04\/01\/qdrant-base-vectorielle-open-source-pour-le-rag-et-la-recherche-semantique\/\" type=\"post\" id=\"2159\">Qdrant<\/a> et <a href=\"https:\/\/askem.eu\/en\/2026\/03\/29\/ollama-executer-des-llm-en-local\/\" type=\"post\" id=\"2141\">Ollama<\/a>. Vos documents texte simples sont bien index\u00e9s, les r\u00e9ponses sont pertinentes. Puis arrive un rapport PDF de 200 pages avec des tableaux complexes, des colonnes multiples, des en-t\u00eates imbriqu\u00e9s et des images contenant du texte. Le pipeline s&rsquo;effondre&nbsp;: le chunking na\u00eff d\u00e9coupe les tableaux en fragments incoh\u00e9rents, la mise en page est perdue, et les r\u00e9ponses deviennent approximatives. C&rsquo;est exactement le probl\u00e8me que r\u00e9sout <strong><a href=\"https:\/\/github.com\/docling-project\/docling\">Docling<\/a><\/strong>, une biblioth\u00e8que open source d\u00e9velopp\u00e9e par IBM Research (licence MIT) pour transformer des documents complexes en donn\u00e9es structur\u00e9es exploitables par des LLM.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Le probl\u00e8me&nbsp;: l&rsquo;extraction de contenu documentaire est un faux ami<\/h2>\n\n\n\n<p>Les outils classiques d&rsquo;extraction de texte comme PyPDF, pdfminer, python-docx, extraient du texte brut. Ils perdent la structure s\u00e9mantique du document&nbsp;: la hi\u00e9rarchie des titres, les relations entre cellules d&rsquo;un tableau, la distinction entre l\u00e9gende et corps de texte, l&rsquo;ordre de lecture dans une mise en page multi-colonnes. Or, pour un pipeline RAG, cette structure est essentielle. Un chunk qui m\u00e9lange deux colonnes d&rsquo;un tableau ou qui s\u00e9pare un titre de son paragraphe produit des embeddings de mauvaise qualit\u00e9 et des r\u00e9ponses fausses.<\/p>\n\n\n\n<p>Docling aborde le probl\u00e8me diff\u00e9remment. Au lieu d&rsquo;extraire du texte, il <em>comprend<\/em> la mise en page du document gr\u00e2ce \u00e0 des mod\u00e8les de deep learning sp\u00e9cialis\u00e9s, puis reconstruit une repr\u00e9sentation structur\u00e9e fid\u00e8le au document original.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ce que Docling fait concr\u00e8tement<\/h2>\n\n\n\n<p>Docling prend en entr\u00e9e des fichiers PDF, DOCX, PPTX, XLSX, HTML et des images, puis produit en sortie un format structur\u00e9 (Markdown, JSON ou DoclingDocument) qui pr\u00e9serve la s\u00e9mantique du contenu. Voici ses capacit\u00e9s principales&nbsp;:<\/p>\n\n\n\n<p><strong>Analyse de mise en page par deep learning<\/strong>&nbsp;: Un mod\u00e8le de d\u00e9tection d&rsquo;objets (bas\u00e9 sur RT-DETR) identifie les zones du document&nbsp;: titres, paragraphes, tableaux, figures, l\u00e9gendes, en-t\u00eates, pieds de page. Chaque zone est classifi\u00e9e et ordonn\u00e9e selon le flux de lecture r\u00e9el.<\/p>\n\n\n\n<p><strong>Extraction de tableaux<\/strong>&nbsp;: Un mod\u00e8le d\u00e9di\u00e9 (TableFormer) reconstruit la structure des tableaux&nbsp;: cellules fusionn\u00e9es, en-t\u00eates multi-niveaux, colonnes imbriqu\u00e9es. Le r\u00e9sultat est un tableau structur\u00e9, pas du texte lin\u00e9aris\u00e9.<\/p>\n\n\n\n<p><strong>OCR int\u00e9gr\u00e9<\/strong>&nbsp;: Pour les PDF scann\u00e9s ou les images, Docling int\u00e8gre EasyOCR ou Tesseract pour extraire le texte des zones identifi\u00e9es par le mod\u00e8le de layout. L&rsquo;OCR est cibl\u00e9 sur les zones pertinentes, pas appliqu\u00e9 aveugl\u00e9ment sur toute la page.<\/p>\n\n\n\n<p><strong>Export multi-format<\/strong>&nbsp;: Le document analys\u00e9 peut \u00eatre export\u00e9 en Markdown (id\u00e9al pour le chunking RAG), en JSON structur\u00e9 (pour un traitement programmatique), ou manipul\u00e9 via l&rsquo;API Python DoclingDocument.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">D\u00e9ployer Docling avec Docker<\/h2>\n\n\n\n<p>Docling s&rsquo;installe via pip ou se conteneurise facilement. Voici un d\u00e9ploiement Docker minimal qui expose une API de conversion&nbsp;:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Dockerfile\nFROM python:3.11-slim\n\nRUN pip install --no-cache-dir docling\n\nWORKDIR \/app\nCOPY server.py .\n\nEXPOSE 8000\nCMD &#91;\"python\", \"server.py\"]<\/code><\/pre>\n\n\n\n<p>Et le script serveur minimaliste avec FastAPI&nbsp;:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># server.py\nfrom fastapi import FastAPI, UploadFile\nfrom docling.document_converter import DocumentConverter\nimport tempfile, os, uvicorn\n\napp = FastAPI()\nconverter = DocumentConverter()\n\n@app.post(\"\/convert\")\nasync def convert(file: UploadFile):\n    with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.filename)&#91;1]) as tmp:\n        tmp.write(await file.read())\n        tmp_path = tmp.name\n    try:\n        result = converter.convert(tmp_path)\n        return {\n            \"markdown\": result.document.export_to_markdown(),\n            \"metadata\": {\n                \"pages\": len(result.document.pages),\n                \"tables\": len(&#91;e for e in result.document.texts if hasattr(e, 'table')]),\n            }\n        }\n    finally:\n        os.unlink(tmp_path)<\/code><\/pre>\n\n\n\n<p>Avec Docker Compose, en compl\u00e9ment de la stack existante&nbsp;:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># docker-compose.yml (extrait)\nservices:\n  docling:\n    build: .\/docling\n    ports:\n      - \"8000:8000\"\n    volumes:\n      - .\/documents:\/documents\n    deploy:\n      resources:\n        limits:\n          memory: 4G<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Int\u00e9gration dans un pipeline RAG existant<\/h2>\n\n\n\n<p>L&rsquo;int\u00e9r\u00eat principal de Docling pour une stack comme celle d\u00e9j\u00e0 pr\u00e9sent\u00e9e sur ce site (Qdrant + Ollama + n8n + LangGraph) est de servir de premi\u00e8re \u00e9tape de traitement documentaire. Le flux devient&nbsp;:<\/p>\n\n\n\n<p>1. Un document PDF\/DOCX arrive (upload utilisateur, webhook Nextcloud, dossier surveill\u00e9 par n8n).<br>2. n8n envoie le fichier \u00e0 l&rsquo;API Docling pour conversion en Markdown structur\u00e9.<br>3. Le Markdown est d\u00e9coup\u00e9 en chunks intelligents (en respectant les fronti\u00e8res de sections et de tableaux).<br>4. Les chunks sont vectoris\u00e9s via un mod\u00e8le d&#8217;embedding local et stock\u00e9s dans Qdrant.<br>5. Un agent LangGraph ou Open WebUI interroge Qdrant avec le contexte structur\u00e9.<\/p>\n\n\n\n<p>Le gain est direct&nbsp;: les tableaux restent coh\u00e9rents dans les chunks, les titres de sections servent de m\u00e9tadonn\u00e9es pour le filtrage, et les figures\/l\u00e9gendes ne polluent pas les embeddings textuels.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Docling vs les alternatives<\/h2>\n\n\n\n<p><strong>Unstructured.io<\/strong> est l&rsquo;alternative la plus connue. La version open source offre des fonctionnalit\u00e9s similaires, mais la version compl\u00e8te (avec les meilleurs mod\u00e8les) est commerciale. Docling est enti\u00e8rement MIT, sans version payante ni fonctionnalit\u00e9s verrouill\u00e9es.<\/p>\n\n\n\n<p><strong>LlamaParse<\/strong> (LlamaIndex) propose un service cloud de parsing documentaire. Performant, mais d\u00e9pendant d&rsquo;une API externe,  incompatible avec une approche 100&nbsp;% auto-h\u00e9berg\u00e9e.<\/p>\n\n\n\n<p><strong>Apache Tika<\/strong> extrait du texte brut de nombreux formats, mais sans compr\u00e9hension de la mise en page. Les tableaux sont lin\u00e9aris\u00e9s, la structure hi\u00e9rarchique est perdue.<\/p>\n\n\n\n<p>Docling se distingue par sa combinaison de mod\u00e8les de deep learning pour le layout, sa licence MIT sans restriction, et son int\u00e9gration native avec l&rsquo;\u00e9cosyst\u00e8me Python\/LangChain\/LlamaIndex.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Utilisation en ligne de commande<\/h2>\n\n\n\n<p>Pour un usage ponctuel ou un traitement par lot, Docling fournit une CLI&nbsp;:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Convertir un PDF en Markdown\ndocling convert rapport-annuel.pdf --output rapport-annuel.md\n\n# Convertir un dossier entier en JSON structur\u00e9\ndocling convert .\/documents\/ --output .\/structured\/ --format json\n\n# Activer l'OCR pour les PDF scann\u00e9s\ndocling convert scan.pdf --ocr --output scan.md<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Points d&rsquo;attention<\/h2>\n\n\n\n<p><strong>Ressources<\/strong>&nbsp;: Les mod\u00e8les de layout et de tableaux consomment de la m\u00e9moire. Compter 2 \u00e0 4 Go de RAM pour le traitement courant. Un GPU n&rsquo;est pas requis mais acc\u00e9l\u00e8re significativement le traitement des gros volumes.<\/p>\n\n\n\n<p><strong>Qualit\u00e9 de l&rsquo;OCR<\/strong>&nbsp;: Sur les documents scann\u00e9s de mauvaise qualit\u00e9 (fax, photocopies), la cha\u00eene layout + OCR reste limit\u00e9e. Un pr\u00e9-traitement d&rsquo;image (deskew, binarisation) peut am\u00e9liorer les r\u00e9sultats.<\/p>\n\n\n\n<p><strong>Formats support\u00e9s<\/strong>&nbsp;: PDF et DOCX sont les formats les mieux support\u00e9s. Le support PPTX et XLSX est fonctionnel mais moins mature. Pour les fichiers HTML, un parsing direct est souvent plus simple.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">En r\u00e9sum\u00e9<\/h2>\n\n\n\n<p>Docling comble un cha\u00eenon manquant dans la stack IA auto-h\u00e9berg\u00e9e&nbsp;: la conversion intelligente de documents complexes en donn\u00e9es structur\u00e9es exploitables par un pipeline RAG. En combinant des mod\u00e8les de deep learning pour la compr\u00e9hension de mise en page, l&rsquo;extraction de tableaux et l&rsquo;OCR cibl\u00e9, le tout sous licence MIT, il s&rsquo;int\u00e8gre naturellement aux c\u00f4t\u00e9s d&rsquo;Ollama, Qdrant, n8n et LangGraph pour construire une cha\u00eene documentaire compl\u00e8te, du PDF brut \u00e0 la r\u00e9ponse contextualis\u00e9e.<\/p>","protected":false},"excerpt":{"rendered":"<p>Docling&nbsp;: convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG Vous avez d\u00e9ploy\u00e9 un pipeline RAG avec Qdrant et Ollama. Vos documents texte simples sont bien index\u00e9s, les r\u00e9ponses sont pertinentes. Puis arrive un rapport PDF de 200 pages avec des tableaux complexes, des colonnes multiples, des en-t\u00eates imbriqu\u00e9s et des images [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2217,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ocean_post_layout":"","ocean_both_sidebars_style":"","ocean_both_sidebars_content_width":0,"ocean_both_sidebars_sidebars_width":0,"ocean_sidebar":"","ocean_second_sidebar":"","ocean_disable_margins":"enable","ocean_add_body_class":"","ocean_shortcode_before_top_bar":"","ocean_shortcode_after_top_bar":"","ocean_shortcode_before_header":"","ocean_shortcode_after_header":"","ocean_has_shortcode":"","ocean_shortcode_after_title":"","ocean_shortcode_before_footer_widgets":"","ocean_shortcode_after_footer_widgets":"","ocean_shortcode_before_footer_bottom":"","ocean_shortcode_after_footer_bottom":"","ocean_display_top_bar":"default","ocean_display_header":"default","ocean_header_style":"","ocean_center_header_left_menu":"","ocean_custom_header_template":"","ocean_custom_logo":0,"ocean_custom_retina_logo":0,"ocean_custom_logo_max_width":0,"ocean_custom_logo_tablet_max_width":0,"ocean_custom_logo_mobile_max_width":0,"ocean_custom_logo_max_height":0,"ocean_custom_logo_tablet_max_height":0,"ocean_custom_logo_mobile_max_height":0,"ocean_header_custom_menu":"","ocean_menu_typo_font_family":"","ocean_menu_typo_font_subset":"","ocean_menu_typo_font_size":0,"ocean_menu_typo_font_size_tablet":0,"ocean_menu_typo_font_size_mobile":0,"ocean_menu_typo_font_size_unit":"px","ocean_menu_typo_font_weight":"","ocean_menu_typo_font_weight_tablet":"","ocean_menu_typo_font_weight_mobile":"","ocean_menu_typo_transform":"","ocean_menu_typo_transform_tablet":"","ocean_menu_typo_transform_mobile":"","ocean_menu_typo_line_height":0,"ocean_menu_typo_line_height_tablet":0,"ocean_menu_typo_line_height_mobile":0,"ocean_menu_typo_line_height_unit":"","ocean_menu_typo_spacing":0,"ocean_menu_typo_spacing_tablet":0,"ocean_menu_typo_spacing_mobile":0,"ocean_menu_typo_spacing_unit":"","ocean_menu_link_color":"","ocean_menu_link_color_hover":"","ocean_menu_link_color_active":"","ocean_menu_link_background":"","ocean_menu_link_hover_background":"","ocean_menu_link_active_background":"","ocean_menu_social_links_bg":"","ocean_menu_social_hover_links_bg":"","ocean_menu_social_links_color":"","ocean_menu_social_hover_links_color":"","ocean_disable_title":"default","ocean_disable_heading":"default","ocean_post_title":"","ocean_post_subheading":"","ocean_post_title_style":"","ocean_post_title_background_color":"","ocean_post_title_background":0,"ocean_post_title_bg_image_position":"","ocean_post_title_bg_image_attachment":"","ocean_post_title_bg_image_repeat":"","ocean_post_title_bg_image_size":"","ocean_post_title_height":0,"ocean_post_title_bg_overlay":0.5,"ocean_post_title_bg_overlay_color":"","ocean_disable_breadcrumbs":"default","ocean_breadcrumbs_color":"","ocean_breadcrumbs_separator_color":"","ocean_breadcrumbs_links_color":"","ocean_breadcrumbs_links_hover_color":"","ocean_display_footer_widgets":"default","ocean_display_footer_bottom":"default","ocean_custom_footer_template":"","osh_disable_topbar_sticky":"default","osh_disable_header_sticky":"default","osh_sticky_header_style":"default","osh_sticky_header_effect":"","osh_custom_sticky_logo":0,"osh_custom_retina_sticky_logo":0,"osh_custom_sticky_logo_height":0,"osh_background_color":"","osh_links_color":"","osh_links_hover_color":"","osh_links_active_color":"","osh_links_bg_color":"","osh_links_hover_bg_color":"","osh_links_active_bg_color":"","osh_menu_social_links_color":"","osh_menu_social_hover_links_color":"","ocean_post_oembed":"","ocean_post_self_hosted_media":"","ocean_post_video_embed":"","ocean_link_format":"","ocean_link_format_target":"self","ocean_quote_format":"","ocean_quote_format_link":"post","ocean_gallery_link_images":"on","ocean_gallery_id":[],"footnotes":""},"categories":[16],"tags":[],"class_list":["post-2216","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","entry","has-media"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG - askem<\/title>\n<meta name=\"description\" content=\"ASKEM BUREAU D&#039;\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/askem.eu\/en\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG - askem\" \/>\n<meta property=\"og:description\" content=\"ASKEM BUREAU D&#039;\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/askem.eu\/en\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/\" \/>\n<meta property=\"og:site_name\" content=\"askem\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/fb.me\/askem.eu\" \/>\n<meta property=\"article:published_time\" content=\"2026-04-10T06:23:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-10T06:23:23+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/04\/sujet-askem-2026-04-10.png\" \/>\n\t<meta property=\"og:image:width\" content=\"800\" \/>\n\t<meta property=\"og:image:height\" content=\"600\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"askemadmin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"askemadmin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/\"},\"author\":{\"name\":\"askemadmin\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#\\\/schema\\\/person\\\/8bbee74ab9a977d56bf4826662e9d2e9\"},\"headline\":\"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG\",\"datePublished\":\"2026-04-10T06:23:17+00:00\",\"dateModified\":\"2026-04-10T06:23:23+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/\"},\"wordCount\":1050,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2026\\/04\\/sujet-askem-2026-04-10.png\",\"articleSection\":[\"AI\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/\",\"url\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/\",\"name\":\"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG - askem\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2026\\/04\\/sujet-askem-2026-04-10.png\",\"datePublished\":\"2026-04-10T06:23:17+00:00\",\"dateModified\":\"2026-04-10T06:23:23+00:00\",\"description\":\"ASKEM BUREAU D'\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/#primaryimage\",\"url\":\"https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2026\\/04\\/sujet-askem-2026-04-10.png\",\"contentUrl\":\"https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2026\\/04\\/sujet-askem-2026-04-10.png\",\"width\":800,\"height\":600},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/askem.eu\\\/2026\\\/04\\\/10\\\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/askem.eu\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#website\",\"url\":\"https:\\\/\\\/askem.eu\\\/\",\"name\":\"askem\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/askem.eu\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#organization\",\"name\":\"Askem\",\"url\":\"https:\\\/\\\/askem.eu\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\/\\/mlpi0fxo3sth.i.optimole.com\\/cb:3obA.c61\\/w:760\\/h:480\\/q:mauto\\/f:best\\/https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2020\\/10\\/logoGalaxieAskem3.png\",\"contentUrl\":\"https:\\/\\/mlpi0fxo3sth.i.optimole.com\\/cb:3obA.c61\\/w:760\\/h:480\\/q:mauto\\/f:best\\/https:\\/\\/askem.eu\\/wp-content\\/uploads\\/2020\\/10\\/logoGalaxieAskem3.png\",\"width\":760,\"height\":480,\"caption\":\"Askem\"},\"image\":{\"@id\":\"https:\\\/\\\/askem.eu\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/fb.me\\\/askem.eu\",\"https:\\\/\\\/linkedin.com\\\/company\\\/askem-eu\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/askem.eu\\\/#\\\/schema\\\/person\\\/8bbee74ab9a977d56bf4826662e9d2e9\",\"name\":\"askemadmin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g\",\"caption\":\"askemadmin\"},\"sameAs\":[\"https:\\\/\\\/askem.eu\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG - askem","description":"ASKEM BUREAU D'\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/askem.eu\/en\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/","og_locale":"en_US","og_type":"article","og_title":"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG - askem","og_description":"ASKEM BUREAU D'\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.","og_url":"https:\/\/askem.eu\/en\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/","og_site_name":"askem","article_publisher":"https:\/\/fb.me\/askem.eu","article_published_time":"2026-04-10T06:23:17+00:00","article_modified_time":"2026-04-10T06:23:23+00:00","og_image":[{"width":800,"height":600,"url":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/04\/sujet-askem-2026-04-10.png","type":"image\/png"}],"author":"askemadmin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"askemadmin","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/#article","isPartOf":{"@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/"},"author":{"name":"askemadmin","@id":"https:\/\/askem.eu\/#\/schema\/person\/8bbee74ab9a977d56bf4826662e9d2e9"},"headline":"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG","datePublished":"2026-04-10T06:23:17+00:00","dateModified":"2026-04-10T06:23:23+00:00","mainEntityOfPage":{"@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/"},"wordCount":1050,"commentCount":0,"publisher":{"@id":"https:\/\/askem.eu\/#organization"},"image":{"@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/#primaryimage"},"thumbnailUrl":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/04\/sujet-askem-2026-04-10.png","articleSection":["AI"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/","url":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/","name":"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG - askem","isPartOf":{"@id":"https:\/\/askem.eu\/#website"},"primaryImageOfPage":{"@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/#primaryimage"},"image":{"@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/#primaryimage"},"thumbnailUrl":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/04\/sujet-askem-2026-04-10.png","datePublished":"2026-04-10T06:23:17+00:00","dateModified":"2026-04-10T06:23:23+00:00","description":"ASKEM BUREAU D'\u00c9TUDES ET DE FORMATION NUM\u00c9RIQUE. Nous vous assistons dans la transformation num\u00e9rique de vos outils, services et organisations tout en pla\u00e7ant l\u2019humain au c\u0153ur de notre d\u00e9marche d\u2019accompagnement.","breadcrumb":{"@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/#primaryimage","url":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/04\/sujet-askem-2026-04-10.png","contentUrl":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:auto\/h:auto\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2026\/04\/sujet-askem-2026-04-10.png","width":800,"height":600},{"@type":"BreadcrumbList","@id":"https:\/\/askem.eu\/2026\/04\/10\/docling-convertir-pdf-docx-et-images-en-donnees-structurees-pour-ses-pipelines-rag\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/askem.eu\/"},{"@type":"ListItem","position":2,"name":"Docling : convertir PDF, DOCX et images en donn\u00e9es structur\u00e9es pour ses pipelines RAG"}]},{"@type":"WebSite","@id":"https:\/\/askem.eu\/#website","url":"https:\/\/askem.eu\/","name":"askem","description":"","publisher":{"@id":"https:\/\/askem.eu\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/askem.eu\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/askem.eu\/#organization","name":"Askem","url":"https:\/\/askem.eu\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/askem.eu\/#\/schema\/logo\/image\/","url":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:760\/h:480\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2020\/10\/logoGalaxieAskem3.png","contentUrl":"https:\/\/mlpi0fxo3sth.i.optimole.com\/cb:3obA.c61\/w:760\/h:480\/q:mauto\/f:best\/https:\/\/askem.eu\/wp-content\/uploads\/2020\/10\/logoGalaxieAskem3.png","width":760,"height":480,"caption":"Askem"},"image":{"@id":"https:\/\/askem.eu\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/fb.me\/askem.eu","https:\/\/linkedin.com\/company\/askem-eu"]},{"@type":"Person","@id":"https:\/\/askem.eu\/#\/schema\/person\/8bbee74ab9a977d56bf4826662e9d2e9","name":"askemadmin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a202f744ee3a4b6fdbe2ceb57fd84c72559337791a276662270d8d2fb7842e3f?s=96&d=mm&r=g","caption":"askemadmin"},"sameAs":["https:\/\/askem.eu"]}]}},"_links":{"self":[{"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/posts\/2216","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/comments?post=2216"}],"version-history":[{"count":1,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/posts\/2216\/revisions"}],"predecessor-version":[{"id":2218,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/posts\/2216\/revisions\/2218"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/media\/2217"}],"wp:attachment":[{"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/media?parent=2216"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/categories?post=2216"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/askem.eu\/en\/wp-json\/wp\/v2\/tags?post=2216"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}