Skip to content

doc-to-markdown

Converts DOCX/PDF/PPTX to high-quality Markdown with automatic post-processing. Fixes pandoc grid tables, simple tables, image paths, CJK bold spacing, attribute noise, and code blocks. Benchmarked best-in-class (7.6/10) against Docling, MarkItDown, Pandoc raw, and Mammoth. Trigger on "convert document", "docx to markdown", "parse word", "doc to markdown", "解析word", "转换文档".

Repository Source folder

Details

Path
daymade-docs/doc-to-markdown
Bundled scripts
6
Dependencies
1

Bundled scripts

  • daymade-docs/doc-to-markdown/scripts/validate_output.py
  • daymade-docs/doc-to-markdown/scripts/convert.py
  • daymade-docs/doc-to-markdown/scripts/test_convert.py
  • daymade-docs/doc-to-markdown/scripts/convert_path.py
  • daymade-docs/doc-to-markdown/scripts/extract_pdf_images.py
  • daymade-docs/doc-to-markdown/scripts/merge_outputs.py

FAQ