doc-to-markdown
Skillby daymade
Converts DOCX/PDF/PPTX to high-quality Markdown with automatic post-processing. Fixes pandoc grid tables, simple tables, image paths, CJK bold spacing, attribute noise, and code blocks. Benchmarked best-in-class (7.6/10) against Docling, MarkItDown, Pandoc raw, and Mammoth. Trigger on "convert document", "docx to markdown", "parse word", "doc to markdown", "解析word", "转换文档".
Details
- Path
- daymade-docs/doc-to-markdown
- Bundled scripts
- 6
- Dependencies
- 1
Bundled scripts
- daymade-docs/doc-to-markdown/scripts/validate_output.py
- daymade-docs/doc-to-markdown/scripts/convert.py
- daymade-docs/doc-to-markdown/scripts/test_convert.py
- daymade-docs/doc-to-markdown/scripts/convert_path.py
- daymade-docs/doc-to-markdown/scripts/extract_pdf_images.py
- daymade-docs/doc-to-markdown/scripts/merge_outputs.py