12/18/2023
Learn how to install and use MarkItDown for converting various file formats to Markdown
Getting Started with MarkItDown
MarkItDown is a powerful utility that converts various file formats to Markdown. This guide will help you get started with installation and basic usage.
Installation
You can install MarkItDown using pip:
pip install markitdown
Alternatively, install from source:
pip install -e .
Basic Usage
Command Line Interface
Convert a PDF file to Markdown:
markitdown path-to-file.pdf > document.md
You can also pipe content:
cat path-to-file.pdf | markitdown
Python API
Here's a simple example using the Python API:
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("test.xlsx")
print(result.text_content)
Docker Support
Run MarkItDown in a container:
docker build -t markitdown:latest .
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md
Supported File Formats
MarkItDown supports a wide range of file formats:
- PDF documents
- Microsoft Office files (PowerPoint, Word, Excel)
- Images (with EXIF metadata and OCR)
- Audio files (with EXIF metadata and speech transcription)
- HTML pages
- Text-based formats (CSV, JSON, XML)
- ZIP archives (processes contents recursively)
For more detailed information about specific features and advanced usage, check out our other guides.