12/18/2023

Learn how to install and use MarkItDown for converting various file formats to Markdown

Getting Started with MarkItDown

MarkItDown is a powerful utility that converts various file formats to Markdown. This guide will help you get started with installation and basic usage.

Installation

You can install MarkItDown using pip:

pip install markitdown

Alternatively, install from source:

pip install -e .

Basic Usage

Command Line Interface

Convert a PDF file to Markdown:

markitdown path-to-file.pdf > document.md

You can also pipe content:

cat path-to-file.pdf | markitdown

Python API

Here's a simple example using the Python API:

from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("test.xlsx")
print(result.text_content)

Docker Support

Run MarkItDown in a container:

docker build -t markitdown:latest .
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md

Supported File Formats

MarkItDown supports a wide range of file formats:

  • PDF documents
  • Microsoft Office files (PowerPoint, Word, Excel)
  • Images (with EXIF metadata and OCR)
  • Audio files (with EXIF metadata and speech transcription)
  • HTML pages
  • Text-based formats (CSV, JSON, XML)
  • ZIP archives (processes contents recursively)

For more detailed information about specific features and advanced usage, check out our other guides.