Using AI Features in MarkItDown

MarkItDown integrates with Large Language Models to provide advanced features like image description and speech transcription. This guide explains how to set up and use these AI-powered features.

Setting Up AI Integration

To use AI features, you'll need an OpenAI API key. Here's how to set it up:

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()  # Make sure OPENAI_API_KEY is set in your environment
md = MarkItDown(llm_client=client, llm_model="gpt-4")

Image Description

When processing images, MarkItDown can generate detailed descriptions using AI:

result = md.convert("example.jpg")
print(result.text_content)  # Includes AI-generated image description

The AI will analyze the image and provide:

Detailed visual description
Object identification
Text recognition (OCR)
Scene understanding
Contextual information

Speech Transcription

For audio files, MarkItDown can transcribe speech to text:

result = md.convert("audio_file.mp3")
print(result.text_content)  # Includes transcribed text

Features include:

Multi-language support
Speaker diarization
Timestamp markers
Punctuation and formatting

Best Practices

Choose appropriate models for your use case
Consider rate limits and API costs
Handle large files appropriately
Cache results when processing the same files multiple times

For more information about API configuration and advanced settings, refer to our documentation.