12/18/2023
Learn how to use AI-powered features like image description and speech transcription in MarkItDown
Using AI Features in MarkItDown
MarkItDown integrates with Large Language Models to provide advanced features like image description and speech transcription. This guide explains how to set up and use these AI-powered features.
Setting Up AI Integration
To use AI features, you'll need an OpenAI API key. Here's how to set it up:
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI() # Make sure OPENAI_API_KEY is set in your environment
md = MarkItDown(llm_client=client, llm_model="gpt-4")
Image Description
When processing images, MarkItDown can generate detailed descriptions using AI:
result = md.convert("example.jpg")
print(result.text_content) # Includes AI-generated image description
The AI will analyze the image and provide:
- Detailed visual description
- Object identification
- Text recognition (OCR)
- Scene understanding
- Contextual information
Speech Transcription
For audio files, MarkItDown can transcribe speech to text:
result = md.convert("audio_file.mp3")
print(result.text_content) # Includes transcribed text
Features include:
- Multi-language support
- Speaker diarization
- Timestamp markers
- Punctuation and formatting
Best Practices
- Choose appropriate models for your use case
- Consider rate limits and API costs
- Handle large files appropriately
- Cache results when processing the same files multiple times
For more information about API configuration and advanced settings, refer to our documentation.