graph TB
subgraph "Input Sources"
A[Dublin Core JSON<br/>Local File or URL]
B[OpenAI API Key]
end
subgraph "Enhancement Pipeline"
C[MetadataEnhancer Class]
D[CLI Interface]
E[Image Processing]
F[AI Analysis GPT-5]
end
subgraph "Output"
G[Enhanced JSON with Alt Text]
H[Error Logs]
end
A --> C
B --> C
C --> D
C --> E
E --> F
F --> G
C --> H
style A fill:#e1f5fe
style G fill:#e8f5e8
style F fill:#fff3e0
Metadata Enhancement Pipeline
The Dublin Core Metadata Enhancer includes an automated pipeline for generating WCAG 2.2-compliant alternative text for images in Dublin Core metadata records.
System Overview
Overview
This pipeline uses OpenAI’s newest GPT-5 model multimodal capabilities to analyze images within their metadata context and generate appropriate alternative text descriptions in German.
Pipeline Architecture
graph TD
A[Load Dublin Core Metadata<br/>Local File or URL] --> B[Extract object_thumb URLs]
B --> C[Download Images from Omeka]
C --> D[Build Context Prompts]
D --> E[OpenAI GPT-5 Analysis]
E --> F[Generate Alt Text]
F --> G[Validate WCAG 2.2 Compliance]
G --> H[Save Enhanced Metadata]
E --> L{Image Type Classification}
L -->|Informative| M[Generate 1-2 sentences<br/>Max 120 chars]
L -->|Complex/Maps/Diagrams| N[Generate description<br/>Max 200 chars + longdesc]
L -->|Text Images| O[OCR-based alt text]
M --> F
N --> F
O --> F
Components
Core Module (src/metadata_enhancer.py)
The main MetadataEnhancer class provides:
- Metadata Loading: Fetch Dublin Core metadata from JSON URLs
- Image Processing: Download and prepare images for AI analysis
- Prompt Generation: Create contextual prompts using metadata
- AI Integration: Generate alt text using OpenAI GPT-5 API
- Output Generation: Save enhanced metadata as JSON
CLI Interface (enhance_metadata.py)
Command-line tool for running the enhancement pipeline:
python enhance_metadata.py [options]CLI Workflow
graph TD
A[Start CLI] --> B[Parse Arguments]
B --> C{API Key Set?}
C -->|No| D[Error: Missing API Key]
C -->|Yes| E[Initialize MetadataEnhancer]
E --> F[Load Metadata from URL]
F --> G{Valid Metadata?}
G -->|No| H[Error: Invalid Metadata]
G -->|Yes| I[Process Each Object]
I --> J[Extract Image Information]
J --> K[Generate Alt Text]
K --> L[Collect Enhanced Objects]
L --> M{More Objects?}
M -->|Yes| I
M -->|No| N[Save to Output File]
N --> O[Success: Enhancement Complete]
D --> P[Exit with Error]
H --> P
O --> Q[Exit Successfully]
style A fill:#e3f2fd
style O fill:#e8f5e8
style D fill:#ffebee
style H fill:#ffebee
Options:
--metadata-url: Source path to local JSON file or URL for metadata JSON--output: Output file for enhanced metadata--api-key: OpenAI API key (or use environment variable)
Configuration
Set your OpenAI API key:
export OPENAI_API_KEY="your-openai-api-key-here"Or copy example.env to .env and configure:
cp example.env .env
# Edit .env with your API keyAI Prompt Design
The pipeline uses a carefully designed German prompt that follows a systematic decision process:
Image Classification Decision Tree
graph TD
A[Image Analysis Start] --> B{Contains readable text?}
B -->|Yes| C[Text Image Type]
C --> D[OCR-based alt text<br/>Transcribe visible text]
B -->|No| E{Complex visual content?}
E -->|Yes - Maps/Diagrams| F[Complex Content Type]
F --> G[Generate descriptive alt text<br/>Max 200 chars + optional longdesc]
E -->|No - Simple image| H[Informative Image Type]
H --> I[Generate concise description<br/>1-2 sentences, max 120 chars]
D --> J[Apply WCAG 2.2 Guidelines]
G --> J
I --> J
J --> K[German language output<br/>No 'Image of...' prefixes<br/>Contextual and descriptive]
Prompt Context Integration
graph LR
A[Dublin Core Metadata] --> B[Extract Context]
B --> C[Title]
B --> D[Description]
B --> E[Subject Terms]
B --> F[Historical Era]
B --> G[Creator Info]
B --> H[Date Information]
C --> I[Build Contextual Prompt]
D --> I
E --> I
F --> I
G --> I
H --> I
I --> J[Send to OpenAI GPT-5]
J --> K[Generate Contextual Alt Text]
The prompt design follows these principles:
- Identifies image types:
- Informative images (1-2 sentences, max 120 characters)
- Complex content like diagrams/maps (max 200 characters, optional long description)
- Text images (OCR-based alt text)
- Incorporates metadata context:
- Title, description, subject terms
- Historical era, creator, dates
- Collection and relationship information
- Follows WCAG 2.2 guidelines:
- Concise and descriptive
- No redundant “Image of…” prefixes
- German language output
- Structured JSON response
Output Format
Enhanced metadata objects include:
{
"objectid": "unique-identifier",
"alt_text": "Descriptive alternative text in German",
"longdesc": "Optional detailed description for complex content"
}Data Transformation Flow
graph LR
A[Input: Dublin Core JSON] --> B[Extract Object Data]
B --> C[Object Metadata]
C --> D[objectid]
C --> E[title]
C --> F[description]
C --> G[subject]
C --> H[format/image URL]
H --> I[Image Download & Analysis]
D --> J[Context Building]
E --> J
F --> J
G --> J
I --> K[OpenAI GPT-5 Processing]
J --> K
K --> L[Generated Alt Text]
K --> M[Optional Long Description]
D --> N[Enhanced Object]
L --> N
M --> N
N --> O[Output: Enhanced JSON]
style A fill:#e1f5fe
style O fill:#e8f5e8
style K fill:#fff3e0
Error Handling
The pipeline includes robust error handling for network, API, and processing issues:
Error Handling Flow
graph TD
A[Process Object] --> B{Valid image URL?}
B -->|No| C[Log Warning & Skip]
B -->|Yes| D[Download Image]
D --> E{Download Success?}
E -->|No| F[Network Error<br/>Log & Skip]
E -->|Yes| G[Process Image]
G --> H{Valid Image Format?}
H -->|No| I[Format Error<br/>Log & Skip]
H -->|Yes| J[Send to OpenAI]
J --> K{API Success?}
K -->|No| L[API Error<br/>Log & Skip]
K -->|Yes| M{Valid JSON Response?}
M -->|No| N[Parse Error<br/>Log & Skip]
M -->|Yes| O[Save Enhanced Object]
C --> P[Continue Next Object]
F --> P
I --> P
L --> P
N --> P
O --> P
The pipeline handles various error scenarios:
- Network connectivity issues
- Invalid image formats
- API rate limits and errors
- Malformed metadata
- Missing required fields
Failed objects are logged and skipped, allowing the pipeline to continue processing other objects.
Testing
Unit tests cover all major components:
# Run tests with uv
uv run pytest test/ -v
# Run type checking
uvx ty check src/
# Run linting
uv run ruff check .
# Format code
uv run ruff format . && uv run ruff check --fix .Tests use mocking to avoid API calls during development and validate:
- Metadata extraction and prompt building
- Image downloading logic (no resizing - handled by omeka)
- Error handling scenarios
- CLI argument parsing
Performance Considerations
- Batch Processing: Process multiple objects in sequence
- Rate Limiting: Respect OpenAI API limits
- Image Handling: Uses optimized thumbnail images from omeka (object_thumb field)
- Caching: Consider implementing caching for repeated images
Security
- API keys are handled securely through environment variables
- No sensitive data is logged or stored in outputs
- Image data is processed in memory only