Metadata Enhancement Pipeline

The Dublin Core Metadata Enhancer includes an automated pipeline for generating WCAG 2.2-compliant alternative text for images in Dublin Core metadata records.

System Overview

graph TB
    subgraph "Input Sources"
        A[Dublin Core JSON<br/>Local File or URL]
        B[OpenAI API Key]
    end

    subgraph "Enhancement Pipeline"
        C[MetadataEnhancer Class]
        D[CLI Interface]
        E[Image Processing]
        F[AI Analysis GPT-5]
    end

    subgraph "Output"
        G[Enhanced JSON with Alt Text]
        H[Error Logs]
    end

    A --> C
    B --> C
    C --> D
    C --> E
    E --> F
    F --> G
    C --> H

    style A fill:#e1f5fe
    style G fill:#e8f5e8
    style F fill:#fff3e0

Overview

This pipeline uses OpenAI’s newest GPT-5 model multimodal capabilities to analyze images within their metadata context and generate appropriate alternative text descriptions in German.

Pipeline Architecture

graph TD
    A[Load Dublin Core Metadata<br/>Local File or URL] --> B[Extract object_thumb URLs]
    B --> C[Download Images from Omeka]
    C --> D[Build Context Prompts]
    D --> E[OpenAI GPT-5 Analysis]
    E --> F[Generate Alt Text]
    F --> G[Validate WCAG 2.2 Compliance]
    G --> H[Save Enhanced Metadata]

    E --> L{Image Type Classification}
    L -->|Informative| M[Generate 1-2 sentences<br/>Max 120 chars]
    L -->|Complex/Maps/Diagrams| N[Generate description<br/>Max 200 chars + longdesc]
    L -->|Text Images| O[OCR-based alt text]

    M --> F
    N --> F
    O --> F

Components

Core Module (`src/metadata_enhancer.py`)

The main MetadataEnhancer class provides:

Metadata Loading: Fetch Dublin Core metadata from JSON URLs
Image Processing: Download and prepare images for AI analysis
Prompt Generation: Create contextual prompts using metadata
AI Integration: Generate alt text using OpenAI GPT-5 API
Output Generation: Save enhanced metadata as JSON

CLI Interface (`enhance_metadata.py`)

Command-line tool for running the enhancement pipeline:

python enhance_metadata.py [options]

CLI Workflow

graph TD
    A[Start CLI] --> B[Parse Arguments]
    B --> C{API Key Set?}
    C -->|No| D[Error: Missing API Key]
    C -->|Yes| E[Initialize MetadataEnhancer]

    E --> F[Load Metadata from URL]
    F --> G{Valid Metadata?}
    G -->|No| H[Error: Invalid Metadata]
    G -->|Yes| I[Process Each Object]

    I --> J[Extract Image Information]
    J --> K[Generate Alt Text]
    K --> L[Collect Enhanced Objects]

    L --> M{More Objects?}
    M -->|Yes| I
    M -->|No| N[Save to Output File]

    N --> O[Success: Enhancement Complete]

    D --> P[Exit with Error]
    H --> P
    O --> Q[Exit Successfully]

    style A fill:#e3f2fd
    style O fill:#e8f5e8
    style D fill:#ffebee
    style H fill:#ffebee

Options:

--metadata-url: Source path to local JSON file or URL for metadata JSON
--output: Output file for enhanced metadata
--api-key: OpenAI API key (or use environment variable)

Configuration

Set your OpenAI API key:

export OPENAI_API_KEY="your-openai-api-key-here"

Or copy example.env to .env and configure:

cp example.env .env
# Edit .env with your API key

AI Prompt Design

The pipeline uses a carefully designed German prompt that follows a systematic decision process:

Image Classification Decision Tree

graph TD
  A[Image Analysis Start] --> B{Contains readable text?}
  B -->|Yes| C[Text Image Type]
  C --> D[OCR-based alt text<br/>Transcribe visible text]

  B -->|No| E{Complex visual content?}
  E -->|Yes - Maps/Diagrams| F[Complex Content Type]
  F --> G[Generate descriptive alt text<br/>Max 200 chars + optional longdesc]

  E -->|No - Simple image| H[Informative Image Type]
  H --> I[Generate concise description<br/>1-2 sentences, max 120 chars]

  D --> J[Apply WCAG 2.2 Guidelines]
  G --> J
  I --> J

  J --> K[German language output<br/>No 'Image of...' prefixes<br/>Contextual and descriptive]

Prompt Context Integration

graph LR
    A[Dublin Core Metadata] --> B[Extract Context]
    B --> C[Title]
    B --> D[Description]
    B --> E[Subject Terms]
    B --> F[Historical Era]
    B --> G[Creator Info]
    B --> H[Date Information]

    C --> I[Build Contextual Prompt]
    D --> I
    E --> I
    F --> I
    G --> I
    H --> I

    I --> J[Send to OpenAI GPT-5]
    J --> K[Generate Contextual Alt Text]

The prompt design follows these principles:

Identifies image types:
- Informative images (1-2 sentences, max 120 characters)
- Complex content like diagrams/maps (max 200 characters, optional long description)
- Text images (OCR-based alt text)
Incorporates metadata context:
- Title, description, subject terms
- Historical era, creator, dates
- Collection and relationship information
Follows WCAG 2.2 guidelines:
- Concise and descriptive
- No redundant “Image of…” prefixes
- German language output
- Structured JSON response

Output Format

Enhanced metadata objects include:

{
    "objectid": "unique-identifier",
    "alt_text": "Descriptive alternative text in German",
    "longdesc": "Optional detailed description for complex content"
}

Data Transformation Flow

graph LR
    A[Input: Dublin Core JSON] --> B[Extract Object Data]
    B --> C[Object Metadata]

    C --> D[objectid]
    C --> E[title]
    C --> F[description]
    C --> G[subject]
    C --> H[format/image URL]

    H --> I[Image Download & Analysis]
    D --> J[Context Building]
    E --> J
    F --> J
    G --> J

    I --> K[OpenAI GPT-5 Processing]
    J --> K

    K --> L[Generated Alt Text]
    K --> M[Optional Long Description]

    D --> N[Enhanced Object]
    L --> N
    M --> N

    N --> O[Output: Enhanced JSON]

    style A fill:#e1f5fe
    style O fill:#e8f5e8
    style K fill:#fff3e0

Error Handling

The pipeline includes robust error handling for network, API, and processing issues:

Error Handling Flow

graph TD
    A[Process Object] --> B{Valid image URL?}
    B -->|No| C[Log Warning & Skip]
    B -->|Yes| D[Download Image]

    D --> E{Download Success?}
    E -->|No| F[Network Error<br/>Log & Skip]
    E -->|Yes| G[Process Image]

    G --> H{Valid Image Format?}
    H -->|No| I[Format Error<br/>Log & Skip]
    H -->|Yes| J[Send to OpenAI]

    J --> K{API Success?}
    K -->|No| L[API Error<br/>Log & Skip]
    K -->|Yes| M{Valid JSON Response?}

    M -->|No| N[Parse Error<br/>Log & Skip]
    M -->|Yes| O[Save Enhanced Object]

    C --> P[Continue Next Object]
    F --> P
    I --> P
    L --> P
    N --> P
    O --> P

The pipeline handles various error scenarios:

Network connectivity issues
Invalid image formats
API rate limits and errors
Malformed metadata
Missing required fields

Failed objects are logged and skipped, allowing the pipeline to continue processing other objects.

Testing

Unit tests cover all major components:

# Run tests with uv
uv run pytest test/ -v

# Run type checking
uvx ty check src/

# Run linting
uv run ruff check .

# Format code  
uv run ruff format . && uv run ruff check --fix .

Tests use mocking to avoid API calls during development and validate:

Metadata extraction and prompt building
Image downloading logic (no resizing - handled by omeka)
Error handling scenarios
CLI argument parsing

Performance Considerations

Batch Processing: Process multiple objects in sequence
Rate Limiting: Respect OpenAI API limits
Image Handling: Uses optimized thumbnail images from omeka (object_thumb field)
Caching: Consider implementing caching for repeated images

Security

API keys are handled securely through environment variables
No sensitive data is logged or stored in outputs
Image data is processed in memory only