Dublin Core Metadata Enhancer

This project provides tools and workflows to enhance Dublin Core metadata records with reproducible enrichment processes. It aims to improve the quality and completeness of Dublin Core metadata through automated enrichment pipelines.

Purpose

The Dublin Core Metadata Enhancer enables:

Automated Enhancement: Systematic improvement of Dublin Core metadata records
Reproducible Workflows: Documented and repeatable enhancement processes
Quality Assurance: Validation and verification of enhanced metadata
Open Science: Transparent and shareable enhancement methodologies

Scope

This repository contains the source code, documentation, and examples for enhancing Dublin Core metadata records. The enhancement workflows can be applied to various types of digital resources and collections.

For detailed documentation and usage examples, please see the full documentation in this repository.

dublin-core-metadata-enhancer

Enhance Dublin Core records with reproducible enrichment workflows. The data in this repository is openly available to everyone and is intended to support reproducible research.

Repository Structure

The structure of this repository follows the Advanced Structure for Data Analysis of The Turing Way and is organized as follows:

analysis/: scripts and notebooks used to analyze the data
assets/: images, logos, etc. used in the README and other documentation
build/: scripts and notebooks used to build the data
data/: data files
documentation/: documentation for the data and the repository
project-management/: project management documents (e.g., meeting notes, project plans, etc.)
src/: source code for the data (e.g., scripts used to collect or process the data)
test/: tests for the data and source code
report.md: a report describing the analysis of the data

Data Description

This repository contains Dublin Core metadata enhancement tools and workflows designed to improve the quality and completeness of Dublin Core metadata records. The data includes:

Enhancement Workflows: Reproducible processes for enriching Dublin Core metadata
Validation Tools: Scripts and utilities for quality assurance of enhanced metadata
Documentation: Comprehensive guides and examples for using the enhancement pipelines
Test Data: Sample Dublin Core records for testing and validation purposes

All enhancement workflows are documented and version-controlled to ensure reproducibility. The tools support various Dublin Core metadata formats and can be adapted for different types of digital collections.

Data models and field mappings are documented in the documentation/ directory. All code is released under the AGPL-3.0 license, and data products are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Use

Metadata Enhancement Pipeline

This repository includes an automated metadata enhancement pipeline that generates WCAG 2.2-compliant alternative text for images using OpenAI’s newest GPT-5 model.

Prerequisites

Python 3.8 or higher
OpenAI API key

Installation

# Install uv (modern Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install Python dependencies
uv sync

# Set your OpenAI API key
export OPENAI_API_KEY="your-openai-api-key-here"

Usage

# Enhance metadata from the default source
uv run python enhance_metadata.py

# Specify custom metadata URL and output file
# Run enhancement on remote metadata
uv run python enhance_metadata.py --metadata-url "https://example.com/metadata.json" --output "enhanced_metadata.json"

# Run enhancement on local metadata file
uv run python enhance_metadata.py --metadata-url "data/local_metadata.json" --output "enhanced_local.json"

# Use API key from command line
uv run python enhance_metadata.py --api-key "your-api-key"

# Development commands
uv run pytest                    # Run tests
uvx ty check src/                # Type checking
uv run ruff format .             # Format code with ruff
uv run ruff check .              # Lint code with ruff

How it works

The enhancement pipeline:

Loads Dublin Core metadata from a JSON source (local file or URL)
Downloads thumbnail images (object_thumb field) - images are pre-optimized by omeka
Analyzes images using GPT-5 with contextual metadata
Generates WCAG-compliant alternative text in German
Outputs enhanced metadata as JSON

The AI prompt is designed to:

Identify image types (informative, complex diagrams/maps, or text images)
Generate appropriate alt text (max 120-200 characters)
Create long descriptions for complex content when needed
Follow accessibility best practices

Output Format

{
    "objectid": "example001",
    "alt_text": "Karte von Basel als befestigte Grenzstadt, umgeben von Breisgau und Sundgau.",
    "longdesc": ""
}

Testing

# Run tests
python -m unittest test.test_metadata_enhancer

Citation and Data Access

These data are openly available to everyone and can be used for any research or educational purpose. If you use this data in your research, please cite as specified in CITATION.cff. The following citation formats are also available through Zenodo:

Zenodo provides an API (REST & OAI-PMH) to access the data. For example, the following command will return the metadata for the most recent version of the data

curl -i https://zenodo.org/api/records/ZENODO_RECORD

Support

This project is maintained by @Stadt-Geschichte-Basel. Please understand that we can’t provide individual support via email. We also believe that help is much more valuable when it’s shared publicly, so more people can benefit from it.

Type	Platforms
🚨 Bug Reports	GitHub Issue Tracker
📊 Report bad data	GitHub Issue Tracker
📚 Docs Issue	GitHub Issue Tracker
🎁 Feature Requests	GitHub Issue Tracker
🛡 Report a security vulnerability	See SECURITY.md
💬 General Questions	GitHub Discussions

Roadmap

No changes are currently planned.

Contributing

All contributions to this repository are welcome! If you find errors or problems with the data, or if you want to add new data or features, please open an issue or pull request. Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

Versioning

We use SemVer for versioning. The available versions are listed in the tags on this repository.

Authors and acknowledgment

Moritz Mähr - Initial work - maehr

See also the list of contributors who contributed to this project.

License

The data in this repository is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) License - see the LICENSE-CCBY file for details. By using this data, you agree to give appropriate credit to the original author(s) and to indicate if any modifications have been made.

The code in this repository is released under the GNU Affero General Public License v3.0 - see the LICENSE-AGPL file for details. By using this code, you agree to make any modifications available under the same license.