Dublin Core Metadata Enhancer
This project provides tools and workflows to enhance Dublin Core metadata records with reproducible enrichment processes. It aims to improve the quality and completeness of Dublin Core metadata through automated enrichment pipelines.
Purpose
The Dublin Core Metadata Enhancer enables:
- Automated Enhancement: Systematic improvement of Dublin Core metadata records
- Reproducible Workflows: Documented and repeatable enhancement processes
- Quality Assurance: Validation and verification of enhanced metadata
- Open Science: Transparent and shareable enhancement methodologies
Scope
This repository contains the source code, documentation, and examples for enhancing Dublin Core metadata records. The enhancement workflows can be applied to various types of digital resources and collections.
For detailed documentation and usage examples, please see the full documentation in this repository.
dublin-core-metadata-enhancer
Enhance Dublin Core records with reproducible enrichment workflows. The data in this repository is openly available to everyone and is intended to support reproducible research.
Repository Structure
The structure of this repository follows the Advanced Structure for Data Analysis of The Turing Way and is organized as follows:
analysis/: scripts and notebooks used to analyze the dataassets/: images, logos, etc. used in the README and other documentationbuild/: scripts and notebooks used to build the datadata/: data filesdocumentation/: documentation for the data and the repositoryproject-management/: project management documents (e.g., meeting notes, project plans, etc.)src/: source code for the data (e.g., scripts used to collect or process the data)test/: tests for the data and source codereport.md: a report describing the analysis of the data
Data Description
This repository contains Dublin Core metadata enhancement tools and workflows designed to improve the quality and completeness of Dublin Core metadata records. The data includes:
- Enhancement Workflows: Reproducible processes for enriching Dublin Core metadata
- Validation Tools: Scripts and utilities for quality assurance of enhanced metadata
- Documentation: Comprehensive guides and examples for using the enhancement pipelines
- Test Data: Sample Dublin Core records for testing and validation purposes
All enhancement workflows are documented and version-controlled to ensure reproducibility. The tools support various Dublin Core metadata formats and can be adapted for different types of digital collections.
Data models and field mappings are documented in the documentation/ directory. All code is released under the AGPL-3.0 license, and data products are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Use
Metadata Enhancement Pipeline
This repository includes an automated metadata enhancement pipeline that generates WCAG 2.2-compliant alternative text for images using OpenAI’s newest GPT-5 model.
Prerequisites
- Python 3.8 or higher
- OpenAI API key
Installation
# Install uv (modern Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install Python dependencies
uv sync
# Set your OpenAI API key
export OPENAI_API_KEY="your-openai-api-key-here"Usage
# Enhance metadata from the default source
uv run python enhance_metadata.py
# Specify custom metadata URL and output file
# Run enhancement on remote metadata
uv run python enhance_metadata.py --metadata-url "https://example.com/metadata.json" --output "enhanced_metadata.json"
# Run enhancement on local metadata file
uv run python enhance_metadata.py --metadata-url "data/local_metadata.json" --output "enhanced_local.json"
# Use API key from command line
uv run python enhance_metadata.py --api-key "your-api-key"
# Development commands
uv run pytest # Run tests
uvx ty check src/ # Type checking
uv run ruff format . # Format code with ruff
uv run ruff check . # Lint code with ruffHow it works
The enhancement pipeline:
- Loads Dublin Core metadata from a JSON source (local file or URL)
- Downloads thumbnail images (object_thumb field) - images are pre-optimized by omeka
- Analyzes images using GPT-5 with contextual metadata
- Generates WCAG-compliant alternative text in German
- Outputs enhanced metadata as JSON
The AI prompt is designed to:
- Identify image types (informative, complex diagrams/maps, or text images)
- Generate appropriate alt text (max 120-200 characters)
- Create long descriptions for complex content when needed
- Follow accessibility best practices
Output Format
{
"objectid": "example001",
"alt_text": "Karte von Basel als befestigte Grenzstadt, umgeben von Breisgau und Sundgau.",
"longdesc": ""
}Testing
# Run tests
python -m unittest test.test_metadata_enhancerCitation and Data Access
These data are openly available to everyone and can be used for any research or educational purpose. If you use this data in your research, please cite as specified in CITATION.cff. The following citation formats are also available through Zenodo:
Zenodo provides an API (REST & OAI-PMH) to access the data. For example, the following command will return the metadata for the most recent version of the data
curl -i https://zenodo.org/api/records/ZENODO_RECORDSupport
This project is maintained by @Stadt-Geschichte-Basel. Please understand that we can’t provide individual support via email. We also believe that help is much more valuable when it’s shared publicly, so more people can benefit from it.
| Type | Platforms |
|---|---|
| 🚨 Bug Reports | GitHub Issue Tracker |
| 📊 Report bad data | GitHub Issue Tracker |
| 📚 Docs Issue | GitHub Issue Tracker |
| 🎁 Feature Requests | GitHub Issue Tracker |
| 🛡 Report a security vulnerability | See SECURITY.md |
| 💬 General Questions | GitHub Discussions |
Roadmap
No changes are currently planned.
Contributing
All contributions to this repository are welcome! If you find errors or problems with the data, or if you want to add new data or features, please open an issue or pull request. Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
Versioning
We use SemVer for versioning. The available versions are listed in the tags on this repository.
License
The data in this repository is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) License - see the LICENSE-CCBY file for details. By using this data, you agree to give appropriate credit to the original author(s) and to indicate if any modifications have been made.
The code in this repository is released under the GNU Affero General Public License v3.0 - see the LICENSE-AGPL file for details. By using this code, you agree to make any modifications available under the same license.