Biodiversity Heritage Library Open Data Collection
About the Biodiversity Heritage Library Open Data Collection
All BHL data is available for public use under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. This Creative Commons license allows anyone to reuse, modify, re-purpose, and distribute the data for all purposes including commercial and non-commercial, without the need to ask for permission.
Go ahead, take our data and do something creative with it! If you do repurpose BHL metadata please share your story with us. We often like to feature stories of reuse on our BHL blog.
To use this data effectively it is important to understand how it was cataloged, what format types are available, and what data exists for the named entities in the BHL database. Please consult the following definitions below:
Hosted vs. Complete Versions
The Biodiversity Heritage Library Open Data Collection contains two iterations of BHL data -- hosted and complete:
• Hosted: contains data that is hosted on BHL servers
• Complete: contains data that is hosted on BHL servers, plus data that is externally hosted.
Exports Formats
The records in this collection are cataloged and made available for export by format type:
• MODS
• BibTex
• RIS
• TSV
• OCR TXT
Each record contains several dataset distributions for the following named entities in the BHL database:
• BHL Titles: contains bibliographic metadata about the journals and monographs as extracted from the contributing library’s catalog at the time of digitization or applied post-digitization.
• BHL Items: contains information about each bound object (or “book”) digitized from a contributing library. For a serial, journal, or multi-volume monograph, an item represents a volume or multiple volumes bound together. For a single-volume monograph, an item represents the book.
• BHL Creators: contains the names of the authors of each journal and monograph
• BHL Parts (Segments): contains information about articles/chapters/treatments/etc. These parts may or may not be contained in material scanned by BHL
• BHL Pages: contains the metadata about the scanned pages from an Item.
• BHL Subjects: contains information about subject headings assigned to each journal and monograph represented in the BHL web portal.
• BHL Names: contains all of the names that have been identified by Global Names Scientific Names Services and the pages on which those names are found.
Additional Information
• Data only includes records for the entities with the status of “Published” in the BHL database.
• Data is refreshed on the first of the month. Only the most recent versions are available due to figshare size constraints.
Data Disclaimer
The data in BHL’s collection is sourced and aggregated from its consortium partners and Internet Archive contributors. It is provided "as is," without express or implied warranty as to accuracy, reliability, or fitness for any particular application. Please see our Data Disclaimer for more information.