Biodiversity Heritage Library Open Data Collection
About the Biodiversity Heritage Library Open Data Collection
All BHL data is available for public use under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. This Creative Commons license allows anyone to reuse, modify, re-purpose, and distribute the data for all purposes including commercial and non-commercial, without the need to ask for permission.
Go ahead, take our data and do something creative with it! If you do repurpose BHL metadata please share your story with us. We often like to feature stories of reuse on our BHL blog.
To use this data effectively it is important to understand how it was cataloged, what format types are available, and what data exists for the named entities in the BHL database. Please consult the following definitions below:
Hosted vs. Complete Versions
The Biodiversity Heritage Library Open Data Collection contains two iterations of BHL data -- hosted and complete:
• Hosted: contains data that is hosted on BHL servers
• Complete: contains data that is hosted on BHL servers, plus data that is externally hosted.
Exports Formats
The records in this collection are cataloged and made available for export by format type:
• MODS
• BibTex
• RIS
• TSV
• OCR TXT
Each record contains several dataset distributions for the following named entities in the BHL database:
• BHL Titles: contains bibliographic metadata about the journals and monographs as extracted from the contributing library’s catalog at the time of digitization or applied post-digitization.
• BHL Items: contains information about each bound object (or “book”) digitized from a contributing library. For a serial, journal, or multi-volume monograph, an item represents a volume or multiple volumes bound together. For a single-volume monograph, an item represents the book.
• BHL Creators: contains the names of the authors of each journal and monograph
• BHL Parts (Segments): contains information about articles/chapters/treatments/etc. These parts may or may not be contained in material scanned by BHL
• BHL Pages: contains the metadata about the scanned pages from an Item.
• BHL Subjects: contains information about subject headings assigned to each journal and monograph represented in the BHL web portal.
• BHL Names: contains all of the names that have been identified by Global Names Scientific Names Services and the pages on which those names are found.
Additional Information
• Data only includes records for the entities with the status of “Published” in the BHL database.
• Data is refreshed on the first of the month. Only the most recent versions are available due to size limitations.
Data Disclaimer
BHL strives to be “the most comprehensive, reliable, reputable repository of data-rich biodiversity literature, and other original materials, to support a response to global challenges.” The data in BHL’s collection is sourced and aggregated from its consortium partners and Internet Archive contributors. It is provided "as is," without express or implied warranty as to accuracy, reliability, or fitness for any particular application. The data for digitized legacy materials is curated to the best of our ability to facilitate the discovery of BHL collection materials on its websites, and support a diverse array of downstream data consumers and discovery layers.
Because BHL is primarily a data aggregator and adaptor, rather than a creator, there may be omissions and inaccuracies in BHL data. Harmful language is also present in some BHL data due to prejudiced views inherent within historical cataloging practices. The aggregation of bibliographic records, created over many decades from hundreds of contributing institutions, and the use of predominantly Western classification schemes in historic cataloging practices, means that BHL’s data may contain reductive, offensive, biased, missing, and outdated terminology.
The processing required to improve data quality can be complex and requires high resource inputs. BHL’s ability to improve data quality is dependent upon the resources available from our consortium partner institutions. In cases where BHL links out to selected content in external, trusted repositories, BHL does not create, operate, control, or endorse any data on third party websites. For more information about externally hosted content, please see BHL’s collection development policy.
BHL makes its data available for public use under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. For additional information on how to use BHL data effectively, please refer to the BHL Open Data Collection on Smithsonian’s data repository.