The Smithsonian Institution

sorry, we can't preview this file

bhl-ocr-20231115.tar.bz2 (40.22 GB)

BHL Optical Character Recognition (OCR) - Full Text Export (new)

Download (40.22 GB)
posted on 2023-11-18, 00:30 authored by Joel RichardJoel Richard, Jacqueline DearbornJacqueline Dearborn

The dataset contains a full export of the 60+ million  pages of OCR content in the Biodiversity Heritage Library, for items  hosted by BHL. 

For contextual information and key definitions about this dataset see the Biodiversity Heritage Library Open Data Collection and the data dictionary below.

  • Data Dictionary:
  • Release Date: the 17th of each month
  • Frequency: Monthly 
  • bureauCode: 452:11 
  • Access Level: public