sorry, we can't preview this file
bhl-ocr-20231115.tar.bz2 (40.22 GB)
BHL Optical Character Recognition (OCR) - Full Text Export (new)
dataset
posted on 2023-11-18, 00:30 authored by Joel RichardJoel Richard, Jacqueline DearbornJacqueline DearbornThe dataset contains a full export of the 60+ million pages of OCR content in the Biodiversity Heritage Library, for items hosted by BHL.
For contextual information and key definitions about this dataset see the Biodiversity Heritage Library Open Data Collection and the data dictionary below.
- Data Dictionary: s.si.edu/bhlocrtxt
- Release Date: the 17th of each month
- Frequency: Monthly
- bureauCode: 452:11
- Access Level: public