The Smithsonian Institution
Browse

sorry, we can't preview this file

bhl-ocr-20240515.tar.bz2 (40.64 GB)

BHL Optical Character Recognition (OCR) - Full Text Export (new)

Download (40.64 GB)
Version 18 2024-05-18, 05:29
Version 17 2024-04-19, 15:25
Version 16 2024-03-18, 12:51
Version 15 2024-02-18, 04:22
Version 14 2024-01-18, 18:57
Version 13 2023-12-17, 22:33
Version 12 2023-11-18, 00:30
Version 11 2023-08-30, 05:20
Version 10 2023-06-18, 18:34
Version 9 2023-05-20, 11:12
Version 8 2023-04-17, 00:40
Version 7 2023-03-16, 21:36
Version 6 2023-02-16, 23:44
Version 5 2023-01-27, 00:16
Version 4 2022-12-17, 19:44
Version 3 2022-11-27, 13:39
Version 2 2022-11-21, 15:49
Version 1 2022-11-16, 14:16
dataset
posted on 2024-05-18, 05:29 authored by Joel RichardJoel Richard, Jacqueline DearbornJacqueline Dearborn

The dataset contains a full export of the 60+ million  pages of OCR content in the Biodiversity Heritage Library, for items  hosted by BHL. 

For contextual information and key definitions about this dataset see the Biodiversity Heritage Library Open Data Collection and the data dictionary below.

  • Data Dictionary: s.si.edu/bhlocrtxt
  • Release Date: the 17th of each month
  • Frequency: Monthly 
  • bureauCode: 452:11 
  • Access Level: public

History

Usage metrics

    Smithsonian Libraries and Archives

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC