new_final_txt.tar.gz (111.07 MB)
Download fileSmithsonian Annual Reports Text files (not post processed)
dataset
posted on 2023-01-11, 17:46 authored by Rebecca DikowRebecca Dikow, Michael TriznaMichael Triznatxt files resulting from OCR using Tesseract for Smithsonian Annual Report documents. The JPGs that were used as input data were downloaded from https://library.si.edu/digital-library/collection/smithsonian-legacy-publications. These txt files were not post-processed.