new_final_txt.tar.gz (111.07 MB)
Download fileSmithsonian Annual Reports Text files v2
dataset
posted on 2022-09-26, 19:19 authored by Rebecca DikowRebecca Dikow, Michael TriznaMichael Triznatxt files resulting from OCR using Tesseract for Smithsonian Annual Report documents. The JPGs that were used as input data were downloaded from https://library.si.edu/digital-library/collection/smithsonian-legacy-publications. These txt files were not post-processed.