Inspiring discovery through free access to biodiversity knowledge.

The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.
BHL also serves as the literature component of the Encyclopedia of Life .


Scanning Processes

Internet Archive Scanning

The vast majority of the content on the Biodiversity Heritage Library site has been scanned in partnership with the Internet Archive (IA). For an overview of the IA scanning process see

The BHL has partnered with the Internet Archive to use scanning facilities in the following locations for member libraries:
  • FedScan (Library of Congress, Washington, DC)
  • Northeast Regional Scanning Facility (Boston Public Library, Boston, MA)
  • San Francisco scanning center of Internet Archive (San Francisco, CA)

Additionally, the following locations maintain individual Internet Archive Scribe scanning stations:
  • Smithsonian Libraries (National Museum of Natural History, Washington, DC)
  • Natural History Museum (London, UK)

The University of Illinois, Urbana-Champaign has also provided BHL with the use of its Scribe machine to digitize material from The Field Museum (Chicago, IL).

The BHL has also harvested relevant BHL content from other Internet Archive partners (e.g. California Digital Library, University of Toronto, etc.) available via the Internet Archive.
Scribe station (photo by J. Mignault)
Scribe station (photo by J. Mignault)

Scribe Specifications
The majority of BHL scanning is done on a Scribe machine, developed by the Internet Archive. Scribes incorporate a super structure that supports two cameras on sliding tracks, lights (continuous, not flash), a bed and glass platen set at an angle, and a foot pedal for raising and lowering the platen. Cameras are generally (as of 2010) Canon EOS 5Ds with Canon 100mm 1:2.8 EF Macro lenses. Images are processed using software developed at Internet Archive (processing includes cropping, rotating/ deskewing, and converting from RAW to JPEG2000) and then uploaded to the Internet Archive, where they are further converted to .pdf, .epub, etc.

Internet Archive Scanning Process documentation

Non-Internet Archive Scanning

The individual members of BHL have a long history of digitization. Currently, the BHL portal contains only a small portion of this content. BHL is actively working to integrate this content into the portal.

The Missouri Botanical Garden Library (MOBOT) is the largest supplier of content from non-Internet Archive scanning processes. MOBOT is actively scanning rare, fragile, and oversize material for direct deposit into the BHL portal.

Missouri Botanical Garden - Peter H. Raven Library
The Peter H. Raven Library at the Missouri Botanical Garden currently has five book scanning stations used for scanning and uploading content to BHL. A majority of the library's book scanning takes place on three Indus 5002 book scanning machines. The Indus 5002 book scanner is basically an all-in-one copy stand setup - books are opened up by hand on a flat surface while a continuous fluorescent light source illuminates the bed. A camera housing above the book rapidly scans the book below and transfers the images to the companion BCS-2 software for additional post-processing. Also in use by the library is a Kodak i260 sheet feed scanner. Only unbound material can be scanned with this scanner, so its use can be limited, however the speed is quite fast. The latest addition to the library's scanning equipment is a Leaf Aptus II 10R digital camera back. Attached to a Cambo large format camera mounted on a large Kaiser motorized copy stand, this setup not only provides excellent image quality as well as instant image capture for oversize material that is unable to be scanned on the Indus machines, but is also used for imaging precious and fragile tomes from the library's extensive rare book collection.

Smithsonian Libraries
In 2010, SIL began scanning and uploading oversize content (folios) to Internet Archive that could not be scanned on a Scribe machine. SIL is using a PhaseOne P65+ camera for imaging, CaptureOne software for image processing, and locally developed web-based system called Macaw for workflow and metadata management. Macaw (among other things) processes and packages the images with the various types of metadata necessary for creating digital book objects in BHL. SIL hopes to make Macaw available to BHL partners, and anyone else who is interested, as open-source software sometime in 2011. White paper describing the process is located here.

Created: dukeg1 Jan 21, 2010 7:50 am
Revised: chapmanje Apr 27, 2015 11:40 am (20 revisions)
links to this page | print this page | tag cloud
Contributions to are licensed under a Creative Commons Attribution Share-Alike 3.0 License. Creative Commons Attribution Share-Alike 3.0 License
Portions not contributed by visitors are Copyright 2015 Tangient LLC
TES: The largest network of teachers in the world