Inspiring discovery through free access to biodiversity knowledge.

The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.
BHL also serves as the literature component of the Encyclopedia of Life .

Purposeful Gaming grant awarded Sept 2013

Purposeful gamification of BHL

Here's a blog post from Chris Freeland, BHL Technical Director, on some basic parameters around the "purposeful gamification" of BHL:

Potential Tasks that gaming could be helpful with

  • Correcting OCR
    • BHL needs a mechanism for correcting the tens of millions of pages in its collection for better text indexing and more accurate data mining. BHL has uncorrected OCR from ABBY FineReader for nearly all of its scanned pages. In some cases the OCR is quite good, in some cases it's completely unintelligible.
  • Table of Contents rekeying
    • BHL has more than 300,000 pages tagged as a "Table of Contents" (and more that aren't tagged), which lists articles, chapters, and other structural boundaries in BHL scanned books. We'd like to have those pages keyed into a select number of fields so that we can index & find BHL content by article title, author of article, and to provide a more convenient way of browsing BHL texts online.
    • Example:
  • Rekeying scientific names
    • BHL has OCR for each of its scanned pages. We send that OCR to the TaxonFinder algorithm, which identifies strings that "look like" scientific names, then compares them to NameBank, a list of known names, which is incomplete (there does not exist a comprehensive list of all the world's scientific names). We have more than 90 million strings (as of Feb 2012) that have been identified by the algorithm as a possible scientific name, and 76 million of those candidate strings have been matched to a known name. The remaining 14 million candidates are where all the intriguing stuff resides - is it a name that's not in NameBank, is it a misspelling or misOCRed string that matches to a known name, or even is it a name that's only ever appeared in the published record once & been lost to science ever since?
    • Here's a simple and relatable way of envisioning the challenge:ursuspaniculata.gif
  • Article-ization
    • BHL already has a UI that allows users to select non-contiguous pages from a scanned volume and bundle those pages into a PDF that's created on the fly and delivered to them via e-mail notification. This is conceptually similar to the Table of Contents rekeying, but still different because not all published journals have Tables of Contents, and in historic literature not all pages for an article were printed together (plates were often printed at the end of an issue).
    • Example:, then enter your e-mail address & click next to select pages.
  • Pagination
    • BHL receives page-level metadata through its scanning collaboration with Internet Archive and the institutions who share their scans with BHL. In some cases, there are no page numbers expressed in the metadata, and no indications of the type of page (i.e. Table of Contents, Text, Cover, Illustration, Map, Blank), which makes navigating those books nearly impossible without a time consuming page-by-page review.
    • Example with no page numbers:
  • Image identification & extraction
    • BHL has coordinate-based OCR for nearly all of its scanned pages. We'd like to automatically identify the objects within a scanned page that are a "visual resource" (i.e. figures, plates, illuminated texts, tables) and then provide a way to rekey the caption or other descriptive information.
    • Example:
  • Adding scientific names & common names
    • Related to the two tasks above, BHL staff manually identify illustrations and set page types for particular books of interest and upload those images to Flickr, where they can be tagged & indexed by others within the Flickr community. Of particular interest to our users is being able to find illustrations by scientific name and by common name. When we add a "machine tag" with a scientific name to an image in Flickr, that image is then indexed by the Encyclopedia of Life and made available to its large community of users.
    • Example: < be sure to click the "Show Machine Tags" toggle
  • Authors matching & merging
  • Titles matching & merging
  • Geolocation (identifying places strings and contextualizing them)
  • Editing volume information & re-sequencing volumes
  • Adding/editing date information

Supporting Documentation

Trish Rose-Sandler, BHL Data Analyst, created a one page doc for a gamification event in St. Louis. Its a good overview of what tasks we want to accomplish with gaming and a qr code link to Chris' blog post for his ideas on gaming and BHL. Feel free to edit and reuse as you see fit in talking with other game designers!

Contact Us!

If you would like to engage with BHL about gaming ideas, please contact us via the main BHL Feedback, or via @BioDivLibrary or Facebook. We're interested in collaborating with game designers, UI/UX specialists, and other creative people about ways of using games to solve the data challenges made possible through BHL's mass digitization activities, and finding new and novel ways of expanding BHL's audience.

View Terms Of Use | Privacy
Revised: lipscombb Oct 21, 2013 6:11 am (10 revisions)
links to this page | print this page | Visit
[Invalid Include: Page not found: HTML_div_close]
Contributions to are licensed under a Creative Commons Attribution Share-Alike 3.0 License. Creative Commons Attribution Share-Alike 3.0 License
Portions not contributed by visitors are Copyright 2018 Tangient LLC
TES: The largest network of teachers in the world