BHL+and+Gaming

toc =Purposeful gamification of BHL= Here's a blog post from Chris Freeland, BHL Technical Director, on some basic parameters around the "purposeful gamification" of BHL: []
 * Purposeful Gaming grant awarded Sept 2013 **

=Potential Tasks that gaming could be helpful with= >> BHL needs a mechanism for correcting the tens of millions of pages in its collection for better text indexing and more accurate data mining. BHL has uncorrected OCR from ABBY FineReader for nearly all of its scanned pages. In some cases the OCR is quite good, in some cases it's completely unintelligible.
 * Correcting OCR
 * Good OCR: http://biodiversitylibrary.org/page/38386581 (Click View Text)
 * Bad OCR: http://biodiversitylibrary.org/page/358022 (Click View Text)
 * Table of Contents rekeying
 * BHL has more than 300,000 pages tagged as a "Table of Contents" (and more that aren't tagged), which lists articles, chapters, and other structural boundaries in BHL scanned books. We'd like to have those pages keyed into a select number of fields so that we can index & find BHL content by article title, author of article, and to provide a more convenient way of browsing BHL texts online.
 * Example: http://www.biodiversitylibrary.org/page/28005704
 * Rekeying scientific names
 * BHL has OCR for each of its scanned pages. We send that OCR to the TaxonFinder algorithm, which identifies strings that "look like" scientific names, then compares them to NameBank, a list of known names, which is incomplete (there does not exist a comprehensive list of all the world's scientific names). We have more than 90 million strings (as of Feb 2012) that have been identified by the algorithm as a possible scientific name, and 76 million of those candidate strings have been matched to a known name. The remaining 14 million candidates are where all the intriguing stuff resides - is it a name that's not in NameBank, is it a misspelling or misOCRed string that matches to a known name, or even is it a name that's only ever appeared in the published record once & been lost to science ever since?
 * Here's a simple and relatable way of envisioning the challenge:[[image:ursuspaniculata.gif]]
 * Article-ization
 * BHL already has a UI that allows users to select non-contiguous pages from a scanned volume and bundle those pages into a PDF that's created on the fly and delivered to them via e-mail notification. This is conceptually similar to the Table of Contents rekeying, but still different because not all published journals have Tables of Contents, and in historic literature not all pages for an article were printed together (plates were often printed at the end of an issue).
 * Example: http://www.biodiversitylibrary.org/pdfgen/89028, then enter your e-mail address & click next to select pages.
 * Pagination
 * BHL receives page-level metadata through its scanning collaboration with Internet Archive and the institutions who share their scans with BHL. In some cases, there are no page numbers expressed in the metadata, and no indications of the type of page (i.e. Table of Contents, Text, Cover, Illustration, Map, Blank), which makes navigating those books nearly impossible without a time consuming page-by-page review.
 * Example with no page numbers: http://biodiversitylibrary.org/page/38015709
 * Image identification & extraction
 * BHL has coordinate-based OCR for nearly all of its scanned pages. We'd like to automatically identify the objects within a scanned page that are a "visual resource" (i.e. figures, plates, illuminated texts, tables) and then provide a way to rekey the caption or other descriptive information.
 * Example: http://biodiversitylibrary.org/page/2010582
 * Adding scientific names & common names
 * Related to the two tasks above, BHL staff manually identify illustrations and set page types for particular books of interest and upload those images to Flickr, where they can be tagged & indexed by others within the Flickr community. Of particular interest to our users is being able to find illustrations by scientific name and by common name. When we add a "[|machine tag]" with a scientific name to an image in Flickr, that image is then indexed by the Encyclopedia of Life and made available to its large community of users.
 * Example: http://www.flickr.com/photos/biodivlibrary/6851427763/in/set-72157629257429981/ < be sure to click the "Show Machine Tags" toggle
 * Authors matching & merging
 * BHL receives its author metadata from multiple library catalogues and systems. We would like to create "canonical" records within BHL for an author, ideally matched to [|VIAF], and then merge variants into that canonical record so that we can pull up all works by a given author, illustrator, or entity.
 * Example: http://www.biodiversitylibrary.org/search.aspx?SearchTerm=smith&SearchCat=A
 * Titles matching & merging
 * Similar to the above, we need to merge duplicate scans into a single title. Duplicate scanning occurs both accidentally (there is no international register of digitized works, and scanning projects are left to their own devices to avoid duplication of effort), as well as intentionally (two works with different hand-colored illustrations, or a volume with marginalia from a known scientist, as in [|this book which contains handwritten annotations from Charles Darwin])
 * Example of duplication: http://biodiversitylibrary.org/Search.aspx?searchTerm=flora%20of%20colorado&searchCat=
 * Geolocation (identifying places strings and contextualizing them)
 * Similar to name-finding, BHL would like to find place names contained throughout all of its texts.
 * Existing map from geocoded strings from subject headings only: http://biodiversitylibrary.org/browse/map [|Description] of how that map is generated.
 * Editing volume information & re-sequencing volumes
 * Adding/editing date information

=Supporting Documentation= Trish Rose-Sandler, BHL Data Analyst, created a one page doc for a gamification event in St. Louis. Its a good overview of what tasks we want to accomplish with gaming and a qr code link to Chris' blog post for his ideas on gaming and BHL. Feel free to edit and reuse as you see fit in talking with other game designers!
 * [[file:biodivlib/gamification.ppt|gamification.ppt]]

=Contact Us!= If you would like to engage with BHL about gaming ideas, please contact us via the main [|BHL Feedback], or via [|@BioDivLibrary] or [|Facebook]. We're interested in collaborating with game designers, UI/UX specialists, and other creative people about ways of using games to solve the data challenges made possible through BHL's mass digitization activities, and finding new and novel ways of expanding BHL's audience.

include page="include_pagefooter"