Nov+12-13th+2013


 * //Art of LIfe 2nd Face to Face meeting// **

Attendees: Trish Rose-Sandler, William Ulate, Mike Lichtenberg, Mike Blomberg, Kyle Jaebker, Guarav Vaidya, Doug Holland, Joel Richard, Chris Freeland 12-1pm Lunch ,introductions, volunteers to take notes (William & Guarav) 1-3pm Algorithms 3-4:30pm Classifier 7-9pm Dinner Attendees: Trish Rose-Sandler, William Ulate, Mike Lichtenberg, Mike Blomberg, Kyle Jaebker, Guarav Vaidya, Doug Holland, Joel Richard, Rob Guralnick 8-8:30am Breakfast (bagels, cream cheese, fruit, coffee) 8:30-10:00am Schema 10-11:30am Crowdsourcing Descriptions 11:30-1:00 Lunch at the garden (group picture!) 1:15-3pm Sharing descriptions and images 3-4:00pm Wrapup, review schedule of completion,
 * //Agenda//**
 * //Tuesday Nov 12th// **
 * //Wednesday Nov 13th//**


 * Action Items from Nov 2013 meeting**

ACTION ITEM: Run algorithms on IA server (Kyle) ACTION ITEM: If we are unable to get algorithms running on IA server, contact BHL Egypt to see if we can run there (William) ACTION ITEM: Develop levels of confidence (from high to low) for whether pages contain an illustration. Factors to consider: if one or more algorithms say it has an illustration; sum of block coverage; word count per page. Ex. High confidence If algorithms both agree it contains an illustration and if sum of block coverage is more than 10% OR if 1 algorithm says it contains an illustration and if word count is 50% less than per page avg. We should be able to filter by these levels in the Classifier and they should display in the BHL portal(Trish, Kyle, Joel, Mike L.) ACTION ITEM: Add word count per page to JSON exports for gold standard set (Kyle, Joel) ACTION ITEM: Build functionality into BHL portal to allow users to view levels of confidence, particularly which algorithms said it contains an illustration (Mike L. ) ACTION ITEM: Investigate how we could allow BHL users to give us feedback on which pages contain illustrations. E.g. could we utilize the current PDF generator functionality so that users could click all pages with illustrations and send that file to us as a PDF? Could we track how many users said a page has an illustration so that we can convert page type to illustration only have 2 or 3 users have confirmed that? (Mike L. ) ACTION ITEM: Implement process for identifying new items for future processing by algorithms. E.g. could compare differences between old list and new. (Mike L. ) ACTION ITEM: Investigate why some books are missing pages in mongoDB when they exist in BHL. Kyle will give Mike dump of JSON from records processed at IMA so far to see if that group is missing pages. He could also give Mike the 40 million files he sent to MBL for upload and Mike could do page counts. (Kyle, Mike L.)
 * ALGORITHMS**

ACTION ITEM: Allow administrator to filter by level of confidence in Macaw in order to have users classify in stages (e.g. focus on pages with high confidence first) (Joel) ACTION ITEM: Review functionality in Macaw related to overall stats and determine if sufficient for administrator needs (Trish) ACTION ITEM: Continue testing size of JSON file imports into Macaw. (Mike L., Joel) ACTION ITEM: Add check box to Macaw for “no illustration on page” (Joel) ACTION ITEM: Allow administrator to view pages across multiple books (Joel) ACTION ITEM: Change list of Types to: map, photograph, illustration, diagram/chart. Determine if b&w and color should be included in type list or as separate checkbox. Add ability to choose types via keystrokes for efficiency (Joel) ACTION ITEM: Investigate whether functionality should be added to MOBOT’s paginator to enable MOBOT staff to apply similar types as Macaw (so we don’t have classifiers redoing work MOBOT is already doing). It would require changes to Botanicus and we need to determine how BHL portal would pick up this data from IA. (Mike B, Mike L)
 * CLASSIFIER**

ACTION ITEM: Trish will review Suzanne Pilsk’s feedback and determine what needs to be incorporated in to schema draft. ACTION ITEM: In order to better differentiate between a dwd: scientific name an dwc: accepted NameUsage we will change current example from Trifolium ochroleucum Huds to Sepsis annulipes/Thermira annulipes/Enicita annulipes. This example also shows how you can have more than 1 accepted name depending on the source (Guarav) ACTION ITEM: Add an example of how narrow subject terms can be linked to broader terms e.g. “Melbourne, Victoria: to “Victoria, Australia” to “Australia” or “Sepsis annulipes” to “Sepsidae” to “Insecta” to “Animalia” (Guarav) ACTION ITEM: Trish and Guarav will review definitions for VRA Core and Audubon Core to make sure we are matching their definitions (Trish, Guarav) ACTION ITEM: Review Audubon Core to see if there’s any elements we could use. AC has now been ratified by TDWG (Trish)
 * SCHEMA**

ACTION ITEM: Incorporate geographic subjects from BHL portal into records before they are sent to the description tools (Mike L.) ACTION ITEM: Determine how geographic subjects should map in Flickr - e.g. map to a tag? (Trish, Guarav) ACTION ITEM: Determine how geographic subjects should map in Wikimedia Commons - e.g. map to a note ?//“possible subjects that go along with this image include: Australia”//? (Trish, Guarav) ACTION ITEM: When we have uploaded pages to Flickr and WC we want to indicate that in BHL portal (i.e. have hyperlinks directly there) (Mike L.) ACTION ITEM: If there is a page in BHL that has not been uploaded to Flickr or WC we want to make it easy for BHL users to upload it themselves with the click of a button. ACTION ITEM: The resulting bibliography page for a Species name search could add info about which pages have illustrations. ACTION ITEM: Determine which files in Wikimedia Commons we will test extraction of metadata from e.g. using Guarav’s list of all files in WC that reference bhl urls will we only use those which contain “page” in the url so that we can always know which page it comes from? (Mike L., Trish) ACTION ITEM: Test extraction of descriptive metadata from Wikimedia Commons via the API (Mike L., Trish) ACTION ITEM: Determine if we want to also extract WC categories assigned to image files and/or captions added once they are include in articles (Mike L., Trish) ACTION ITEM: Investigate bulk uploading options - eg. GLAM Toolset Project [], talk with active US chapters such as DC [] general inquiries to info@wikimediadc.org ACTION ITEM: Talk to “copyvio” folks at Wikimedia about a good approach for copyright to be safe. Copyvio are Volunteers that oversee copyright problems on WC. (Trish) ACTION ITEM: Investigate ways to increase interaction and description of BHL images on WC
 * CROWDSOURCING DESCRIPTIONS**

1) publicize BHL content in WC - Wikipedia newsletter Open Access newsletter goes out once a month highlights whats new in Wikimedia Commons and we could promote when new pages we are up 2) Create a form for Wikimedia Commons to make it easier for a non-wikimedians to describe things. 3) Could put into WC Categories that indicate they need attention: e.g. “Media with insufficient description” only 165 files in that category now or “Media lacking a description” currently 165,000 files in that category (Trish)