Art of Life: Face to Face meeting

Oct 4-5th, 2012, STL


Center for Biodiversity Informatics, Missouri Botanical Garden, St. Louis
4651 Shaw Blvd, St. Louis, MO 63110


The Chase Park Plaza Hotel
212 N. Kingshighway, Saint Louis, MO 63108
*2 rooms have already been reserved for Charlie & Ed. Payment due upon checkout.

Trelease House
4466 Castleman Avenue (corner of Maury), St Louis MO 63110
*2 rooms have been reserved for Rob & Gaurav


Thurs Oct 4th

10-12:30pm Introductions, Art of Life workflow diagram and discussion
12:30-2:00pm Lunch and garden tour
2:00-5pm Schema update and discussion
5:00 - 7:00 Charlie and Ed to hotel, Rob & Gaurav to Trelease House
7:15pm - Dinner in Central West End

Fri Oct 5th

8:30am : Breakfast at CBI Office
9:00 - 11:00 Algorithm update and discussion
11:00 - 12:00pm : Defining system requirements for deploying algorithm on BHL Cluster
12:00 - 1:30 : Lunch (skype call with Richard from Wikimedia)
1:30 - 3:00 : Wrapup/next steps

Meeting Notes

Action Items from Oct 2012 face to face meeting
ACTION ITEM: William will have a call with Martin & Nathan at MBL Woods Hole to discuss use of cluster.
ACTION ITEM: Look into long term hosting of algorithm and results. (William)
ACTION ITEM: Determine how JP2 images will be converted to JPEGS whether its kakadusoftware, ImageMagick, Jasper or some other tool (Mike and Ed)
ACTION ITEM: Give Ed list of titles that have already been paginated (Mike)
ACTION ITEM: extract and display pixel info from scandata file in extraction algorithm analyzer UI (Ed)
ACTION ITEM: Test exporting algorithm results as JSON file and then ingesting into portal. (Mike and Ed)
ACTION ITEM: Investigate Python algorithm Orange for improving recall (Ed)
ACTION ITEM: Check the compression ration numbers. (Ed) [I’m not totally sure what this action item means but it was in relation to the algorithm finding book covers]
ACTION ITEM: Add sum of block coverage for text to Algorithm analyzer UI (Ed)

ACTION ITEM: Determine what metrics from extraction belong in the schema (Trish)
ACTION ITEM: Investigate further how to apply guids and whether we need separate guids for metadata record separate from the bhl page url (Trish)
ACTION ITEM: Investigate which metadata should be stored in image header. (Trish)
ACTION ITEM: Investigate how we could utilize citizen scientists to complete the classify step (Trish)
ACTION ITEM: Add to classifier functionality the ability to flag a page to be sent to the description tool (Trish)
ACTION ITEM: Investigate further whether the page url should be its own element as in the Core and update schema if needed (Trish)
ACTION ITEM: Determine which set of images to start with having the least amount of copyright issues in testing uploads to Wikimedia and Flickr (Trish)
ACTION ITEM: Generate list of unique values in MARC 260|a then review list for most recurring cities of publication – note this should only be for publication post 1840? Completion of this action item will depend on copyright strategy for choosing images (Trish and Mike)
ACTION ITEM: pull geographic subject value from 650 |z info into descriptive metadata record within the Classifier UI (Trish)
[ACTION ITEM] Get clarification from Chris as to what is meant by “Incorporate existing API to find scientific names on images”. Person tags subject: scientific name, once brought back into BHL, name is run through ubio. Need to rethink when this step happens. (Trish and William)

ACTION ITEM: Look into Flickr API to see if we could push full schema from Wikimedia commons to Flickr machine tags. (Trish)
ACTION ITEM: Investigate further the tools available on the web to verify when an image was last updated on Description platforms and how that info can be used to update image record in BHL portal. (Trish and Mike)
ACTION ITEM: Determine how pages already in Flickr will be updated when those same pages are run through the algorithm and reidentified to have illustrations. (Trish and Mike)
ACTION ITEM: Look into Zooniverse as potential tool for crowdsourcing the description of the images (Trish)
ACTION ITEM: Work with local wikipedians and others interested in BHL in batch uploading (Trish)
ACTION ITEM: Talk to other large uploaders to Wikimedia as to their experiences (Trish)

ACTION ITEM: Determine how to pull the url for an image from Flickr and Wikimedia when pushing tagged metadata record back into BHL portal. Find a place to display that info in the portal UI add text such as “this BHL image has been described in Flickr. If you would like to add or update information about this image please do so at and that information will update the record in BHL” (Trish and Mike)
ACTION ITEM: Look into how we can utilize taxonomic name web services to do query term expansion in BHL portal so that an image tagged with a species name can also be found if user is searching on the higher level taxonomic levels such as order or family. (Trish and Mike)
ACTION ITEM: Determine easiest way for ARTstor to pull our data (Trish)

ACTION ITEM: Review budget for money available for second face to face meeting. (Trish and William)

