Art+of+Life+-+schema+work

toc include page="include_pagefooter"

=**Schema**=


 * A [|draft of the schema] was posted for public review Aug 31, 2012. As of Feb 2013 we are still accepting feedback and would welcome public input via our [|survey]**

**Primary** Audiences for natural history illustrations
as identified in the original NEH proposal
 * 1) Artists
 * 2) Biologists
 * 3) Humanities Scholars
 * 4) Library staff and scholars they serve
 * 5) Education & Outreach

Existing schemas consulted:

 * [|Dublin Core Metadata Initiative] (dc)
 * [|VRA Core]
 * [|VRA schemas]
 * TDWG
 * [|DarwinCore](dwc)
 * See particularly the [|list of terms]
 * [|Audubon Core]
 * [|CDWA]
 * [|LIDO]this schema may actually be a better fit for nat hist than CDWA -LIDO describes itself as meant for "descriptive information about museum objects. It can be used for all kinds of object, e.g. art, architecture, cultural history, history of technology, and natural history"
 * [|Introductory material]
 * [|International Image Operability Framework]
 * W3C:
 * W3C Image Annotation on the Semantic Web
 * Ontology for Media Resource

Things we might want to describe

 * Entire books (as a single PDF, for instance)- no, we determined entire not a use case for this project
 * Plates (one page with multiple illustrations)
 * Illustrations (either by itself, on a page by itself, or as part of a page with other text on it)
 * Photographs (either by itself, on a page by itself, or as part of a page with other text on it)
 * Maps (either by itself, on a page by itself, or as part of a page with other text on it)
 * Graphs - http://www.biodiversitylibrary.org/page/1395071

Potential Fields and Controlled Values
Question: would it be overly optimistic/detailed to have each field have a "notes" field indicating why that decision was taken? This is of limited but potentially valuable use for the high level fields (e.g. "Type: illustration"; "TypeNote: Determined by user mrvaidya on June 1, 2012"), but could be *very* useful for some of the other fields (e.g. "Subject: verbatimScientificName: Sepsis annulipes; scientificName: Sepsis annulipes; acceptedNameUsage: Enicita annulipes"; "SubjectNote: The species is named as 'Sepsis annulipes' in the illustration; 'Enicita annulipes' is the current scientific name as per ITIS: http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=144995 downloaded on June 1, 2012"); that way, if anybody disagrees with this judgment call, they can either provide BHL with more recent information, or can go to ITIS to get the record corrected there.

To get an idea of what this schema would look like, I tried to put it up on the Wikimedia Commons on an existing BHL image there: http://commons.wikimedia.org/wiki/File:Britishentomologyvolume8Plate245.jpg (permanent link: http://commons.wikimedia.org/w/index.php?title=File:Britishentomologyvolume8Plate245.jpg&oldid=71932983) -- I like how it's turned out, especially with all the notes explaining why certain terms were chosen.

How do we allow for notes, indexing and display in a flickr or Wikimedia environment? Will be more difficult in these environments so may need to reserve for a specialist UI. i.e. Generalists will use Flickr but may need other UI for more parsed and detailed data.

Type
All will be //visual resources//, but more specifically //illustration, map, photograph, etc//

Cross walks to:
 * dc:type
 * vra:worktype
 * lido:objectClassification:objectWorkType:term

Values for type should come from controlled vocabularies and we may also want to track the vocab source and id e.g. type=paintings (visual works) source=aat id= 300033618

Wikimedia Commons has: >> http://biodiversitylibrary.org/page/32033266 >> >>> eg. interesting challenge - is it a drawing or map? http://biodiversitylibrary.org/page/32557627 >> >>> >>> >> >>> >>> >> >>> >>
 * [|Images]
 * [|Diagrams]
 * [|Drawings]http://biodiversitylibrary.org/page/39628975
 * *** e.g. of a drawing that is a figure http://biodiversitylibrary.org/page/15492181
 * [|Maps] ([|Atlas]) Do we call this a map or drawing? http://biodiversitylibrary.org/page/21920173
 * http://www.biodiversitylibrary.org/page/13085795
 * [|Paintings] hand-colored etching http://archive.org/stream/mobot31753000814712#page/11/mode/1up
 * [|Photos]http://biodiversitylibrary.org/page/15492055
 * http://www.biodiversitylibrary.org/page/4686389 documentary photo of an expedition
 * http://www.biodiversitylibrary.org/page/4686866 shark
 * [|Symbols] there may be some in form of [|Monograms] like calligraphic letters at the beginning of a chapter http://www.biodiversitylibrary.org/page/7771352
 * other -
 * fossils http://biodiversitylibrary.org/page/40166251 [[image:http://ia601201.us.archive.org/BookReader/BookReaderImages.php?zip=/20/items/Geologieetpaleo00n/Geologieetpaleo00n_jp2.zip&file=Geologieetpaleo00n_jp2/Geologieetpaleo00n_0023.jp2&scale=37&rotate=0 width="138" height="197"]]

Getty Art and Architecture Thesaurus has:
 * [|Visual Works]
 * [|Drawing]
 * //TBD//

Agent
would include any person or corporate entity involved in the creation/production of a visual resource. Will need role qualifier here along with name and possibly id from a controlled vocab e.g. illustrator, engraver, artists, printer, etc. (Collector could also be useful here.) AFAIK, the year of death of the creator is the most important factor in deciding the copyright status of the image, so that would be a great thing to track.

How do we handle the case where the illustrator is not explicitly mentioned, either because [|it's not clear who illustrated what] or because [|no illustrators are mentioned at all]? I think this will be very common so we will leave blank when unknown

Where to put name of person who said this is x species?

Crosswalks to:
 * dc:creator
 * dc:contributor
 * vra:agent:name[@refid]
 * vra:agent:dates[@type=life]:earliestDate
 * vra:agent:dates[@type=life]:latestDate
 * vra:agent:role
 * lido:event:eventActor:actorInRole:actor:actorID
 * lido:event:eventActor:actorInRole:actor:nameActor:appellationValue
 * lido:event:eventActor:actorInRole:actor:vitalDatesActor:earliestDate
 * lido:event:eventActor:actorInRole:actor:vitalDatesActor:latestDate
 * lido:event:eventActor:actorInRole:roleActor:term

Title
May be nothing more than a species name, plate # or handwritten note if there is nothing else on the page. often times there will be no clear title for an image. Could be simply species name or value could come from Description field. Title is very important in Flickr for identifying which image to choose in a search result– will need to give best practices on when there is no obvious title create title from species name or source?

We planned to keep inscriptions in the "inscriptions" element, but VRA Core 4 does allow us to use vra:title:@type to identify titles which are also inscriptions.

What is title here? http://www.flickr.com/photos/biodivlibrary/6902381630/sizes/k/in/set-72157629748339153/

Crosswalks to:
 * dc:title
 * vra:title:@type
 * lido:objectIdentification:title:appelationValue

Description
(contextual description)- themes, iconography, allegories that would be done by expert scholars Will probably want a source qualifier here so that we know whether description was transcribed directly from page of book or added by someone

Q. What formatting options should we allow in here, and in what formatting language (HTML, [|Markdown], [|something else]?)

Crosswalks to:
 * dc:description
 * vra:description
 * lido:event:description

Source
Bibliographic citation of book or journal in BHL that contains illus. This will automatically get generated by and affiliated with the image by BHL.

Also we may want a type attribute to say the source is a type=book or type=journal as well as an place to put an identifier for the source like an ISBN. See Source element in VRACore to see how they do it. Use a best practice like APA for formatting the citation.

URL - persistent link to page of BHL book where image is found (we don't see a need to link to title page of book). This will also automatically get generated by and affiliated with the image in BHL. Will need attribute type of "url" or "href" May also want a DOI. What about the page number or plate where image came from? Do we put it as its own field or as part of Source????

VRA Core 4 doesn't support DOI refids. Can we add that for this schema?

Crosswalks to:
 * dc: identifier
 * dc:source
 * vra:source:name[@type] e.g. name=Gascoigne, Bamber, The Great Moghuls, New York: Harper & Row, 1971 type=book
 * vra:source:refid[@type] e.g. refid=06011673 type=ISBN
 * lido:administrativeMetadata/resourceWrap/resourceSet/resourceRepresentation/linkResource?

Date
Q: do we need scanned date? Yes for management but is often more administrative metadata and gets generated automatically while scanning. Should schema cover administrative metadata or just descriptive? Schema should be more about descriptive metadata.

Q: Is date Created same as published? Date illustrator created or date published in a book or journal? see this example of a portrait whose date is different from date of when the book was published ( http://archive.org/stream/mobot31753003125132#page/n18/mode/1up)

Need types= publication, engraved, illustrated, copyright

Should we use VRA Core 4's type="view" ("Date when the image being described was captured (image record only.)") to indicate the scanned date, or should we create our own type="scanned"?

Crosswalks to:
 * dc:dateCreated
 * dc:dateIssued
 * dc:dateScanned (Q: how would a user knowing the dataScanned be beneficial?)
 * vra:date[@type=creation]:earliestDate
 * vra:date[@type=creation]:latestDate
 * lido:event:eventDate:date:earliestDate
 * lido:event:eventDate:date:latestDate
 * lido:event:eventType:term

Subjects
Do we want to put this into a subjectSet like VRA Core does?

Can we identify parts of the image as containing a particular subject?
 * Like: subject at (15, 15, 30, 30) is {type = "illustrated_life", dwc:scientificName = "Canis lupus familiaris", dwc:vernacularName = "dog"}

>>
 * Scientific names (and rank if possible- e.g. genus, order) for species
 * Both the one in the book (dwc:scientificName) as well as the currently accepted name (dwc:acceptedNameUsage) Q: Would dwc:taxonRemarks and/or dwc:TaxonomicStatus be useful? Maybe not at this time.
 * Common names for species (will need a qualifier for language here)
 * dwc:vernacularName (if we're using XML or VRA, we can use "xml:lang" to identify the language here) may not need to enter and store if that common name is available somewhere else online.

Crosswalks to: VRA also has subject type attributes which would be useful such as
 * We could also link to a species RDF document, but it'd probably be worth it to have dwc:scientificName and dwc:vernacularName; //Canis familiaris//can be:
 * http://lsid.tdwg.org/urn:lsid:zoobank.org:act:05C23FE8-F45D-4EA6-A309-46864DE24097
 * http://lsid.tdwg.org/urn:lsid:ubio.org:namebank:113727
 * http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=726821
 * Would we at the very least want an identifier and the source or type of id?
 * dc: subject
 * dwc:scientificName, dwc:acceptedNameUsage, dwc:vernacularName
 * vra:subject
 * lido:relations:subject
 * for names: corporateName, familyName, otherName, personalName, scientificName
 * for locations: builtworkPlace, geographicPlace, otherPlace
 * for descriptive, narrative, or thematic content: conceptTopic, descriptiveTopic, iconographicTopic, otherTopic

Some that we think will be applicable to BHL images include: Example of image that would contain some of the above http://archive.org/stream/mobot31753003125132#page/n52/mode/1up
 * 1) personalName (e.g. Linneaus, Carl);
 * 2) scientificName (e.g. Zea mays),
 * 3) I'm imagining something like: ``
 * 4) geographicPlace (e.g. Costa Rica)
 * 5) builtworkPlace (e.g. Taj Mahal)
 * 6) iconographicTopic (e.g. http://archive.org/stream/mobot31753003125132#page/n292/mode/1up) and (http://archive.org/stream/mobot31753003125132#page/n136/mode/1up) Can use classification systems like Iconclass to describe these themes http://www.iconclass.org/
 * personalName=Sir Tho Millington
 * builtworkPlace=Royal London College of Physicians
 * geographicPlace=London, England

Copyright
would this be for book or original image? e.g. http://www.flickr.com/photos/biodivlibrary/6276431243/in/set-72157629748339153/ note this image has a copyright date of 1901 - what if the book was published after 1922? Which copyright date is applied? e.g. http://archive.org/stream/mobot31753003125132#page/n18/mode/1up

What determines copyright date for image? date of illustration, life dates of illustrator, date of published book? We will need copyright fields but maybe we populate them and not users? (i.e. fields maybe viewable but not editable.) This field probably needs a longer discussion with others input.

Europeana.eu has a nice public domain calculator to determine whether works are out of copyright in the EU: http://outofcopyright.eu/calculator.html

Inscription
All marks or written words added to the object at the time of production or in its subsequent history, including signatures, dates, dedications, texts, and colophons, as well as marks, such as the stamps of silversmiths, publishers, or printers. type attributes could be added such as : signature, mark, caption, date, text, translation, other

Needed? Probably - look at Darwin's annotations as use case

Crosswalks to:
 * vra:inscription
 * lido:objectIdentification:inscriptions
 * lido:objectIdentification:inscriptions:inscriptionsTranscription [lang variants]
 * lido:objectIdentification:inscriptions:inscriptionsDescription

Relationships between images

 * For example, it might be useful to link http://commons.wikimedia.org/wiki/File:Flickr_-_BioDivLibrary_-_n98_w1150_(2).jpg as a coloured version of http://commons.wikimedia.org/wiki/File:Flickr_-_BioDivLibrary_-_n96_w1150_(2).jpg

How do these fields relate to fields in the applications where BHL images will be shared?
Flickr fields (from http://www.flickr.com/services/api/flickr.photos.getInfo.html)
 * Title: Title
 * Description: Description
 * Tags
 * Notes (a description localized to parts of an image)
 * [|EOL machine tags]
 * License
 * It would be a //huge// help to the Wikimedia Commons and other resources if images older than a particular threshold (say, 1850) could be automatically categorized as "No copyright restrictions known" [|under the Flickr Commons scheme], which is (I think?) designed to shield partner organizations from legal risks in case of incorrect copyright assignment, so that might be a good model to follow. If possible.

Wikimedia Commons
 * [|Template:Information]is the general template for Commons metadata
 * [|Template:Artwork] is "for the description of images of artworks, especially those residing in museums or galleries. Originally meant for pictorial art (paintings, and by extension: frescoes, pencils, engravings, stained glass, etc.), but now recommended for sculptures, archeological artifacts, artistic photographs, museum collections, etc."
 * [|Template:Book] is "used to provide formatting to the basic information about books and should be used in place of when a set of images (either multiple files or a single DjVu or PDF) comprise the full text of a book".
 * The Commons' organization is through categories. The category scheme is [|described here], but in a nutshell: any category can belong to any number of other categories. This encourages multiple categorization (so a single image would be categorized by image format, image source, copyright status, source country, source decade, and any number of other categorizations).
 * The Commons uses Creator templates to collect images photographed/illustrated by the same individual: http://commons.wikimedia.org/wiki/Template:Creator

Audiences to give feedback on schema before finalizing

 * Consumers of images: EOL, ARTStor, Wiki Commons, ITunes
 * Reps from the 5 primary users
 * current BHL users

Other
Q: To what extent do we want to have the schema be more linked to the semantic web? Identifiers are key to linking to other ids online. Q: What is the End product for the schema? – Would include a list of elements, subelements, attributes. Some Best practice guidelines. An .xsd document (Gaurav can do). Test environments – Wikimedia and Flickr but what about a UI to handle the indexing, display and notes? Not sure yet. Do some tag testing in Wikimedia for now – see if we can develop a test interface

Related projects

 * IMLS Digital Collections and Content - The IMLS DCC Flickr photostream (http://www.flickr.com/people/imlsdcc/) seeks to discover how users interact with DCC content on Flickr, as part of the DCC project's overarching goal of investigating the development of highly useful, meaningful, and usable digital collections. About [] . Their Flickr stream []
 * http://commons.wikimedia.org/wiki/Commons:GLAMwiki_toolset_project -- a project with "the goal of providing a set of tools to get materials from GLAM institutions onto Wikimedia Commons in a way that reuse can easily be tracked, and that Commons materials can easily be integrated back into the collection of the original GLAM or even other GLAMs."

Meeting Notes


do we need scanned date? Yes for management but is often more administrative metadata and gets generated automatically while scanning. Should schema cover administrative metadata or just descriptive? Schema should be more about descriptive metadata. Q: Is Created same as published? Date illustrator created or date published in a book or journal? Need types= publication, engraved, illustrated, copyright