Inspiring discovery through free access to biodiversity knowledge.

The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.
BHL also serves as the literature component of the Encyclopedia of Life .


View Terms Of Use | Privacy
Revised: trosesandler1 May 31, 2013 9:04 am (106 revisions)
links to this page | print this page | Visit
[Invalid Include: Page not found: HTML_div_close]


A draft of the schema was posted for public review Aug 31, 2012. As of Feb 2013 we are still accepting feedback and would welcome public input via our survey

Primary Audiences for natural history illustrations

as identified in the original NEH proposal
  1. Artists
  2. Biologists
  3. Humanities Scholars
  4. Library staff and scholars they serve
  5. Education & Outreach

Existing schemas consulted:

Things we might want to describe

  • Entire books (as a single PDF, for instance)- no, we determined entire not a use case for this project
  • Plates (one page with multiple illustrations)
  • Illustrations (either by itself, on a page by itself, or as part of a page with other text on it)
  • Photographs (either by itself, on a page by itself, or as part of a page with other text on it)
  • Maps (either by itself, on a page by itself, or as part of a page with other text on it)
  • Graphs -

Potential Fields and Controlled Values

Question: would it be overly optimistic/detailed to have each field have a "notes" field indicating why that decision was taken? This is of limited but potentially valuable use for the high level fields (e.g. "Type: illustration"; "TypeNote: Determined by user mrvaidya on June 1, 2012"), but could be *very* useful for some of the other fields (e.g. "Subject: verbatimScientificName: Sepsis annulipes; scientificName: Sepsis annulipes; acceptedNameUsage: Enicita annulipes"; "SubjectNote: The species is named as 'Sepsis annulipes' in the illustration; 'Enicita annulipes' is the current scientific name as per ITIS: downloaded on June 1, 2012"); that way, if anybody disagrees with this judgment call, they can either provide BHL with more recent information, or can go to ITIS to get the record corrected there.

To get an idea of what this schema would look like, I tried to put it up on the Wikimedia Commons on an existing BHL image there: (permanent link: -- I like how it's turned out, especially with all the notes explaining why certain terms were chosen.

How do we allow for notes, indexing and display in a flickr or Wikimedia environment? Will be more difficult in these environments so may need to reserve for a specialist UI. i.e. Generalists will use Flickr but may need other UI for more parsed and detailed data.


All will be visual resources, but more specifically illustration, map, photograph, etc

Cross walks to:
  • dc:type
  • vra:worktype
  • lido:objectClassification:objectWorkType:term

Values for type should come from controlled vocabularies and we may also want to track the vocab source and id e.g.
type=paintings (visual works) source=aat id=300033618

Wikimedia Commons has:

Getty Art and Architecture Thesaurus has:


would include any person or corporate entity involved in the creation/production of a visual resource.
Will need role qualifier here along with name and possibly id from a controlled vocab e.g. illustrator, engraver, artists, printer, etc. (Collector could also be useful here.)
AFAIK, the year of death of the creator is the most important factor in deciding the copyright status of the image, so that would be a great thing to track.

How do we handle the case where the illustrator is not explicitly mentioned, either because it's not clear who illustrated what or because no illustrators are mentioned at all? I think this will be very common so we will leave blank when unknown

Where to put name of person who said this is x species?

Crosswalks to:
  • dc:creator
  • dc:contributor
  • vra:agent:name[@refid]
  • vra:agent:dates[@type=life]:earliestDate
  • vra:agent:dates[@type=life]:latestDate
  • vra:agent:role
  • lido:event:eventActor:actorInRole:actor:actorID
  • lido:event:eventActor:actorInRole:actor:nameActor:appellationValue
  • lido:event:eventActor:actorInRole:actor:vitalDatesActor:earliestDate
  • lido:event:eventActor:actorInRole:actor:vitalDatesActor:latestDate
  • lido:event:eventActor:actorInRole:roleActor:term


May be nothing more than a species name, plate # or handwritten note if there is nothing else on the page. often times there will be no clear title for an image. Could be simply species name or value could come from Description field. Title is very important in Flickr for identifying which image to choose in a search result– will need to give best practices on when there is no obvious title create title from species name or source?

We planned to keep inscriptions in the "inscriptions" element, but VRA Core 4 does allow us to use vra:title:@type to identify titles which are also inscriptions.

What is title here?

Crosswalks to:
  • dc:title
  • vra:title:@type
  • lido:objectIdentification:title:appelationValue


(contextual description)- themes, iconography, allegories that would be done by expert scholars
Will probably want a source qualifier here so that we know whether description was transcribed directly from page of book or added by someone

Q. What formatting options should we allow in here, and in what formatting language (HTML, Markdown, something else?)

Crosswalks to:
  • dc:description
  • vra:description
  • lido:event:description


Bibliographic citation of book or journal in BHL that contains illus. This will automatically get generated by and affiliated with the image by BHL.

Also we may want a type attribute to say the source is a type=book or type=journal as well as an place to put an identifier for the source like an ISBN. See Source element in VRACore to see how they do it. Use a best practice like APA for formatting the citation.

URL - persistent link to page of BHL book where image is found (we don't see a need to link to title page of book). This will also automatically get generated by and affiliated with the image in BHL. Will need attribute type of "url" or "href" May also want a DOI. What about the page number or plate where image came from? Do we put it as its own field or as part of Source????

VRA Core 4 doesn't support DOI refids. Can we add that for this schema?

Crosswalks to:
  • dc: identifier
  • dc:source
  • vra:source:name[@type] e.g. name=Gascoigne, Bamber, The Great Moghuls, New York: Harper & Row, 1971 type=book
  • vra:source:refid[@type] e.g. refid=06011673 type=ISBN
  • lido:administrativeMetadata/resourceWrap/resourceSet/resourceRepresentation/linkResource?


Q: do we need scanned date? Yes for management but is often more administrative metadata and gets generated automatically while scanning. Should schema cover administrative metadata or just descriptive? Schema should be more about descriptive metadata.

Q: Is date Created same as published? Date illustrator created or date published in a book or journal? see this example of a portrait whose date is different from date of when the book was published (

Need types= publication, engraved, illustrated, copyright

Should we use VRA Core 4's type="view" ("Date when the image being described was captured (image record only.)") to indicate the scanned date, or should we create our own type="scanned"?

Crosswalks to:
  • dc:dateCreated
  • dc:dateIssued
  • dc:dateScanned (Q: how would a user knowing the dataScanned be beneficial?)
  • vra:date[@type=creation]:earliestDate
  • vra:date[@type=creation]:latestDate
  • lido:event:eventDate:date:earliestDate
  • lido:event:eventDate:date:latestDate
  • lido:event:eventType:term


Do we want to put this into a subjectSet like VRA Core does?

Can we identify parts of the image as containing a particular subject?
  • Like: subject at (15, 15, 30, 30) is {type = "illustrated_life", dwc:scientificName = "Canis lupus familiaris", dwc:vernacularName = "dog"}

  • Scientific names (and rank if possible- e.g. genus, order) for species
    • Both the one in the book (dwc:scientificName) as well as the currently accepted name (dwc:acceptedNameUsage) Q: Would dwc:taxonRemarks and/or dwc:TaxonomicStatus be useful? Maybe not at this time.
  • Common names for species (will need a qualifier for language here)
    • dwc:vernacularName (if we're using XML or VRA, we can use "xml:lang" to identify the language here) may not need to enter and store if that common name is available somewhere else online.

Crosswalks to:
  • dc: subject
  • dwc:scientificName, dwc:acceptedNameUsage, dwc:vernacularName
  • vra:subject
  • lido:relations:subject
VRA also has subject type attributes which would be useful such as
  • for names: corporateName, familyName, otherName, personalName, scientificName
  • for locations: builtworkPlace, geographicPlace, otherPlace
  • for descriptive, narrative, or thematic content: conceptTopic, descriptiveTopic, iconographicTopic, otherTopic

Some that we think will be applicable to BHL images include:
  1. personalName (e.g. Linneaus, Carl);
  2. scientificName (e.g. Zea mays),
    1. I'm imagining something like: `<subject type="scientificName" name="Z. masy" dwc:scientificName="Zea mays Auth." dwc:vernacularName="Maize" />`
  3. geographicPlace (e.g. Costa Rica)
  4. builtworkPlace (e.g. Taj Mahal)
  5. iconographicTopic (e.g. and ( Can use classification systems like Iconclass to describe these themes
Example of image that would contain some of the above
  • personalName=Sir Tho Millington
  • builtworkPlace=Royal London College of Physicians
  • geographicPlace=London, England


would this be for book or original image? e.g. note this image has a copyright date of 1901 - what if the book was published after 1922? Which copyright date is applied?

What determines copyright date for image? date of illustration, life dates of illustrator, date of published book?
We will need copyright fields but maybe we populate them and not users? (i.e. fields maybe viewable but not editable.) This field probably needs a longer discussion with others input. has a nice public domain calculator to determine whether works are out of copyright in the EU:


All marks or written words added to the object at the time of production or in its subsequent history, including signatures, dates, dedications, texts, and colophons, as well as marks, such as the stamps of silversmiths, publishers, or printers. type attributes could be added such as : signature, mark, caption, date, text, translation, other

Needed? Probably - look at Darwin's annotations as use case

Crosswalks to:
  • vra:inscription
  • lido:objectIdentification:inscriptions
  • lido:objectIdentification:inscriptions:inscriptionsTranscription [lang variants]
  • lido:objectIdentification:inscriptions:inscriptionsDescription

Relationships between images

How do these fields relate to fields in the applications where BHL images will be shared?

Flickr fields (from
  • Title: Title
  • Description: Description
  • Tags
  • Notes (a description localized to parts of an image)
  • License
    • It would be a huge help to the Wikimedia Commons and other resources if images older than a particular threshold (say, 1850) could be automatically categorized as "No copyright restrictions known" under the Flickr Commons scheme, which is (I think?) designed to shield partner organizations from legal risks in case of incorrect copyright assignment, so that might be a good model to follow. If possible.

Wikimedia Commons
  • Template:Informationis the general template for Commons metadata
    • Template:Artwork is "for the description of images of artworks, especially those residing in museums or galleries. Originally meant for pictorial art (paintings, and by extension: frescoes, pencils, engravings, stained glass, etc.), but now recommended for sculptures, archeological artifacts, artistic photographs, museum collections, etc."
    • Template:Book is "used to provide formatting to the basic information about books and should be used in place of information when a set of images (either multiple files or a single DjVu or PDF) comprise the full text of a book".
  • The Commons' organization is through categories. The category scheme is described here, but in a nutshell: any category can belong to any number of other categories. This encourages multiple categorization (so a single image would be categorized by image format, image source, copyright status, source country, source decade, and any number of other categorizations).
  • The Commons uses Creator templates to collect images photographed/illustrated by the same individual:

Audiences to give feedback on schema before finalizing

  • Consumers of images: EOL, ARTStor, Wiki Commons, ITunes
  • Reps from the 5 primary users
  • current BHL users


Q: To what extent do we want to have the schema be more linked to the semantic web? Identifiers are key to linking to other ids online.
Q: What is the End product for the schema? – Would include a list of elements, subelements, attributes. Some Best practice guidelines. An .xsd document (Gaurav can do). Test environments – Wikimedia and Flickr but what about a UI to handle the indexing, display and notes? Not sure yet.
Do some tag testing in Wikimedia for now – see if we can develop a test interface

Related projects

Meeting Notes

do we need scanned date? Yes for management but is often more administrative metadata and gets generated automatically while scanning. Should schema cover administrative metadata or just descriptive? Schema should be more about descriptive metadata.
Q: Is Created same as published? Date illustrator created or date published in a book or journal?
Need types= publication, engraved, illustrated, copyright
Contributions to are licensed under a Creative Commons Attribution Share-Alike 3.0 License. Creative Commons Attribution Share-Alike 3.0 License
Portions not contributed by visitors are Copyright 2018 Tangient LLC
TES: The largest network of teachers in the world