Introduction

"The purpose of this dashboard is to give an overview of a multiple museum collections summarizing their items and their level of digitization. Currently two datasets are included from the Field Museum, Chicago and Naturalis, Leiden."

What does the dashboard do and not do?

  1. You cannot get to the data for a specific specimen/artifact.
  2. Backlog data is an estimate of the number of items that have not been individually digitized.
  3. Items are individual specimens/artifacts.
  4. Not all statistics apply equally across all collection types.
  5. The dashboard data is REAL as are its flaws and or virtues.
  6. Where possible we have used Darwin Core field standards.
  7. Not all combinations of search terms will return results.

Hover over any part of a chart to see the actual record count.

About backlog
We have used the data contained within our digital accession records to estimate both the amount of physical backlog and also the completeness of the data that we have about those items.

Backlog for the purposes of this site is defined as any item that does not have a digital record. Any cataloged non-digitized item is considered backlog, however the "ranking" of completeness indicates cataloged vs non-cataloged non-digitized records.

Data that defines "backlog" collections

The record created during or after an acquisition to establish ownership (provenance) of objects to the institution. These can be things like accession records, inventories, invoices, deed of gifts, etc. The record should include the following information:

An identifier for the collection or dataset from which the record was derived. (e.g. Botany, Mammals, Invertebrate Zoology, Fossil Invertebrates, etc.)

A spatial region or named place. The place the specimens/artifacts/objects originated roughly. This can be anything from a specific region (Porter County, Indiana) to the world at large (Worldwide)

A brief summary of the collection of objects at the time of acquisition. This can include any of the other fields listed here.

An identifier (preferably unique) for the record within the data set or collection.

The total number of specimens/artifacts/objects represented present at the time of the Occurrence.

The number of records (specimen/lots/artifacts) represented present at the time of the Occurrence.

Internal record number. The automatic record number assigned by the database to the record.

Lots vs Individuals
Some collections traditionally catalogue at the individual specimen/object level, whilst others do so at the lot* level--where multiple items are tracked in a single catalogue record. The dashboard displays counts of individual items, to better reflect the number physical items in the collection not the number of records in the database.

*lot = all specimens of the same species collected at the same time and place and by the same person(s).

See the Readme for more detail of how backlog and item counts were calculated.

How the searches work

All searches are exact matches. (e.g. searching for "snails" will bring back only those records with "snails" as a descriptor, not land snail or marine snail or worm snail.) You can select multiple search terms from each "bucket". The selection tool functions as a "begins with" not "contains." (e.g. searching for all Pacific Ocean requires choosing Pacific, Pacific Ocean, North Pacific, North Pacific Ocean, South Pacific etc.).

The lookup lists do not filter based on the value in any other search field. They combine to create an "AND" query.

Searching "Where"" for a locality more specific than country-level will primarily return Accession records, and few (if any) Catalog records. This is because the Where field includes the more loosely-defined "locality" and "geography" fields from Accession records, and the more rigidly-defined "continent," "ocean" and "country" fields from Catalog records. (See below for the specific EMu fields in each dashboard search field.)


Where? Where did the specimen or artifact come from.

Search fields:

  • DwC: country; EMu: DarCountry (ecatalogue)
  • DwC: continent; EMu: DarContinent (ecatalogue)
  • EMu: DarContinentOcean (ecatalogue)
  • DwC: waterBody; EMu: DarWaterBody (ecatalogue)
  • EMu: AccLocality (efmnhtransactions)
  • EMu: AccGeography (efmnhtransactions)


What? What kind of specimen or artifact is it.

Search fields:

  • DwC: collectionCode; EMu: DarCollectionCode (ecatalogue)
  • EMu: DesEthnicGroupSubgroup_tab (ecatalogue)
  • DwC: order; EMu: DarOrder (ecatalogue)
  • DwC: scientificName; EMu: DarScientificName (ecatalogue)
  • EMu: IdeTaxonRef_tab.ComName_tab (ecatalogue)
  • EMu: CatProject_tab (ecatalogue)
  • EMu: DesKDescription (ecatalogue)
  • EMu: EcbNameOfObject (ecatalogue)
  • EMu: AccAccessionDescription (efmnhtransactions)


When? When did the specimen or artifact exist.

Search fields:

  • DwC: earliestAgeOrLowestStage; EMu: DarEarliestAge (ecatalogue)
  • DwC: earliestEonOrLowestEonothem; EMu: DarEarliestEon (ecatalogue)
  • DwC: earliestEpochOrLowestSeries; EMu: DarEarliestEpoch (ecatalogue)
  • DwC: earliestEraOrLowestErathem; EMu: DarEarliestEra (ecatalogue)
  • DwC: earliestPeriodOrLowestSystem; EMu: DarEarliestPeriod (ecatalogue)
  • EMu: AttPeriod_tab (ecatalogue)
  • DwC: year; EMu: DarYearCollected (ecatalogue)


Who? Who created the artifact.

Search fields:

  • EMu: DesEthnicGroupSubgroup_tab (ecatalogue)
  • EMu: EcbNameOfObject (ecatalogue)
  • EMu: AccDescription (efmnhtransactions)
  • EMu: AccAccessionDescription (efmnhtransactions)


Web Infrastructure

Infrastructure is powered by a LAPP stack. Linux, Apache (with mod_wsgi), PostgreSQL, and Python. The Python web framework, Flask along with a PostgreSQL database, is used to serve the data. ChartJS is used for charting.

Data Prep

Data prep scripts are on GitHub.

Readme for further details/explanation.

These scripts clean a raw dataset (from EMu), and prepare it for the collections dashboard website, in the following steps:

  1. Combine Catalogue and Accession records into a single dataset.
  2. Calculate numbers of items catalogued and backlogged.
  3. Calculate record completeness scores (9 = best)
  4. Form Where, What, When, and Who fields. (See above for specifically how Darwin Core & EMu fields are grouped)
  5. Basic data-cleanup & setup Where/What/When/Who LUTs.
  6. Output CSV dataset for dashboard site.

Catalogue & Accessions = Sample-raw input datasets

Notes on calculations:
Count of items backlogged = # of items accessioned - # of items catalogued
Count of items catalogued = DarIndividualCount

The script outputs a dataset with the following fields:
Where, What, WhenAge, Who - These fields broadly accommodate both cultural and natural history datasets, incorporating standard Darwin Core fields when possible. The input dataset groupings (listed above) indicate which input fields correspond to these output fields.
Record completeness - A completeness-rank for the record's data (poor = 1; good = 9)
– below 6 = backlog
– anything 6 or above = catalogued
RecordType - Indicates whether the record is "Catalog" or "Accession" data, and therefore part of the cataloged or backlogged items.
DarIndividualCount - The number of items cataloged, from the DarIndividualCount field of a catalogue record.
Backlog - The number of items backlogged, calculated by subtracting (number of items catalogued) from (number of items accessioned).
TaxIDRank - The level to which a specimen has been identified
HasMM - A binary value where "1" = has Multimedia attached, and "0" = no Multimedia attached.
DarCollectionCode & Department - The name of the collection and department to which a record belongs.
URL - Collections listed in summary stats will link to these URLs
WhenAgeFrom/To/Mid & DarYearCollected - Numeric values for age of geology specimens & anthropology artifacts, or for collection year for botany & zoology specimens.
WhenOrder - Ordinal values between 1 and 53 to group numeric ages into time-groups; necessary for chart to function.
WhenTimeLabel - Labels corresponding to the 53 "WhenOrder" groups, ranging from 4.6 billion years ago to 2020. Loosely, ranges are grouped by geologic periods/epochs/eras prior to ~18th century dates, and grouped by decade after 18th century dates.

Acknowledgements

This website brought to you by the Field Museum, Naturalis, the hard work of the museum's extended IT team, Sharon Grant, Pete Herbst, Janeen Jones, Marc Lambruschi, Kate Webbink, Rob Z. and additional collection staff. Caleb McMahon, Crystal Meier, Matt Von Konrat, Angie Morrow and Rusty Russell.

“There are no right answers to wrong questions.” — Ursula K. Le Guin