Type
Guideline

Data curation in Dandjoo

Summary

Enhancing data quality through active curation.

Hierarchy
Part of Dandjoo

About curation

Curation is central to BIO’s goal of providing high-quality biodiversity data. While some curation can be automated, automation is no substitute for human review.

BIO has a curatorial team of data engineers and subject-matter experts who are responsible for reviewing every dataset prior to publication, detecting possible quality issues, and working with data providers to resolve them.

This page explains more about how we curate data, both when it is ingested, and as part of routine quality control across the entire Dandjoo database.

When new data arrives

When a new dataset is submitted, the next steps will depend on the type of data received:

  • Species observation data will pass through an automated taxonomic name check before arriving with our technical staff for human review and detailed curation.
  • Systematic survey data will be transformed into a ‘species per row’ format by our technical staff before passing through curation at this point in time. Ingestion of this data in its original form is one of the new features we’re looking at adding in a future release.
  • Vegetation association data is reviewed by our staff on receipt, and incorporated into our vegetation association overlay at this point in time. (Searchability and filtering for vegetation association data are also features we’re looking to incorporate in the future.)

Curation of new species observation data

We run a variety of checks on incoming species observation data.

For taxonomic names, we:

  • clean any extra whitespace and special characters;
  • append author and rank where these are missing;
  • check whether the name is known by the Western Australian Herbarium and/or Museum;
  • for names that are unknown, run checks for possible phonetic and non-phonetic spelling errors (any suggested corrections are forwarded to the data provider for approval before changes are made);
  • append ‘sp.’ where only genus is provided; and
  • archive any records with names cannot be resolved to genus level (noting that accommodating higher taxonomic ranks is an enhancement that we’re considering for future releases).

We also perform spatial checks and temporal checks, including:

  • converting date and location variables to match Darwin Core syntax;
  • checking whether the location provided is within Western Australia (records located outside a bounding box with corners at 10° S 105° E and 38° S 130° E will be flagged for review); and
  • checking whether any dates occur in the future.

Where there appears to be a material record in an error (that is, one that changes the meaning of a value, rather than syntax) we’ll consult with the original provider to seek their approval before amending it. 

To prepare data for publication, we also:

  • check for duplicate records (both within the dataset and against the Dandjoo database) based on a comparison of selected Darwin Core fields - duplicate records are archived so they can be retrieved and reviewed at a later date;
  • check that the Darwin Core mappings submitted with the data are valid, and whether any additional optional Darwin Core attributes can be mapped;
  • identify and append a current scientific name to each record (in cases where the current name is unclear due to a taxonomic split, the last name prior to the split will be applied in this field, unless it relates to a threatened or priority species - in this case, BIO will seek guidance from experts in DBCA’s Species and Communities Branch as to how to treat the record); and
  • append a conservation code to records that relate to threatened and priority species, so visibility of these records can be limited to authorised viewers.

Routine curation of species observation data

In addition to performing quality control checks when data is ingested for the first time, we also run routine curation processes over all species occurrence records in Dandjoo. This involves:

  • checking species names against the most current taxonomic names available from the Western Australian Herbarium and Western Australia Museum, and updating the current scientific name we appended to each record where there’s been a change; and
  • checking existing records against the most recent list of Western Australian conservation codes to ensure that the codes appended on ingestion are still correct.

Project Blog

Image

To enhance value of data for users the following additional data attributes have been added to the data exports to better assist in data filtering.

Image

We have been working hard and now bring you two new ways to search in Dandjoo. These are Kingdom search and Latitude & Longitude search.

Image

From March 2024, Dandjoo will produce a species list for an area of interest inclusive of all known species that has been evident within the area of interest through observation and survey.

Join the BIO newsletter and get updated first

Sign up for access to the latest developments at the Biodiversity Information Office, upcoming Dandjoo features, and our newest datasets.

 

Get the BIO newsletter

Image
Map of Western Australia with location points plotted