Type
Guideline

Data curation in Dandjoo

Summary

Enhancing data quality through active curation.

Hierarchy
Part of Dandjoo

About curation

Curation is central to BIO’s goal of providing high-quality biodiversity data. While some curation can be automated, automation is no substitute for human review.

BIO has a curatorial team of data engineers and subject-matter experts who are responsible for reviewing every dataset prior to publication, detecting possible quality issues, and working with data providers to resolve them.

This page explains more about how we curate data, both when it is ingested, and as part of routine quality control across the entire Dandjoo database.

When new data arrives

When a new dataset is submitted, the next steps will depend on the type of data received:

  • Species observation data will pass through an automated taxonomic name check before arriving with our technical staff for human review and detailed curation.
  • Systematic survey data will pass through an automated taxonomic name check before undergoing human review and detailed curation, including checks of the bounding box and restriction of personal identifiable information.
  • Vegetation association data is reviewed by our staff on receipt, and incorporated into our vegetation association overlay at this point in time. (Searchability and filtering for vegetation association data are also features we’re looking to incorporate in the future.)

Curation of new species observation data

We run a variety of checks on incoming species observation data.

For taxonomic names, we:

  • clean any extra whitespace and special characters;
  • append author and rank where these are missing;
  • check whether the name is known by the Western Australian Herbarium and/or Museum; and
  • for names that are unknown, run checks for possible phonetic and non-phonetic spelling errors (any suggested corrections are forwarded to the data provider for approval before changes are made)

We also perform spatial checks and temporal checks, including:

  • converting date and location variables to match Darwin Core syntax;
  • checking whether the location provided is within Western Australia (records located outside a bounding box with corners at 10° S 105° E and 38° S 130° E will be flagged for review); and
  • checking whether any dates occur in the future.

Where there appears to be a material record in an error (that is, one that changes the meaning of a value, rather than syntax) we’ll consult with the original provider to seek their approval before amending it. 

To prepare data for publication, we also:

  • check that the data mappings submitted with the data are valid, and whether any additional optional data attributes can be mapped;
  • identify and append a current scientific name to each record (in cases where the current name is unclear due to a taxonomic split, the last name prior to the split will be applied in this field, unless it relates to a threatened or priority species - in this case, BIO will seek guidance from experts in DBCA’s Species and Communities Branch as to how to treat the record); and
  • append a conservation code to records that relate to threatened and priority species, so visibility of these records can be limited to authorised viewers.

Routine curation of species observation data

In addition to performing quality control checks when data is ingested for the first time, we also run routine curation processes over all species occurrence records in Dandjoo. This involves:

  • checking species names against the most current taxonomic names available from the Western Australian Herbarium and Western Australia Museum, and updating the current scientific name we appended to each record where there’s been a change; and
  • checking existing records against the most recent list of Western Australian conservation codes to ensure that the codes appended on ingestion are still correct.

BIO Blog

Image
Mining Tenements, DBCA managed conservation areas, and Local Government Areas can now be searched by code, license number, or name in the Location search box.
Image
The PDF Species List report, based on user feedback, shows species by Conservation status and Kingdom from user-defined searches, with definitions included. Each species entry lists Class, Family, names, Establishment status, and Conservation code.
Image

We have added functions to be able to search, view and download (where available) Systematic Survey Data in the Dandjoo platform.

Image

To enhance value of data for users the following additional data attributes have been added to the data exports to better assist in data filtering.

Image

We have been working hard and now bring you two new ways to search in Dandjoo. These are Kingdom search and Latitude & Longitude search.

Image

From March 2024, Dandjoo will produce a species list for an area of interest inclusive of all known species that has been evident within the area of interest through observation and survey.

Image

Dandjoo is committed to providing biodiversity data to the Western Australian public that is both usable and compliant with legislation regarding sensitive species.

Join the BIO newsletter and get updated first

Sign up for access to the latest developments at the Biodiversity Information Office, upcoming Dandjoo features, and our newest datasets.

 

Get the BIO newsletter

Image
Map of Western Australia with location points plotted