Data curation in Dandjoo
Enhancing data quality through active curation.
Curation is central to BIO’s goal of providing high-quality biodiversity data. While some curation can be automated, automation is no substitute for human review.
BIO has a curatorial team of data engineers and subject-matter experts who are responsible for reviewing every dataset prior to publication, detecting possible quality issues, and working with data providers to resolve them.
This page explains more about how we curate data, both when it is ingested, and as part of routine quality control across the entire Dandjoo database.
When new data arrives
When a new dataset is submitted, the next steps will depend on the type of data received:
- Species observation data will pass through an automated taxonomic name check before arriving with our technical staff for human review and detailed curation.
- Systematic survey data will be transformed into a ‘species per row’ format by our technical staff before passing through curation at this point in time. Ingestion of this data in its original form is one of the new features we’re looking at adding in a future release.
- Vegetation association data is reviewed by our staff on receipt, and incorporated into our vegetation association overlay at this point in time. (Searchability and filtering for vegetation association data are also features we’re looking to incorporate in the future.)
Curation of new species observation data
We run a variety of checks on incoming species observation data.
For taxonomic names, we:
- clean any extra whitespace and special characters;
- append author and rank where these are missing;
- check whether the name is known by the Western Australian Herbarium and/or Museum;
- for names that are unknown, run checks for possible phonetic and non-phonetic spelling errors (any suggested corrections are forwarded to the data provider for approval before changes are made);
- append ‘sp.’ where only genus is provided; and
- archive any records with names cannot be resolved to genus level (noting that accommodating higher taxonomic ranks is an enhancement that we’re considering for future releases).
We also perform spatial checks and temporal checks, including:
- converting date and location variables to match Darwin Core syntax;
- checking whether the location provided is within Western Australia (records located outside a bounding box with corners at 10° S 105° E and 38° S 130° E will be flagged for review); and
- checking whether any dates occur in the future.
Where there appears to be a material record in an error (that is, one that changes the meaning of a value, rather than syntax) we’ll consult with the original provider to seek their approval before amending it.
To prepare data for publication, we also:
- check for duplicate records (both within the dataset and against the Dandjoo database) based on a comparison of selected Darwin Core fields - duplicate records are archived so they can be retrieved and reviewed at a later date;
- check that the Darwin Core mappings submitted with the data are valid, and whether any additional optional Darwin Core attributes can be mapped;
- identify and append a current scientific name to each record (in cases where the current name is unclear due to a taxonomic split, the last name prior to the split will be applied in this field, unless it relates to a threatened or priority species - in this case, BIO will seek guidance from experts in DBCA’s Species and Communities Branch as to how to treat the record); and
- append a conservation code to records that relate to threatened and priority species, so visibility of these records can be limited to authorised viewers.
Routine curation of species observation data
In addition to performing quality control checks when data is ingested for the first time, we also run routine curation processes over all species occurrence records in Dandjoo. This involves:
- checking species names against the most current taxonomic names available from the Western Australian Herbarium and Western Australia Museum, and updating the current scientific name we appended to each record where there’s been a change; and
- checking existing records against the most recent list of Western Australian conservation codes to ensure that the codes appended on ingestion are still correct.
Guidelines, services and standards
Frequently asked questions
Why is my species missing from Dandjoo?
If you are unable to find any data about a particular species, it may be restricted if the species is a threatened or priority one, in line with the Biodiversity Conservation Act 2016.
For more information, see How can I see data about threatened and other sensitive species?
Does Dandjoo accept data for taxa that can’t be identified to genus level?
At present, we’re only ingesting records that relate to organisms that have been identified to a genus level. However, we’re aware that this poses some limitations for invertebrate observations, and it’s something we’re keen to enhance in future releases.
How do I attribute data I’ve sourced from Dandjoo?
When citing information retrieved via Dandjoo, you should attribute it to the Rights Holder. The Rights Holder for each record is identified any data extracts downloaded from Dandjoo.
You should also reference Dandjoo as the source, citing DBCA as the publisher – for instance ‘Biodiversity, Conservation and Attractions [current year] Dandjoo search accessed on the [date of search]’.
How is the data in Dandjoo licensed?
The data in Dandjoo is generally provided under a CC BY 4.0 licence, except where:
- the record indicates it has been provided by the Department of Water and Environmental Regulation’s Index of Biodiversity Surveys for Assessment (IBSA) program (which allows for bespoke licensing arrangements); or
- the data relates to a threatened species or ecological community under the Biodiversity Conservation Act 2016, where limitations to data sharing apply.
If you’re uncertain about the licensing conditions that apply, contact us and we’ll help you out.
What do I do if I have a question about a specific record or dataset?
We’ll be happy to help you out if you send us a message. Make a note of the Record ID or dataset name you’re asking about and we’ll look into it for you. If we can’t give you an answer right away, we’ll get in touch with the original data provider on your behalf.
How can I see data about threatened and other sensitive species?
Information relating to threatened species and ecological communities is not publicly available via BIO. BIO is trialling the delivery of this functionally for approved internal users, but at the current time threatened species information still needs to be requested via DBCA’s Species and Communities Branch.
BIO is also working with other States and Territories to develop a national best-practice approach to sharing threatened species data to the public with reduced geographic precision. When complete, this approach will be implemented in Dandjoo, safely allowing public users to view threatened species records.
Who can submit data to Dandjoo?
At launch, we’re prioritising datasets collected from industry surveys and by the research sector. We recognise the value of all data sources, including citizen science data, and as Dandjoo matures we’ll explore ways to ingest data from a wider variety of sources while allowing users more control over the types of data they want to see.
Do get in touch if you’re interested in providing data – we’re keen to talk to you.
Does Dandjoo contain data from the Department of Water and Environmental Regulation’s Index of Biodiversity Surveys for Assessment (IBSA)?
Dandjoo has been pre-populated with data provided directly by the private sector - this data is considerably richer than that submitted for IBSA and covers a longer time period.
The BIO team is currently working on ingestion of the entire collection of historical IBSA datasets, and these will appear in Dandjoo as each is processed.
Is Dandjoo’s data the same data that DBCA used to provide on the NatureMap platform?
The datasets previously provided via NatureMap are now available in Dandjoo. (We’ve also updated some of these datasets where refreshed data is available, and will continue to work with data custodians to update them periodically.)
Dandjoo’s collection is considerably larger than that previously available in NatureMap, as it also includes new datasets from industry, researchers, and regulatory agencies.
Does Dandjoo contain both terrestrial and marine data?
Most records in Dandjoo relate to terrestrial species, since much of the data is generated by industry processes - for example surveys undertaken for regulatory approvals. However, marine data is not entirely absent - for example, many marine species are represented in records from the Western Australian Museum.
What kinds of data can I find in Dandjoo?
Dandjoo currently accepts three types of data:
Species occurrence data: This is data about where a species was observed. When these datasets are provided to BIO, they contain a list of records by species, with information about the date and place each was observed. (Each record in a dataset may refer to one individual of the species, or may include a count to indicate how many individuals were observed.
Systematic survey data: This is data that relates to observations of multiple species in a systematic survey. When these datasets are provided to BIO, they generally contain a list of plots, and include information about all the species observed in each plot. In the leadup to the platform’s launch, we worked with data providers to restructure systematic survey data into species occurrence data where feasible. We appreciate that this approach results in the loss of rich site information - one of our priorities for the future is to enhance Dandjoo’s ability to ingest and visualise systematic survey data.
Vegetation association data: These datasets contain polygons that define the boundaries of vegetation associations. Currently we’re providing these as a simple overlay that can be viewed in the map interface. As with systematic survey data, we’re planning to enhance the way in which this data is presented in future releases.
Can I connect to Dandjoo via an API?
Yes, check out our API documentation for details, and do tell us about what you're working on - we’re keen to hear about how you’re using the platform, and how we can support your project.
I have an idea for a new feature – can you implement it in the next version of Dandjoo?
We want to make sure future development is informed by users, and are keen to have your input. You can also contact us to find out more about our forthcoming User Consultation Committee, and how your sector is represented.
How is Dandjoo different to other data sharing platforms?
For data providers, we’ve taken an approach that you shouldn’t need to use a template, provide a set number of fields, or - where possible - reformat date and location information data in your dataset to meet a prescribed format. We want to make it as easy as possible for you to submit data - if you’re providing species occurrence data, you can even use our self-service quality assurance tools to map columns in your dataset to those recognised by Dandjoo.
We’re also committed to maintaining the integrity of your data; if we have any questions about specific records in your dataset, we’ll let you know so you can decide whether you’d like us to make a correction or redact a record.
For data users our map-based interface is designed to be user-friendly and provide a familiar experience for those who have used other biodiversity data platforms. In addition, it is underpinned by a number of data quality innovations.
Data is reviewed by our team of curatorial staff prior to publication, and mapped to 33 key fields from the Darwin Core data standard. Dandjoo also retains all the original data fields submitted by the data provider, so we can extend those mappings in the future and even extend the platform to include additional standards.
The platform also contains a number of data sets that have never been released before, including data from the private sector, and the data undergoes routine curation to ensure that taxonomic name and conservation code information is kept up-to-date.
Does my business have to submit data to Dandjoo as part of a regulatory process?
There are no requirements to submit data directly to Dandjoo.
We’re currently working with the Environment Online team at the Department of Water and Environmental Regulation on the implementation of an integrated data environment. This will mean that data submitted as part of a regulatory process will flow seamlessly into the platform. However, if your organisation has a collection of historical biodiversity data and would like to provide it to BIO, please do let us know.
Join the BIO newsletter and get updated first
Sign up for access to the latest developments at the Biodiversity Information Office, upcoming Dandjoo features, and our newest datasets.