CREED evaluation of environmental exposure data for risk assessments
- Graham Merrington, Arjen Marcus, Iain Wilson, Adam Peters, Carolina Di Paolo
- Jul 28, 2025
- 11 min read
The Criteria for Reporting and Evaluating Exposure Datasets (CREED) evaluation facilitates the systematic and transparent evaluation of exposure data to improve consistency and reduce uncertainty and inaccuracy for both the hazard and exposure aspects of environmental risk assessments. It aims to ensure that the data used for an assessment are appropriate and, where there are limitations associated with them, these are properly considered in conclusions.
Development
The Society of Environmental Toxicology and Chemistry (SETAC) developed CREED at a technical workshop held in May 2022. Contributors were identified with experience and expertise in the collection, evaluation, and use of environmental exposure data for risk assessment. Panel members were assigned to one of three groups to consider issues relating to the reliability, relevance, or implementation of environmental exposure data in risk assessment. The groups met virtually for around six months before meeting face-to-face during a two-day workshop in Copenhagen to develop a common understanding and to frame an approach for the development of guidance for dataset evaluation.
The CREED approach is founded on the comparable procedures for evaluating the reliability of ecotoxicity testing, which have been available and applied in regulatory assessments for many years. The Klimisch1 approach uses four different categories (or scores) to which data can be assigned: reliable without restrictions, reliable with restrictions, not reliable, and not assignable. This reliability scoring approach has been updated for ecotoxicity data in the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) system to improve consistency between different assessors2, and to encourage transparent reporting of restrictions. Similar approaches to evaluating ecotoxicity data are applied in some specific regions for certain regulatory purposes3.
The relatively widespread adoption of the CRED system and comparable approaches resulted in a potential imbalance in the quality and suitability of data used `for hazard vs exposure aspects of environmental risk assessments of chemicals. Therefore, a conceptually equivalent approach for evaluating the reliability and relevance of environmental exposure datasets was proposed.
Limitations
Developed by a committee of experts, the final CREED procedure represents a compromise between: (i) detailed and rigorous criteria for evaluating the analytical performance of measured exposure data; and (ii) concerns of stakeholders (e.g. database owners, users, and risk assessment practitioners) that some potentially desirable information is rarely, if ever, included within such datasets. As such, the CREED evaluation is not sufficient to satisfy requirements of analytical method validation. Rather it is intended to ensure that data are suitable for various assessments. There are two levels of usability, depending upon whether all criteria are fulfilled or only those required.
Differences in the approaches used by different assessors have led to disagreements about issues such as the scale of potential environmental risks4.
Reliability by Arjen Marcus and Iain Wilson
Although some aspects of data reliability are already well covered by existing analytical accreditation schemes, there are also many aspects that are important in terms of the reliability of data for use in an assessment that fall
Consequently, there is a need to ensure that these additional aspects are evaluated; and for a structured evaluation approach for data that are not accredited.
What is reliability?
Data reliability relates to its inherent quality and provides assurance that the data can be accepted as reported. The evaluation of reliability primarily focuses on the methods used for the collection and storage of samples, analytical methods, and subsequent data processing. The reliability of a dataset is generally independent of the purpose for which it is assessed, and this means that the reliability evaluation can usually be applied to all risk assessment purposes. As such, defining the purpose of the risk assessment is not critical. Although there are potentially some data handling decisions that could be affected by the assessment purpose, these would also be identified by the relevance evaluation. Typically, a dataset that is evaluated for multiple risk assessment purposes would need only a single reliability evaluation – but relevance evaluation would depend on each assessment. Datasets can be identified as relevant for the purpose before being evaluated for reliability, to ensure that time is not wasted.
Gateway Criteria
The next stage of the CREED evaluation is comparing the dataset to basic “Gateway Criteria”, which are intended to ensure that all datasets meet at least a basic minimum standard of information. The Gateway Criteria cover the sample matrix, analyte, location, year, units, and citation, and if there is no information available for one of these criteria then the dataset does not have sufficient information available for an assessor to adequately evaluate its reliability and relevance. If the dataset meets the minimum Gateway Criteria, then it can be evaluated in detail for reliability and relevance. The Gateway Criteria screening step also identifies any missing information that must be located.
Reliability evaluation
There are 19 specific reliability criteria addressing information on six different aspects: specifically, sampling media, spatial, temporal, analytical, data handling and statistics, and supporting parameters. Of the 19 detailed reliability criteria, only nine apply to all datasets. Six of the criteria apply only to certain types of data, such as datasets containing summary statistics or censored data, and there are also four criteria that address laboratory method performance that are not required to be evaluated for datasets that used accredited methods. Some reliability criteria cover the same topics as Gateway Criteria, but in more detail. In each case, the reliability criterion may be “fully met”, “partly met” (i.e. some, but not all, conditions are met for some, or all, of the dataset), “not met” (the data or approach were flawed or inappropriate for the analyte or assessment purpose), “not reported” (insufficient information was available for evaluation), or “not applicable” (the criterion does not apply to the dataset under evaluation). For each criterion evaluated as less than fully met, the assessor records the reason for that decision.
The evaluation of the sampling matrix (such as filtered water, or whole organism tissue) requires that it is both reported in detail and appropriate to capture the analyte being measured. Information on the sample type and collection method are also required, and this may include details such as sampling depth and volume for water samples. Details of the sample handling are also important and include information about the sample containers, preservation methods, treatments such as sieving or filtration, and sample storage. Incomplete or missing information may result in these criteria being only partly met, not reported, or not met (if the information reported makes the dataset inappropriate for the measured analyte).
Spatial information requirements include a specific location, as well as other aspects such as site name, site type. Similarly, temporal information requires the date and time of sampling. Because there is a requirement for information on the country and year to be available as a minimum to pass the Gateway Criteria, these criteria should never be rated as not reported, although a lack of detailed information may result in a rating of less than fully met.
Analytical requirements relate to the analyte(s) measured, including the specificity with which it/they are identified, the limits of detection and/or quantification, whether the analysis was performed according to an accredited method and certified to the applicable standards. For analyses that have not been performed under an accredited and certified method, there are four additional criteria that must be evaluated, covering the methodological details, blanks, accuracy, precision, and field quality controls.
There are nine circumstance-specific criteria that may not be applicable to all datasets; these relate to data handling and statistics, significant
to data handling and statistics, significant figures for reporting, outliers, censored data, summary statistics, and supporting parameters.
Scoring procedure
Ultimately, the objective of the reliability evaluation is to assign the dataset to one of four categories aligned with the reliability of ecotoxicity data under Klimisch and CRED systems:
• Reliable without restrictions
• Reliable with restrictions
• Not reliable
• Not assignable
Further details on the overall evaluation are provided in the section on implementing the CREED approach.
Although the need for data to be relevant may seem obvious, having a structured approach to evaluating this is especially important in situations where the ideal data is not available and an assessment must be conducted using data that are not optimal for the purpose. In these situations, the CREED approach helps to ensure that any limitations of the data are both recorded and taken into account.
What is relevance?
Relevance is the degree to which a dataset is appropriate for addressing a specific purpose as explicitly defined (and recorded) by the assessor (or those generating the data). The systematic consideration of data relevance is beneficial both in the use of existing datasets and in the design of new monitoring studies.
Purpose statement
A key aspect of the CREED approach is a clear and specific statement of the purpose of the risk assessment for which the dataset will be used. This is analogous to the problem formulation stage of other scientific assessments. Experience suggests that ambiguous or poorly defined purpose statements can result in considerable uncertainty over whether criteria are fully met, partly met, or not met for a dataset. Ambiguous purpose statements are much more likely to require professional judgement on the part of the assessor in evaluating whether or not a criterion is met, and this can result in much greater inconsistencies in the evaluation of the relevance and reliability of a dataset by different assessors.
An effective purpose statement should define the type and level of information that is required for each individual criterion to be either fully met or partly met. The definition of the purpose statement is the stage of the CREED process at which professional judgement should be applied. This then ensures that if different assessors evaluate the same dataset for the same purpose, there is a high level of consistency in their conclusions. The use of professional judgement in the definition of the purpose statement therefore considerably reduces the need for it to be applied in the detailed evaluation of datasets.
The authors of the CREED relevance evaluation advocate defining the specific requirements for each individual relevance criterion to be both fully met and partly met. In some cases, a particular criterion may not be critical to the assessment purpose and a wide variety of data may suitably fulfil the criterion, whereas, in others, a very specific sample type or sampling regime may be required.
It is also important to note that although a particular kind of information may be desirable, the most relevant information may not always be available. In these cases, the assessor may need to consider compromising on relevance and accepting the limitations that such compromises introduce into the subsequent risk assessments. CREED enables the assessor to pre-define goals in the assessment purpose and by recording data limitations during the evaluation of any criteria that is less than fully met. Ultimately, the assessor should identify the best available data for the purpose, even where this may not fulfil all of the criteria for the dataset to be relevant without restrictions. The identification of any restrictions due to any limitations that have been identified for the purpose improves transparency and consistency and helps to ensure that they are properly accounted for when drawing any conclusions during the risk assessment.
Relevance criteria
The eleven detailed relevance criteria are divided into six categories covering aspects relating to the sampling media, spatial, temporal, analytical, data handling and statistics, and supporting parameters. As with the evaluation of data reliability, the dataset is first required to meet the “Gateway Criteria” before it is subject to a more detailed evaluation to avoid performing assessments on datasets that are missing critical information. Some of the relevance criteria are designated as required, and others as recommended, so the latter may not be required for all purposes or assessments.
Relevance criteria ensure that the sample medium and collection method are both appropriate and adequate for the given purpose. Similarly, both the study area and types of sites sampled need to be suitable for the purpose. The evaluation of temporal aspects involves the time span covered, sampling frequency, and temporal conditions. Analytical aspects cover the suitability of the analyte(s) and the sensitivity of the analytical method. The final criteria cover the appropriateness of the data handling and statistics, and any supporting parameters that may be required.
Examples of limited situations could involve evaluations considering a specific toxic form of a contaminant, such as chromate (chromium(VI)), where data for total chromium may be used as a surrogate, or a particular taxon of organism (such as an endangered species) where data for similar species may be considered if no data are available for the specific species of concern (for which sampling may be prohibited). In both of these cases, although an optimal dataset may not be available, it may still be possible to draw useful conclusions from a more limited risk assessment based on data that are available.
It is important that the minimum standard for data to be useable within the CREED evaluation is clearly identified within the purpose statement. Ideally, the required information for a relevance criterion to be either fully met or partly met should be specified clearly and unambiguously in the purpose statement. This will ensure greater consistency between different assessors and also that limitations, for a given purpose, can be properly identified.
The purpose of assessing compliance against an annual average environmental quality standard (EQS) is to require a minimum number of samples collected over a suitable time period, and an analysis sufficiently sensitive to ensure compliance. Limits of detection can be a particular issue because both the limit of detection and the proportion of censored data may be important. However, the maximum acceptable proportion of censored data may depend upon the level of the detection limit, with a higher proportion of censored data being acceptable in conjunction with a lower detection limit.
Ultimately, the CREED was developed to establish a common level of understanding regarding the potential weaknesses in monitoring study design and increase confidence in the conclusions of assessments that are based on monitoring data. The approach can be readily implemented through the use of a workbook template that guides the assessor through all of the framework components and provides a standardised output summarising the usability of the dataset and any restrictions associated with its use for the specified purpose.
The CREED approach is implemented using a scoring system that allows the outcomes of an evaluation to be summarised transparently, and it determines the usability of the dataset for a specific purpose. The 19 reliability and 11 relevance criteria, which are each designated as either required or recommended, are used to define the overall usability of the dataset for the specified purpose at two different levels, Silver (based on required criteria only) and Gold (based on all criteria, both required and recommended). This two-level scoring approach allows a pragmatic evaluation of dataset usability while highlighting information gaps that might be addressed in future exposure data collection.
The CREED requirements for a dataset to be usable at the Silver level are similar to the minimum dataset requirements specified by the Organisation for Economic Co-operation and Development (OECD) in their metadata requirements to support monitoring data6. However, whilst the OECD guidance includes some relevance questions, the CREED approach requires a specific evaluation of relevance against nine detailed relevance criteria.
The full suite of CREED criteria, including both the required and recommended elements, is analogous to the standard of an ideal dataset identified by the OECD7.
The five components of the CREED framework are:
1. Purpose statement
2. Gateway Criteria
3. Reliability evaluation
4. Relevance evaluation
5. Data usability and reporting.
The CREED approach workbook template provides prompts for each specific
evaluation step. A summary of the results of the evaluation is provided, including reporting any limitations associated with the use of the dataset for the specific purpose.
The authors tested a preliminary version of the CREED approach by providing a group of volunteer assessors with some example datasets to evaluate. The results highlighted the importance of having a clearly defined purpose statement to improve consistency among different assessors.
A much simpler form of purpose statement was used for the preliminary testing, and some vague language in the purpose statement led to ambiguity in its interpretation, with consequently different outcomes depending on how different assessors applied their professional judgement.
A detailed and specific purpose statement assigning specific information requirements for each individual criterion to be fully met or partly met is critical in reducing the need for professional judgement. This effectively shifts the need to apply professional judgement to the specification of the purpose statement (where it is predefined and transparent), rather than during the criteria evaluation stage (where it may be applied inconsistently by different assessors or for different criteria).
References
1. Klimisch, H., Andreae, M., Tillmann, U. A., Regulatory Toxicology and Pharmacology, 1997, 25:1-5.
2. C. Moermond et al., Environmental Toxicology and Chemistry, 2016, 35:1297-1309.
3. Warne, M. et al., Revised method for deriving Australian and New Zealand water quality guideline values for toxicants. Updated May, 2017. Council of Australian Government's Standing Council on Environment and Water (SCEW) (p. 46). Department of Science, Information Technology and Innovation. 2018.
4. Merrington, G. et al. , Environmental Toxicology and Chemistry, 2021 40:1237–1238.
5. Peters, A. et al., Integrated Environmental Assessment and Management, 2024, 20:1004-101.
6. Di Paolo, C. et al. Integrated Environmental Assessment and Management, 2024. 20, 1019–1034. https://doi.org/10.1002/ieam.4909
7. OECD Environment, Health and Safety Publications Series on Testing and Assessment No. 185. Guidance document for exposure assessment based on environmental monitoring. 2013. ENV/JM/MONO (2013) 7 p.79.
Weblink



Comments