Introduction
Critical appraisal is a process which is used to help you identify the strengths and weakness of a research paper. Understanding how appropriate the study design is for the question you are seeking to answer, how well the study was carried out, and how good the reporting in the paper is helps you to assess whether the paper is likely to provide reliable evidence.
The purpose of this page is to help you assess the level of evidence from the many different types of research study that are being published.
This page will help you:
- Understand what we mean by levels of evidence
- Recognize different types of study design
- Understand how the study design relates to the level of evidence.
Research studies in veterinary science can be designed in a variety of ways, depending on the type of question they are trying to answer. These different study designs are often arranged into a hierarchy known as the ‘levels of evidence’ with practitioners encouraged to find the highest level of evidence possible to answer their clinical questions.
Many versions of the hierarchy of evidence concentrate on clinical studies, placing Randomised Controlled Trials (RCTs), and Systematic Reviews and metanalyses of such studies, at the top of the evidence hierarchy.
However, we are increasingly seeing the publication of social science papers, looking at the human factors involved in the delivery of veterinary care, and studies which use new technology to provide evidence based on large data sets or report on the development of computer models of animal disease, or machine learning based diagnostics.
What are the levels of evidence?
The levels of evidence are often portrayed as a pyramid of evidence, which helps to convey that some types of study provide a higher level of evidence than others, and also that higher levels of evidence are often built on and supported by lower levels of evidence (Arlt & Heuweiser 2016; Cockcroft & Holmes 2008).
There are many different types of study design, so it is helpful to start by considering in broad terms what makes some types of study more likely to provide a higher level, that is more reliable and accurate, evidence than others.
Synthesised evidence
At the top of the pyramid, and considered to provide the highest level of evidence, we have synthesised evidence, which involves the critical evaluation and summary of previously published studies.
Intervention studies
Below this we have intervention studies, such as clinical trials, where the researcher controls the intervention (diagnostic test or treatment) that the animal is exposed to. The reliability of these types of studies can be increased by providing a suitable comparator or control to the intervention being assessed, randomisation of the participants between the trial and control arm of the study and blinding of participants and researchers to which participants received the intervention.
Observational studies
Observational studies are those where the researcher observes and compares the outcomes between two or more groups without having any control over which animals receive the intervention or exposure. While these studies may provide some evidence of differences between the groups, there may be many confounding factors which could influence the outcome other than the intervention under investigation.
Descriptive studies
Descriptive studies are those where the author records the details of individuals or groups of animals, but due to lack of a comparator group cannot analyse variables and causal relationships.
Background information
Background information does not count as strong evidence, but it may provide the information that enables you to formulate an answerable question or evaluate higher level published evidence.
Types of study design
What types of clinical research are there?
This table shows a broad categorisation of the different type of clinical research, arranged according to the level of evidence that they are likely to provide for clinical decision making.
The study types at the top of the table correspond to increasing quality and reliability of the evidence. The higher the level of evidence, the more confident you can be in the accuracy of the results with less the chance of error or bias.
| [add] | [add] | [add] |
|---|---|---|
| Synthesised evidence | Critical evaluation and summary of previously published studies. | Meta-analysis, Systematic review, Knowledge Summary, Critically Appraised Topics, Evidence-based guidelines |
| Intervention studies / Clinical trials | Studies where the researcher controls the intervention (diagnostic test or treatment) that the animal is exposed to. | Randomised controlled trials (RCTs), Non-randomised controlled trials, Crossover trials |
| Observational studies | Studies where the researcher observes and compares the outcomes between 2 or more groups without having any control over which animals receive the intervention or exposure. | Case-control studies, Cohort studies, Cross sectional studies, Diagnostic test, validation studies |
| Descriptive studies | Studies where the author records details of individuals or groups of animals. | Case series, Case reports |
| Background information | Information which provides us with general information. | Narrative reviews, Textbooks/ lectures, Non-clinical studies |
Remember: Study design is not the only factor
The levels of evidence are designed to help you to concentrate your efforts on sources that are most likely to provide reliable evidence. However, is important to remember that study design is not the only factor to consider when critically appraising a paper.
It should be noted that each type of study has its own strengths and limitations. For example, a case-control study is an appropriate way to study the aetiology of a disease and a qualitative study would be appropriate address questions regarding the perceived quality of life of a patient after an intervention.
Randomised controlled trials are often considered to provide the highest level of evidence, because their methodological design inherently reduces bias. However, a poorly designed Randomised Controlled Trial (RCT) may provide less reliable evidence than a good cohort study. The information below on the different types of study should help you to identify different types of study and assess the level of evidence it is likely to provide.
Alongside the study design it is also relevant to look at whether the data was collected prospectively, for the purpose of the study, or whether the study uses retrospective data that had been collected for another purpose, for example as part of the animal’s clinical records. For some types of study design data can only be collected prospectively, e.g. a Randomised Controlled Trial, this has the benefit that the data is likely to be more complete. However, retrospective data, while it may be incomplete, can often allow analysis from a larger population of participants to be collected and analysed more quickly.
Levels of evidence and study design
Although the levels of evidence are usually presented as a pyramid showing synthesised evidence at the top, it is also important to realise that higher levels of evidence are built on and supported by lower levels of evidence, so this information will be presented from the bottom up.
Background information
Background information is generally considered low quality evidence for decision making. However, it can be useful for giving an overview of a subject and providing the knowledge you need to find and assess higher quality evidence.
In terms of the published veterinary literature the most common forms of background information are opinion pieces and narrative reviews. Textbooks and CPD notes may also fall in this category.
Examples of background information include:
- Opinion pieces: These are not based on a literature search. Instead, the authors give their opinions on a subject without any explicit appraisal of existing literature though they may provide a couple of references to back up their claims.
- Narrative reviews: These provide an overview of a subject. They may include references, but they rarely include details of specific search protocols or explicit criteria by which papers are included or excluded. Instead, they rely on the author to draw conclusions based on the papers they find most relevant or interesting.
While background information is generally considered to provide low quality evidence, its usefulness may be improved if it provides references to a range of appropriate sources, contains explicit discussion of the evidence on which it is based, and acknowledges areas of uncertainty and knowledge gaps.
Further support in crucially appraising Narrative Reviews can be found here. [add correct link]
Descriptive studies
Descriptive, or non-comparative studies, are designed to record what is seen. They provide a description of what has been seen in an individual or group of animals, but do not attempt any comparison to a control group. These studies have value if the aim of the paper is to highlight an unusual finding, such as a new disease, or report a rare occurrence. Descriptive studies will not be able to demonstrate causation, so when using this type of study care should be taken to avoid over-interpreting the findings by making conclusions regarding causal links.
Examples of descriptive studies include:
- Case reports: These are reports on a single patient. They describe the presentation and/or course of a disease.
- Case series: These are collections of case reports and can provide descriptive quantitative data.
Observational studies
Observational studies are those where the researcher examines the outcomes of an intervention within two groups without having any influence on which animals get the intervention.
For example, a researcher could consider the rate of complication following different types of surgery by looking back at all the surgical cases and analysing those that resulted in complications. Where two groups are compared it is possible to make inferences about the association of risk factors and outcomes. However, because there are many possible confounding factors which have not been controlled it is not possible to prove causation.
Examples of observational studies include:
- Case-control studies: These are where animals which have a disease condition are identified and any causal or risk factors are compared to a control group (without the disease). Information regarding the exposure is historical These studies start with groups that already have the outcome (e.g. diabetes) and it looks back to examine what might have been the exposure factors (e.g. obesity). Large data sets can be used to undertake nested case-control studies, where cases of a disease are identified and, for each, a specified number of matched controls is selected from among those in the data set that have not developed the disease, by the time of disease occurrence in the case.
Access our Case Control Checklist [link + button]
- Cohort studies: These identify a group of animals and follow them over time to see how their exposures affect their outcomes compared to another group that were not exposed to that factor, either the general population or another cohort of animals.
Access our Cohort Study Checklist. [link + button]
- Cross-sectional studies: These are studies that describe the characteristics of sample groups of animals. Data is collected at one point in time and two groups are identified – usually animals with a specified disease and those without. The relationships within the groups to given parameters are then considered, and typically expressed as an odds ratio. As the data is taken at one point in time direction of case and effect cannot be established.
Access our Cross Sectional Study checklist [link + button]
- Controlled before-and-after/interrupted time series: These are studies that measure the characteristics of a group of animals before and after an event or intervention. The two sets of data are then compared to judge the effect of the event or intervention.
Interventional studies
Interventional studies are those where there is an intervention (e.g. treatment, drug therapy, surgical method, exposure to a chemical etc) with a researcher responsible for designing the intervention and deciding which animals are exposed to it. By controlling the conditions in which the intervention takes place the researcher can control for potential confounding variables and increase the likelihood that differences between the groups are the result of the intervention.
- Randomised control trials (RCTs): These can either be experimental laboratory studies or clinical trials. RCTs have two important features: there are at least two groups – a treatment group and a control group and animals are randomly assigned into the two groups. Randomisation of participants and blinding of the investigator to which animals are in which group reduce bias, and lead to randomised controlled trials being considered the ‘gold standard’ when assessing the efficacy of a treatment.
- Non-randomised control trials: Not every intervention can, or should, be randomised. Non-randomised controlled trials can detect associations between an intervention and an outcome, but they cannot rule out the possibility that the association was caused by a third factor linked to both intervention and outcome.
- Cross over trials: These involve the administration of two or more interventions one after the other in a specified or random order to the same group of patients. In these trials each participant acts as its own control, minimising the risk of confounding.
Access our Controlled Trial Checklist [button + link]
Synthesised evidence
These are studies which systematically review and critically evaluate already published literature. They may also be referred to as secondary sources of evidence, in contrast to the primary sources of evidence form published research studies.
When appraising secondary sources, it is still important to consider how reliable the evidence is, by looking for a clear description of how the evidence has been gathered and evaluated. With secondary sources of evidence, it is particularly important to consider whether any important or relevant evidence has been left out, such as newly published primary research.
Sources of synthesised evidence include:
- Systematic reviews: These are comprehensive surveys of a topic in which all the primary studies of the evidence have been systematically identified, those providing the most relevant evidence have been selected and critically appraised, and then graded and summarised according to explicit and reproducible methodologies. A systematic review should be performed in a highly structured way, with the question and methods clearly defined in advance to minimise any bias in the selection of papers and grading of evidence.
Access our systematic review checklist. [button + link]
- Meta analyses: These are secondary studies the in which the quantitative data from several similar research studies are combined and analysed as if they were a single study. Analyses of this type are normally accompanied by some sort of graphical representation, for example a forest plot.
- Knowledge summaries/critically appraised topics: A knowledge summary, or critically appraised topic, is a concise appraisal of the best available evidence on a defined clinical question. They should follow an explicit methodology to identify and critically appraise the evidence.
Access our Evidence Summary checklist. [button + link]
- Evidence-based guidelines: Guidelines can provide an accessible way for practitioners to access recommendations for the care of patients with specific conditions. However, if they are to provide a high level of evidence and a reliable basis for clinical decision making it is important that they are based on a systematic and critical evaluation of the available evidence.
Access the Clinical Guidelines checklist. [button + link]
Social science studies
Social science studies seek to understand humans and their interactions. They may use quantitative methods, like those that have been described in the section on clinical research. However, they may also use qualitative methods which seek to understand the experiences of the human subjects involved.
Studies using social science methods may be relevant to the delivery of veterinary care or the experience of caring for animals with certain conditions. They may involve veterinary professionals or owners as participants.
Even if you are unfamiliar with the particular methodology being used it should be possible to place them withing the broad categories of the levels of evidence (Figure 1).
Access our Qualitative Study Checklist. [button + link]
Cost benefit analyses
Cost benefit analyses in veterinary healthcare seek to compare the economic costs and effectiveness of an intervention with one or more alternatives. They may use actual data or computer modelling, but in either case, it is important to critically evaluate the way that costs and outcomes have been assessed.
Computer-based studies
Technological advances in computing, machine learning, and artificial intelligence are opening new approaches to research and the development of clinical decision aids. As these are new approaches to evidence generation we are still learning their strengths and limitations. However, the basic principles of critical evaluation and assigning levels of evidence (descriptive, observational, intervention studies and synthesis of evidence) can still be applied.
Large data sets
Large data sets can provide access to much larger samples than can be achieved with conventional research methods. However, the reliability of research based on these data sets will always be dependent on the quality of data available for analysis.
- Clinical data sets. Computerised clinical records and large datasets can provide access to very large samples for analysis. It should be remembered that as the data has been recorded for clinical rather than research purposes it may not include all the information that the researcher would want. Most of the research carried out on this sort of data will be descriptive or observational. However, new techniques such as “Trial Emulation” can be used to mimic clinical trials. In all cases it is important to carefully evaluate the selection criteria for cases and controls and consider the limitations of using retrospective data (Hernán et al 2022).
- Genome Wide Association Studies (GWAS) are a type of observational study that screen for genetic variants within a population to look for associations with a particular trait or disease.
Generative AI
This is a type of artificial intelligence which can create new content, including text, images and audio. Although it is “trained” on a large data set it uses generative algorithms to generate new content which is based on, but not identical to, the data it was trained on.
Computer modelling
There are some areas of research where it is very difficult to control variables and interventions, for example in one off events or complex systems such as disease outbreaks. In these cases, computer models can be used to examine the effect of different variables on outcomes. Until recently the variables were selected by those carrying out the research, however tools based on artificial intelligence are now starting to be used in veterinary research. In all cases the accuracy of computer models will depend on the assumptions on which they are based.
Machine learning
Machine Learning is a type of artificial intelligence using algorithms and statistical models to enable computers to “learn” and make predictions without being explicitly programmed. The algorithms are “trained” on large data sets to identify patterns and associations and then use this information to make predictions about new data sets. Machine learning can be subdivided based on whether the data has already been labelled (supervised learning) or whether the machine is left to identify patterns in the data (unsupervised learning). Deep learning is a further type of machine learning which use neural networks, with multiple layers to analyse complex relationships and patterns in data (Hennesey et al 2022; Pereira et al2023).
This type of research is now being carried out to develop tools which aid diagnosis, particular in diagnostic imaging. It is important that these tools should be validated not only on separate data sets but in a range of clinical settings.
Access our Diagnostic Test Study Checklist. EBVM Toolkit 12 – Diagnostic Test Study Checklist – RCVS Knowledge [button + link]
References and further reading
- Arlt, S.P. and Heuwieser, W., 2016. The staircase of evidence–a new metaphor displaying the core principles of evidence-based veterinary medicine. Veterinary Evidence, 1(1).
- Cardwell, J.M. (2008) An overview of study design. Journal of Small Animal Practice, 49 (5), pp. 217-218. https://doi.org/10.1111/j.1748-5827.2008.00594.x
- Cockcroft, P. and Holmes, M., 2008. Handbook of evidence-based veterinary medicine. John Wiley & Sons.
- Dean, R. (2013) How to read a paper and appraise the evidence. In Practice, 35 (5), pp. 282-285. DOI: https://doi.org/10.1136/inp.f1760
- Greenhalgh, T. (2014) How to read a paper: The basics of evidence-based medicine. 5th ed. Wiley Blackwell
- Hennessey, E., DiFazio, M., Hennessey, R. and Cassel, N., 2022. Artificial intelligence in veterinary diagnostic imaging: A literature review. Veterinary Radiology & Ultrasound, 63, pp.851-870.
- Hernán, M.A., Wang, W. and Leaf, D.E., 2022. Target trial emulation: a framework for causal inference from observational data. Jama, 328(24), pp.2446-2447.
- Pereira, A.I., Franco-Gonçalo, P., Leite, P., Ribeiro, A., Alves-Pimenta, M.S., Colaço, B., Loureiro, C., Gonçalves, L., Filipe, V. and Ginja, M., 2023. Artificial Intelligence in Veterinary Imaging: An Overview. Veterinary Sciences, 10(5), p.320.