How do we judge the quality of evidence we use to make decisions? CHEW Trustee and Director of Salisbury Kuczkowska Consulting, David Salisbury shares his thoughts .
Judging how confident we should be in the evidence available to inform our decisions is tricky. This is not least because it requires us to stick our colours to the mast of what we think makes for quality evidence. There are differing opinions on this within organisations, and they are often quite passionate. Indeed, I’m sure that there will be those of you who both agree and disagree with my interpretation of what makes quality evidence. However, if you are working in an organisation that doesn’t define and agree this then it will be an uphill struggle to develop a culture of using evidence, because you will lack a common language with which to discuss it.
Most of the literature on “evidence quality” centres around the quality of evidence on interventions. This is the base of evidence that would inform “testing decisions” that I mentioned in my previous blog. I think it’s broadly true to say that there is a split in this literature between those who believe in appropriateness of design versus those who believe in more rigid hierarchies (in which randomised controlled trials and meta-analysis / systematic review of RCTs are seen as a “gold standard”).
I’m not going to get too deep in that debate here as it probably would need it’s own separate blog that I might get to in the future. What I will say is that evidence standards and guidance that espouse hierarchy of evidence (e.g. NESTA’s standards of evidence*) are probably more well known across the third sector in the UK and I don’t think they are particularly helpful.
To be totally truthful, I think that the notion of the traditional hierarchy in applied social research is a “fools gold” standard. I don’t believe that there is any one design that has a monopoly on producing good evidence and any evaluation design needs to be chosen taking into account a mixture of both the question that is being asked and the context and circumstances in which the answer will play out. I think the most recent iteration of the Magenta Book is actually pretty good. Rather than advocate one approach over all the others, it sets out when different evaluation designs are most appropriate (check out figure 2.1 on page 23). The supplement on complexity is a bit braver than the actual main document itself and makes for good reading too.
My position on quality is similar to that set out in this paper from the American Educational Research Association. This paper is a guide for how researchers report their work, but I think many of the principles in here are very useful when reading research and considering its quality.
The first two areas it covers are “problem formulation” (which is similar to the ideas I talked about in my previous blog) and “design and logic.” It states that “the design and logic of a study flows directly from the problem formulation” before setting out that different purposes of studies require different study designs. In other words, when considering the evidence base we should (be able to) interrogate whether the study design used was appropriate to the circumstance and question asked. In organisations we will typically be asking “is this the evidence I need to inform this decision”.
What standards like those adopted by NESTA provide is a useful shorthand for looking at the evidence base and making judgements. By being able to look at the description of the level of evidence in those standards, readers have a language for describing the quality of that evidence base.
Conversely, the logical conclusion of following the principles from the AERA paper (and I do note it’s not written for this purpose) would be to undertake a fairly robust critical appraisal of every paper and piece of research on the topic. That is not necessarily bad advice but it is quite unrealistic for most professionals.
I think what is needed for the third sector is a halfway house between the two, something that is more nuanced than the NESTA standards but that doesn’t expect professionals to get into the detail of each study the way the AERA paper does.
I have made a suggestion of what such a framework could look like below. I hope for this to be a practical tool, providing a clear and transparent language with which we can talk about the confidence we have in the evidence available when making decisions. Linking back to my first blog – the framework is an artifact bringing the language element of the culture to life in something tangible.
The framework I suggest takes as it’s starting point the types of decisions we make and gives a rule of thumb for what gives us more or less confidence in the body of evidence we have to inform that decision (and the types of questions that underpin it). It works at an aggregate level, considering the confidence in the body of evidence available on the topic rather than the detail of how to appraise the quality of individual studies.
There are four principles that are useful when considering the available confidence in an evidence base. I’ve drawn these from two places – DFID’s 2014 guide on Assessing the Strength of Evidence and NESTA’s Using Research Evidence: A Practice Guide. This includes:
Completeness – capturing the full range of evidence on any given topic
Consistency – the degree to which similar messages emerge from different pieces of research, analysis and evaluation
Quantity – volume of evidence on the topic
Quality – appropriately designed, transparent studies in reasoning and method
So without further ado – here’s a suggested framework. It could probably be improved so please don’t be shy in the comments! I've included a fairly crude summary version first - with more detail below that.
The summary version
The full table
*whilst I don’t agree with the conclusion they come to in their standards of evidence, there is a lot that is very useful in NESTA’s guide to using research evidence (pretty much everything that comes before the bit where the document points you back to the rigid standards of evidence is very helpful).
This blog series has been reproduced with permission from Dave Salisbury's personal blog . You can follow Dave for further updates here