Tag Archives: randomised control trials

Key issues in evidence-based policymaking: comparability, control, and centralisation

In other posts on evidence-based policymaking I’m critical of this idea: the main barriers to getting evidence into policy relate to the presentation of scientific evidence, timing, and the scientific skills of policymakers. You may overcome these barriers without closing the ‘evidence-policy gap’ and, for example, spend too much effort trying to reduce scientific uncertainty on the size of a policy problem without addressing ambiguity and the tendency of policymakers to be willing to consider only a limited range of solutions.

In this post, I try to reframe this discussion by generally describing the EBPM process as a series of political choices made as much by scientists as policymakers. The choices associated primarily with policymakers are also made by academics, and they relate to inescapable trade-offs rather than policymaking problems that can somehow be solved with more evidence.

In this context, a key role of policy analysis is to improve policymaking by clarifying key elements of the process in which we produce, understand, and use evidence on policy solutions[i] (in part by encouraging scientists to understand the choices of policymakers by reflecting on their own).

The three Cs: comparability, control, and centralisation

When we focus on the evidence underpinning policy solutions or ‘interventions’, let’s start with three key terms and note the potential to identify a notional spectrum of approaches to the issues they raise. I’ll describe them as ideal-types for now:

  1. Control.

In randomised control trials (RCTs) we conduct experiments to determine the effect of an intervention by giving the dose to one group and placebo to the other. This takes place in a controlled environment in which we are able to isolate the effect of an intervention by making sure that the only difference between the control groups is the (non)introduction of the intervention.

A key tenet of policy analysis is that it is difficult if not impossible to control real-world environments and, therefore, to be sure that an intervention works as intended.

Indeed, some scholars argue that complex policymaking systems defy control and, therefore, are not conducive to the use of RCTs. Instead, we use other methods such as case studies to highlight the interaction between a large number of factors which combine to produce an outcome (the result of which cannot be linked simply to the independent effects of each factor).

As a result, we have our first notional spectrum: at one end is complete confidence in RCTs to conduct policy-relevant experiments; at the other is a complete lack of confidence in their value.

  1. Comparability.

This choice about how to address the issue of control feeds directly into our understanding of comparability. If you have complete confidence in RCTs you can produce a series of studies in different times/places/populations, based on the understanding that the results are directly comparable. You might even say that: if it worked there, and there, and there, it will work here. If you have no confidence in RCTs, you seek other ways to consider comparability, such as by producing case study analysis that is detailed enough to allow you to highlight patterns which might be apparent in multiple cases.

  1. Centralisation.

If a national central government identifies effective policy solutions, should it roll them out uniformly across the country or allow local public bodies the chance to tailor solutions to their areas? At one end of another notional spectrum is the idea that they should seek uniformity to ensure ‘fidelity’ to a successful programme which requires a specific dosage, or simply to minimise a ‘postcode’ lottery in which people receive different levels of service in different parts of the country. At the other end is the preference for localism built on arguments including: one size does not fit all, local authorities have their own mandates to make local policy choices, a policy will not succeed without consultation and local ‘ownership’, and/ or local public bodies need the ability to adapt to quickly changing local circumstances.

What happens when the three Cs come together?

Although we can separate these terms and their related choices analytically, in practice people make these choices simultaneously or one choice in one category influences choices in the others.

In effect, two fundamental debates play out at the same time: epistemological and methodological disagreements on the nature of good evidence; and, practical disagreements regarding the best way for national policymakers to translate evidence into local policy and practice. Such disagreements present a major dilemma for policymakers, which cannot be solved by scientists or with reference to evidence. Instead, it involves political choices about which forms of evidence to prioritise and how to combine evidence with governance choices to inform practice.

Our new spectrum of choice may involve a range of options within the following extremes: oblige policy emulation/ uniformity and roll out policy interventions that require ‘fidelity’ to the policy intervention (minimal discretion to adapt interventions to local circumstances); or, encourage policy inspiration, as people tell detailed stories of their experiences and invite others to learn from them.

These approaches to policymaking are tied strongly to approaches to evidence gathering, such as when programmes based on RCTs require fidelity to ensure the correct dosage of an intervention during a continuous process of policy delivery and evaluation. They are also influenced by the need to adapt policies to local circumstances, to address (a) the ‘not invented here’ problem, in which local policymakers are sceptical about importing policies that were not developed in their area; and, (b) normative arguments about the relative benefits of centralisation and localism, regarding the extent to which we should value policy flexibility and local political autonomy, and the generation of normative principles guiding service delivery (e.g. include service users and communities in the design or ‘co-production’ of policy) as much as alleged effectiveness.

What is the value of such discussions?

First, elected policymakers are often portrayed as the villains of this piece because, for example, they don’t understand RCTs and the need for RCT-driven evaluations, they don’t recognise a hierarchy of evidence in which the systematic review of RCTs represents the gold standard, and/ or are unwilling to overcome ethical dilemmas about who gets to be in/ out of the control group of a promising intervention.

Yet, there are also academics who remain sceptical of the value of RCTs, have different views on the hierarchy of evidence (many scholars value practitioner experience and service user feedback more highly) and/ or who promote different ways to gather and use comparable policy-relevant evidence (see for example this entertaining exchange between Axford and Pawson).

Second, EBPM is not just about the need for policymakers to understand how evidence is produced and should be used. It is also about the need for academics to reflect on, for example:

  • the assumptions they make about the best ways to gather evidence and put the results into practice (in a political environment where other people may not share, or even know about, their understanding of the world).
  • the difference between the identification of evidence on the success of an intervention, in one place and one point in time (or several such instances), and the political choice to roll it out, based on the assumption that national governments are best placed to spread that success throughout the country.

Third, I have largely discussed extremes or ideal-types. In practice, people may be willing to compromise or produce pragmatic responses to the need to adapt research methods to real world situations. In that context, this kind of discussion should help clarify why scientists may need to work with policymakers or practitioners to produce a solution that each actor can support.

Further reading: EBPM and best practice 5.11.15


[i] Note:  we can generate and use evidence on two elements – (1) the nature of a problem, and (2) the value of possible solutions – in very different ways. A conflation of the two leads to a lot of confused debate about how evidence-based a policy or policymaking process tends to be.



Filed under Evidence Based Policymaking (EBPM), public policy

The politics of evidence and randomised control trials: the symbolic importance of family nurse partnerships

We await the results of the randomised control trial (RCT) on family nurse partnerships in England. While it looks like an innocuous review of an internationally well-respected programme, and will likely receive minimal media attention, I think it has high-stakes symbolic value in relation to the role of RCTs in British government.

EBM versus EBPM?

We know a lot about the use of evidence in politics – and we hear that politicians play fast and loose with it. We also know that some professions have a very clear idea about what counts as evidence, and that this view is not shared by politicians and policymakers. Somehow, ‘politics’ gets in the way of the good production and use of evidence.

A key example is the ideal of ‘Evidence Based Medicine’ (EBM), which is associated with a hierarchy of evidence in which the status of the RCT is only exceeded by the systematic review of RCTs – particularly when the results of this work are peer reviewed and published in high-status journals or databases.

This contrasts with evidence based policy making (EBPM) in which there are competing notions of evidence value, competing sources of evidence (expertise, policymaker experience, professional opinion, service user feedback, etc.), and a greater sense that policymakers will beg, borrow and steal whatever evidence they can get their hands on quickly to address the specific problem they face – including reports that are not peer reviewed or published in outlets with recognised scientific status.

Policymakers also have to weigh up evidence on policies that are difficult (if not impossible) to compare with each other, and come up with ways of choosing between them – such as by assessing their value for money in relation to the benefits they provide.

A compromise between evidence and politics?

In some cases there may be a decent compromise between these practices. In health, expert bodies such as NICE have become responsible for combining the kinds of evidence consistent with EBM with economic and other methods (often including professional and user feedback) to produce guidance on policy choices. NICE does not quite take the politics out of health and social care choices (and nor should it) but it often acts as a standard which prompts policymakers to accept its advice or explain why they don’t.

There have also been important efforts to encourage the greater use of RCTs more widely in government, such as by the Cabinet Office’s Behavioural Insights Team and academics such as Peter John and Gerry Stoker.

Major obstacles to the uptake of RCTs

Yet, these developments do not guarantee a central role for the RCT in politics more generally (far from it). Rather, many politicians or policymakers exhibit uncertainty or scepticism about:

  1. The relevance of RCT evidence – they may argue that (a) an RCT does not answer their question fully or capture the complexity of a policy problem, and (b) that RCT evidence from somewhere else does not apply to their area.
  2. Practical and ethical – an RCT could require cooperation across many levels and types of government, and randomisation is a ‘hard political sell’, at least to elected policymakers who (a) rely on an image of certainty when they propose policies (why would you need to test a policy’s value – are you trying something that might fail?), and (b) struggle with the idea of giving a good intervention to one group of people and not another (if the policy works, why don’t you give everyone the benefit?).

These political concerns may combine with academic criticisms about the assumptions behind RCTs (e.g. that you can produce what can meaningfully be called ‘control’ groups in complex social interactions) to produce major obstacles to the uptake of the evidence favoured by key groups of scientists.

The next best thing: importing policies based on RCTs

Perhaps the next best thing to conducting an RCT in the relative dark is to import a programme with an international reputation for well-evidenced and impressive results. That is where the family nurse partnership (FNP) comes in (box 1).


BOX 1: The Family Nurse Partnership

The FNP began in the US as the Nurse-Family Partnership – designed to engage nurses with first time mothers (deemed to be at relatively high risk of poor life chances) approximately once per month from pregnancy until the child is two. The criteria for inclusion relate to age (teenage), income (low), and partnership status (generally unmarried). Nurses give advice on how mothers can look after their own health, care for their child, minimise the chances of further unplanned pregnancy, and access education or employment. It combines intervention to address the immediate problems of mothers and early intervention to influence the longer term impact on children.

The US’ Coalition for Evidence-Based Policy gave it ‘top tier’ status, which describes ‘Interventions shown in well-designed and implemented randomized controlled trials, preferably conducted in typical community settings, to produce sizable, sustained benefits to participants and/or society’. Identifying three US-based RCTs, it describes common outcomes in at least two, including reductions in pre-natal smoking, child abuse and neglect, and second pregnancies, and improvements in their child’s cognitive function and education attainment (in follow-ups when the children reached 15-19) at a low cost. These trials have been conducted since the first project began in 1977, producing at least 18 peer-reviewed articles, including by its pioneer Professor David Olds, in elite academic journals (such as Journal of the American Medical Association), and at least two which identify new results in non-US studies.

The programme was rolled out in England to 9000 mothers, with reference to its high cost effectiveness and ‘strong evidence base’, which would be enhanced by an RCT to evaluate its effect in a new country. The FNP requires ‘fidelity’ to the US programme (you can only access the progamme if you agree to the licensing conditions) based on evaluation results which showed that the programme was most effective when provided by nurses/ midwives and using a license ‘setting out core model elements covering clinical delivery, staff competencies and organisational standards to ensure it is delivered well’. Fidelity is a requirement because, ‘If evidence-based programmes are diluted or compromised when implemented, research shows that they are unlikely to replicate the benefits’.


Adopting the FNP doesn’t solve the ‘not invented here’ problem, but it helps reduce many concerns: we import a successful policy (with success demonstrated in multiple RCTs) and conduct an RCT to make sure that a programme that works somewhere else works here. Not everyone gets the programme but, unlike in the US, they still receive ‘universal’ NHS care. The use of an RCT can also be sold politically, as (a) part of the license and (b) the kind of routine evidence gathering/ evaluation that should be present in all policy interventions anyway. This RCT/ programme also relates to a fairly contained group of recipients and healthcare professionals, with not as much need for ‘joined up government’ or ‘health and social care integration’ as in many other initiatives. It is praised by NICE and the Early Intervention Foundation.

In this context, the FNP is almost perfect

That’s what makes it seem so important symbolically. It’s like a trailblazer, showing all that is right with the use of multiple RCTs, to perform meaningful tests to demonstrate the effectiveness of a public policy. It is as much an advert for the value of the RCT as for the value of the programme.

The flip side to this coin is that, if the perfect programme doesn’t produce meaningfully better results than the NHS programme it replaced, some people may get the sense that we went to a lot of bother and expense for very little reward. The idea of a ‘gold standard’ of research may take on a different connotation, particularly during a period of austerity in which governments may be reluctant to invest in new policies and their evaluation when they have to reduce public provision.

Therefore, I expect the release of the RCT results to be political, at least in the sense that they won’t be released without some thought given to how to present the findings in as positive a way as possible. That’s perhaps not in the spirit of the ideal of EBM, but it seems consistent with the reality of EBPM.


The first RCT results were published in October 2015 in the Lancet. A very short summary of these developments is as follows:

  • After publishing the results of the RCT, Robling et al argue that ‘Programme continuation is not justified on the basis of available evidence, but could be reconsidered should supportive longer-term evidence emerge’
  • David Olds’s reply is that the FNP could be more effective if directed more accurately to the most relevant target population
  • The Local Government Association, which recently became responsible for public health (alongside social services), made a broad statement about using the opportunity to look ‘closely at how to achieve maximum effectiveness for the Family Nurse Partnership programme and whether it can be adapted to achieve better value’
  • The Early Intervention Foundation commends the use of good evaluation and reinforces its view that there are many well-evidenced programmes from which to choose

Further Reading

This draft paper provides some further reading on the trainspotter’s guide to evidence/ policy, while this link takes you to a draft Palgrave Pivot book with a bibliography on EBPM.

See also: Ruth Kennedy What works. Can we know?


Filed under Evidence Based Policymaking (EBPM), public policy, UK politics and policy

Policy Concepts in 1000 Words: Success and Failure (Evaluation)

(podcast download)

Policy success is in the eye of the beholder. The evaluation of success is political in several ways. It can be party political, when election campaigns focus on the record of the incumbent government. Policy decisions produce winners and losers, prompting disputes about success between actors with different aims. Evaluation can be political in subtler but as-important ways, involving scientific disputes about:

  • How long we wait to evaluate.
  • How well-resourced our evaluation should be.
  • The best way to measure and explain outcomes.
  • The ‘benchmarks’ to use – should we compare outcomes with the past or other countries?
  • How we can separate the effect of policy from other causes, in a complex world where randomised-controlled-trials are often difficult to use.

In this more technical-looking discussion, the trade-off is between the selection of a large mixture of measures that are hard to work with, or a small number of measures that are handpicked and represent no more than crude proxies for success.

Evaluation is political because we set the agenda with the measures we use, by prompting a focus on some aims at the expense of others. A classic example is the aim to reduce healthcare waiting times, which represent a small part of health service activity but generate disproportionate attention and action, partly because outcomes are relatively visible and easy to measure. Many policies are implemented and evaluated using such proxies: the government publishes targets to provide an expectation of implementer behaviour; and, regulatory bodies exist to monitor compliance.

Let’s consider success in terms of the aims of the person responsible for the policy. It raises four interesting issues:

  1. The aims of that policymaker may not be clear. For example, they may not say why they made particular choices, they may have many reasons, their reasons may not be specific enough to be meaningful, and/or they may not be entirely truthful.
  2. Policymaking is a group effort, which magnifies the problem of identifying a single, clear, aim.
  3. Aims are not necessarily noble. Marsh and McConnell describe three types. Process measures success in terms of its popularity among particular groups and its ease of passage through the legislature. Political describes its effect on the government’s popularity. Programmatic describes its implementation in terms of original aims, its effect in terms of intended outcomes, and the extent to which it represented an ‘efficient use of resources’. Elected policymakers may justify their actions in programmatic terms, but be more concerned with politics and process. Or, their aims may be unambitious. We could identify success in their terms but still feel that major problems remain unsolved.
  4. Responsibility is a slippery concept. In a Westminster system, we may hold ministers to be ultimately responsible but, in practice, responsibility is shared with a range of people in various types and levels of government. In multi-level political systems, responsibility may be shared with several elected bodies with their own mandates and claims to pursue distinctive aims.

Traditionally, these responsibility issues were played out in top-down and bottom-up discussions of policy implementation. For the sake of simplicity, the ‘top’ is the policymaker at the heart of central government and we try to explain success or failure according to the extent to which policy implementation met these criteria:

1.   The policy’s objectives are clear, consistent and well communicated and understood.

2.   The policy will work as intended when implemented.

3.   The required resources are committed to the programme.

4.   Policy is implemented by skilful and compliant officials.

5.   Success does not depend on cooperation from many bodies.

6.   Support from influential groups is maintained.

7.   Demographic and socioeconomic conditions, and unpredictable events beyond the control of policymakers, do not significantly undermine the process.

Such explanations for success still have some modern day traction, such as in recommendations by the Institute for Government:

  1. Understand the past and learn from failure.
  2. Open up the policy process.
  3. Be rigorous in analysis and use of evidence.
  4. Take time and build in scope for iteration and adaptation.
  5. Recognise the importance of individual leadership and strong personal relationships.
  6. Create new institutions to overcome policy inertia.
  7. Build a wider constituency of support.

Alternatively, ‘bottom-up’ studies prompted a shift of analysis, towards a larger number of organisations which made policy as they carried it out – and had legitimate reasons to diverge from the aims set at the ‘top’. Indeed, central governments might encourage a bottom up approach, by setting a broad strategy and accepting that other bodies will implement policy in their own way. However, this is difficult to do in Westminster systems, where government success is measured in terms of ministerial and party manifesto aims.

Examples of success and failure?

Many implementation studies focus on failure, including Pressman and Wildavsky’s ‘How Great Expectations in Washington are Dashed in Oakland’ and Marsh & Rhodes’ focus on the ‘implementation gap’ during the Thatcher Government era (1979-90).

In contrast, the IFG report focuses on examples of success, derived partly from a vote by UK political scientists, including: the national minimum wage, Scottish devolution, and privatisation.

Note the respondents’ reasons for declaring success, based on a mix of their personal values and their assessment of process, political and programmatic factors.  They declare success in very narrow terms, as the successful delivery in the short term.

So, privatisation is a success because the government succeeded in raising money, boosting its popularity and international reputation – not because we have established that the nationalized industries work better in the private sector.

Similarly, devolution was a declared a success because it solved a problem (local demand for self-autonomy), not because devolved governments are better at making policy or their policies have improved the lives of the Scottish population (Neil McGarvey and I discuss this here).

Individual policy instruments like the smoking ban are often treated in similar ways – we declare instant success when the bill passes and public compliance is high, then consider the longer term successes (less smoking, less secondhand smoke) later.

Further reading and watching: (1) Can a Policy Fail and Succeed at the Same Time?

(2)  http://blogs.lse.ac.uk/politicsandpolicy/archives/34735

Why should you read and watch this case study? I hesitate to describe UK tobacco control as a success because it instantly looks like I am moralising, and because it is based on a narrow set of policymaking criteria rather than an outcome in the population (it is up to you to decide if the UK’s policies are appropriate and its current level of smoking and health marks policy success). However, it represents a way to explore success in terms of several ‘causal factors’ (Peter John) that arise in each 1000 Words post: institutions, networks, socioeconomic conditions and ideas. Long term tobacco control ‘success’ happened because:

  • the department of health took the policy lead (replacing trade and treasury departments);
  • tobacco is ‘framed’ as a pressing public health problem, not an economic good;
  • public health groups are consulted at the expense of tobacco companies;
  • socioeconomic conditions (including the value of tobacco taxation, and public attitudes to tobacco control) are conducive to policy change;
  • and, the scientific evidence on the harmful effects of smoking and secondhand smoking are ‘set in stone’ within governments.

The ‘take home’ message here is that ‘success’ depends as much on a policy environment conducive to change as the efficacy of political instruments and leadership qualities of politicians.

Update September 2019

I have now written up this UK tobacco discussion in this book:

Paul Cairney (2019) ‘The transformation of UK tobacco control’ in (eds) Mallory Compton and Paul ‘t Hart Great Policy Successes: How Governments Get It Right in a Big Way at Least Some of the Time (Oxford: Oxford University Press) Preview PDF

Each chapter is accompanied by a case study, such as the one on UK tobacco by 



Filed under 1000 words, agenda setting, Evidence Based Policymaking (EBPM), public policy, UK politics and policy