Evaluation as a core component of Pay for Success projects (Urban Institute)

By Mayookha Mitra-Majumdar, Kelly Walsh

Standing up a new program, or scaling a promising one, can be challenging. Financial, reputational and political consequences can make even the most forward-thinking local governments risk-adverse.  As a result, innovative and evidence-based programs often fall by the wayside in favor of the status quo.

Pay for success (PFS) aims to change that. By shifting financial risk from the government to private investors, PFS makes it more feasible for communities to scale evidence-based social programs, promoting evidence-based policymaking, and fostering a culture of outcome-setting and measurement. Thus, evaluations, which ascertain the impact of program services on participants, are a core component of PFS projects and provide the basis for key decisions on investor repayment.

A rigorous evaluation must address the counterfactual: what outcomes could have been expected in the absence of program services? Measuring the outcomes of an equivalent but untreated comparison group—called the “control” group in experimental evaluation designs—provides an answer to the counterfactual, and increases confidence in a causal link between program services and participant outcomes.

A new paper from the Pay for Success Initiative (PFSI) details three different evaluation design types, and their comparative strengths and weaknesses, available to PFS evaluators.

A companion paper argues that randomized controlled trials (RCTs), are the most rigorous way to measure program impact. In an RCT, independent evaluators randomly assign program participants to either the treatment or control groups. Proper randomization maximizes the likelihood that the outcomes measured can be causally attributable to the services provided. RCTs offer a high level of confidence and clarity in project results, a method of fairly distributing potentially scarce resources, and the opportunity to contribute to the program’s evidence base.

All quasi-experimental designs—such as regression analysis and propensity score matching—seek to construct a comparison group without randomization. Quasi-experimental designs can control for observable factors, such as gender and income level, but are less equipped to control for unobservable traits, such as motivation or intrinsic ability, which also affect participant outcomes. This can result in non-equivalent comparison and treatment groups, and can weaken the causal link between services and outcomes.

Non-experimental evaluations do not incorporate a comparison group. They only measure outcomes for those who receive program services. Participant outcomes are compared to expected outcomes absent services, or, similar to the United Kingdom’s “rate card” programs, benchmarks derived from historical outcomes for the target population. Because non-experimental designs do not include a comparison group, the impact of program services cannot be distinguished from the impact of broader trends. A causal link cannot be made.

Of the 11 PFS projects implemented in the United States, six have incorporated an RCT, with a seventh using an RCT companion study. Rigorous evaluations like RCTS build theevidence base behind an intervention, which generates more confidence in projects scaling that intervention, and inform future PFS projects and policymaking. 

As an organization, the Urban Institute does not take positions on issues. Scholars are independent and empowered to share their evidence-based views and recommendations shaped by research.