Evaluate Impact

Evaluation Design

Given the likely size and complexity of an evaluation to assess not only health outcomes but also the value added by integration, programs should begin planning their impact evaluation at the earliest stages of project design. The best impact evaluations for any kind of program use a combination of evidence from formative research, implementation monitoring, and process evaluation, as well as post intervention data collection, to understand not only the magnitude of change but the mechanisms of change, as well. This is even more important when evaluating integrated programs, due to the greater number and diversity of inputs, the degree of cross-sectoral collaboration and, often, the longer time frame involved in integrated programming.

Research designs will vary depending on what is being integrated, the amount of time and funding available, the level of control the program has over implementation processes, the potential (or mandate) for dissemination of learnings from the program, and the practicality and acceptability of certain research designs (e.g., randomized trials vs. quasi-experimental vs. observational designs), among other factors. Below are several examples of research designs and examples of their appropriate use. Since true experimental designs involving random assignment to program interventions are rarely feasible with large scale full coverage programs, donors and key stakeholder groups must decide what evaluation designs are considered appropriate and sufficient to demonstrate the value of integration and the achievement of project objectives.

Tip

It is unlikely that a single type of evaluation design will tell the whole story of complex, integrated SBCC programs. Evaluation of integrated SBCC programs almost always requires a mixed methods approach. A combination of both quantitative and qualitative methods often gives the most robust picture, bringing to light direct and indirect impact pathways at multiple levels. Quantitative methods provide the most rigorous methods for measuring the magnitude and modeling the process of change at the population level, while qualitative methods provide in-depth, localized insights into synergies, unanticipated consequences, and contextual factors that help explain outcomes. The research designs that follow, therefore, are not meant to be mutually exclusive.

Evaluation Design: Randomized Controlled Trials

Randomized controlled trials (RCT), sometimes called experimental designs, randomly assign individuals or groups to receive or not receive a particular intervention, then compare outcomes among those exposed and unexposed. RCTs are generally considered to provide the strongest evidence of cause and effect, but have low external validity, meaning that they don’t generate evidence of how the intervention would work in the real world where controlled conditions are not possible. The RCT approach also builds knowledge about successful interventions by replicating studies multiple times with minor variations; this is not feasible with population-based interventions at scale. Also, programs that employ mass media as part of their intervention strategy are difficult to randomize because it is hard to prevent spillover between treatment and control locations. Integrated programs are even harder than vertical programs to study using RCTs because they typically have too many components to be systematically randomized. Facility-based interventions can sometimes be randomized—if the populations served by different facilities do not overlap—by sampling service delivery points (or providers) and randomly assigning some to implement an integrated program while others implement a vertical program, then comparing outcomes across the two groups of facilities.

Using a full factorial experimental design can help integrated programs determine whether an integrated design contributed to amplified or synergistic effects. In this method, participants are randomized to either: 1) a control group (no intervention), 2) a single intervention arm for each activity included in the study, or 3) a multi-intervention arm for each permutation of integration. The more components there are in the integrated strategy, the more arms of the study that must be created and randomly assigned to reflect all the possible combinations of components. FHI360 observes, “The simplest full factorial design for integrated evaluation will include four arms: one control, one for the first activity, one for the second activity, and one for the integrated activities. In such a design, if true amplification is achieved, the integrated arm(s) should show a degree of change that is greater than the sum of change among all of the arms that are not integrated.”
– (Source: FHI360 Guidelines for Integrated Development Programs)