3 key aspects about statistical power in impact assessments
Impact assessment results are usually translated into public policy decisions. The "statistical power" is one of the items that helps ensure that such results actually reflect the assessed policy impact.
Is it the same to retrieve a scuba diving mask from a pool as to retrieve it from the bottom of the sea? If our diving mask sinks into a pool, it is likely we could retrieve it ourselves, or with the help of a partner. However, losing the mask in the sea is more complicated. We will probably need several people to help us recover it, or we may have to consider it lost, when, in fact, it is really very close to us.
Statistical power is represented by the number of people that we need to retrieve our diving mask, since it has not been lost. From the statistical point of view, power is defined as the ability of a specific study to detect the effects of an intervention when it actually had an impact on the test group. One of the most important determinants of statistical power is the size of the experimental sample. The bigger it is, the more precise estimates will be, and researchers will be more likely to detect the impacts of the assessed intervention, if any.
Below, we describe three keys aspects of the role of statistical power and its importance regarding impact assessments results.
1. An assessment with low statistical power is inconclusive
If the study sample size is very small, it is possible that, although the intervention may have had effects, we cannot detect the impacts on the interest variable. Therefore, there is a high risk that public servants rule out programs that were actually effective and that made significant changes in the test group. At the same time, including this false evidence in literature could discourage the implementation of similar programs in the future. In this regard, an impact assessment without statistical power to detect effects cannot reach conclusive results since, if no effects are detected, it will not be possible to differentiate the policy ineffectiveness (losing the mask) from the lack of statistical power (insufficient number of people to get it back).
2. Power calculations allow you to set the ideal sample size or define the minimum effect that we will be able to detect
In the world of public policies, there are budget constraints that can limit the number of beneficiaries of a program or the reach of information research associated with the assessment. To ensure cost-effectiveness, it is possible to perform ex ante power calculations and define the ideal sample size or the minimum effect we can detect, given a particular sample.
If the sample is already defined and it is not possible to expand it, power calculations help us estimate the minimum effect of the intervention that we can theoretically detect given such constraints. The smaller the minimum detectable effect, the greater the assessment’s statistical power.
To exemplify this statement, let’s go back to the example: the greater the number of people who will help us retrieve the mask, the more trained we will be to retrieve it, regardless of being lost in a pool or at sea. Imagining that we only have two people, we will probably manage to retrieve it from a swimming pool, but we will surely not be able to retrieve it from the sea. Therefore, if the successful effect of our policy turns out to be small and we did not have statistical power, it may be impossible for us to see, running the risk of stating there were no impacts, when there were some in fact.
Power calculations give us information about the capacity we have to detect impacts prior to the intervention. Based on this data, we can determine if we have enough statistical power to identify large or very small effects.
3. The sample size is key, but other elements should also be considered
Despite being a determining factor, the sample size and the minimal effect which we expect to detect as a result of the assessment are not the only relevant factors. Statistical power is also related to both the assessment design (if there are one or more test groups), and the randomization level, i.e., when the assignment for testing or control is made by groups (schools, municipalities, hospitals), and not at an individual level. In addition, there is a possible risk of noncompliance and contamination , or a high rate of exhaustion. All of these elements have an impact on the effective sample size and, thus, on the statistical power. We will later describe this relationship and its implications in detail.
The results of an impact assessment can have significant implications on public policy decisions. Therefore, ensuring that these findings really reflect the impact of the programs is essential. In this context, the statistical power is a key element to be considered by those in charge of the assessment.