Commentary
Article
Despite its importance, after graduating from pharmacy school, many pharmacists likely experience a decline in confidence in their ability to appropriately evaluate studies and interpret statistical findings.
Pharmacists are crucial for ensuring safe and effective medication therapy. A fundamental facet of this role is a solid understanding of the evidence generated by studies involving the use of a drug. In this context, being able to analyze these studies, including their design and statistical findings, is key. Although this task can seem intimidating, possessing literature evaluation skills is necessary for making evidence-based decisions.
Despite its importance, after graduating from pharmacy school, many pharmacists likely experience a decline in confidence in their ability to appropriately evaluate studies and interpret statistical findings. This article aims to offer pharmacists a brief refresher on key concepts relevant to performing critical analysis of such studies; it is not intended to serve as a comprehensive primer on trial design and statistics.
Studies involving drug use are commonly categorized as experimental (e.g., randomized controlled trial [RCT]) and observational. Generally, experimental studies measure drug safety and efficacy under ideal conditions, whereas observational studies evaluate its safety and/or effectiveness in real-world use.
Randomized controlled trials are considered the gold standard of clinical research due to their ability to minimize bias and evaluate the intended effects of an intervention. Key study design elements of RCTs include randomization, blinding, and the use of a control arm. Randomization primarily involves the process by which study participants are assigned to receive study intervention or control. Blinding refers to keeping key groups of individuals, such as participants and investigators, unaware of treatment assignments. In double-blind studies, which are most common, both participants and investigators are unaware of the treatment assignment. In contrast to blinded studies, open-label studies refer to situations where both the health care professionals and the participants are aware of treatment allocation. The control arm is used to compare the efficacy and safety of an intervention to a placebo or standard of care, as is the case in active control studies. When experimental studies are performed properly, the only difference between the study arms is the treatment administered. As such, when differences in outcomes are identified, one may conclude that they are likely caused by the intervention.
Observational studies, like cohort or case-control studies, typically focus on the unintended effects of a variable of interest (e.g., a drug or environmental/chemical exposure). In this case, investigators do not assign interventions to study participants; rather, they only observe the effects of the exposure to the variable of interest. These studies are useful for determining associations, but because they lack randomization, bias and confounding factors are more likely to exist. Therefore, they cannot be used to establish cause-and-effect relationships as experimental studies do.
The ability to apply the results of a study in clinical practice depends on 2 factors: internal validity and external validity. Internal validity is directly related to the study design, methods, and analysis of the results. When these elements are not done properly—for example, if participants were not appropriately randomized into the study arms—the results of the study may be skewed and not useful for answering the research question. In this case, the study’s internal validity will be low.
External validity refers to the ability to extrapolate the study results to the broader population. Because only a sample of the population is studied in most cases, it is important to ensure that the sample is representative of the population in which the intervention will be used. As such, close attention must be paid to the inclusion and exclusion criteria used in the study. When these criteria allow for generalizability, external validity will be high.
Outcomes are used to measure the efficacy and/or safety of an intervention. Clinical outcomesmeasure efficacy and safety parameters that matter to the patient. Examples include feeling better, time out of the hospital, and/or living longer. Measuring clinical outcomes may be difficult because it often necessitates a large number of study participants and a long study duration. As such, investigators sometimes use surrogate outcomesto predict the clinical effects an intervention will have on patients. For instance, looking at the effects of an intervention on lipid levels may serve as a surrogate for cardiovascular outcomes, such as stroke. Some studies may include both clinical and surrogate outcomes.
Outcomes are typically categorized as primary or secondary, each playing a distinct role in evaluating drug efficacy and safety. A primary outcome is the main outcome a study is designed to assess, and it directly reflects the study's objective. It is common to have a single primary outcome, whereas there may be multiple secondary outcomes. Although they are not the focus of the study, secondary outcomes can offer valuable complementary information about the potential benefits and/or risks of an intervention. It is important to pay attention to how outcomes are measured in clinical studies. Ideally, investigators should be using measurement tools that are considered gold standard.
Typically, in an RCT, data analysis can be performed using 2 complementary strategies: intention-to-treat analysis and per-protocol analysis. Intention-to-treat analysis examines data for all participants originally allocated to each treatment group even if they did not complete the study according to study protocol. On the other hand, per-protocol analysis is conducted only for the subset of participants who completed the study according to study protocol.
Studies may include descriptive and/or inferential statistics. Descriptive statistics simply describe the data sets (e.g., mean body weight), while inferential statistics analyze data to draw conclusions about the general population.
Statistical significance may not necessarily mean that a difference is clinically meaningful. Finding a statistically significant difference indicates that the difference identified between the treatment arms is unlikely to be due to chance; rather, it is likely due to the intervention. However, the difference may not be substantial enough to affect the outcomes that matter to the patient (i.e., a clinically significant difference). For instance, a statistically significant difference in blood pressure may not be large enough to produce an outcome that experts in the field believe would make patients’ lives better.
Conducting proper data analysis requires an adequate number of participants (sample size) to detect a difference between treatment arms. The study power that the investigators set is used to determine the number of participants needed. A power of 80% or 0.8 is commonly considered acceptable in clinical studies.
Effect size provides information on the magnitude of the difference between the study arms and therefore is indicative of the importance of the difference found. One common term associated with measures of effect size is relative risk (RR).
Relative risk is a measure used to compare the probability of an event occurring in the treatment group versus the control group. In practical terms, it is calculated by dividing the probability (usually %) of the event occurring in the treatment group by the probability of it occurring in the control group. A RR of 1 implies that there is no difference in risk of the event between groups. A RR < 1 implies a lower risk of the event in the treatment group, whereas a RR > 1 implies a higher risk of the event in the treatment group. For example, a RR of 0.6 indicates that the treatment group has a 40% lower risk of the event compared to the control group.
Confidence interval (CI) is a term used in conjunction with the study’s findings (e.g., the RR) and refers to the range where the true effect size is expected to lie. In most cases, a 95% CI is used. This means that if the study was repeated over and over, 95% of the resulting RR would fall within this CI. The width of the CI is related to the sample size and variability of the sample; the narrower the CI, the greater the precision of the values. If the 95% CI for a RR crosses 1 (e.g., 0.4-1.2), a statistical difference between study arms is not present. Likewise, if study findings are presented as an estimated difference, a 95% CI crossing a 0 (e.g., -1.2-0.9) indicates that the difference is not statistically significant.
Basic skills in study analysis and critique are required for pharmacists to be able to interpret clinical studies. Applying study results to practice can help pharmacists improve patient outcomes.