For pharmacists looking to become a board-certified pharmacotherapy specialist, it is essential to study statistics in preparation for the exam. In this article two pharmacists who have passed the BCPS exam and have significant research experience provide a BCPS statistics review for pharmacists.
Authored By: Timothy P. Gauthier, Pharm.D., BCPS-AQ ID & Tristan T. Timbrook, Pharm.D, MBA, BCPS
[Last updated: 9 July 2018]
In recent years the data provided by the American Association of Colleges of Pharmacy shows that within the profession of pharmacy there has been an increase in the number of pharmacy schools as well as an increase in the number of people graduating with doctor of pharmacy degrees.
With more competition for jobs pharmacists are seeking opportunities to achieve certifications to set themselves apart from within the applicant pool. The Board of Pharmacy Specialties (BPS) is one popular place pharmacists now look to for this purpose. BPS offers certification examinations for numerous pharmacy specialty areas and each has specific criteria for test eligibility. By far the most popular certification is that of a board certified pharmacotherapy specialist (BCPS), a designation more than 21,000 pharmacists have achieved.
Statistics study cheat sheets.
In speaking with anyone who has passed the BCPS exam you will learn that having a basic understanding of statistics is essential to succeed on test day. In the most recent BCPS content outline there are three domains and domain #2 (drug information and evidence based medicine) identifies that pharmacists should have knowledge of biostatistical methods and interpretation, clinical vs. statistical significance, and protocol design, methodology, and biostatistical methods.
To provide an additional study tool for those seeking BCPS designation the following is provided. Here is a BCPS statistics review for pharmacists.
NOTE: Trial design is not described here in detail, but has a major impact on the validity of statistical test results and must always be considered. Additionally, this is a summary of select information and readers are referred to the suggested readings at the end of this article for more in-depth information.
VARIABLE TYPES
To select the proper statistical test it is necessary to first identify the types of variables (i.e., data) that have been used. Data can be nominal, ordinal, or continuous (interval or ratio). Here are notes on the different variable types.
1. Nominal
- Data without order or indication of relative severity
- Is a discrete variable
- Not eligible for parametric tests
- Examples: sex, mortality, presence of a disease state
Picking a Test for NOMINAL Data |
||
Dependence |
Samples |
# Independent Variables → Test |
Independent Samples
Parallel Design |
2 |
0 → X2, Fisher’s Exact test (if <5)
1 → Mantel-Haenszel test >2 → Logistic Regression |
>3 |
0 → X2*
1 → X2* >2 → Logistic Regression |
|
Dependent (Paired)
Cross-over Design, Related |
2 |
0 → McNemar Test
>1 → Repeated measures logistic regression |
>3 |
0 → Cochran’s Q*
>1 → Repeated measures logistic regression |
*Bonferroni correction applied
2. Ordinal
- Data ranked in a specific order, but lacks a constant difference in magnitude change
- Is a discrete variable
- Not eligible for parametric tests
- Median is the most common measure of central tendency used to describe ordinal data
- Mean and standard deviation NOT used for reporting
- Commonly used in observational studies
- Examples: APACHE-II score, trauma score, NYHA class
Picking a Test for ORDINAL Data |
||
Dependence |
Samples |
# Independent Variables → Test |
Independent Samples
Parallel Design |
2 |
0 → Wilcoxon rank sum or Mann-Whitney U
1 → 2-way ANOVA >2 → ANOVA ranks |
>3 |
0 → Kruskal-Wallis*¥
1 → 2-way ANOVA ranks >2 → ANCOVA ranks |
|
Dependent (Paired)
Cross-over Design, Related |
2 |
0 → Wilcoxon signed rank test
1 → 2-way repeated ANOVA ranks >2 → Repeated measures regression |
>3 |
0 → Friedman test
1 → 2-way repeated ANOVA ranks >2 → Repeated measures regression |
*Bonferroni correction applied; ¥Multiple Comparison Procedure applied
3. Interval
- Data ranked in a specific order that includes a constant difference in magnitude change between units and zero is an arbitrary value
- A continuous variable
- Mean (average) is the most common measure of central tendency used to describe continuous data
- Eligible for parametric tests if data is normally distributed
- Example: temperature
4. Ratio
- Data that is ranked in a specific order that includes a constant difference in magnitude change between units and zero is NOT an arbitrary value
- A continuous variable
- Mean (average) is the most common measure of central tendency used to describe continuous data
- Eligible for parametric tests if data is normally distributed
- Examples: blood pressure, heart rate, respiratory rate
Picking a Test for CONTINUOUS Data (Interval or Ratio Data) |
||
Dependence |
Samples |
# Independent Variables → Test |
Independent Samples
Parallel Design |
2 |
0 → Student’s t-test
1 → 2-way ANOVA >2 → ANCOVA |
>3 |
0 → 1-way ANOVA (MCP)
1 → 2-way ANOVA >2 → ANCOVA |
|
Dependent (Paired)
Cross-over Design, Related |
2 |
0 → Paired Student’s t-test
1 → 2-way repeated ANOVA ranks >2 → Repeated measures regression |
>3 |
0 → ANOVA for repeated measures
1 → 2-way repeated ANOVA ranks >2 → Repeated measures regression |
¥Multiple Comparison Procedure applied
REVIEW OF SELECT STATISTICS DEFINITIONS
Null hypothesis: The hypothesis that there is no difference between groups in a study.
P-value: a measure of the probability from sample data that the difference between two estimates occurred by chance, if the estimates being compared were the really same.
Alpha (α): the chance of concluding there is a difference between groups when there is actually no difference.
Type I error: when you conclude there is a difference between groups, but there actually is no difference.
Beta (β): the chance of concluding there is no difference between groups when there actually is a difference.
Type II error: when you conclude there is no difference between groups, but there actually is a difference.
Power: the ability of a study to detect a significant difference between treatment groups.
Confidence interval: an estimate of the true treatment effect within a range.
Intent-to-treat analysis: when the analysis includes all subjects randomized to a treatment arm, regardless of whether or not the subject completed the study.
Relative risk: compares the risk of an event in a group of individuals with a specific characteristic to the risk of that even in a group of individuals without that specific characteristic.
Absolute risk: the risk of developing a disease over a given time period.
Relative risk ratio: a ratio of the event rate in an intervention group versus the rate of that event in a control group.
Relative risk reduction: how much risk is reduced in the intervention group as compared to the control group.
Absolute risk reduction: the absolute difference in rates of an outcome between treatment and control groups.
Odds ratio: the chances that an outcome will occur in one group of subjects with an intervention, as compared to that outcome occurring in a group of subjects without the intervention.
Number needed to treat: the number of subjects that must be treated in order to benefit one subject.
Number needed to harm: the number of subjects that must be treated in order to harm one subject.
Selection bias: when one group within a study is different than the other(s) due to the manner in which the subjects were selected.
Publication bias: when the available literature favors one outcome. This is typically the result of researchers and journals only reporting favorable study results.
Recall bias: when the subject remembers an event differently from how it actually occurred.
Clinical significance: when the data analysis from the study produces results that change clinical practice.
STUDY TYPE BASICS
Cross-sectional study: a snapshot of a population at a single point in time.
Observational study: a study that observes subjects and there is no active intervention or randomization of patients.
Case control study: a retrospective study that examines subjects with the outcome of interest (the cases) versus patients without the outcome of interest (the controls).
Cohort study: an observational study in which one group of subjects receives the intervention and one group does not.
Crossover study: when the intervention group and control group switch and each subject serves as their own comparator.
Follow-up study: observation of subjects that have not yet experienced an outcome until that outcome occurs.
Non-inferiority study: aims to demonstrate the intervention is not worse than the comparator by more than a small pre-specified amount, M or delta (Δ).
Randomized controlled study: a prospective study in which subjects are randomized into intervention and control groups.
COMMON STATISTICS ABBREVIATIONS
P = p-value
CI = confidence interval
IQR = inter-quartile range
SD = standard deviation
OR = odds ratio
RR = relative risk
RRR = relative risk reduction
ARR = absolute risk reduction
NNT = number needed to treat
NNH = number needed to harm
STATISTICS EQUATIONS
Power = 1 – β
Outcome Yes |
Outcome No |
|
Intervention yes: |
A |
B |
Intervention no: |
C |
D |
OR = (A/C) / (B/D) or (AD/BC)
RR = A/(A+B) / C/(C+D) …aka… (event rate in the intervention group) / (event rate in the control group)
RRR = 1 – RR
ARR = A/(A+B) – C/(C+D) …aka… (event rate in the intervention group ) – (event rate in the control group)
NNT = 1/ARR
SOME STATISTICS BASICS
P-value
- The smaller the p-value, the less likely something occurred by chance
- The larger the p-value, the more likely something occurred by chance
- A statistically significant p-value (e.g., P = 0.005) does not always translate into a difference that is clinically significant
- A p-value of 0.01 means the chances the results are due to chance is 1 in 100
- Does not describe the size of an effect, only the strength of the results
Confidence interval
- A narrow confidence interval = less uncertainty
- A wide confidence interval = more uncertainty
Power
- The greater the power, the more reliable the study findings
- A common way to increase the power of a study is to increase the sample size
Sample size
- When an outcome is rare, a larger sample size is commonly needed to detect a difference between an intervention and a control group
- Magnitude of the association of the effect of treatment on outcome can decrease the needed sample size even in less common outcomes
- Sample size estimates should be calculated prior to starting a study
- Without an adequate sample size a study cannot produce valid results and is of insufficient power
Standard deviation
- Used as a measure of dispersion around the measure of central tendency for the data, mean
- The more the variability, the larger the standard deviation
- The less variability there is, the smaller the standard deviation
- When data are not evenly distributed it makes the standard deviation less reliable and in turn inter-quartile range, along with median, is commonly employed instead
- Cannot be used with nominal or ordinal data
- 68% of data points are within 1 standard deviation and 95% of data points are within 2 standard deviations
Odds ratio
- Can be used in case-control studies and cohort studies
- Cannot be used to calculate a number needed to treat
- When the outcome is common, can exaggerate the risk
- An odds ratio < 1 means the outcome is less likely in the intervention group
- An odds ratio > 1 means the outcome is more likely in the intervention group
Relative risk
- A relative risk of 1 means there is no association between the intervention and the outcome
- A relative risk < 1 indicates a negative association between the intervention and the outcome
- A relative risk > 1 indicates a positive association between the intervention and the outcome
Relative risk reduction
- In the absence of an understanding for a subject’s baseline risk for the outcome, presenting benefits using relative risk reduction can be deceiving
- Can make treatments seem more substantial
- Can make toxicities seem more substantial
- Cannot be used in case-control studies
Absolute risk reduction
- Inverse of number needed to treat
- Yields a less exaggerated risk reduction than relative risk reduction
- Expressed in units of baseline risk
Number needed to treat/harm
- Strong treatment effects on positive and negative outcomes lead to small NNT and NNH, respectively, and vice-versa.
- As NNT/NNH are derived from ARR estimates, they are also point effect estimates for a true population and therefore have confidence intervals, although not often reported.
MISCELLANEOUS NOTES
- Parametric tests generally have greater statistical power than non-parametric tests. Data must be normally distributed and continuous (i.e., ratio or interval) to use parametric tests.
- Combining study endpoints with low event rates into a composite endpoint can allow for enrolling fewer patients while increasing a study’s power, but not all endpoints are appropriate to combine and a significant result does not mean all endpoints are significant.
- Type II errors are common in clinical trials due to low patient enrollment.
- Univariate logistic regression can only identify variables that are potential independent predictors of an outcome, but these must be confirmed in multivariate logistical regression while taking into account validity of study design.
- A median value is not impacted by extreme outliers like a mean value.
- By definition, observational studies lack an active intervention.
- Randomized controlled trials have an active intervention, minimize effects of confounding, and have a greater publication impact.
HELPFUL RESOURCES
- ACCP Foundation Biostatistics Resources
- Choosing the correct statistical test
- Sample Size Calculator
- NNTonline.net
- MEDCALC: free statistics calculators
- GraphPad: free statistics calculators
- Chi-square calculator
- Learn EBM by BMJ: How to calculate risk
SUGGESTED READINGS
Kier KL. Biostatistical applications in epidemiology. Pharmacotherapy. 2011; 31(1): 9-22.