Step 3 Evidence Interpreter

The 2×2 Table

// everything flows from here. learn this once, use it forever.

Every board question about risk, treatment effect, or diagnostic tests is secretly asking you to fill in a 2×2 table and do arithmetic. Master the table, and the formulas become obvious.

Exposed +

aTP / exposed + sick

bFP / exposed + healthy

a+b

Exposed −

cFN / unexposed + sick

dTN / unexposed + healthy

c+d

a+c

b+d

KEY INSIGHT

"Exposed" = treated group, risk factor group, or test positive — depending on question type.
"Disease" = outcome, disease, whatever you're measuring.
The math is identical regardless of framing.

HOW TO READ A QUESTION

Step 1: Draw the 2×2 mentally.
Step 2: Fill in a, b, c, d from the stem or table.
Step 3: Ask — "what are they actually asking for?" and apply one formula.

STUDY DESIGN MATTERS FOR WHICH CALC YOU CAN USE

RCT / Cohort Can calculate RR and OR
Case-Control Can only calculate OR (you selected cases and controls — incidence is artificial)
Cross-Sectional / Dx Study Sensitivity, Specificity, PPV, NPV

Risk Measures

// six numbers. this is the entire section.

🎯

Absolute Risk (AR)

a / (a+b)

"What fraction of exposed people got sick?"
Also called incidence or attack rate.
Unexposed AR = c/(c+d)

⚠ Case-control → cannot calculate AR (no true incidence data)

⚖️

Relative Risk (RR)

a/(a+b) ÷ c/(c+d)

= [a/(a+b)] / [c/(c+d)]
RR = 1: no difference RR > 1: harmful RR < 1: protective
Only valid for cohort/RCT.

⚠ 95% CI includes 1 → not statistically significant

🎲

Odds Ratio (OR)

(a×d) / (b×c)

"Diagonal cross multiply."
Use for case-control studies.
When disease is rare (<10%), OR ≈ RR.
Interpreted identically to RR.

⚠ OR always farther from 1 than RR — never mistake them

🛡️

Absolute Risk Reduction (ARR)

c/(c+d) − a/(a+b)

= [c/(c+d)] − [a/(a+b)]
Always control minus treatment.
This is the most clinically meaningful number.

⚠ Boards: when asked "best reflects clinical benefit" → choose ARR or NNT

📢

Relative Risk Reduction (RRR)

ARR / AR_control

Often cited in drug ads to make results look impressive.
RRR is misleading when baseline risk is very low.
"50% RRR" sounds great until ARR = 0.002%.

⚠ Pharma ads use RRR. The exam will ask you to see through it

💊

NNT / ☠️ NNH

1 / ARR

Number needed to treat = 1/ARR
Number needed to harm = 1/ARI (absolute risk increase)
Lower NNT = better treatment.
Higher NNH = safer drug. Always round UP.

⚠ NNT of 1 = perfect drug. NNT of 1000 = barely any benefit

BOARD EXAM PATTERN — MEMORIZE THIS

When a question gives you a drug trial and asks what "best describes the benefit":
→ They want ARR or NNT, almost always.
When they cite "the drug reduced risk by 40%" → that's RRR, and they're testing whether you can calculate the actual ARR.
When it's a case-control study → OR only. No RR possible.

THE MISLEADING AD TRAP

Drug trial: 2% placebo events, 1% drug events.
Ad claims: "50% reduction in events!" (That's RRR.)
The real ARR = 1%. NNT = 100.
The boards will ask which statistic "most accurately reflects clinical benefit." Answer: ARR or NNT.

Live Calculator

// sliders update everything live — watch the ROC curve move

Drag sliders or type values. All stats and the ROC curve update instantly.

a — exposed + disease (TP)

b — exposed + no disease (FP)

c — unexposed + disease (FN)

d — unexposed + no disease (TN)

Exp+

180

200

Exp−

160

200

340

400

ROC Curve — operating point moves with your values

RISK MEASURES (cohort/RCT interpretation)

🎯 AR (exposed)

—

a / (a+b)

🎯 AR (control)

—

c / (c+d)

⚖️ RR

—

AR(exp)/AR(ctrl)

🎲 OR

—

(a×d)/(b×c)

🛡️ ARR

—

AR(ctrl)−AR(exp)

📢 RRR

—

ARR/AR(ctrl)

💊 NNT

—

1/ARR

DIAGNOSTIC STATS (treat Exposed+ = Test+, Disease+ = True Disease)

🔍 Sensitivity

—

a/(a+c)

🔒 Specificity

—

d/(b+d)

✅ PPV

—

a/(a+b)

❌ NPV

—

d/(c+d)

📈 LR+

—

Sens/(1−Spec)

📉 LR−

—

(1−Sens)/Spec

Diagnostic Tests

// sensitivity, specificity, PPV, NPV — and why prevalence matters more than you think

🔍

Sensitivity (SnNout)

a / (a+c)

TP / all who have disease.
"How good is this test at finding disease?"
High Sens → good screening test.
Negative result on high-Sens test = rules OUT disease.

⚠ SeNsitivity: high → Negative rules out (SnNout). Does NOT change with prevalence.

🔒

Specificity (SpPin)

d / (b+d)

TN / all who don't have disease.
"How good is this test at correctly labeling healthy people?"
High Spec → good confirmatory test.
Positive result on high-Spec test = rules IN disease.

⚠ Specificity: high → Positive rules in (SpPin). Does NOT change with prevalence.

✅

PPV — Positive Predictive Value

a / (a+b)

"If the test is positive, what's the chance the patient actually has it?"
PPV depends heavily on prevalence.
Same test → lower PPV in low-prevalence population.

⚠ Low prevalence → lousy PPV even with a great test. Most positives = false positives.

❌

NPV — Negative Predictive Value

d / (c+d)

"If the test is negative, what's the chance they truly don't have it?"
NPV also depends on prevalence.
Higher prevalence → more false negatives → lower NPV.

⚠ High prevalence → lower NPV. More disease = more missed cases.

📈

LR+ (Likelihood Ratio +)

Sens / (1−Spec)

How much does a positive result shift probability?
LR+ > 10 = strong evidence for disease.
LR+ 2–5 = modest shift.
LR is independent of prevalence.

⚠ LR+ > 10 = strong rule-in. Post-test odds = pre-test odds × LR+

📉

LR− (Likelihood Ratio −)

(1−Sens) / Spec

How much does a negative result shift probability?
LR− < 0.1 = strong evidence against disease.
LR− 0.2–0.5 = modest shift down.
Post-test odds = Pre-test odds × LR

⚠ LR− < 0.1 = strong rule-out. LR is prevalence-independent, unlike PPV/NPV.

THE PREVALENCE TRAP — MOST COMMON WRONG ANSWER

A test has 99% sensitivity and 99% specificity. Prevalence = 1 in 1000.
Test is positive. What is the PPV?
Answer: only ~9%! (999 healthy people → ~10 false positives. 1 true positive. PPV = 1/11.)
This is why mass screening programs with rare diseases have lousy PPV — and why confirmatory tests exist.

MNEMONIC

SnNout: High SeNsitivity → Negative rules out
SpPin: High Specificity → Positive rules in

PPV & NPV change with prevalence. Sens & Spec do NOT. LR does NOT.

CHANGING THE THRESHOLD

Lower threshold → more positives → Sensitivity ↑, Specificity ↓
Raise threshold → fewer positives → Sensitivity ↓, Specificity ↑
They always move in opposite directions — this is the ROC curve tradeoff.

Board Question Drill

// 10 questions. full board format. worked solutions.

correct

wrong

0/10

answered

Q01 / 💊 NNT

A randomized controlled trial evaluates a new statin in patients with known coronary artery disease. Over 5 years, 8% of patients in the statin group experienced a myocardial infarction, compared to 12% of patients in the placebo group.

What is the number needed to treat (NNT) to prevent one MI over 5 years?

STEP 1 — ARR:
ARR = AR(control) − AR(treated) = 0.12 − 0.08 = 0.04

STEP 2 — NNT:
NNT = 1 / ARR = 1 / 0.04 = 25

Treat 25 patients for 5 years to prevent 1 MI. ✓

Q02 / 🎲 ODDS RATIO

A case-control study examines the association between smoking and bladder cancer. Among 100 cases (bladder cancer), 60 are smokers. Among 100 controls, 30 are smokers.

What is the odds ratio for the association between smoking and bladder cancer?

STEP 1 — Build the 2×2:
Cases: 60 smokers (a), 40 non-smokers (c)
Controls: 30 smokers (b), 70 non-smokers (d)

STEP 2 — OR formula:
OR = (a×d)/(b×c) = (60×70)/(30×40) = 4200/1200 = 3.5

Case-control → OR is the only valid measure. Cannot calculate RR. ✓

Q03 / 🔍🔒 SENSITIVITY & SPECIFICITY

A new rapid test for pulmonary embolism is evaluated in 400 patients. Results:

	PE Present	PE Absent
Test +	90	40
Test −	10	260

What is the sensitivity and specificity of this test?

a=90, b=40, c=10, d=260

Sensitivity = a/(a+c) = 90/100 = 90%

Specificity = d/(b+d) = 260/300 = 86.7% ≈ 87%

90% sensitivity → a negative result helps rule OUT PE (SnNout). ✓

Q04 / 📢 THE MISLEADING AD

A pharmaceutical rep presents data on a new anticoagulant. In a RCT of 10,000 patients: 1% of the treatment group and 2% of the placebo group had a stroke over 3 years. The rep claims the drug "reduces stroke risk by 50%."

Which of the following most accurately describes the clinical benefit of this drug?

STEP 1: The rep cited RRR = (2%-1%)/2% = 50%. Technically correct but misleading.

STEP 2 — ARR: 0.02 − 0.01 = 0.01 = 1%

STEP 3 — NNT: 1/0.01 = 100

Treat 100 patients for 3 years to prevent 1 stroke. The boards always choose ARR/NNT over RRR. ✓

Q05 / ✅ PPV AND PREVALENCE

A screening test for a rare genetic disorder has sensitivity 99% and specificity 99%. Disease prevalence = 1 in 1,000. A randomly selected patient tests positive.

What is the approximate PPV?

100,000 people: 100 have disease, 99,900 don't

Sens 99% → TP=99, FN=1 | Spec 99% → TN=98,901, FP=999

PPV = 99/(99+999) = 99/1098 ≈ 9%

Even with an excellent test, rare disease = most positives are false positives. ✓

Q06 / ⚖️🛡️💊 INTEGRATED

A cohort study follows 500 hypertensive patients on ACE inhibitor and 500 matched controls for 10 years. 30 ACE inhibitor patients develop CKD; 60 control patients develop CKD.

Which of the following BEST describes the effect of ACE inhibitors on CKD?

a=30, b=470, c=60, d=440

RR = (30/500)/(60/500) = 0.5

ARR = 0.12 − 0.06 = 6%

NNT = 1/0.06 = 16.7 → round up to 17

Choice A says NNT=16 (rounds wrong). Choice D is exact. ✓

Q07 / 📈 LIKELIHOOD RATIO

A bedside test for Lyme disease has sensitivity 80% and specificity 95%. A patient from an endemic area tests positive.

What is the likelihood ratio positive (LR+) for this test?

LR+ = Sens / (1 − Spec)

LR+ = 0.80 / (1 − 0.95) = 0.80 / 0.05 = 16

LR+ > 10 = strong evidence for disease. This is a good confirmatory test. E (0.21) is the LR−. ✓

Q08 / 🔍🔒 CHANGING THE THRESHOLD

A researcher lowers the positivity threshold for a urine dipstick test for UTI, capturing more patients as "positive." Which of the following is MOST LIKELY to occur?

Select the best answer.

Lowering the threshold = calling more results "positive."

More positives → more TP captured → Sensitivity ↑

But more FP also captured → Specificity ↓

Sens and Spec always move in opposite directions as threshold shifts. This is the ROC tradeoff. PPV and LR+ both fall (more false positives dilute them). ✓

Q09 / 🎲 STUDY DESIGN → WHICH STAT

Investigators identify 300 patients with newly diagnosed lung cancer and 300 age-matched patients without cancer. They survey both groups about lifetime asbestos exposure. Which measure of association is most appropriate?

Select the best answer.

This is a case-control study — you selected based on outcome (cancer vs no cancer).

Because you selected on outcome, you cannot calculate true incidence → no RR, no AR, no ARR, no NNT.

OR is the only valid association measure in case-control studies. ✓

Q10 / TYPE I / TYPE II ERROR

A randomized trial comparing two antibiotic regimens uses α = 0.05 and β = 0.20 (power = 80%). The trial ends with p = 0.18, showing no statistically significant difference. The researchers suspect the sample size was too small.

Which error is most likely occurring?

Type I error (α) = false positive — concluding there IS a difference when there isn't. p < 0.05.

Type II error (β) = false negative — MISSING a real difference. More likely with small samples.

p = 0.18 → not significant. But this could mean no effect OR the sample was too small to detect it → Type II error.

Power = 1 − β = 80%. Increasing sample size ↑ power ↓ Type II error risk. ✓

Rapid Cheatsheet

// print this in your head before entering the exam room

Risk Measures

🎯 AR a/(a+b) incidence in exposed Case-control → cannot calculate AR

⚖️ RR AR(exp)/AR(ctrl) cohort/RCT only CI crosses 1 → not significant. RR<1 = protective

🎲 OR (a×d)/(b×c) case-control; rare disease OR≈RR OR always farther from 1 than RR. Never confuse them.

🛡️ ARR AR(ctrl)−AR(exp) most clinically meaningful Boards: "best reflects benefit" → ARR or NNT, not RRR

📢 RRR ARR/AR(ctrl) misleading at low baseline risk Pharma uses this. Low baseline risk = tiny ARR despite large RRR

💊 NNT 1/ARR lower = better; round UP NNT=1 perfect; NNT=1000 barely works

☠️ NNH 1/ARI higher = safer drug ARI = absolute risk increase caused by exposure/drug

Diagnostic Tests

🔍 Sens a/(a+c) = TP/all sick SnNout — NEG rules OUT Does NOT change with prevalence. Changes with threshold.

🔒 Spec d/(b+d) = TN/all healthy SpPin — POS rules IN Does NOT change with prevalence. Changes with threshold.

✅ PPV a/(a+b) ↑ with prevalence Low prevalence → terrible PPV even with 99% sens/spec

❌ NPV d/(c+d) ↓ with prevalence High prevalence → lower NPV (more false negatives)

📈 LR+ Sens/(1−Spec) >10 = strong rule-in Prevalence-INDEPENDENT. Post-test odds = pre-test odds × LR+

📉 LR− (1−Sens)/Spec <0.1 = strong rule-out Prevalence-INDEPENDENT. LR− < 0.1 = very strong rule-out

Study Design Rules

RCT/CohortRR, OR, ARR, NNTgold standard for causation

Case-ctrlOR onlyno incidence → no RRGood for rare diseases; remember: selected on outcome

Cross-sectprevalence, ORsnapshot in time

Interpretation & Errors

RR=1 / OR=1no association

CI crosses 1not significantfor RR and OR

CI crosses 0not significantfor differences (ARR)

p < 0.05statistically significant≠ clinically meaningful

Type I (α)false positive — α = 0.05concluding effect when nonep < 0.05 = Type I risk ≤ 5%

Type II (β)false negative — β = 0.20missing a real effectSmall sample size → ↑ Type II risk

Power1 − β = 80%↑ sample size → ↑ power

EXAM STRATEGY — 30 SECOND ALGORITHM

1. What study type? → determines which stats are valid
2. Build 2×2 (a, b, c, d) from the data given
3. What are they asking? → apply one formula
4. Does the answer make biologic sense?
If RRR vs ARR question → always choose ARR/NNT as "most meaningful"

🃏 Flash Cards

// tap to flip · Space = flip · → = got it · ← = again

0 / 13 learned

Tap card to flip | Space = flip | → = Got it | ← = Again

🎯

AR — Absolute Risk

tap to reveal formula & killer fact

a / (a+b)

Fraction of exposed people who got sick. Also called incidence or attack rate.

⚠ Case-control → cannot calculate AR. No true incidence data.

🎯 Bullseye = direct hit rate in exposed group

🏆

All 13 metrics locked in!

You've cleared the deck. Go crush that exam.