Epidemiology and Biostatistics
 All biostatistics (studies, tests, etc.) can be described using the predictoroutcome 2x2 table:
 predictor = cause, risk factor, therapy, test result, etc.
 outcome = effect, presence of disease, cure, etc.

Outcome
+ 
Outcome
 
Predictor
+ 
a 
b 
Predictor
 
c 
d 
Clinical Epidemiology and Study Design
 To analyze a study, use the 2x2 table above with predictor = exposure; outcome is often a disease
 Three Types of Studies:
 (1) Case Control Study researchers control outcome (disease), look for exposure (risk factor, medical history, etc.)
 Procedure:
 (a) test subjects are divided into cases and controls based upon presence of outcome
 cases and controls on the top of the 2x2 table
 (b) look into the subjects histories and look for presence of exposures; inherently a retrospective study
 easy to do and low cost, so one of the most common studies; but also most prone to bias and confounding
 Incidence of outcome in test population is artificially enriched (since outcome was one of the selection criteria)
 good for rare outcomes, since it does not require that a significant fraction of those exposed will develop the outcome; also good for outcomes with long latency periods from time of exposure, since it is not necessary to wait
 bad for risk calculation, since information about actual incidence of outcome has been eliminated; must use odds ratio
 Odds Ratio an estimate of the relative risk (direct calculation of relative risk is impossible with this type of study)
 (odds those with outcome have exposure) / (odds those without outcome have exposure) = (a/c) / (b/d) = (axd) / (bxc) = cross product
 =1 Þ exposure does not affect outcome; >1 Þ exposure increases risk of outcome; <1 Þ exposure decreases risk of outcome
 (2) Cohort Trial researchers passively control exposure (risk factor), look for outcome
 Procedure:
 (a) test subjects divided into cases and controls based upon prexisting presence of exposure
 cases and controls on the side of the 2x2 table
 (b) observe the subjects and see if the outcome develops; inherently prospective
 good evidence of cause and effect, since it is possible to establish a temporal relationship
 good for rare exposures, multiple effects of a single exposure
 not as prone to bias as case control study; losses to follow up can significantly affect results
 Relative Risk measurement of the likelihood that exposure carries risk of outcome
 (fraction of those with exposure having outcome) / (fraction of those without exposure having outcome) = (a/(a+b)) / (c/(c+d))
 =1 Þ exposure does not affect outcome; >1 Þ exposure increases risk of outcome; <1 Þ exposure decreases risk of outcome
 Absolute Risk Reduction (ARR) measurement of treatment effect; equal to a/(a+b)  c/(c+d)
 Number Needed to Treat (NNT) number of patients that must be treated to prevent 1 adverse outcome; equal to 1/ARR
 (3) Randomized Control Trialresearchers actively and deliberately control exposure (therapy), look for outcome (recovery)
 Procedure:
 (a) test subjects are randomly divided into cases and controls; cases are exposed to the therapy, controls can receive placebo or an alternative therapy; cases and controls on the side of the 2x2 table
 (b) observe the subjects and see if the outcome develops
 gold standard for any study all potential confounders are cancelled out by randomization
 because of ethical considerations, this type of study is used in humans for therapies rather than risk factors
 Relative Risk (see above) is used here as well.
 Other Definitions:
 blinding keeping patients and/or investigators unaware of treatment assignments to prevent bias
 single = patient doesnt know; double = doctor doesnt know either; triple = even data analyst doesnt know!
 Bias is a systematic error in the way that subjects are selected or information is collected some types of bias:
 lead time bias earlier diagnosis means greater survival time (period from diagnosis to death) without change in course of disease
 length bias diseases that are more mild tend to last longer (more aggressive conditions kill patients sooner)
 recall bias patients remember bad things more often
 hindsight bias knowledge that an event has occurred leads to inflated estimate of the probability of that event
 performance bias how well an intervention is applied affects its results
 Confounding is the presence of a 3^{rd} factor in addition to the two variables being studied (such as age, gender, socioeconomic status, smoking) that independently affects outcome
Diagnostic Testing
 To analyze a test, use a 2x2 table with predictor = test result and outcome = actual state of affairs (disease)

Actually
+ 
Actually
 
Test
+ 
True Positive (TP) 
False Positive (FP) 
Test
 
False Negative (FN) 
True Negative (TN) 
 A perfect test would be able to reliably detect presence or absense of disease; in practice, no test is perfect
 Sensitivity and Specificity measure intrinsic diagnostic discrimination of a test
 Sensitivity ability of a test to detect presence of a disease; equal to TP / all actually positive (TP + FN)
 Specificity ability of a test to detect absence of a disease; equal to TN / all actually negative (TN + FP)
 Predictive Value (PV) measures likelihood that a test result is correct
 PV takes into account prevalence of the disease across the entire population; for very rare diseases, PV can be low even for an excellent test (best PV when pretest probability is 50%)
 predictive value positive (PVP) = probability that positive result is correct; equal to TP / all testing positive (TP + FP)
 predictive value positive (PVP) = probability that negative result is correct; equal to TN / all testing negative (TN + FN)
Hypothesis Testing a.k.a. Statistical Significance Testing
 Hypothesis testing is used to test for association between two or more variables or characteristics of interest
 the null hypothesis (H_{0}) states that there is no association; i.e., any differences are due to chance
 the alternative hypothesis (H_{1}) states that there is an association
 statistical testing is used to evaluate the null hypothesis; rejection of null hypothesis means differences are not due to chance, and there really is an association
 pvalue the probability that H_{0} is true; when low (p<0.05), measured associations are probably not due to chance
 not all statistically significant results are clinically significant (must affect patient care)
 Errors of Hypothesis Testing
 2x2 table where predictor = inference and outcome = actual state of affairs; P = power, + = reject H_{0},  = accept H_{0}

H_{0}
False 
H_{0}
True 
Reject
H_{0} 
Power
(1b ) 
Type I Error
a 
Accept
H_{0} 
Type II Error
b 

 Type I Error (a ) "false positive" infer an association (H_{0} rejected) when there is, in fact, no association (H_{0} true)
 probability of Type I Error is equal to pvalue; if p = 0.05, type 1 errors will be made 5% of the time
 Type II Error (b ) "false negative" infer no association (H_{0} accepted) when there is, in fact, an association (H_{0} false)
 Power is the rate of "true positives" infer association (H_{0} rejected) when there is, in fact, an association (H_{0} false)
 analogous to sensitivity of a diagnostic test
 equal to 1b ; since Type II Error Rate (b ) is generally set at 20%, power is usually 0.8
 Association vs. Causality hypothesis testing formally demonstrates association only
 association statistical dependence between two variables (H_{0} false); can be observed
 to establish true association, must eliminate bias and confounding (see below)
 causality association in which one variable actually affects the other; cannot be proven, so must be inferred
 not all associations are causal! e.g., both variables could be effects of a common cause
 judgement of causality involves consideration of several criteria: strength and consistency of association, biologic plausibility, temporal relationship, doseresponse gradient
Biostatistics
 General Terms
 Prevalence existing cases; # of cases with disease/total population
 Incidence new cases; # of new cases/population at start of observation period
 Mortality incidence of death; # of deaths/total population
 Case Fatality mortality associated with a disease relative to those affected; # of deaths due to disease/# affected by disease
 Proportionate Fatality mortality assoc. with a disease relative to all deaths; # of deaths due to disease/total deaths
 Types of Analysis
 Linear regression describes relationship between two variables
 used for prediction or correlation (r = correlation coefficient; variables perfectly correlated if r = ± 1, no correlation if r = 0)
 Multiple regression linear regression using two or more independent variables; allows adjustment
 Logistic regression for binary data (live vs. dead); allows for direct estimate of odds ratio for each indep. predictor
 Survival analysis measures risk of outcome over time S(t) is proportion without outcome at time t
 although usually used to analyze disease mortality, can be used for any outcome
 median survival refers to the time at which 50% of subjects reach the outcome
 censoring if subject is lost to followup, only data prior to followup is analyzed (assuming outcome has not occurred)
 KaplanMeier plot S(t) vs. t; used to estimate survival; "step function" in which each step represents an outcome
 logrank statistic used to compare two KaplanMeier plots (e.g., treatment vs. therapy) requires computer
 Cox proportional hazards regression assesses simultaneous effect of multiple independent variables on survival