# Internal Validity and Experimental Design # ## February 18 ## --- ## Outline 1. Share examples from TESS 2. Talk about how to write experimental designs 3. Internal validity - Why randomization works - Threats to internal validity --- ## Examples from TESS - What was the most interesting study you found on TESS? - What was the topic (outcome concept and research question)? - What was the design? --- name: protocol ## Protocol - Writing up experimental designs - Once we know our hypotheses, the experimental conditions are easy --- ## Examples What experimental conditions do we need? 1. Individuals exposed to expert endorsements are more likely to support a policy than when exposed to partisan endorsements. -- 2. Providing conditional cash transfers to women in rural Uganda is more effective at increasing their childrens' educational attainment than either microfinance loans to start businesses or unconditional grants of cash or goods (e.g., food). -- 3. The effect of a public health intervention is more effective for native speakers of Danish than non-native speakers of Danish. --- template: protocol - But, there are still lots of decisions to make! --- ## Random Assignment - Why do we need it? - Why can't we just compare `\(t_2 - t_1\)` changes? --- ## Random Assignment - Breaks the selection process - This has benefits: 1. Balances covariates between groups 2. Balances potential outcomes between groups 3. No confounding --- # "Perfect Doctor" # True potential outcomes (unobservable in reality) | Unit | Y(0) | Y(1) | | ---- | ---- | ---- | | 1 | 13 | 14 | | 2 | 6 | 0 | | 3 | 4 | 1 | | 4 | 5 | 2 | | 5 | 6 | 3 | | 6 | 6 | 1 | | 7 | 8 | 10 | | 8 | 8 | 9 | | *Mean* | *7* | *5* | --- # "Perfect Doctor" # Observational data with strong selection bias | Unit | Y(0) | Y(1) | | ---- | ---- | ---- | | 1 | ? | 14 | | 2 | 6 | ? | | 3 | 4 | ? | | 4 | 5 | ? | | 5 | 6 | ? | | 6 | 6 | ? | | 7 | ? | 10 | | 8 | ? | 9 | | *Mean* | *5.4* | *11* | --- ## Random assignment - We have to do it - But how do we randomize? ??? Does randomization have to be by coin toss? Do an equal number of units need to be in each treatment group? - Statistical efficiency - Ethics - Limited quantity of treatment to distribute --- .left-column[ ## Definition ] .right-column[ > The observation of units after, and possibly before, a randomly assigned intervention in a controlled setting, which tests one or more precise causal expectations. ] --- .left-column[ ## Definition ] .right-column[ > The observation of units **after, and possibly before,** a randomly assigned intervention in a controlled setting, which tests one or more precise causal expectations. ] ??? - All we have to do is observe outcomes after treatment exposure - Why might we want to observe pretest outcomes? - What are the consequences of that? - Why can't we just look at pretest-posttest differences and skip randomization entirely? - Doesn't break selection bias - Assumes that the counterfactual potential outcome for today is the observed outcome for yesterday --- ## Design consideration - Single-factor versus crossed designs - Control groups - Pretest measurement - Crossover (within-subjects) designs - Follow-up outcome measurement ??? - SSC Ch.8 gives lots of examples 1. Factorial designs - Full versus partial factorial - Advantages - Sample size considerations - Interaction effects 2. Control groups and placebo groups 3. Post-test only versus pre-post designs - History, maturation - Precision in standard errors 4. Within-subjects - History, maturation concerns - Order of treatment concerns - e.g., We have to do two things, which is better to do first? Education curricula: is it better to teach research methods or political philosophy in the first semester? 5. When do we want to measure any effects? --- ## Threats to validity - "Falsificationist" strategy * - No experiment is perfect .footnote[* SCC pp.41-42] ??? Simply because we have avoided or dealt with known threats, doesn't mean we have a valid inference --- ## Threats to internal validity 1. Ambiguous temporal precedence 2. Selection 3. History 4. Maturation 5. Regression 6. Attrition 7. Testing (exposure to test affects subsequent scores; measurement has an effect) 8. Instrumentation .footnote[SSC Table 2.4 (p.55)] ??? 1. How do we know that X caused Y? In an experiment, we are intervening, so this is solved automatically - We do have a new problem: How do we know what we manipulated the thing that we thought we did? - How do we know that we manipulated the treatment and not the outcome? 2. In observational studies, we have selection bias - As long as we randomize, this isn't a problem (in expectation) 3. Or "simultaneous changes": This shouldn't be a problem in an experiment, because of randomization 4. Maturation - This and history should be solved by randomization, but they could still crop up if we are underpowered 5. What units are we dealing with? It is easy for extreme cases to move closer to the center of their distribution, so we have to be cautious about inferring effects on extreme samples. 6. Attrition is in general bad, because we have missing data on our outcomes so we can't measure effects - This creates power problems - It is worse if it is asymmetric across groups - It is even worse if it is because of the treatment 7. Repeated measures. Perhaps we want to compare a pretest to a posttest or we want to measure an outcome repeatedly over time. Measurement may change units' responses (e.g., they remember their answers, or they understand what outcome we're trying interested in, or they improve at taking the test simply because they've taken the test before) 8. Changing measurement over time - Ex: policy changes - We decide that speeding is a problem, so we change the punishment for speeding (increasing the fine). We can compare pre-post differences (this called an interrupted time series design). But what if we also lower speed limits? The definition fo speeding has changed, so it becomes hard to compare levels of speeding before and after the policy change. --- ## Internal validity in experiments Which of these threats is solved by randomized experimentation? - All but attrition, testing, and instrumentation - Assuming well-designed protocol, then we just have to deal with testing and attrition --- ## Ethics of randomization - Equipoise* - Treatment preferences** - When to randomize .footnote[ * Freedman; SSC pp.272-273 ** SSC pp.273-274 ] ??? - Do you agree with Freedman that researchers need to be in a state of equipoise to conduct an experiment? Why or why not? - What about subjects' preferences? - Should randomization only occur when they are indifferent? - How might preferences over treatments impact outcomes? - SSC discuss when to randomize - They focus on resource scarcity - Is scarcity the only time that we should randomize? Why or why not? --- ## Next week - Continue our discussion of designs - Talk about experimental analysis - No example study for next week (because there is a lot to cover in class) - Do not read the Splawa-Neyman text - It's just there in case you're really interested in the statistics of experiments