# External Validity # ## March 25 ## --- ## Definition How do we define external validity? - Mundane realism (surface similarity) - Generalizability of results ??? SSC p.353 and Table 11.1 (p.357), Five principles: - Surface similarity - Ruling out irrelevancies - Units differ in many ways but those differences may not be important - Making discriminations - Interpolation and extrapolation - Causal explanation --- ## Discussion Experiments have strong internal validity. If that is why we do experiments, why do we care about the external validity of experiments? ??? --- ## SUTO framework - Setting - Units - Treatments - Outcome ??? - SUTO is about surface similarity but it is also about formal generalizability; if surface similarities are what differentiate among effects, then SUTO is informative about formal generalizability - Setting - Lab - Field - Survey - Broader contextual variations (Rune will talk about this next week) - When in the life histories of units does the experiment occur (pretreatment paper we read a few weeks ago) - Units - Sampling techniques - Who are they? Are they representative of the population we care about? In what ways? - Not just demographic similarity, but also life histories: - E.g., studying the effect of university education on students in their 20s does not necessarily lead to valid inferences about students in university in their 30s - Treatments - Does this treatment look like the treatment units would actually receive in the real world? - Is the treatment/experiment too obtrusive to be generalizable? - Outcome - Construct: is this experiment about the same construct as we care about in the real world - Operationalization: is this experiment using the same operationalization as the treatment in the real world --- ## Assessing external validity How do we assess the external validity of a single study? ??? We can't really. We can only focus on SUTO features (mundane realism and surface similarity) but we can't really know if those features matter unless they're part of the experiment. This isn't a problem if there is **effect homogeneity**. It's a big problem if there is **effect heterogeneity**, where effects vary across SUTO features. --- ## Assessing external validity The most common multi-study method of assessing external validity is the *review* 1. Qualitative reviews 2. Quantitative/systematic reviews (meta-analysis) ??? - Qualitative reviews - Strengths? - Focus on variations and SUTO - Focus on mundane realism - Weaknesses? - File drawer problem - Hard to get measures of effect sizes - Do not account for size of studies - Quantitative reviews - Strengths? - Precise measures of effect sizes - Weaknesses? - File drawer problem - Focus primarily on effect size rather than qualitative variations - Often not a lot of consideration of broader context - May be difficult to do because of lack of information in published studies --- ## Assessing external validity Other strategies of assessing external validity include: - Formal/exact replication - Approximate replication - Statistical generalization - Parallel experimentation ??? - Formal replication requires detailed protocol - Approximate replication is what Ansolabehere et al. do - Statistical generalization is what Hedges is talking about - Parallel experimentation is complex - Many labs project - My parallel experiment project --- ## Exact replication - Recreate a previous experiment exactly - Requires a complete protocol - SUTO: What can we replicate? ??? - SUTO - Same setting possible, but different broader context - Units might be similar, but different individuals; might be difficult to recruit from same population - Treatments should be identical; this is easy with proper protocol - Outcomes should be identical; this is easy with proper protocol --- ## Approximate replication - Model a new study off a previous study - Possibly intentional changes in SUTO characteristics ??? - SUTO - Same setting possible, but different broader context; this may be intentionally varied - Lab/field comparisons common - We might test something again in a different historical context (does political deliberation have the same effects today as it did in the past?) - Units might be similar, but different individuals; might be difficult to recruit from same population - Convenience sample versus population sample - Treatments can be identical; this is easy with proper protocol - But we might intentionally change this (e.g., using a different kind of negative ad) - Outcomes can be identical; this is easy with proper protocol - But we might intentionally change this (e.g., measuring actual turnout as opposed to self-reported turnout intentions) --- ## Statistical Generalization - Hedges is interested in the formal, statistical generalization of experiments - How does this work? ??? Propensity score subclassification approach TATE is a re-weighted function of conditional ATEs in our experimental sample --- ## Parallel Experiments - Execute the same experiment with variations in some experimental feature - Compare results across experiments - SUTO: What can we replicate? ??? - Expensive - Difficult - SUTO - Might vary geographical setting (usually trying to hold this context); Might implement the study at different points in time controlling all else (Rune and I are doing this now) - Might vary units intentionally - Might vary treatment, controlling all else (real ads versus fabricated ads) - Might vary outcomes, controlling all else --- ## Discussion Are these types of replication useful? Why or why not? --- ## Discussion How important is it for an experiment to have high external validity? --- ## Discussion What is the point of a single experiment if it does not generalize? ??? - What does it mean to generalize? - Theory is wrong in first place - Theory is context-dependent - Original study or replication are flawed in some way - To what should we expect a given study to generalize? --- ## Presentations What are you thinking for your exam paper? --- ## Preview - No class next week - In 2 weeks: - Survey Experiments - Presentation by Rune Slothuus - Share a one-page synopsis of your project - Due via email to me: Monday April 7 12:00