mostly harmeless econometrics


Чтобы посмотреть этот PDF файл с форматированием и разметкой, скачайте его и откройте на своем компьютере.
MostlyHarmlessEconometrics:AnEmpiricist’sCompanion
JoshuaD.Angrist
MassachusettsInstituteofTechnology
Jörn-Ste¤enPischke
TheLondonSchoolofEconomics
March2008
ii
Contents
Preface
xi
Acknowledgments
xiii
OrganizationofthisBookxv
IIntroduction1
1Questionsabout
Questions
3
2TheExperimentalIdeal
9
2.1TheSelectionProblem........................................10
2.2RandomAssignmentSolvestheSelectionProblem........................12
2.3RegressionAnalysisofExperiments.................................16
IITheCore19
3MakingRegressionMakeSense21
3.1RegressionFundamentals......................................22
3.1.1EconomicRelationshipsandtheConditionalExpectationFunction...........23
3.1.2LinearRegressionandtheCEF...............................26
3.1.3AsymptoticOLSInference..................................30
3.1.4SaturatedModels,MainE¤ects,andOtherRegressionTalk...............36
3.2RegressionandCausality......................................38
3.2.1TheConditionalIndependenceAssumption........................38
3.2.2TheOmittedVariablesBiasFormula............................44
3.2.3BadControl..........................................47
3.3HeterogeneityandNonlinearity...................................51
iv
CONTENTS
3.3.2ControlforCovariatesUsingthePropensityScore....................59
CONTENTS
v
5.2.1RegressionDD........................................174
5.3FixedE¤ectsversusLaggedDependentVariables.........................182
5.4Appendix:Moreon…xede¤ectsandlaggeddependentvariables................184
IIIExtensions187
vi
CONTENTS
ListofFigures
3.1.1RawdataandtheCEFofaveragelogweeklywagesgivenschooling..............24
3.1.2RegressionthreadstheCEFofaverageweeklywagesgivenschooling..............31
3.1.3Micro-dataandgrouped-dataestimatesofreturnstoschooling.................32
viii
LISTOFFIGURES
ListofTables
2.2.1ComparisonoftreatmentandcontrolcharacteristicsintheTennesseeSTARexperiment...14
2.2.2Experimentalestimatesofthee¤ectofclass-sizeassignmentontestscores...........15
x
LISTOFTABLES
7.1.1Quantileregressioncoe¢cientsforschoolinginthe1970,1980,and2000Censuses......206
7.2.1Quantileregressionestimatesandquantiletreatmente¤ectsfromtheJTPAexperiment...220
8.1.1MonteCarloresultsforrobuststandarderrors..........................243
8.2.1Standarderrorsforclasssizee¤ectsintheSTARdata......................244
Preface
xii
PREFACE
if
inarandomizedclinicaltrial.Thisviewshapesourapproachtoallresearchquestions.Intheabsenceof
arealexperiment,welookforwell-controlledcomparisonsand/ornatural“quasi-experiments”.Ofcourse,
Acknowledgments
Wehadthebene…tofcommentsfrommanyfriendsandcolleaguesasthisprojectprogressed.Specialthanks
areduetoAlbertoAbadie,DavidAutor,AmitabhChandra,MonicaChen,JohnDiNardo,JoeDoyle,Jerry
xiv
ACKNOWLEDGMENTS
OrganizationofthisBook
Webeginwithtwointroductorychapters.The…rstdescribesthetypeofresearchagendaforwhichthe
materialinsubsequentchaptersismostlikelytobeuseful.Theseconddiscussesthesenseinwhichex-
periments,i.e.,randomizedtrialsofthesortusedinmedicalresearch,provideanidealbenchmarkforthe
questionswe…ndmostinteresting.Afterthisintroduction,thethreechaptersofPartIIpresentcoremate-
rialonregression,instrumentalvariables,anddi¤erences-in-di¤erences.Thesechaptersemphasizeboththe
universalpropertiesoftherelevantestimators(e.g.,regressionalwaysapproximatestheconditionalmean
function)andtheassumptionsnecessaryforacausalinterpretationofresults(theconditionalindependence
assumption;instrumentsasgoodasrandomlyassigned;parallelworlds).Wethenturntoimportantexten-
sionsinPartIII.Chapter6coversregressiondiscontinuitydesigns,whichcanbeseenaseitheravariation
onregression-controlstrategiesoratypeofinstrumentalvariablesstrategy.InChapter7,wediscussthe
useofquantileregressionforestimatinge¤ectsondistributions.Thelastchaptercoversimportantinfer-
enceproblemsthataremissedbythetextbookasymptoticapproach.Somechaptersincludemoretechnical
orspecializedsectionsthatcanbeskimmedorskippedwithoutmissingoutonthemainideas-theseare
indicatedwithastar.Notation,anacronymglossary,andanindextoempiricalexamplesaregatheredat
thebackofthebook.
xv
xvi
ORGANIZATIONOFTHISBOOK
PartI
Introduction
1
Chapter1
Questionsabout
Questions
‘Icheckeditverythoroughly,’saidthecomputer,‘andthatquitede…nitelyistheanswer.I
thinktheproblem,tobequitehonestwithyou,isthatyou’veneveractuallyknownwhatthe
questionis.’
DouglasAdams,
TheHitchhiker’sGuidetotheGalaxy
(1979)
4
CHAPTER1.QUESTIONSABOUT
QUESTIONS
Aslaboreconomists,we’remostlikelytostudycausale¤ectsinsamplesofworkers,buttheunitof
observationincausalresearchneednotbeanindividualhumanbeing.Causalquestionscanbeaskedabout
…rms,or,forthatmatter,countries.AnexampleofthelatterisAcemoglu,Johnson,andRobinson’s(2001)
researchonthee¤ectofcolonialinstitutionsoneconomicgrowth.Thisstudyisconcernedwithwhether
countriesthatinheritedmoredemocraticinstitutionsfromtheircolonialrulerslaterenjoyedhighereconomic
growthasaconsequence.Theanswertothisquestionhasimplicationsforourunderstandingofhistoryand
fortheconsequencesofcontemporarydevelopmentpolicy.Today,forexample,wemightwonderwhether
newlyformingdemocraticinstitutionsareimportantforeconomicdevelopmentinIraqandAfghanistan.
Thecasefordemocracyisfarfromclear-cut;atthemoment,Chinaisenjoyingrobustgrowthwithoutthe
1
MilgramwaslaterplayedbytheactorWilliamShatnerinaTVspecial,anhonorthatnoeconomisthasyetreceived,
thoughAngristisstillhopeful.
5
Themechanicsofanidealexperimenthighlighttheforcesyou’dliketomanipulateandthefactorsyou’d
liketoholdconstant.
ResearchquestionsthatcannotbeansweredbyanyexperimentareFUQ’d:FundamentallyUnidenti…ed
Questions.WhatexactlydoesaFUQ’dquestionlooklike?At…rstblush,questionsaboutthecausal
e¤ectofraceorgenderseemslikegoodcandidatesbecausethesethingsarehardtomanipulateinisolation
(“imagineyourchromosomeswereswitchedatbirth”).Ontheotherhand,theissueeconomistscaremost
aboutintherealmofraceandsex,labormarketdiscrimination,turnsonwhethersomeonetreatsyou
di¤erentlybecausethey
believe
youtobeblackorwhite,maleorfemale.Thenotionofacounterfactual
worldwheremenareperceivedaswomenorviceversahasalonghistoryanddoesnotrequireDouglas-
Adams-styleoutlandishnesstoentertain(RosalinddisguisedasGanymedefoolseveryoneinShakespeare’s
AsYouLikeIt
).Theideaofchangingraceissimilarlynear-fetched:In
TheHumanStain
,PhilipRoth
imaginestheworldofColemanSilk,ablackLiteratureprofessorwhopassesaswhiteinprofessionallife.
2
ArecentexampleisBertrandandMullainathan(2004)whocomparedemployers’reponsestoresumeswithblacker-sounding
andwhiter-sounding…rstnames,likeLakishaandEmily(thoughFryerandLevitt,2004,notethatnamesmaycarryinformation
aboutsocioeconomicstatusaswellasrace.)
6
CHAPTER1.QUESTIONSABOUT
QUESTIONS
ThethirdandfourthresearchFAQsareconcernedwiththenuts-and-boltselementsthatproducea
speci…cstudy.QuestionNumber3asks:
whatisyouridenti…cationstrategy?
AngristandKrueger(1999)
usedtheterm
identi…cationstrategy
todescribethemannerinwhicharesearcherusesobservationaldata
7
signi…cancegone.
Asshouldbeclearfromtheabovediscussion,thefourresearchFAQsarepartofaprocessofproject
8
CHAPTER1.QUESTIONSABOUT
QUESTIONS
Chapter2
TheExperimentalIdeal
Itisanimportantandpopularfactthatthingsarenotalwayswhattheyseem.Forinstance,
1
ThePerrydatacontinuetogetattention,particularaspolicy-interesthasreturnedtoearlyeducation.Arecentre-analysis
byMichaelAnderson(2006)con…rmsmanyofthe…ndingsfromtheoriginalPerrystudy,thoughAndersonalsoshowsthatthe
overallpositivee¤ectsofPerryaredrivenentirelybytheimpactongirls.ThePerryinterventionseemstohavedonenothing
forboys.
9
10
CHAPTER2.THEEXPERIMENTALIDEAL
2.1TheSelectionProblem
Wetakeabrieftime-outforamoreformaldiscussionoftheroleexperimentsplayinuncoveringcausale¤ects.
Hospital77742.790.014
NoHospital900492.070.003
Thedi¤erenceinthemeansis0.71,alargeandhighlysigni…cantcontrastinfavorofthe
non-hospitalized
,
witha
t
-statisticof58.9.
Takenatfacevalue,thisresultsuggeststhatgoingtothehospitalmakespeoplesicker.It’snotimpossible
thisistherightanswer:hospitalsarefullofothersickpeoplewhomightinfectus,anddangerousmachines
andchemicalsthatmighthurtus.Still,it’seasytoseewhythiscomparisonshouldnotbetakenat
facevalue:peoplewhogotothehospitalareprobablylesshealthytobeginwith.Moreover,evenafter
2.1.THESELECTIONPROBLEM
11
potentialoutcome
=
8



:
y
1
i
if
d
i
=1
y
0
i
if
d
i
=0
:
Inotherwords,
y
0
i
isthehealthstatusofanindividualhadhenotgonetothehospital,irrespectiveof
whetherheactuallywent,while
y
1
i
istheindividual’shealthstatusifhegoes.Wewouldliketoknow
{z
}
Observeddi¤erenceinaveragehealth
=
E
[
y
1
i
j
d
i
=1]

E
[
y
0
i
j
d
i
=1]
|
{z
}
averagetreatmente¤ectonthetreated
+
E
[
y
0
i
j
d
i
=1]

E
[
y
0
i
j
d
i
=0]
|
{z
}
selectionbias
Theterm
E
[
y
1
i
j
d
i
=1]

E
[
y
0
i
j
d
i
=1]=
E
[
y
1
i

y
0
i
j
d
i
=1]
isthe
averagecausale¤ectofhospitalizationonthosewhowerehospitalized
.Thistermcapturestheaverages
di¤erencebetweenthehealthofthehospitalized,
E
[
y
1
i
j
d
i
=1]
;
andwhatwouldhavehappenedto
them
hadtheynotbeenhospitalized,
E
[
y
0
i
j
d
i
=1]
:
Theobserveddi¤erenceinhealthstatushowever,addsto
thiscausale¤ectatermcalled
selectionbias
.Thistermisthedi¤erenceinaverage
y
0
i
2
Thepotentialoutcomesideaisafundamentalbuildingblockinmodernresearchoncausale¤ects.Importantreferences
developingthisideaareRubin(1974,1977),andHolland(1986),whoreferstoacausalframeworkinvolvingpotentialoutcomes
astheRubinCausalModel.
12
CHAPTER2.THEEXPERIMENTALIDEAL
wereandwerenothospitalized.Becausethesickaremorelikelythanthehealthytoseektreatment,those
whowerehospitalizedhaveworse
y
0
i
’s,makingselectionbiasnegativeinthisexample.Theselectionbias
maybesolarge(inabsolutevalue)thatitcompletelymasksapositivetreatmente¤ect.Thegoalofmost
2.2.RANDOMASSIGNMENTSOLVESTHESELECTIONPROBLEM
13
experimentalcomparisonsofparticipantsandnon-participantsoftenshowthataftertraining,thetrainees
earnlessthanplausiblecomparisongroups(see,e.g.,Ashenfelter,1978;AshenfelterandCard,1985;Lalonde
1995).Heretoo,selectionbiasisanaturalconcernsincesubsidizedtrainingprogramsaremeanttoserve
menandwomenwithlowearningspotential.Notsurprisingly,therefore,simplecomparisonsofprogram
participantswithnon-participantsoftenshowlowerearningsfortheparticipants.Incontrast,evidencefrom
randomizedevaluationsoftrainingprogramsgeneratemostlypositivee¤ects(see,e.g.,Lalonde,1986;Orr,
14
CHAPTER2.THEEXPERIMENTALIDEAL
with22-25childrenandapart-timeteacher’saide,orregularclasseswithafulltimeteacher’saide.Schools
withatleastthreeclassesineachgradecouldchoosetoparticipateintheexperiment.
StudentswhoenteredSTARinkindergarten
VariableSmallRegularRegular/AideJoint
P
-value
1.Freelunch.47.48.50.09
2.White/Asian.68.67.66.26
3.Agein19855.445.435.42.32
4.Attritionrate.49.52.53.02
5.Classsizeinkindergarten15.1022.4022.80.00
6.Percentilescoreinkindergarten54.7048.9050.00.00
Notes:AdaptedfromKrueger(1999),Table1.Thetableshowsmeansofvariablesby
treatmentstatus.The
P
-valueinthelastcolumnisforthe
F
-testofequalityofvariable
meansacrossallthreegroups.Allvariablesexceptattritionareforthe…rstyearastudent
isobserved,Thefreelunchvariableisthefractionreceivingafreelunch.Thepercentile
scoreistheaveragepercentilescoreonthreeStanfordAchievementTests.Theattrition
rateistheproportionlosttofollowupbeforecompletingthirdgrade.
characteristicsinthetableareafreelunchvariable,studentrace,andstudentage.Freelunchstatusisa
goodmeasureoffamilyincome,sinceonlypoorchildrenqualifyforafreeschoollunch.Di¤erencesinthese
characteristicsacrossthethreeclasstypesaresmallandnonearesigni…cantlydi¤erentfromzero.This
suggeststherandomassignmentworkedasintended.
Table2.2.1alsopresentsinformationonaverageclasssize,theattritionrate,andtestscores,measured
hereonapercentilescale.Theattritionratewaslowerinsmallkindergartenclassrooms.Thisispotential
aproblem,atleastinprinciple.
3
Classsizesaresigni…cantlylowerintheassigned-to-be-smallclass
rooms,whichmeansthattheexperimentsucceededincreatingthedesiredvariation.Ifmanyoftheparents
3
Krueger(1999)devotesconsiderableattentiontotheattritionproblem.Di¤erencesinattritionratesacrossgroupsmay
resultinasampleofstudentsinhighergradesthatisnotrandomlydistributedacrossclasstypes.Thekindergartenresults,
whichwereuna¤ectedbyattrition,arethereforethemostreliable.
2.2.RANDOMASSIGNMENTSOLVESTHESELECTIONPROBLEM
15
Becauserandomizationeliminatesselectionbias,thedi¤erenceinoutcomesacrosstreatmentgroups
capturestheaveragecausale¤ectofclasssize(relativetoregularclasseswithapart-timeaide).Inpractice,
scoresondummiesforeachtreatmentgroup,apointweexpandonbelow.Theestimatedtreatment-control
di¤erencesforkindergartners,reportedinTable2.2.2(derivedfromKrueger,1999,Table5),showasmall-
classe¤ectofabout5to6percentilepoints.Thee¤ectsizeisabout
:
2
;
where

isthestandarddeviation
ofthepercentilescoreinkindergarten.Thesmall-classe¤ectissigni…cantlydi¤erentfromzero,whilethe
Table2.2.2:Experimentalestimatesofthee¤ectofclass-sizeassignmentontestscores
Explanatoryvariable(1)(2)(3)(4)
Smallclass4.825.375.365.37
(2.19)(1.26)(1.21)(1.19)
Regular/aideclass.12.29.53.31
(2.23)(1.13)(1.09)(1.07)
White/Asian(1=yes)––8.358.44
(1.35)(1.36)
Girl(1=yes)––4.484.39
(.63)(.63)
Freelunch(1=yes)––-13.15-13.07
(.77)(.77)
Whiteteacher–––-.57
(2.10)
Teacherexperience–––.26
(.10)
Master’sdegree–––-0.51
(1.06)
School…xede¤ectsNoYesYesYes
R
2
.01.25.31.31
Note:AdaptedfromKrueger(1999),Table5.The
dependentvariableistheStanfordAchievementTest
percentilescore.Robuststandarderrorsthatallow
forcorrelatedresidualswithinclassesareshownin
parentheses.Thesamplesizeis5681.
regular/aidee¤ectissmallandinsigni…cant.
TheSTARstudy,anexemplaryrandomizedtrialintheannalsofsocialscience,alsohighlightsthe
logisticaldi¢culty,longduration,andpotentiallyhighcostofrandomizedtrials.Inmanycases,suchtrials
areimpractical.
4
Inothercases,wewouldlikeananswersoonerratherthanlater.Muchoftheresearch
4
RandomizedtrialsareneverperfectandSTARisnoexception.Pupilswhorepeatedorskippedagradelefttheexperiment.
Studentswhoenteredanexperimentalschoolonegradelaterwereaddedtotheexperimentandrandomlyassignedtooneof
theclasses.Oneunfortunateaspectoftheexperimentisthatstudentsintheregularandregular/aideclasseswerereassigned
afterthekindergartenyear,possiblyduetoprotestsoftheparentswithchildrenintheregularclassrooms.Therewasalso
someswitchingofchildrenafterthekindergartenyear.Despitetheseproblems,theSTARexperimentseemstohavebeenan
16
CHAPTER2.THEEXPERIMENTALIDEAL
wedo,therefore,attemptstoexploitcheaperandmorereadilyavailablesourcesofvariation.Wehopeto
…ndnaturalorquasi-experimentsthatmimicarandomizedtrialbychangingthevariableofinterestwhile
otherfactorsarekeptbalanced.Canwealways…ndaconvincingnaturalexperiment?Ofcoursenot.
Nevertheless,wetakethepositionthatanotionalrandomizedtrialisourbenchmark.Notallresearchers
sharethisview,butmanydo.Weheardit…rstfromourteacherandthesisadvisor,OrleyAshenfelter,
apioneeringproponentofexperimentsandquasi-experimentalresearchdesignsinsocialscience.Hereis
Ashenfelter(1991)assessingthecredibilityoftheobservationalstudieslinkingschoolingandincome:
extremelywellimplementedrandomizedtrial.Krueger’s(1999)analysissuggeststhatnoneoftheseimplementationproblems
a¤ectedthemainconclusionsofthestudy.
5
TheAngrist-Lavy(1999)resultsturnupagaininChapter6,asanillustrationofthequasi-experimentalregression-
discontinuityresearchdesign.
2.3.REGRESSIONANALYSISOFEXPERIMENTS
17
constanttreatmente¤ects,wecanrewriteequation(2.1.1)intheform
y
i
=

+

d
i
+

i
;
qqq
E
(
y
0
i
)(
y
1
i

y
0
i
)
y
0
i

E
(
y
0
i
)
(2.3.1)
where

i
istherandompartof
y
0
i
.Evaluatingtheconditionalexpectationofthisequationwithtreatment
statusswitchedo¤andongives
E
[
y
i
j
d
i
=1]=

+

+
E
[

i
j
d
i
=1]
E
[
y
i
j
d
i
=0]=

+
E
[

i
j
d
i
=0]
;
sothat,
E
[
y
i
j
d
i
=1]

E
[
y
i
j
d
i
=0]=

|{z}
treatmente¤ect
+
E
[

i
j
d
i
=1]

E
[

i
j
d
i
=0]
|
{z
}
selectionbias
:
18
CHAPTER2.THEEXPERIMENTALIDEAL
…xede¤ectsisratherminor,butwewouldn’tknowthiswithouttakingalook.Wewillhavemoretosay
aboutregressionmodelswith…xede¤ectsinChapter5.
TheothercontrolsinKrueger’stabledescribestudentcharacteristicssuchasrace,age,andfreelunch
status.Wesawbeforethattheseindividualcharacteristicsarebalancedacrossclasstypes,i.e.theyare
notsystematicallyrelatedtotheclass-sizeassignmentofthestudent.Ifthesecontrols,callthem
X
i
,are
uncorrelatedwiththetreatment
d
i
,thentheywillnota¤ecttheestimateof

.Inotherwords,estimates
of

inthelongregression,
y
i
=

+

d
i
+
X
0
i

+

i
(2.3.2)
willbeclosetoestimatesof

intheshortregression,(2.3.1).ThisisapointweexpandoninChapter3.
Nevertheless,inclusionofthevariables
X
i
maygeneratemorepreciseestimatesofthecausale¤ectof
interest.Noticethatthestandarderroroftheestimatedtreatmente¤ectsincolumn3issmallerthan
thecorrespondingstandarderrorincolumn2.Althoughthecontrolvariables,
X
i
,areuncorrelatedwith
d
i
,theyhavesubstantialexplanatorypowerfor
y
i
.Includingthesecontrolvariablesthereforereducesthe
residualvariance,whichinturnlowersthestandarderroroftheregressionestimates.Similarly,thestandard
errorsoftheestimatesof

arereducedbytheinclusionofschool…xede¤ectsbecausethesetooexplain
animportantpartofthevarianceinstudentperformance.Thelastcolumnaddsteachercharacteristics.
Becauseteacherswererandomlyassignedtoclasses,andteachercharacteristicsappeartohavelittletodo
withstudentachievementinthesedata,boththeestimatede¤ectofsmallclassesandit’sstandarderrorare
unchangedbytheadditionofteachervariables.
Regressionplaysanexceptionallyimportantroleinempiricaleconomicresearch.Someregressionsare
simplydescriptivetools,asinmuchoftheresearchonearningsinequality.Aswe’veseeninthischapter,
regressioniswell-suitedtotheanalysisofexperimentaldata.Insomecases,regressioncanalsobeusedto
PartII
TheCore
19
Chapter3
MakingRegressionMakeSense
22
CHAPTER3.MAKINGREGRESSIONMAKESENSE
allthatplausible,sincevoterturnoutislowforthepoor.IalsorememberarguingwithAlan
Meltzeroverwhethergovernmentexpenditureoneducationshouldbeclassi…edasapublicgood
3.1.REGRESSIONFUNDAMENTALS
23
(iv)thesamplingdistributionofregressionestimates
3.1.1EconomicRelationshipsandtheConditionalExpectationFunction
Empiricaleconomicresearchinour…eldofLaborEconomicsistypicallyconcernedwiththestatistical
24
CHAPTER3.MAKINGREGRESSIONMAKESENSE
interestbeforewecanusedatatostudythem.
1
Figure3.1.1plotstheCEFoflogweeklywagesgivenschoolingforasampleofmiddle-agedwhitemen
fromthe1980Census.Thedistributionofearningsisalsoplottedforafewkeyvalues:4,8,12,and16years
ofschooling.TheCEFinthe…gurecapturesthefactthat—theenormousvariationindividualcircumstances
notwithstanding—peoplewithmoreschoolinggenerallyearnmore,onaverage.Theaverageearningsgain
associatedwithayearofschoolingistypicallyabout10percent.
Figure3.1.1:RawdataandtheCEFofaveragelogweeklywagesgivenschooling.Thesampleincludes
whitemenaged40-49inthe1980IPUMS5percent…le.
AnimportantcomplementtotheCEFisthelawofiteratedexpectations.Thislawsaysthatan
unconditionalexpectationcanbewrittenasthepopulationaverageoftheCEF.Inotherwords
E
[
y
i
]=
E
f
E
[
y
i
j
X
i
]
g
;
(3.1.1)
wheretheouterexpectationusesthedistributionof
X
i
.Hereisproofofthelawofiteratedexpectations
forcontinuouslydistributed
(
X
i
;
y
i
)
withjointdensity
f
xy
(
u;t
)
,where
f
y
(
t
j
X
i
=
x
)
istheconditional
1
Examplesofpedagogicalwritingusingthe“population-…rst”approachtoeconometricsincludeChamberlain(1984),Gold-
berger(1991),andManski(1991).
3.1.REGRESSIONFUNDAMENTALS
25
distributionof
y
i
given
X
i
=
x
and
g
y
(
t
)
and
g
x
(
u
)
arethemarginaldensities:
E
f
E
[
y
i
j
X
i
]
g
=
Z
E
[
y
i
j
X
i
=
u
]
g
x
(
u
)
du
=
Z

Z
tf
y
(
t
j
X
i
=
u
)
dt

g
x
(
u
)
du
=
ZZ
tf
y
(
t
j
X
i
=
u
)
g
x
(
u
)
dudt
=
Z
t

Z
f
y
(
t
j
X
i
=
u
)
g
x
(
u
)
du

dt
=
Z
t

Z
f
xy
(
u;t
)
du

dt
=
Z
tg
y
(
t
)
dt:
Theintegralsinthisderivationrunoverthepossiblevaluesof
X
i
and
y
i
(indexedby
u
and
t
).We’velaid
outthesestepsbecausetheCEFanditspropertiesarecentraltotherestofthischapter.
Thepowerofthelawofiteratedexpectationscomesfromthewayitbreaksarandomvariableintotwo
pieces.
Theorem3.1.1
TheCEF-DecompositionProperty
y
i
=
E
[
y
i
j
X
i
]+
"
i
,
where(i)
"
i
ismean-independentof
X
i
,i.e.,
E
[
"
i
j
X
i
]=0
;
and,therefore,(ii)
"
i
isuncorrelatedwithany
functionof
X
i
.
Proof.
(i)
E
[
"
i
j
X
i
]=
E
[
y
i

E
[
y
i
j
X
i
]
j
X
i
]=
E
[
y
i
j
X
i
]

E
[
y
i
j
X
i
]=0;
Thistheoremsaysthatanyrandomvariable,
y
i
,canbedecomposedintoapiecethat’s“explainedby
X
i
”,i.e.,theCEF,andapieceleftoverwhichisorthogonalto(i.e.,uncorrelatedwith)anyfunctionof
X
i
.
26
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Proof.
Write
(
y
i

m
(
X
i
))
2
=((
y
i

E
[
y
i
j
X
i
])+(
E
[
y
i
j
X
i
]

m
(
X
i
)))
2
=(
y
i

E
[
y
i
j
X
i
])
2
+2(
E
[
y
i
j
X
i
]

m
(
X
i
))(
y
i

E
[
y
i
j
X
i
])
+(
E
[
y
i
j
X
i
]

m
(
X
i
))
2
The…rsttermdoesn’tmatterbecauseitdoesn’tinvolve
m
(
X
i
)
.Thesecondtermcanbewritten
h
(
X
i
)
"
i
,
where
h
(
X
i
)

2(
E
[
y
i
j
X
i
]

m
(
X
i
))
,andthereforehasexpectationzerobytheCEF-decompositionprop-
erty.Thelasttermisminimizedatzerowhen
m
(
X
i
)
istheCEF.
A…nalpropertyoftheCEF,closelyrelatedtoboththeCEFdecompositionandpredictionproperties,
istheAnalysis-of-Variance(ANOVA)Theorem:
Theorem3.1.3
TheANOVATheorem
V
(
y
i
)=
V
(
E
[
y
i
j
X
i
])+
E
[
V
(
y
i
j
X
i
)]
where
V
(

)
denotesvarianceand
V
(
y
i
j
X
i
)
istheconditionalvarianceof
y
i
given
X
i
:
Proof.
TheCEF-decompositionpropertyimpliesthevarianceof
y
i
isthevarianceoftheCEFplusthe
varianceoftheresidual,
"
i

y
i

E
[
y
i
j
X
i
]
since
"
i
and
E
[
y
i
j
X
i
]
areuncorrelated.Thevarianceof
"
i
is
E

"
2
i

=
E

E

"
2
i
j
X
i

=
E
[
V
[
y
i
j
X
i
]]
where
E

"
2
i
j
X
i

=
V
[
y
i
j
X
i
]
because
"
i

y
i

E
[
y
i
j
X
i
]
.
ThetwoCEFpropertiesandtheANOVAtheoremmayhaveafamiliarring.Youmightbeusedto
seeinganANOVAtableinyourregressionoutput,forexample.ANOVAisalsoimportantinresearchon
inequalitywherelaboreconomistsdecomposechangesintheincomedistributionintopartsthatcanbe
accountedforbychangesinworkercharacteristicsandchangesinwhat’sleftoverafteraccountingforthese
factors(See,e.g.,Autor,Katz,andKearney,2005).WhatmaybeunfamiliaristhefactthattheCEF
propertiesandANOVAvariancedecompositionworkinthepopulationaswellasinsamples,anddonot
turnontheassumptionofalinearCEF.Infact,thevalidityoflinearregressionasanempiricaltooldoes
notturnonlinearityeither.
3.1.2LinearRegressionandtheCEF
Sowhat’stheregressionyouwanttorun?
Inourworld,thisquestionoronelikeitisheardalmosteveryday.Regressionestimatesprovideavaluable
baselineforalmostallempiricalresearchbecauseregressionistightlylinkedtotheCEF,andtheCEF
3.1.REGRESSIONFUNDAMENTALS
27
providesanaturalsummaryofempiricalrelationships.Thelinkbetweenregressionfunctions–i.e.,the
best-…ttinglinegeneratedbyminimizingexpectedsquarederrors–andtheCEFcanbeexplainedinat
least3ways.Tolayouttheseexplanationsprecisely,ithelpstobepreciseabouttheregressionfunctionwe
haveinmind.Thischapterisconcernedwiththevectorof
population
regressioncoe¢cients,de…nedasthe
solutiontoapopulationleastsquaresproblem.Atthispoint,wearenotworriedaboutcausality.Rather,
V
(
x
i
)
,andtheinterceptis

=
E
[
y
i
]


1
E
[
X
i
]
.Inthemultivariate
case,i.e.,withmorethanonenon-constantregressor,theslopecoe¢cientforthe
k
-thregressorisgivenbelow:
REGRESSIONANATOMY

k
=
Cov
(
y
i
;
~
x
ki
)
V
(~
x
ki
)
;
(3.1.3)
where
~
x
ki
istheresidualfromaregressionof
x
ki
onalltheothercovariates.
Inotherwords,
E

X
i
X
0
i


1
E
[
X
i
y
i
]
isthe
k

1
vectorwith
k
-thelement
Cov
(
y
i
;
~
x
ki
)
V
(~
x
ki
)
.Thisimportant
formulaissaidtodescribethe“anatomyofamultivariateregressioncoe¢cient”becauseitrevealsmuch
morethanthematrixformula

=
E

X
i
X
0
i


1
E
[
X
i
y
i
]
:
Itshowsusthateachcoe¢cientinamultivariate
regressionisthebivariateslopecoe¢cientforthecorrespondingregressor,after"partiallingout"allthe
othervariablesinthemodel.
Toverifytheregression-anatomyformula,substitute
y
i
=

0
+

1
x
1
i
+
:::
+

k
x
ki
+
:::
+

k
x
k
i
+
e
i
inthenumeratorof(3.1.3).Since
~
x
ki
isalinearcombinationoftheregressors,itisuncorrelatedwith
e
i
:
Also,since
~
x
ki
isaresidualfromaregressiononalltheothercovariatesinthemodel,itmustbeuncorrelated
thesecovariates.Finally,forthesamereason,thecovarianceof
~
x
ki
with
x
ki
isjustthevarianceof
~
x
ki
.We
28
CHAPTER3.MAKINGREGRESSIONMAKESENSE
thereforehavethat
Cov
(
y
i
;
~
x
ki
)=

k
V
(~
x
ki
)
:
2
Theregression-anatomyformulaisprobablyfamiliartoyoufromaregressionorstatisticscourse,perhaps
withonetwist:theregressioncoe¢cientsde…nedinthissectionarenotestimators,butrathertheyarenon-
stochasticfeaturesofthejointdistributionofdependentandindependentvariables.Thejointdistribution
ThelinearCEFtheoremraisesthequestionofunderwhatcircumstancesaCEFislinear.Theclassic
scenarioisjointNormality,i.e.,thevector
(
y
i
;x
0
i
)
0
hasamultivariateNormaldistribution.Thisisthe
scenarioconsideredbyGalton(1886),fatherofregression,whowasinterestedintheintergenerationallink
2
Theregression-anatomyformulaisusuallyattributedtoFrischandWaugh(1933).Youcanalsodoregressionanatomy
thisway:

k
=
Cov
(
˜
y
ki
;
~
x
ki
)
V
(~
x
ki
)
;
where
˜
y
ki
istheresidualfromaregressionof
y
i
oneverycovariateexcept
x
ki
.Thisworksbecausethe…ttedvaluesremoved
from
˜
y
ki
areuncorrelatedwith
~
x
ki
.Oftenit’susefultoplot
˜
y
ki
against
~
x
ki
;theslopeoftheleast-squares…tinthisscatterplot
isyourestimateofthemultivariate

k
,eventhoughtheplotistwo-dimensional.Note,however,thatit’snotenoughtopartial
theothercovariatesoutof
y
i
only.Thatis,
Cov
(
˜
y
ki
;x
ki
)
V
(
x
ki
)
=

Cov
(
˜
y
ki
;
~
x
ki
)
V
(~
x
ki
)

V
(~
x
ki
)
V
(
x
ki
)

6
=

k
;
unless
x
ki
isuncorrelatedwiththeothercovariates.
3.1.REGRESSIONFUNDAMENTALS
29
ThefollowingtworeasonsforfocusingonregressionarerelevantwhenthelinearCEFtheoremdoesnot
apply.
Theorem3.1.5
TheBestLinearPredictorTheorem(Regression-justi…cationII)
Thefunction
X
0
i

isthebest
linear
predictorof
y
i
given
X
i
inaMMSEsense.
Proof.

=
E
[
X
i
X
0
i
]

1
E
[
X
i
y
i
]
solvesthepopulationleastsquaresproblem,(3.1.2).
Inotherwords,justastheCEF,
E
[
y
i
j
X
i
]
,isthebest(i.e.,MMSE)predictorof
y
i
given
X
i
inthe
classof
all
functionsof
X
i
,thepopulationregressionfunctionisthebestwecandointheclassof
linear
functions.
Theorem3.1.6
TheRegression-CEFTheorem(Regression-justi…cationIII)
Thefunction
X
0
i

providestheMMSElinearapproximationto
E
[
y
i
j
X
i
]
,thatis,

=argmin
b
E
f
(
E
[
y
i
j
X
i
]

X
0
i
b
)
2
g
:
(3.1.4)
Proof.
Write

y
i

X
0
i
b

2
=
f
(
y
i

E
[
y
i
j
X
i
])+(
E
[
y
i
j
X
i
]

X
0
i
b
)
g
2
=(
y
i

E
[
y
i
j
X
i
])
2
+(
E
[
y
i
j
X
i
]

X
0
i
b
)
2
+2(
y
i

E
[
y
i
j
X
i
])(
E
[
y
i
j
X
i
]

X
0
i
b
)
:
The…rsttermdoesn’tinvolve
b
andthelasttermhasexpectationzerobytheCEF-decompositionproperty
(ii).TheCEF-approximationproblem,(3.1.4),thereforehasthesamesolutionasthepopulationleast
squaresproblem,(3.1.2).
Thesetwotheoremsshowustwomorewaystoviewregression.Regressionprovidesthebestlinear
predictorforthedependentvariableinthesamewaythattheCEFisthebestunrestrictedpredictorofthe
dependentvariable.Ontheotherhand,ifweprefertothinkaboutapproximating
E
[
y
i
j
X
i
]
,asopposedto
predicting
y
i
,theRegression-CEFtheoremtellsusthateveniftheCEFisnonlinear,regressionprovides
thebestlinearapproximationtoit.
Theregression-CEFtheoremisourfavoritewaytomotivateregression.Thestatementthatregression
approximatestheCEFlinesupwithourviewofempiricalworkasane¤orttodescribetheessentialfeatures
ofstatisticalrelationships,withoutnecessarilytryingtopinthemdownexactly.ThelinearCEFtheorem
isforspecialcasesonly.Thebestlinearpredictortheoremissatisfyinglygeneral,butitencouragesan
overlyclinicalviewofempiricalresearch.We’renotreallyinterestedinpredicting
individual
y
i
;it’sthe
distribution
of
y
i
thatwecareabout.
Figure3.1.2illustratestheCEFapproximationpropertyforthesameschoolingCEFplottedinFigure
3.1.1.Theregressionline…tsthesomewhatbumpyandnonlinearCEFasifwewereestimatingamodel
30
CHAPTER3.MAKINGREGRESSIONMAKESENSE
for
E
[
y
i
j
X
i
]
insteadofamodelfor
y
i
.Infact,thatisexactlywhat’sgoingon.Animplicationofthe
regression-CEFtheoremisthatregressioncoe¢cientscanbeobtainedbyusing
E
[
y
i
j
X
i
]
asadependent
variableinsteadof
y
i
itself.Toseethis,supposethat
X
i
3.1.REGRESSIONFUNDAMENTALS
31
Figure3.1.2:RegressionthreadstheCEFofaverageweeklywagesgivenschooling
estimatesshouldbeinterpreted.Whateveraregressioncoe¢cientmaymean,ithasasamplingdistribution
thatiseasytodescribeanduseforstatisticalinference.
3
Weareinterestedinthedistributionofthesampleanalogof

=
E
[
X
i
X
0
i
]

1
E
[
X
i
y
i
]
inrepeatedsamples.Supposethevector
W
i


y
i
;
X
0
i

0
isindependentlyandidenticallydistributedin
asampleofsize
N
.Anaturalestimatorofthe…rstpopulationmoment,
E
[
W
i
]
,isthesum,
1
N
P
N
i
=1
W
i
.By
N
P
N
i
=1
W
i
W
0
i
.Followingthisprinciple,the
3
ThediscussionofasymptoticOLSinferenceinthissectionislargelyacondensationofmaterialinChamberlain(1984).
Importantpitfallsandproblemswiththisasymptotictheoryarecoveredinthelastchapter.
4
Econometriciansliketousematricesbecausethenotationissocompact.Sometimes(notveryoften)wedotoo.Suppose
32
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Figure3.1.3:Micro-dataandgrouped-dataestimatesofreturnstoschooling.Source:1980Census-IPUMS,
5percentsample.Sampleislimitedtowhitemen,age40-49.DerivedfromStataregressionoutput.Old-
3.1.REGRESSIONFUNDAMENTALS
33
Theasymptoticsamplingdistributionof
^

dependssolelyonthede…nitionoftheestimand(i.e.,the
natureofthethingwe’retryingtoestimate,

)andtheassumptionthatthedataconstitutearandom
sample.Beforederivingthisdistribution,ithelpstorecordthegeneralasymptoticdistributiontheorythat
coversourneeds.Thisbasictheorycanbestatedmostlyinwords.Forthepurposesofthesestatements,
weassumethereaderisfamiliarwiththecoretermsandconceptsofstatisticaltheory(e.g.,moments,
mathematicalexpectation,probabilitylimits,andasymptoticdistributions).Forde…nitionsoftheseterms
andaformalmathematicalstatementofthetheoreticalpropositionsgivenbelow,see,e.g.,Knight(2000).
THELAWOFLARGENUMBERS
Samplemomentsconvergeinprobabilitytothecorresponding
populationmoments.Inotherwords,theprobabilitythatthesamplemeanisclosetothepopulation
meancanbemadeashighasyoulikebytakingalargeenoughsample.
THECENTRALLIMITTHEOREM
SamplemomentsareasymptoticallyNormallydistributed(after
subtractingthecorrespondingpopulationmomentandmultiplyingbythesquarerootofthesample
size).Thecovariancematrixisgivenbythevarianceoftheunderlyingrandomvariable.Inother
words,inlargeenoughsamples,appropriatelynormalizedsamplemomentsareapproximatelyNormally
distributed.
SLUTSKY’STHEOREM
(a)
Considerthesumoftworandomvariables,oneofwhichconvergesindistributionandtheotherconverges
inprobabilitytoaconstant:theasymptoticdistributionofthissumisuna¤ectedbyreplacingthe
X
isthematrixwhoserowsaregivenby
X
0
i
and
y
isthevectorwithelements
y
i
,for
i
=1
;:::;N
.Thesamplemoment
1
N
P
X
i
X
0
i
is
X
0
X=N
andthesamplemoment
1
N
P
X
i
y
i
is
X
0
y=N
.Thenwecanwrite
^

=(
X
0
X
)

1
X
0
y
,afamiliarmatrix
formula.
34
CHAPTER3.MAKINGREGRESSIONMAKESENSE
N
(
^



)=
N

P
X
i
X
0
i


1
1
p
N
P
X
i
e
i
.
BytheSlutskytheorem,thishasthesameasymptoticdistributionas
E
[
X
i
X
0
i
]

1
1
p
N
P
X
i
e
i
.Since
E
[
X
i
e
i
]=
0
,
1
p
N
P
X
i
e
i
isaroot-
N
-normalizedandcenteredsamplemoment.Bythecentrallimittheorem,thisis
asymptoticallyNormallydistributedwithmeanzeroandcovariancematrix
E
[
X
i
X
0
i
e
2
i
]
,sincethisfourthmo-
mentisthecovariancematrixof
X
i
e
i
.Therefore,
^

hasanasymptoticNormaldistribution,withprobability
limit

,andcovariancematrix
E
[
X
i
X
0
i
]

1
E
[
X
i
X
0
i
e
2
i
]
E
[
X
i
X
0
i
]

1
:
(3.1.7)
Thestandarderrorsusedtoconstruct
t
-statisticsarethesquarerootsofthediagonalelementsofthis
5
ForaderivationofthethedeltamethodformulausingtheSlutskyandcontinuousmappingtheorems,see,e.g.,Knight,
2000,pp.120-121.
6
Residualsde…nedinthiswayarenotnecessarily
mean-independent
of
X
i
;formean-independence,weneedalinearCEF.
3.1.REGRESSIONFUNDAMENTALS
35
matrix.Inpracticethesestandarderrorsareestimatedbysubstitutingsumsforexpectations,andusingthe
estimatedresiduals,
^
e
i
=
y
i

X
0
i
^

toformtheempiricalfourthmoment,
P
[
X
i
X
i
^
e
2
i
]
=N
.
7
Thecross-producttermresultingfromanexpansionofthequadraticinthemiddleof3.1.9iszerobecause
y
i

E
[
y
i
j
X
i
]
ismean-independentof
X
i
.
36
CHAPTER3.MAKINGREGRESSIONMAKESENSE
notasu¢cientconditionforhomoskedasticity.Ourfavoriteexampleinthiscontextisthelinearprobability
model(LPM).Alinearprobabilitymodelisanyregressionwherethedependentvariableiszero-one,i.e.,
adummyvariablesuchasanindicatorforlaborforceparticipation.Supposetheregressionmodelis
saturated,sotheCEFislinear.BecausetheCEFislinear,theresidualvarianceisalsotheconditional
variance,
V
[
y
i
j
X
i
]
:
ButthedependentvariableisaBernoullitrialandthevarianceofaBernoullitrialis
P
[
y
i
j
X
i
](1

P
[
y
i
j
X
i
])
3.1.REGRESSIONFUNDAMENTALS
37
CEFperfectlybecausetheCEFislinearinthedummyregressorsusedtosaturate.Thisisanimportant
specialcaseoftheregression-CEFtheorem.
Iftherearetwoexplanatoryvariables,sayonedummyindicatingcollegegraduatesandonedummy
indicatingsex,themodelissaturatedbyincludingthesetwodummies,theirproduct,andaconstant.The
coe¢cientsonthedummiesareknownasmaine¤ects,whiletheproductiscalledan
interactionterm
.This
8
Withathirddummyvariableinthemodel,say
x
3
i
,asaturatedmodelincludes3maine¤ects,3second-orderinteraction
terms
f
x
1
i
x
2
i
,
x
2
i
x
3
i
;x
1
i
x
2
i
g
andonethird-orderterm,
x
1
i
x
2
i
x
3
i
.
38
CHAPTER3.MAKINGREGRESSIONMAKESENSE
has

maine¤ectsforschooling,onemaine¤ectforsex,and

sex-schoolinginteractions:
y
i
=

0
+

X
j
=1

j
d
ji
+
x
2
i
+

X
j
=1

j
(
d
ji
x
2
i
)+
"
i
:
(3.1.10)
Theinteractionterms,

j
,tellushoweachoftheschoolinge¤ectsdi¤erbysex.TheCEFinthiscasetakes
on
2(

+1)
3.2.REGRESSIONANDCAUSALITY
39
inaseriesofepisodeswherethedecision-makermightrealisticallygoonewayoranother,evenifcertain
choicesaremorelikelythanothers.Forexample,inthemiddleofjunioryear,restlessandunhappy,Angrist
glumlyconsideredhisoptions:droppingoutofhighschoolandhopefullygettingajob,stayinginschoolbut
takingeasyclassesthatleadtoaquickanddirtyhighschooldiploma,orplowingoninanacademictrack
thatleadstocollege.Althoughtheconsequencesofsuchchoicesareusuallyunknowninadvance,theideaof
alternativepathsleadingtoalternativeoutcomesforagivenindividualseemsuncontroversial.Philosophers
havearguedoverwhetherthispersonalnotionofpotentialoutcomesispreciseenoughtobescienti…cally
useful,butindividualdecision-makersseemtohavenotroublethinkingabouttheirlivesandchoicesinthis
manner(asinRobertFrost’scelebrated
TheRoadNotTaken
:thetraveller-narratorseeshimselflooking
backonamomentofchoice.Hebelievesthatthedecisiontofollowtheroadlesstraveled"hasmadeallthe
di¤erence,"thoughhealsorecognizesthatcounterfactualoutcomesareunknowable).
40
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Chapter2.Toaddressthisquestion,weimaginetwopotentialearningsvariables:
potentialoutcome
=
8



:
y
1
i
if
c
i
=1
y
0
i
if
c
i
=0
:
Inthiscase,
y
0
i
is
i
’searningswithoutcollege,while
y
1
i
is
i
’searningsifhegoes.Wewouldliketoknow
thedi¤erencebetween
y
1
i
and
y
0
i
,whichisthecausale¤ectofcollegeattendanceonindividual
i
.This
iswhatwewouldmeasureifwecouldgobackintimeandnudge
i
ontotheroadnottaken.Theobserved
outcome,
y
i
,canbewrittenintermsofpotentialoutcomesas
y
i
=
y
0
i
+(
y
1
i

y
0
i
)
c
i
:
Wegettoseeoneof
y
1
i
or
y
0
i
,butneverboth.Wethereforehopetomeasuretheaverageof
y
1
i

y
0
i
,or
theaverageforsomegroup,suchasthosewhowenttocollege.Thisis
E
[
y
1
i

y
0
i
j
c
i
=1]
:
Ingeneral,comparisonsofthosewhodoanddon’tgotocollegearelikelytobeapoormeasureofthe
causale¤ectofcollegeattendance.FollowingthelogicinChapter2,wehave
E
[
y
i
j
c
i
=1]

E
[
y
i
j
c
i
=0]
|
{z
}
Observeddi¤erenceinearnings
=
E
[
y
1
i

y
0
i
j
c
i
=1]
|
{z
}
averagetreatmente¤ectonthetreated
(3.2.1)
+
E
[
y
0
i
j
c
i
=1]

E
[
y
0
i
j
c
i
=0]
|
{z
}
selectionbias
:
Itseemslikelythatthosewhogotocollegewouldhaveearnedmoreanyway.Ifso,selectionbiasispositive,
andthenaivecomparison,
E
[
y
i
j
c
i
=1]

E
[
y
i
j
c
i
=0]
,exaggeratesthebene…tsofcollegeattendance.
TheCIAassertsthatconditionalonobservedcharacteristics,
X
i
,selectionbiasdisappears.Inthis
example,theCIAsays,
f
y
0
i
,
y
1
i
gq
c
i
j
X
i
:
(3.2.2)
GiventheCIA,conditional-on-
X
i
comparisonsofaverageearningsacrossschoolinglevelshaveacausal
3.2.REGRESSIONANDCAUSALITY
41
todenotethepotentialearningsthatperson
i
wouldreceiveafterobtaining
s
yearsofeducation.If
s
takes
ononlytwovalues,12and16,thenwearebacktothecollege/nocollegeexample:
y
0
i
=
f
i
(12);
y
1
i
=
f
i
(16)
:
Moregenerally,thefunction
f
i
(
s
)
tellsuswhat
i
wouldearnfor
any
valueofschooling,
s
.Inotherwords,
f
i
(
s
)
42
CHAPTER3.MAKINGREGRESSIONMAKESENSE
graduatinghighschoolonhighschoolgraduatesisthepopulationaveragehighschoolgraduatione¤ect:
E
[
f
i
(12)

f
i
(11)
j
X
i
;
s
i
=12]=
E
[
f
i
(12)

f
i
(11)
j
X
i
]
:
Thisisimportant...butlessimportantthantheeliminationofselectionbiasin(3.2.1).
Sofar,wehaveconstructedseparatecausale¤ectsforeachvaluetakenonbytheconditioningvariable,
X
i
.Thisleadstoasmanycausale¤ectsastherearevaluesof
X
i
,anembarrassmentofriches.Empiricists
3.2.REGRESSIONANDCAUSALITY
43
Athirdconsiderationisthatthetwo-waycontrastattheheartofthissubsection(highschoolorcollege
completersversusdropouts)doesnotdofulljusticetotheproblemathand.Since
s
i
takesonmanyvalues,
thereareseparateaveragecausale¤ectsforeachpossibleincrementin
s
i
,whichalsomustbesummarized
insomeway.
9
Theseconsiderationsleadusbacktoregression.
Regressionprovidesaneasy-to-useempiricalstrategythatautomaticallyturnstheCIAintocausale¤ects.
TworoutescanbetracedfromtheCIAtoregression.Oneassumesthat
f
i
(
s
)
isbothlinearin
s
andthesame
foreveryoneexceptforanadditiveerrorterm
,
inwhichcaselinearregressionisanaturaltooltoestimate
thefeaturesof
f
i
(
s
)
.Amoregeneralbutsomewhatlongerrouterecognizesthat
f
i
(
s
)
almostcertainly
di¤ersfordi¤erentpeople,and,moreover,neednotbelinearin
s
.Evenso,allowingforrandomvariationin
f
i
(
s
)
acrosspeople,andfornon-linearityforagivenperson,regressioncanbethoughtofasstrategyforthe
estimationofaweightedaverageoftheindividual-speci…cdi¤erence,
f
i
(
s
)

f
i
(
s

1)
.Infact,regression
canbeseenasaparticularsortofmatchingestimator,capturinganaveragecausale¤ectmuchlike3.2.3or
3.2.5.
Atthispoint,wewanttofocusontheconditionsrequiredforregressiontohaveacausalinterpretation
9
Forexample,wemightconstructtheaveragee¤ectover
s
usingthedistributionof
s
i
:
Inotherwords,estimate
E
[
f
i
(
s
)

f
i
(
s

1)]
foreach
s
bymatching,andthencomputetheaveragedi¤erence
X
E
[
f
i
(
s
)

f
i
(
s

1)]
P
(
s
)
:
where
P
(
s
)
istheprobabilitymassfunctionfor
s
i
:
Thisisadiscreteapproximationtotheaveragederivative,
E
[
f
0
i
(
s
i
)]
:
44
CHAPTER3.MAKINGREGRESSIONMAKESENSE
SupposenowthattheCIAholdsgivenavectorofobservedcovariates,
X
i
:
Inadditiontothefunctional
formassumptionforpotentialoutcomesembodiedin(3.2.8),wedecomposetherandompartofpotential
earnings,

i
,intoalinearfunctionofobservablecharacteristics,
X
i
,andanerrorterm,
v
i
:

i
=
X
0
i

+
v
i
,
where

isavectorofpopulationregressioncoe¢cientsthatisassumedtosatisfy
E
[

i
j
X
i
]=
X
0
i

.Because

isde…nedbytheregressionof

i
on
X
i
;
theresidual
v
i
and
X
i
areuncorrelated
byconstruction
.Moreover,
byvirtueoftheCIA,wehave
E
[
f
i
(
s
)
j
X
i
;
s
i
]=
E
[
f
i
(
s
)
j
X
i
]=

+
s
+
E
[

i
j
X
]=

+
s
+
X
0
i

Becausemean-independenceimpliesorthogonality,theresidualinthelinearcausalmodel
y
i
=

+

s
i
+
X
0
i

+
v
i
(3.2.9)
isuncorrelatedwiththeregressors,
s
i
and
X
i
,andtheregressioncoe¢cient

isthecausale¤ectofinterest.
Itbearsemphasizingonceagainthatthekeyassumptionhereisthattheobservablecharacteristics,
X
i
,are
theonlyreasonwhy

i
and
s
i
(equivalently,
f
i
(
s
)
and
s
i
)arecorrelated.Thisistheselection-on-observables
assumptionforregressionmodelsdiscussedoveraquartercenturyagobyBarnow,Cain,andGoldberger
(1981).ItremainsthebasisofmostempiricalworkinEconomics.
3.2.2TheOmittedVariablesBiasFormula
3.2.REGRESSIONANDCAUSALITY
45
wagesonschooling,
s
i
,controllingforabilitycanwrittenas
y
i
=

+

s
i
+
A
0
i

+
"
i
,(3.2.10)
where

,

,and

arepopulationregressioncoe¢cients,and
"
i
isaregressionresidualthatisuncorrelated
withallregressorsbyde…nition.IftheCIAappliesgiven
A
i
,then

canbeequatedwiththecoe¢cientin
thelinearcausalmodel,3.2.7,whiletheresidual
"
i
istherandompartofpotentialearningsthatisleftover
aftercontrollingfor
A
i
.
Inpractice,abilityishardtomeasure.Forexample,theAmericanCurrentPopulationSurvey(CPS),a
V
(
s
i
)
=

+

0

As
;
(3.2.11)
where

As
isthevectorofcoe¢cientsfromregressionsoftheelementsof
A
i
on
s
i
.Toparaphrase,theOVB
formulasays
Shortequalslongplusthee¤ectofomittedtimestheregressionofomittedonincluded.
Thisformulaiseasytoderive:plugthelongregressionintotheshortregressionformula,
Cov
(
y
i
;
s
i
)
V
(
s
i
)
:
Not
surprisingly,theOVBformulaiscloselyrelatedtotheregressionanatomyformula,3.1.3,fromSection3.1.2.
BoththeOVBandregressionanatomyformulastellusthatshortandlongregressioncoe¢cientsarethe
samewhenevertheomittedandincludedvariablesareuncorrelated.
10
10
HereisthemultivariategeneralizationofOVB:Let

s
1
denotethecoe¢cientvectorona
k
1

1
vectorofvariables,
X
1
i
in
a(short)regressionthathasnoothervariablesandlet

l
1
denotethecoe¢cientvectoronthesevariablesina(long)regression
thatincludesa
k
2

1
vectorofcontrolvariables,
X
2
i
,withcoe¢cientvector

l
2
.Then

s
1
=

l
1
+
E
[
X
1
i
X
0
1
i
]

1
E
[
X
1
i
X
0
2
i
]

l
2
.
11
Ashighlyeducatedpeople,weliketoassumethatabilityandschoolingarepositivelycorrelated.Thisisnotaforegone
conclusion,however:MickJaggerdroppedoutoftheLondonSchoolofEconomicsandBillGatesdroppedoutofHarvard,
perhapsbecausetheopportunitycostofschoolingforthesehigh-abilityguyswashigh(ofcourse,theymayalsobeacoupleof
veryluckycollegedropouts).
46
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Table3.2.1illustratesthesepointsusingdatafromtheNLSY.The…rstthreeentriesinthetableshow
thattheschoolingcoe¢cientdecreasesfrom.132to.114whenfamilybackgroundvariables—inthiscase,
parents’education—aswellasafewbasicdemographiccharacteristics(age,race,censusregionofresidence)
areincludedascontrols.Furthercontrolforindividualability,asproxiedbytheArmedForcesQuali…cation
Test(AFQT)testscore,reducestheschoolingcoe¢cientto.087(AFQTisusedbythemilitarytoselect
soldiers).Theomittedvariablesbiasformulatellsusthatthesereductionsarearesultofthefactthatthe
additionalcontrolsarepositivelycorrelatedwithbothwagesandschooling.
12
(1)(2)(3)(4)(5)
Controls:NoneAgeCol.(2)andCol.(3)andCol.(4),with
dummiesadditionalAFQTscoreoccupation
controls*dummies
0.1320.1310.1140.0870.066
(0.007)(0.007)(0.007)(0.009)(0.010)
Notes:DataarefromtheNationalLongitudinalSurveyofYouth(1979cohort,
2002survey).Thetablereportsthecoe¢cientonyearsofschoolinginaregres-
sionoflogwagesonyearsofschoolingandtheindicatedcontrols.Standard
errorsareshowninparentheses.Thesampleisrestrictedtomenandweighted
byNLSYsamplingweights.Thesamplesizeis2434.
*Additionalcontrolsaremother’sandfather’syearsofschoolinganddummy
variablesforraceandCensusregion.
Althoughsimple,theOVBformulaisoneofthemostimportantthingstoknowaboutregression.The
importanceoftheOVBformulastemsfromthefactthatifyouclaimanabsenceofomittedvariablesbias,
thentypicallyyou’realsosayingthattheregressionyou’vegotistheoneyouwant.Andtheregressionyou
12
Alargeempiricalliteratureinvestigatestheconsequencesofomittingabilityvariablesfromschoolingequations.Keyearly
referencesincludeGrilichesandMason(1972),Taubman(1976),Griliches(1977),andChamberlain(1978).
3.2.REGRESSIONANDCAUSALITY
47
ajob.Whensomeofthemandatorytraininggroupscontainedmoreworkersthantrainingslots,training
opportunitiesweredistributedbylottery.Hence,trainingrequirementswererandomlyassignedconditional
onthecovariatesusedtoassignworkerstogroups.Aregressiononadummyfortrainingplusthepersonal
characteristics,pastunemploymentvariables,andjobhistoryvariablesusedtoclassifyworkersseemsvery
likelytoprovidereliableestimatesofthecausale¤ectoftraining.
13
13
Thisprogramappearstoraiseearnings,primarilybecauseworkersinthetraininggroupwentbacktoworkmorequickly.
14
Lotterieshavebeenusedtodistributeprivateschooltuitionsubsidies;see,e.g.,Angrist,etal.(2002).
48
CHAPTER3.MAKINGREGRESSIONMAKESENSE
theselectionbiasdiscussedinChapter(2)andSection(3.2).Toillustrate,supposeweareinterestedinthe
e¤ectsofacollegedegreeonearningsandthatpeoplecanworkinoneoftwooccupations,whitecollarand
bluecollar.Acollegedegreeclearlyopensthedoortohigher-payingwhitecollarjobs.Shouldoccupation
thereforebeseenasanomittedvariableinaregressionofwagesonschooling?Afterall,occupationishighly
correlatedwithbotheducationandpay.Perhapsit’sbesttolookatthee¤ectofcollegeonwagesforthose
withinanoccupation,saywhitecollaronly.Theproblemwiththisargumentisthatonceweacknowledge
thefactthatcollegea¤ectsoccupation,comparisonsofwagesbycollegedegreestatuswithinanoccupation
arenolongerapples-to-apples,
15
Thesameproblemarisesin"conditional-on-positive"comparisons,discussedindetailinsection(3.4.2),below.
3.2.REGRESSIONANDCAUSALITY
49
Bythejointindependenceof
f
y
1
i
;
w
1
i
;
y
0
i
;
w
0
i
g
and
c
i
,wehave
E
[
y
1
i
j
w
1
i
=1
;
c
i
=1]

E
[
y
0
i
j
w
0
i
=1
;
c
i
=0]=
E
[
y
1
i
j
w
1
i
=1]

E
[
y
0
i
j
w
0
i
=1]
:
Thisexpressionillustratestheapples-to-orangesnatureofthebad-controlproblem:
E
[
y
1
i
j
w
1
i
=1]

E
[
y
0
i
j
w
0
i
=1]
=
E
[
y
1
i

y
0
i
j
w
1
i
=1]
|
{z
}
causale¤ectoncollegegrads
+
f
E
[
y
0
i
j
w
1
i
=1]

E
[
y
0
i
j
w
0
i
=1]
g
:
|
{z
}
selectionbias
Inotherwords,thedi¤erenceinwagesbetweenthosewithandwithoutacollegedegreeconditionalon
workinginawhitecollarjobequalsthecausale¤ectofcollegeonthosewith
w
1
i
=1
(peoplewhoworkat
awhitecollarjobwhentheyhaveacollegedegree)andaselection-biastermwhichre‡ectsthefactthat
collegechangesthecompositionofthepoolofwhitecollarworkers.
16
Inthisexample,selectionbiasisprobablynegative,thatis
E
[
y
0
i
j
w
1
i
=1]
E
[
y
0
i
j
w
0
i
=1]
:
Itseemsreasonabletothink
thatanycollegegraduatecangetawhitecollarjob,so
E
[
y
0
i
j
w
1
i
=1]
isnottoofarfrom
E
[
y
0
i
]
:
Butsomeonewhogetsa
whitecollarwithoutbene…tofacollegedegree(i.e.,
w
0
i
=1
)isprobablyspecial,i.e.,hasabetterthanaverage
y
0
i
.
50
CHAPTER3.MAKINGREGRESSIONMAKESENSE

2
)+(




1

2
)
s
i
+


2
a
li
+
"
i
:
(3.2.15)
Inthisscenario,

,

1
,and

2
areallpositive,so




1

2
istoosmallunless

1
turnsouttobezero.In
otherwords,useofaproxycontrolthatisincreasedbythevariableofinterestgeneratesacoe¢cientbelow
thedesirede¤ect.Importantly,

1
canbeinvestigatedtosomeextent:iftheregressionof
a
li
on
s
i
iszero,
3.3.HETEROGENEITYANDNONLINEARITY
51
beoutcomesinthecausalnexus.Inmanycases,however,thetimingisuncertainorunknown.Insuch
cases,clearreasoningaboutcausalchannelsrequiresexplicitassumptionsaboutwhathappened…rst,orthe
assertionthatnoneofthecontrolvariablesarethemselvescausedbytheregressorofinterest.
17
3.3HeterogeneityandNonlinearity
Aswesawintheprevioussection,alinearcausalmodelincombinationwiththeCIAleadstoalinear
CEFwithacausalinterpretation.AssumingtheCEFislinear,thepopulationregressionisit.Inpractice,
however,theassumptionofalinearCEFisnotreallynecessaryforacausalinterpretationofregression.For
onething,asdiscussedinSection3.1.2,wecanthinkoftheregressionof
y
i
on
X
i
and
s
i
asprovidingthe
bestlinearapproximationtotheunderlyingCEF,regardlessofitsshape.Therefore,iftheCEFiscausal,
thefactthatregressionapproximatesitgivesregressioncoe¢cientsacausal‡avor.Thisclaimisalittle
17
GrilichesandMason(1972)isaseminalexplorationoftheuseofearlyandlateabilitycontrolsinschoolingequations.
SeealsoChamberlain(1977,1978)forcloselyrelatedstudies.Rosenbaum(1984)o¤ersanalternativediscussionoftheproxy
controlideausingverydi¤erentnotation,outsideofaregressionframework.
52
CHAPTER3.MAKINGREGRESSIONMAKESENSE
aparticularweightedaverageofcontrastsorcomparisonsacrosscellsde…nedbycovariates.Thisiseasiest
E
[
y
0
i
j
d
i
=1]
.Simplycomparingthe
3.3.HETEROGENEITYANDNONLINEARITY
53
oftheright-hand-sideof(3.3.1).Inthediscretecase,thematchingestimandcanbewritten
E
[
y
1
i

y
0
i
j
d
i
=1]=
X
x

x
P
(
X
i
=
x
j
d
i
=1)
;
(3.3.2)
where
P
(
X
i
=
x
j
d
i
=1)
istheprobabilitymassfunctionfor
X
i
given
d
i
=1
.
18
.Inthiscase,
X
i
,takeson
18
ThismatchingestimatorisdiscussedbyRubin(1977)andusedbyCardandSullivan(1988)toestimatethee¤ectof
subsidizedtrainingonemployment.
19
Withcontinuouscovariates,exactmatchingisimpossibleandsomesortofapproximationisrequired,afactthatleadsto
bias.SeeAbadieandImbens(2006),whoderivetheimplicationsofapproximatematchingforthelimitingdistirbutionof
matchingestimators.
54
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Table3.3.1:Uncontrolled,matching,andregressionestimatesofthee¤ectsofvoluntarymilitaryserviceon
earnings
RaceAverageDi¤erencesMatchingRegressionRegression
earningsinmeansestimatesestimatesminus
Whites145371233.4-197.2-88.8108.4
(60.3)(70.5)(62.5)(28.5)
Non-116642449.1839.71074.4234.7
whites(47.4)(62.7)(50.7)(32.5)
Notes:AdaptedfromAngrist(1998,TablesIIandV).Standarderrorsare
reportedinparentheses.Thetableshowsestimatesofthee¤ectofvoluntary
militaryserviceonthe1988-1991SocialSecurity-taxableearningsofmenwho
appliedtoenterthearmedforcesbetween1979and1982.Thematchingand
regressionestimatescontrolforapplicants’yearofbirth,educationatthetime
ofapplication,andAFQTscore.Thereare128,968whitesand175,262non-
whitesinthesample.
Table(3.3.1)alsoshowsregressionestimatesofthee¤ectofvoluntarymilitaryservice,controllingfor
3.3.HETEROGENEITYANDNONLINEARITY
55
Toseethis,startbyusingtheregressionanatomyformulatowritethecoe¢cienton
d
i
intheregression
of
y
i
on
X
i
and
d
i
as

R
=
Cov
(
y
i
,
˜
d
i
)
V
(
˜
d
i
)
(3.3.5)
=
E
[(
d
i

E
[
d
i
j
X
i
])
y
i
]
E
[(
d
i

E
[
d
i
j
X
i
])
2
]
=
E
f
(
d
i

E
[
d
i
j
X
i
])
E
[
y
i
j
d
i
;
X
i
]
g
E
[(
d
i

E
[
d
i
j
X
i
])
2
]
:
(3.3.6)
E
[(
d
i

E
[
d
i
j
X
i
])
2
]
=
E
f
E
[(
d
i

E
[
d
i
j
X
i
])
2
j
X
i
]

X
g
E
f
E
[(
d
i

E
[
d
i
j
X
i
])
2
j
X
i
]
g
=
E
[

2
D
(
X
i
)

X
]
E
[

2
D
(
X
i
)]
;
(3.3.7)
56
CHAPTER3.MAKINGREGRESSIONMAKESENSE
where

2
D
(
X
i
)=
E
[(
d
i

E
[
d
i
j
X
i
])
2
j
X
i
]
istheconditionalvarianceof
d
i
given
X
i
.Thisestablishesthattheregressionmodel,(3.3.4),producesa
treatment-varianceweightedaverageof

X
:
Becausetheregressorofinterest,
d
i
isadummyvariable,onelaststepcanbetaken.Inthiscase,

2
D
(
X
i
)=
P
(
d
i
=1
j
X
i
)(1

P
(
d
i
=1
j
X
i
))
,so

R
=
X
x

x
[
P
(
d
i
=1
j
X
i
=
x
)(1

P
(
d
i
=1
j
X
i
=
x
))]
P
(
X
i
=
x
)
X
x
[
P
(
d
i
=1
j
X
i
=
x
)(1

P
(
d
i
=1
j
X
i
=
x
))]
P
(
X
i
=
x
)
Thisshowsthattheregressionestimandweightseachcovariate-speci…ctreatmente¤ectby
[
P
(
X
i
=
x
j
d
i
=
1)(1

P
(
X
i
=
x
j
d
i
=1))]
P
(
X
i
=
x
)
.Incontrast,thematchingestimandforthee¤ectoftreatmentonthe
treatedcanbewritten
E
[
y
1
i

y
0
i
j
d
i
=1]=
X
x

x
P
(
X
i
=
x
j
d
i
=1)=
X
x

x
P
(
d
i
=1
j
X
i
=
x
)
P
(
X
i
=
x
)
X
x
P
(
d
i
=1
j
X
i
=
x
)
P
(
X
i
=
x
)
because
P
(
X
i
=
x
j
d
i
=1)=
P
(
d
i
=1
j
X
i
=
x
)

P
(
X
i
=
x
)
P
(
d
i
=1)
:
Sotheweightsusedtoconstruct
E
[
y
1
i

y
0
i
j
d
i
=1]
areproportionaltotheprobabilityoftreatmentateach
valueofthecovariates.
Thepointofthisderivationisthatthetreatment-on-the-treatedestimandputsthemostweighton
covariatecellscontainingthosewhoaremostlikelytobetreated.Incontrast,regressionputsthemost
weightoncovariatecellswheretheconditionalvarianceoftreatmentstatusislargest.Asarule,this
varianceismaximizedwhen
P
(
d
i
=1
j
X
i
=
x
)=
1
2
,inotherwords,forcellswherethereareequalnumbers
oftreatedandcontrolobservations.Ofcourse,thedi¤erenceinweightingschemesisoflittleimportance
if

x
doesnotvaryacrosscells(thoughweightingstilla¤ectsthestatisticale¢ciencyofestimators).In
thisexample,however,menwhoweremostlikelytoserveinthemilitaryappeartobene…tleastfromtheir
service.Thisisprobablybecausethosemostlikelytoserveweremostquali…ed,butthereforealsohadthe
highestcivilianearningspotentialandsobene…tedleastfrommilitaryservice.Thisfactleadsmatching
estimatesofthee¤ectofmilitaryservicetobesmallerthanregressionestimatesbasedonthesamevector
ofcontrolvariables.
20
20
It’snosurprisethatregressiongivesthemostweighttocellswhere
P
(
d
i
=1
j
X
i
=
x
)=1
=
2
sinceregressionise¢cientfor
ahomoskedasticconstant-e¤ectslinearmodel.Weshouldexpectane¢cientestimatortogivethemostweighttocellswhere
thecommontreatmente¤ectisestimatedmostprecisely.Withhomoskedasticresiduals,themostprecisetreatmente¤ects
3.3.HETEROGENEITYANDNONLINEARITY
57
Importantly,neithertheregressionnorthecovariate-matchingestimandsgiveanyweighttocovariate
cellsthatdonotcontainbothtreatedandcontrolobservations.Consideravalueof
X
i
,say
x

,where
eithernooneistreatedoreveryoneistreated.Then,

x

isunde…ned,whiletheregressionweights,
[
P
(
d
i
=1
j
X
i
=
x

)(1

P
(
d
i
=1
j
X
i
=
x

))]
;
arezero
:
Inthelanguageoftheeconometricliteratureon
matching,boththeregressionandmatchingestimandsimpose
commonsupport
,thatis,theyarelimitedto
covariatevalueswherebothtreatedandcontrolobservationsarefound.
21
Thestepfrom
estimand
to
estimator
isalittlemorecomplicated.Inpractice,bothregressionand
matchingestimatorsareimplementedusingmodellingassumptionsthatimplicitlyinvolveacertainamount
ofextrapolationacrosscells.Forexample,matchingestimatorsoftencombinecovariatescellswithfew
observations.Thisviolatescommonsupportifthecellsbeingcombineddonoteachhavebothtreatedand
non-treatedobservations.Regressionmodelsthatarenotsaturatedin
X
i
mayalsoviolatecommonsupport,
sincecovariatecellswithoutbothtreatedandcontrolobservationscanendupcontributingtotheestimates
comefromcellswheretheprobabilityoftreatmentequals
1
=
2
.
21
The
support
ofarandomvariableisthesetofrealizationsthatoccurwithpositiveprobability.SeeHeckman,Ichimura,
Smith,andTodd(1998)andSmithandTodd(2001)foradiscussionofcommonsupportinmatching.
22
Matchingproblemsinvolving…nelydistributed
X
-variablesareoftensolvedbyaggregatingvaluestomakecoarsergroupings
orbypairingobservationsthathavesimilar,thoughnotnecessarilyidenticalvalues.SeeCochran(1965),Rubin(1973),or
Rosenbaum(1995,Chapter3)fordiscussionsofthisapproach.Withcontinuously-distributedcovariates,matchingestimators
arebiasedbecausematchesareimperfect.AbadieandImbens(2008)haverecentlyshownthataregression-basedbiascorrection
caneliminatethe(asymptotic)biasfromimperfectmatches.
58
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Forthepurposesofthisdiscussion,thetreatmentintensity,
s
i
,isassumedtobeacontinuouslydistributed
randomvariable,notnecessarilynon-negative.SupposethattheCEFofinterestcanbewritten
h
(
t
)

E
[
y
i
j
s
i
=
t
]
withderivative
h
0
(
t
)
.Then
E
[
y
i
(
s
i

E
[
s
i
])]
E
[
s
i
(
s
i

E
[
s
i
])]
=
R
h
0
(
t
)

t
dt
R

t
dt
(3.3.8)
where

t
f
E
[
s
i
j
s
i

t
]

E
[
s
i
j
s
i
t
]
gf
P
(
s
i

t
)[1

P
(
s
i

t
)
g
;
(3.3.9)
andtheintegralsin(3.3.8)runoverthepossiblevaluesof
s
i
.Thisformulaweightseachpossiblevalueof
s
i
inproportiontothedi¤erenceintheconditionalmeanof
s
i
aboveandbelowthatvalue.Moreweightisalso
giventopointsclosetothemedianof
s
i
since
P
(
s
i

t
)

[1

P
(
s
i

t
)]
ismaximizedat
P
(
s
i

t
)=1
=
2
.
Withcovariates,
X
i
,theweightsin(3.3.8)become
X
-speci…c.Acovariate-averagedversionofthesame
formulaappliestothemultivariateregressioncoe¢cientof
y
i
on
s
i
,afterpartiallingout
X
i
.Inparticular,
E
[
y
i
(
s
i

E
[
s
i
j
X
i
])]
E
[
s
i
(
s
i

E
[
s
i
j
X
])]
=
E

R
h
0
X
(
t
)

tX
dt

E

R

tX
dt

;
(3.3.10)
where
h
0
X
(
t
)

@E
[
y
i
j
X
i
;
s
i
=
t
]
@t
and

tX
f
E
[
s
i
j
X
i
;
s
i

t
]

E
[
s
i
j
X
i
;
s
i
t
]
gf
P
(
s
i

t
j
X
i
)[1

P
(
s
i

t
j
X
i
)
g
.
Itbearsemphasizingthatequation(3.3.10)re‡ectstwotypesofaveraging:anintegralthataverages
along
thelengthofanonlinearCEFat…xedcovariatevalues,andanexpectationthataverages
across
covariate
cells.Animportantpointinthiscontextisthatpopulationregressioncoe¢cientscontainnoinformation
aboutthee¤ectof
s
i
ontheCEFforvaluesof
X
i
where
P
(
s
i

t
j
X
i
)
equals0or1.Thisincludesvalues
of
X
i
where
s
i
is…xed.Inthesamespirit,it’sworthnotingthatif
s
i
isadummyvariable,wecanextract
equation(3.3.7)fromthemoregeneralformula,(3.3.10).
AngristandKrueger(1999)constructtheaverageweightingfunctionforaschoolingregressionwithstate
ofbirthandyearofbirthcovariates.Althoughequations(3.3.8)and(3.3.10)mayseemarcaneoratleast
non-obvious,inthisexampletheaverageweights,
E
[

tX
]
;

s
,where

s
isthe
standarddeviationof
s
i
,sothat
z
i
isstandardNormal.Then
E
[
s
i
j
s
i

t
]=
E
(
s
i
)+

s
E

z
i
j
z
i

t

E
(
s
i
)

s

=
E
(
s
i
)+

s
E
[
z
i
j
z
i

t

]
:
3.3.HETEROGENEITYANDNONLINEARITY
59
FromtruncatedNormalformulas(see,e.g.,JohnsonandKotz,1970),weknowthat
E
[
z
i
j
z
i
�t

]=

(
t

)
[1

(
t

)]
and
E
[
z
i
j
z
i
t

]=


(
t

)
(
t

)
:
where

(

)
and
(

)
arethestandardNormaldensityanddistributionfunction.Substitutingintheformula
for

t
,(3.3.9),wehave

t
=

s


(
t

)
[1

(
t

)]



(
t

)
(
t

)

[1

(
t

)](
t

)=

s

(
t

)
:
Wehavethereforeshownthat
Cov
(
y
i
;
s
i
)
V
(
s
i
)
=
E
[
h
0
(
s
i
)]
:
Inotherwords,theregressionof
y
i
on
s
i
isthe(unweighted!)populationaveragederivative,
E
[
h
0
(
s
i
)]
,
when
s
i
isNormallydistributed.Ofcourse,thisresultisaspecialcaseofaspecialcase.
23
Still,itseems
reasonabletoimaginethatNormalitymightnotmatterverymuch.Andinourempiricalexperience,the
averagederivatives(alsocalled“marginale¤ects”)constructedfromparametricnonlinearmodelsforlimited
dependentvariables(e.g.,ProbitorTobit)areusuallyindistinguishablefromthecorrespondingregression
coe¢cients,regardlessofthedistributionofregressors.WeexpandonthispointinSection3.4.2,below.
3.3.2ControlforCovariatesUsingthePropensityScore
Themostimportantresultinregressiontheoryistheomittedvariablesbiasformula:coe¢cientsonincluded
variablesareuna¤ectedbytheomissionofvariableswhenthevariablesomittedareuncorrelatedwiththe
variablesincluded.Thepropensityscoretheorem,duetoRosenbaumandRubin(1983),extendsthisidea
toestimationstrategiesthatrelyonmatchinginsteadofregression,wherethecausalvariableofinterestis
atreatmentdummy.
24
Thepropensityscoretheoremstatesthatifpotentialoutcomesareindependentoftreatmentstatus
conditionalonamultivariatecovariatevector,
X
i
,thenpotentialoutcomesareindependentoftreatment
statusconditionalonascalarfunctionofcovariates,thepropensityscore,de…nedas
p
(
X
i
)

E
[
d
i
j
X
i
]
.
Formally,wehave
Theorem3.3.1
ThePropensity-ScoreTheorem.
SupposetheCIAholdsfor
y
ji
;
j
=0
;
1
.Then
y
ji
q
d
i
j
p
(
X
i
)
.
23
MorespecializedresultsinthisspiritappearinRuud(1986),whoconsidersdistribution-freeestimationoflimited-dependent-
variablemodelswithNormallydistributedregressors.
24
Propensity-scoremethodscanbeadaptedtomulti-valuedtreatments,thoughthishasyettocatchon.SeeImbens(2000)
forane¤ortinthisdirection.
60
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Proof.
Theclaimistrueif
P
[
d
i
=1
j
y
ji
;p
(
X
i
)]
doesnotdependon
y
ji
.
P
[
d
i
=1
j
y
ji
;p
(
X
i
)]=
E
[
d
i
j
y
ji
;p
(
X
i
)]
=
E
f
E
[
d
i
j
y
ji
;p
(
X
i
)
;
X
i
]
j
y
ji
;p
(
X
i
)
g
=
E
f
E
[
d
i
j
y
ji
;
X
i
]
j
y
ji
;p
(
X
i
)
g
=
E
f
E
[
d
i
j
X
i
]
j
y
ji
;p
(
X
i
)
g
;
bytheCIA.
But
E
f
E
[
d
i
j
X
i
]
j
y
ji
;p
(
X
i
)
g
=
E
f
p
(
X
i
)
j
y
ji
;p
(
X
i
)
g
,whichisclearlyjust
p
(
X
i
)
.
LiketheOVBformulaforregression,thepropensityscoretheoremsaysyouneedonlycontrolforcovari-
atesthata¤ecttheprobabilityoftreatment.Butitalsosayssomethingmore:theonlycovariateyoureally
needtocontrolforistheprobabilityoftreatmentitself.Inpractice,thepropensityscoretheoremisusually
usedforestimationintwosteps:…rst,
p
(
X
i
)
p
(
X
i
)
andsubstitutingconditionalsampleaveragesforexpectationsorbymatchingeachtreatedobservation
tocontrolswiththesameorsimilarvaluesofthepropensityscore(bothoftheseapproacheswereusedby
DehejiaandWahba,1999).Alternately,amodel-basedornon-parametricestimateof
E
[
y
i
j
p
(
X
i
)
;
d
i
]
can
besubstitutedfortheseconditionalmeanfunctionsandtheouterexpectationreplacedwithasum(asin
Heckman,Ichimura,andTodd,1998).
Thesomewhatniftierweightingapproachtopropensity-scoreestimationskipsthecumbersomematching
stepbyexploitingthefactthattheCIAimplies
E
h
y
i
d
i
p
(
X
i
)
i
=
E
[
y
1
i
]
and
E
[
y
i
(1

d
i
)
(1

p
(
X
i
))
]=
E
[
y
0
i
]
.Therefore,
givenaschemeforestimating
p
(
X
i
)
;
wecanconstructestimatesoftheaveragetreatmente¤ectfromthe
sampleanalogof
E
[
y
1
i

y
0
i
]=
E

y
i
d
i
p
(
X
i
)

y
i
(1

d
i
)
1

p
(
X
i
)

=
E

(
d
i

p
(
X
i
))
y
i
p
(
X
i
)(1

p
(
X
i
))

:
(3.3.11)
ThislastexpressionisanestimandoftheformsuggestedbyNewey(1990)andRobins,Mark,andNewey
3.3.HETEROGENEITYANDNONLINEARITY
61
(1992).Wecansimilarlycalculatethee¤ectoftreatmentonthetreatedfromthesampleanalogof:
E
[
y
1
i

y
0
i
j
d
i
=1]=
E

(
d
i

p
(
X
i
))
y
i
(1

p
(
X
i
))
P
(
d
i
)

:
(3.3.12)
Theideathatyoucancorrectfornon-randomsamplingbyweightingbythereciprocaloftheprobabilityof
selectiondatesbacktoHorvitzandThompson(1952).Ofcourse,tomakethisapproachfeasible,andfor
theresultingestimatestobeconsistent,weneedaconsistentestimatorfor
p
(
X
i
)
TheHorvitz-Thompsonversionofthepropensity-scoreapproachisappealingsincetheestimatorisessen-
tiallyautomated,withnocumbersomematchingrequired.TheHorvitz-Thompsonapproachalsohighlights
E
[
p
(
X
i
)(1

p
(
X
i
))]
:
(3.3.13)
ThetwoHorvitz-Thompsonmatchingestimandsandtheregressionestimandareallmembersoftheclassof
weightedaverageestimandsconsideredbyHirano,Imbens,andRidder(2003):
E

g
(
X
i
)

y
i
d
i
p
(
X
i
)

y
i
(1

d
i
)
(1

p
(
X
i
))

;
(3.3.14)
where
g
(
X
i
)
isaknownweightingfunction(Togofromestimandtoestimator,replace
p
(
X
i
)
withaconsistent
P
(
d
i
)
E
[
p
(
X
i
)(1

p
(
X
i
))]
:
Thissimilarityhighlightsonceagainthefactthatregressionandmatching—includingpropensityscore
matching—arenotreallydi¤erentanimals,atleastnotuntilwespecifyamodelforthepropensityscore.
Abigquestionhereishowbesttomodelandestimate
p
(
X
i
)
,orhowmuchsmoothingorstrati…cationto
usewhenestimating
E
[
y
i
j
p
(
X
i
)
;
d
i
]
;
especiallyifthecovariatesarecontinuousTheregressionanalogofthis
62
CHAPTER3.MAKINGREGRESSIONMAKESENSE
propensityscore.First,fromthepointofviewofasymptotice¢ciency,thereisusuallyacosttomatching
onthepropensityscoreinsteadoffullcovariatematching.Wecangetlowerasymptoticstandarderrorsby
oftreatmente¤ectsundertheCIA,withandwithoutknowledgeofthepropensityscore.Forexample,in
Angrist(1998),thereisane¢ciencygainfrommatchingonyearofbirth,eveniftheprobabilityofservingin
themilitaryisunrelatedtobirthyear,becauseearningsarerelatedtobirthyear.Aregressionanalogforthis
pointistheresultthateveninascenariowithnoomittedvariablesbias,thelongregressiongeneratesmore
preciseestimatesofthecoe¢cientsonthevariablesincludedinashortregressionwheneverthesevariables
havesomepredictivepowerforoutcomesbecausethesecovariatesleadtoasmallerresidualvariance(see
Section3.1.3).
Hahn’s(1998)resultsraisethequestionofwhyweshouldeverbotherwithestimatorsthatusethe
propensityscore.Aphilosophicalargumentisthatthepropensityscorerightlyfocusesresearcherattentionon
modelsfortreatmentassignment,somethingaboutwhichwemayhavereasonablygoodinformation,instead
3.3.HETEROGENEITYANDNONLINEARITY
63
arebothkeyfortheHirano,Imbens,andRidderconclusions.
DotheHirano,Imbens,andRidder(2003)resultsresolvethepropensity-scoreparadox?Forthemoment,
wepreferthe…nite-sampleresolutiongivenbyAngristandHahn(2004).Theirresultshighlightthefactthat
itistheresearchers’willingnesstoimposesomerestrictionsonthescorewhichgivespropensity-score-based
inferenceitsconceptualandstatisticalpower.InAngrist(1998),forexample,anapplicationwithhigh-
wherepropensityscorematchingprovidesmorereliableestimatesofaveragecausale¤ects.The…rstreason
wedon’t…ndourselvesonthepropensity-scorebandwagonispractical:therearemanydetailstobe…lledin
whenimplementingpropensity-scorematching-suchashowtomodelthescoreandhowtodoinference-
64
CHAPTER3.MAKINGREGRESSIONMAKESENSE
ofpropensity-scoreweighting,sothedi¤erenceismostlyintheimplementation.Inpracticeyoumaybefar
fromsaturation,butwiththerightcovariatesthisshouldn’tmatter.
25
Ansimilarbutmoreextendedpropensity-scoreface-o¤appearsintheexchangebeweenSmithandTodd(2005)andDehejia
(2005).
3.3.HETEROGENEITYANDNONLINEARITY
65
66
CHAPTER3.MAKINGREGRESSIONMAKESENSE
estimationwithCPS-1,alargeandcoarsely-selectedinitialsample.TheestimatesinscreenedCPS-1are
asgoodasunscreenedCPS-3.Wenote,however,thatthestandarderrorsforestimatesusingpropensity-
score-screenedsampleshavenotbeenadjustedtore‡ectsamplingvarianceinourestimatesofthescore.
Anadvantageofpre-screeningusingpriorinformation,asinthestepfromCPS-1toCPS-3,isthatnosuch
adjustmentisnecessary.
3.4RegressionDetails
3.4.1WeightingRegression
Fewthingsareasconfusingtoappliedresearchersastheroleofsampleweights.Evennow,20yearspost-
Ph.D.,wereadthesectionoftheStatamanualonweightingwithsomedismay.Weightscanbeusedina
Table3.3.2:CovariatemeansintheNSWandobservationalcontrolsamples
NSWFullSamplesP-scoreScreenedSamples
VariableTreatedControlCPS-1CPS-3CPS-1CPS-3
(1)(2)(3)(4)(5)(6)
Age25.8225.0533.2328.0325.6325.97
Yearsofschooling10.3510.0912.0310.2410.4910.42
Black0.840.830.070.200.960.52
Hispanic0.060.110.070.140.030.20
Dropout0.710.830.300.600.600.63
Married0.190.150.710.510.260.29
1974earninigs2,0962,10714,0175,6192,8212,969
1975earnings1,5321,26713,6512,4661,9501,859
NumberofObs.18526015,992429352157
Notes:AdaptedfromDehejiaandWahba(1999),Table1.Thesamplesinthe…rstfourcolumns
areasdescribedinDehejiaandWahba(1999).Thesamplesinthelasttwocolumnsarelimited
68
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Table3.3.3:RegressionestimatesofNSWtraininge¤ectsusingalternatecontrols
FullSamplesP-ScoreScreenedSamples
Speci…cationNSWCPS-1CPS-3CPS-1CPS-3
(1)(2)(3)(4)(5)
1,794-8,498-635
RawDi¤erence(633)(712)(657)
1,670-3,437771-3,361890
Demographiccontrols(639)(710)(837)(811)(884)
[139/497][154/154]
1,750-78-91no166
1975Earnings(632)(537)(641)obs.(644)
[0/0][183/427]
1,6366231,0101,2011,050
Demographics,1975Earnings(638)(558)(822)(722)(861)
[149/357][157/162]
1,6767941,3691,362649
Demographics,1974and1975Earnings(639)(548)(809)(708)(853)
[151/352][147/157]
Notes:Thetablereportsregressionestimatesoftraininge¤ectsusingtheDehejia-Wahba(1999)
datawithalternativesetsofcontrols.Thedemographiccontrolsareage,yearsofschooling,and
dummiesforBlack,Hispanic,highschooldropout,andmarried.
26
AltonjiandSegal(1996)discussthispointinageneralizedmethod-of-momentscontext.
70
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Section3.4.2inthischapterandinthechapteroninstrumentalvariables,below.Thisstudyisconcerned
withthecausale¤ectsofchildbearingonparents’workandearnings.Becausechildbearingislikelyto
becorrelatedwithpotentialearnings,thestudyreportsinstrumentalvariablesestimatesbasedonsibling-
sexcompositionandmultiplebirths,aswellasOLSestimates.Almosteveryoutcomeinthisstudyis
eitherbinary(likeemploymentstatus)ornon-negative(likehoursworked,weeksworked,andearnings).
OutpatientProb.AnyProb.AnyTotal
Face-to-ExpensesAdmis-MedicalInpatientExpenses
Planfacevisits(1984$)sions(%)(%)(1984$)
Free4.55340.12886.810.3749
(.168)(10.9)(.0070)(.817)(.45)(39)
Individual
Deductible
3.02235.11572.39.6608
(.171)(11.9)(.0076)(1.54)(.55)(46)
Deductible
minusfree
-1.53-105-0.013-14.5-0.7-141
(.240)(16.1)(.0103)(1.74)(.71)(60)
Notes:AdaptedfromManning(1987),Table2.Allstandarderrors(shown
inparentheses)arecorrectedforintertemporalandintrafamilycorrelations.
AmountsareinJune1984dollars.Visitsareface-to-facecontactswithMD,
DO,orotherhealthproviders;excludesvisitsonlyforradiology,anesthesiology
orpathologyservices.Visitsandexpensesexcludedentalcareandoutpatient
psychotherapy.
27
TheHIEwasconsiderablymorecomplicatedthandescribedhere.Therewere14di¤erenttreatments,includingassignment
toaprepaidHMO-likeservice.Theexperimentaldesigndidnotusesimplerandomassignment,butratheramorecomplicated
assignmentschememeanttoensurecovariatebalanceacrosssgroups.
72
CHAPTER3.MAKINGREGRESSIONMAKESENSE
SincetheoutcomehereisaBernoullitrial,wehave
E
[
y
1
i

y
0
i
]=
E
[
y
1
i
]

E
[
y
0
i
]=
P
[
y
1
i
=1]

P
[
y
0
i
=1]
:
(3.4.2)
Thisrelationmighta¤ectthelanguageweusetodescribetheresultsbutnottheunderlyingcalculation.In
theHIE,forexample,comparisonsacrossexperimentalgroups,asonthelefthandsideof(3.4.1),showthat
87percentofthoseassignedtothefree-caregroupusedatleastsomecareinagivenyear,whileonly72
percentofthoseassignedtothedeductibleplanusedcare.Therelativelymodest$150deductibletherefore

]
;
where
[

]
istheNormalCDF.Therefore
E
[
y
i
j
d
i
]=[


0

]+
f
[


0
+


1

]

[


0

]
g
d
i
:
Thisisalinearfunctionoftheregressor,
d
i
,sotheslopecoe¢cientintheregressionof
y
i
on
d
i
isexactly
thedi¤erenceinProbit…ttedvalues,
[


0
+


1

]

[


0

]
:
Note,however,thatthe
ProbitCoe¢cients,


0

and


1

donotgiveusthesizeofe¤ectof
d
i
onparticipationuntilwefeedthembackintotheNormalCDF
(thoughtheydohavetherightsign).
OneofthemostimportantoutcomesintheHIEisgrossmedicalexpenditure,inotherwords,healthcare
costs.Didsubjectswhofacedadeductibleuselesscare,asmeasuredbythecost?IntheHIE,theaverage
di¤erenceinexpendituresbetweenthedeductibleandfree-caregroupswas

141
dollars(
s:e:
=60
),about
{z
}
participatione¤ect
E
[
y
i
j
y
i

0
;
d
i
=1]
+
f
E
[
y
i
j
y
i

0
;
d
i
=1]

E
[
y
i
j
y
i

0
;
d
i
=0]
g
|
{z
}
COPe¤ect
P
[
y
i

0
j
d
i
=0]
:
Sotheoveralldi¤erenceinaverageexpenditurecanbebrokenupintotwoparts:thedi¤erenceinthe
probabilitythatexpendituresarepositive(oftencalledaparticipatione¤ect),andthedi¤erenceinmeans
conditionalonparticipation,aconditional-on-positive(COP)e¤ect.Again,however,thishasnospecial
implicationsfortheestimationofcausale¤ects;equation(3.4.1)remainstrue:theregressionof
y
i
on
d
i
givesthepopulationaveragetreatmente¤ectforexpenditures.
GoodCOP,BadCOP:Conditional-on-positivee¤ects
Becausethee¤ectonanon-negativerandomvariablelikeexpenditurehastwoparts,someappliedresearchers
feeltheyshouldlookatthesepartsseparately.Infact,manyusea"two-partmodel,"wherethe…rstpart
isanevaluationofe¤ectonparticipationandthesecondpartlooksattheCOPe¤ects(see,e.g.,Duan,
{z
}
causale¤ect
+
f
E
[
y
0
i
j
y
1
i

0]

E
[
y
0
i
j
y
0
i

0]
g
:
|
{z
}
selectionbias
ThisdecompositionshowsthattheCOPe¤ectiscomposedoftwoterms:acausale¤ectforthesubpopulation
74
CHAPTER3.MAKINGREGRESSIONMAKESENSE
thatusesmedicalcarewhenitisfreeandthedi¤erencein
y
0
i
Hereselectionbiasarisesbecausetheexperimentchangesthe
composition
ofthegroupwithpositive
expenditures.The
y
0
i

0
populationprobablyincludessomelow-costuserswhowouldoptoutofcare
iftheyhadtopayadeductible.Inotherwords,itislargerandprobablyhaslowercostsonaveragethan
the
y
1
i

0
group.Theselectionbiastermisthereforepositive,withtheresultthatCOPe¤ectsarecloser
tozerothanthenegativecausale¤ect,
E
[
y
1
i

y
0
i
j
y
1
i

0]
.Thisisaversionofthebadcontrolproblem
fromSection3.2.3:inacausal-e¤ectssetting,
y
i

0
isanoutcomevariableandthereforeunkosherfor
conditioningunlessthetreatmenthasnoe¤ectonthelikelihoodthat
y
i
ispositive.
Oneresolutionofthenon-causalityofCOPe¤ectsreliesoncensoredregressionmodelslikeTobit.These
modelspostulatealatentexpenditureoutcomefornonparticipants(e.g.,HayandOlsen,1984).Atraditional
Tobitformulationfortheexpenditureproblemstipulatesthattheobserved
y
i
isgeneratedby
y
i
=1[
y

i

0]
y

i
where
y

i
isaNormallydistributedlatentexpenditurevariablethatcantakeonnegativevalues.Because
y

i
isnotanLDV,Tobitproponentsfeelcomfortablelinkingthisto
d
i
withatraditionallinearmodel,say,
equation(3.4.3).Inthiscase,


1
isthecausale¤ectof
d
i
onlatentexpenditure,
y

i
.Thisequationisde…ned


[


0
+


1
d
i
]+




0
+


1
d
i


(3.4.6)
where

isthestandarddeviationof

i
(see,e.g.McDonaldandMo¢tt,1980).Thisexpressioninvolvesthe
assumedNormalityandhomoskedasticityof

i
andtheassumptionthat
y
i
canberepresentedas
1[
y

i

0]
y

i
,
aswellasthelatentcoe¢cients.
28
AgeneralizationofTobitisthesampleselectionmodel,wherethelatentvariabledeterminingparticipationisnotthesame
asthelatentexpenditurevariable.See,e.g.,Maddala(1983).Thesameconceptualproblemsrelatedtotheinterpretationof
e¤ectsonlatentvariablesariseinthesampleselectionmodelaswithTobit.


[


0
+


1
]+




0
+


1







0


[


0
]+




0


aratherdauntingexpression.Butsincetheonlyconditioningvariableisadummyvariable,
d
i
,noneofthis
isnecessaryfortheestimationof
E
[
y
i
j
d
i
=1]

E
[
y
i
j
d
i
=0]
.Theslopecoe¢cientfromanOLSregression
of
y
i
on
d
i
29
Weshouldnotethatourfavoriteregressionexample-aregressionoflogwagesonschooling-mayhaveaCOPproblem
sincethesampleoflogwagesnaturallyomitsthosewithzeroearnings.ThisleadstoCOP-styleselectionbiasifeducation
a¤ectstheprobabilityofworking.Inpractice,therefore,wefocusonsamplesofprime-agemaleswhereparticipationratesare
highandreasonablystableacrossschoolinggroups(e.g.,whitemenaged40-49inFigure3.1.1).
76
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Covariatesleadtononlinearity
TruecensoringaswiththeCPStopcodeisrare,afactthatleaveslimitedscopeforconstructiveapplications
ofTobit-typemodelsinappliedwork.Atthispoint,however,wehavetohedgeabit.Partoftheneatness
inthediscussionofexperimentscomesfromthefactthat
E
[
y
i
j
d
i
]
isnecessarilyalinearfunctionof
d
i
so
thatregressionandtheCEFareoneandthesame.Infact,thisCEFislinearforanyfunctionof
y
i
,
includingthedistributionindicators,
1[
y
i
�c
]
.Inpractice,ofcourse,theexplanatoryvariableofinterest
isn’talwaysadummy,andthereareusuallyadditionalcovariatesintheCEF,inwhichcase,
E
[
y
i
j
X
i
;
d
i
]
@
d
i
o
:
Mostpeopleusederivativeswhendealingwithcontinuousormulti-
valuedregressorsaswell.
HowclosedoOLSregressionestimatescometothemarginale¤ectsinducedbyanonlinearmodellike


:
Theaverage…nitedi¤erenceistherefore
E



X
0
i


0
+


1





X
0
i


0


:
(3.4.8)
Inpractice,thiscanalsobeapproximatedbytheaveragederivative,
E



X
0
i


0
+


1
d
i



(


1
=
)
(Statacomputesmarginale¤ectsbothwaysbutdefaultsto(3.4.8)fordummyregressors).
Similarly,generalizingequation(3.4.6)toamodelwithcovariates,wehave
E
[
y
i
j
X
i
;
d
i
]=

X
0
i


0
+


1
d
i



X
0
i


0
+


1
d
i

+


X
0
i


0
+


1
d
i


foranon-negativeLDV.Tobitmarginale¤ectsarealmostalwayscastintermsoftheaveragederivative,
whichcanbeshowntobethesurprisinglysimpleexpression
E



X
0
i


0
+


1
d
i





1
:
(3.4.9)
See,e.g.,Wooldridge(2006).Oneimmediateimplicationof(3.4.9)isthattheTobitcoe¢cient,


1
isalways
toobigrelativetothee¤ectof
d
i
on
y
i
.Intuitively,thisisbecause-giventhelinearmodelforlatent
y

i
-thelatentoutcomealwayschangeswhen
d
i
switchesonoro¤.Butreal
y
i
neednotchange:formany
people,it’szeroeitherway.
Table3.4.2comparesregressionandnonlinearmarginale¤ectsforaregressionoffemaleemploymentand
hoursofwork,bothLDVs,onmeasuresoffertility.Theestimateswereconstructedusingoneofthe1980
CensussamplesusedbyAngristandEvans(1998)Thissampleincludesmarriedwomenaged21-35withat
leasttwochildren.Thechildbearingvariablesconsistofeitheradummyindicatingadditionalchildbearing
beyondtwo,orthetotalnumberofbirths.Thecovariatesincludelineartermsinmothers’age,ageat…rst
birth,racedummies(blackandHispanic),andmother’seducation(dummiesforhighschoolgraduates,some
college,andcollegegraduates).Thecovariatemodelisnotsaturated,rathertherearelineare¤ectsandno
interactions,sotheunderlyingCEFinthisexampleissurelynonlinear.
Probitmarginale¤ectsforthee¤ectofadummyvariableindicatingmorethantwochildrenareindistin-
guishablefromOLSestimatesofthesamerelation.Thiscanbeseenincolumns2,3,and4ofTable3.4.2,
78
CHAPTER3.MAKINGREGRESSIONMAKESENSE





X
0
i


0


j
d
i
=1

inthesecond(hence,amarginale¤ectonthetreated).
Table3.4.2:Comparisonofalternativeestimatesofthee¤ectofchildbearingonLDVs
Right-handsidevariable
MorethantwochildrenNumberofchildren
MeanOLSProbitTobitOLSProbitMFXTobitMFX
AvgAvgef-AvgAvgef-Avge¤ect,Avgef-Avg
e¤ect,fectone¤ect,fectonfullsamplefect,fulle¤ecton
fulltreatedfulltreatedsampletreated
samplesample
Dependentvariable
(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)
PanelA:FullSample
Employment.528-.162-.163-.162---.113-.114--
(.499)(.002)(.002)(.002)(.001)(.001)
Hoursworked16.7-5.92---6.56-5.87-4.07--4.66-4.23
(18.3)(.074)(.081)(.073)(.047)(.054)(.049)
PanelB:Non-whiteCollegeAttendeesover30,…rstbirthbeforeage20
Employment.832-.061-.064-.070---.054-.048--
(.374)(.028)(.028)(.031)(.016)(.013)
Hoursworked30.8-4.69---4.97-4.90-2.83--3.20-3.15
(16.0)(1.18)(1.33)(1.31)(.645)(.670)(.659)
Notes:ThetablereportsOLSestimates,averagetreatmente¤ects,andmarginale¤ects(MFX)forthee¤ectofchildbearing
onmothers’laborsupply.ThesampleinPanelAincludes254,654observationsandisthesameasthemarried-women-1980-
CensussampleusedbyAngristandEvans(1998).Covariatesincludeage,ageat…rstbirth,anddummiesforboysat…rst
andsecondbirth.ThesampleinPanelBincludes746nonwhiteswithatleastsomecollegeagedover30whose…rstbirth
wasbeforeage20.Standarddeviationsarereportedinparenthesesincolumn1.Standarderrorsareshowninparentheses
inothercolumns.Thesampleusedtoestimateaveragee¤ectsonthetreatedincludeswomenwithmorethantwochildren.
80
CHAPTER3.MAKINGREGRESSIONMAKESENSE
Theupshotofthisdiscussionisthatwhileanonlinearmodelmay…ttheCEFforLDVsmorecloselythan
alinearmodel,whenitcomestomarginale¤ectsthisprobablymatterslittle.Thisoptimisticconclusionis
notatheorem,butasintheempiricalexamplehere,itseemstobefairlyrobustlytrue.
Whythen,shouldwebotherwithnonlinearmodelsandmarginale¤ects?Oneansweristhatthe
marginale¤ectsareeasyenoughtocomputenowthattheyareautomatedinpackageslikeStata.But
thereareanumberofdecisionstomakealongtheway(e.g.,theweightingscheme,derivativesversus…nite
di¤erences)whileOLSisstandardized.Nonlinearlifealsopromisestogetconsiderablymorecomplicated
whenwestarttothinkaboutIVandpaneldata.Finally,extracomplexitycomesintotheinferencestep
aswell,sinceweneedstandarderrorsformarginale¤ects.TheprincipleofOccam’srazoradvises,"Entities
shouldnotbemultipliedunnecessarily."Inthisspirit,wequoteourformerteacher,AngusDeaton(1997),
ponderingthenonlinearregressionfunctiongeneratedbyTobit-typemodels:
AbsentknowledgeofF[thedistributionoftheerrors],thisregressionfunctiondoesnoteven
identifythe

’s[Tobitcoe¢cients]-seePowell(1989)-butmorefundamentally,weshouldask
howithascomeaboutthatwehavetodealwithsuchanawkward,di¢cult,andnon-robust
object.
3.4.3WhyisRegressionCalledRegressionandWhatDoesRegression-to-the-
meanMean?
ThetermregressionoriginateswithFrancisGalton’s(1886)studyofheight.Galton,whoworkedwith
samplesofroughly-normally-distributeddataonparentsandchildren,notedthattheCEFofachild’sheight
V
(
x
i
)
,andtheintercept
is

=
E
[
y
i
]


1
E
[
X
i
]
.Butbecauseheightisnotchangingacrossgenerations,themeanandvarianceof
y
i
and
x
i
arethesame.Therefore,

1
=
Cov
(
y
i
;x
i
)
V
(
x
i
)
=
Cov
(
y
i
;x
i
)
p
V
(
x
i
)
p
V
(
y
i
)
=

xy

=
E
[
y
i
]


1
E
[
X
i
]=

(1


1
)=

(1


xy
)
where

xy
istheintergenerationalcorrelationcoe¢cientinheightand

=
E
[
y
i
]=
E
[
X
i
]
ispopulation
averageheight.FromthiswegetthelinearCEF
E
[
y
i
j
x
i
]=

(1


xy
)+

xy
x
i
;
3.5.APPENDIX:DERIVATIONOFTHEAVERAGEDERIVATIVEFORMULA
81
sotheheightofachildgivenhisparents’heightisthereforeaweightedaverageofhisparents’heightand
thepopulationaverageheight.Thechildoftallparentswillthereforenotbeastallastheyare,onaverage.
Likewise,fortheshort.Tobespeci…c,Pischke,whois6’3",canexpecthischildrentobetall,thoughnotas
tallasheis.Thankfully,however,Angrist,whois5’6",canexpecthischildrentobetallerthanheis.Galton
calledthisproperty,"regressiontowardmediocrityinhereditarystature."Today,wecallthis"regressionto
themean."
V
(
s
i
)
=
E
[
h
(
s
i
)(
s
i

E
[
s
i
])]
E
[
s
i
(
s
i

E
[
s
i
])]
:
30
Yule’s…rstappliedpaperonthepoorlawswaspublishedin1895inthe
EconomicJournal
,wherePischkeisproudtoserve
asco-editor.ThetheoryofmultipleregressionthatgoesalongwiththisappearsinYule(1897).
82
CHAPTER3.MAKINGREGRESSIONMAKESENSE
V
(
s
i
)
;
equation(3.3.8)inthetext.Asimilarformulaforaregressionwithcovariates,
X
i
,is
derivedintheappendixtoAngristandKrueger(1999).
Chapter4
InstrumentalVariablesinAction:
SometimesYouGetWhatYouNeed
Anythingthathappens,happens.
1
Recentyearshaveseenanincreasedwillingnessbystatisticianstodiscussstatisticalmodelsforobservationaldatainan
explicitlycausalframework;see,forexample,Freedman’s(2005)review.
83
84
CHAPTER4.INSTRUMENTALVARIABLESINACTION
todothataswell.
Studyingagriculturalmarketsinthe1920s,thefatherandsonresearchteamofPhillipandSewall
Wrightwereinterestedinachallengingproblemofcausalinference:howtoestimatetheslopeofsupply
anddemandcurveswhenobserveddataonpricesandquantitiesaredeterminedbytheintersectionofthese
twocurves.Inotherwords,equilibriumpricesandquantities—theonlyoneswegettoobserve—solvethese
twostochasticequationsatthesametime.Uponwhichcurve,therefore,doestheobservedscatterplotof
pricesandquantitieslie?Thefactthatpopulationregressioncoe¢cientsdonotcapturetheslopeofany
2
KeyhistoricalreferenceshereareWald(1940)andDurbin(1954),bothdiscussedbelow.
3
SeeAngristandKrueger(2001)forabriefexpositionofthehistoryandusesofIV;StockandTrebbi(2003)foradetailed
accountofthebirthofIV;andMorgan(1990)foranextendedhistoryofeconometricideas,includingthesimultaneousequations
model.
4.1.IVANDCAUSALITY
85
minimumoffuss.
86
CHAPTER4.INSTRUMENTALVARIABLESINACTION
thediscussionwhenweconsiderIVmodelswithcovariates).
Giventheexclusionrestriction,itfollowsfromequation(4.1.2)that

=
Cov
(
y
i
;
z
i
)
Cov
(
s
i
;
z
i
)
=
Cov
(
y
i
;
z
i
)
=V
(
z
i
)
Cov
(
s
i
;
z
i
)
=V
(
z
i
)
:
(4.1.3)
Thesecondequalityin(4.1.3)isusefulbecauseit’susuallyeasiertothinkintermsofregressioncoe¢cients
thanintermsofcovariances.Thecoe¢cientofinterest,

,istheratioofthepopulationregressionof
y
i
on
z
i
(thereducedform)tothepopulationregressionof
s
i
on
z
i
(the…rststage).TheIV
estimator
isthesample
analogofexpression(4.1.3).NotethattheIV
estimand
ispredicatedonthenotionthatthe…rststageis
notzero,butthisissomethingyoucancheckinthedata.Asarule,ifthe…rststageisonlymarginally
signi…cantlydi¤erentfromzero,theresultingIVestimatesareunlikelytobeinformative,apointwereturn
tolater.
It’sworthrecappingtheassumptionsneededfortheratioofcovariancesin(4.1.3)toequalthecasual
e¤ect,
:
First,theinstrumentmusthaveacleare¤ecton
s
i
.Thisisthe…rststage.Second,theonly
2
.Furthermore,because
compulsoryschoolinglawstypicallyrequirestudentstoremaininschoolonlyuntiltheir16thbirthday,these
groupsofstudentswillbeindi¤erentgradesorthroughagivengradetodi¤erentdegree,whentheyreach
thelegaldropoutage.Inessence,thecombinationofschoolstartagepoliciesandcompulsoryschoolinglaws
4.1.IVANDCAUSALITY
87
createsanaturalexperimentinwhichchildrenarecompelledtoattendschoolfordi¤erentlengthsoftime
dependingontheirbirthdays.
AngristandKruegerlookedattherelationshipbetweeneducationalattainmentandquarterofbirth
usingUScensusdata.PanelAofFigure4.1.1(adaptedfromAngristandKrueger,2001)displaysthe
education-quarter-of-birthpatternformeninthe1980Censuswhowereborninthe1930s.The…gure
clearlyshowsthatmenbornearlierinthecalendaryeartendtohaveloweraverageschoolinglevels.Panel
AofFigure4.1.1isagraphicalrepresentationofthe…rst-stage.The…rst-stageinageneralIVframework
istheregressionofthecausalvariableofinterestoncovariatesandtheinstrument(s).Theplotsummarizes
4
Otherexplanationsarepossible,themostlikelybeingsomesortoffamilybackgrounde¤ectassociatedwithseasonofbirth
(see,e.g.,Bound,Jaeger,andBaker,1995).Weighingagainstthepossibilityofomittedfamilybackgrounde¤ectsisthe
factthatthequarterofbirthpatterninaverageschoolingismuchmorepronouncedattheschoolinglevelsmosta¤ectedby
compulsoryattendancelaws.Anotherpossibleconcernisapureage-at-entrye¤ectwhichoperatesthroughchannelsother
thanhighestgradecompleted(e.g.,achievement).Thecausale¤ectofage-at-entryonlearningisdi¢cult,ifnotimpossible,to
separatefrompureagee¤ects,asnotedinChapter1).ArecentstudybyElderandLubotsky(2008)arguesthattheevolution
ofputativeage-at-entrye¤ectsovertimeismoreconsistentwithe¤ectsduetoagedi¤erences
perse
thantoawithin-school
learningadvantageforolderstudents.
88
CHAPTER4.INSTRUMENTALVARIABLESINACTION
4.1.IVANDCAUSALITY
89
X
i

11
.Toseethis,notethatthe
denominatorsofthereduced-formand…rst-stagee¤ectsarethesame.Hence,theirratiois

=

21

11
=
Cov
(
y
i
;
~
z
i
)
Cov
(
s
i
;
~
z
i
)
;
(4.1.5)
where
~
z
i
istheresidualfromaregressionof
z
i
ontheexogenouscovariates,
X
i
.Theright-handsideof
(4.1.5)thereforeswaps
~
z
i
for
z
i
5
Foradirectproofthat(4.1.5)equals

in(4.1.6),use(4.1.6)tosubstitutefor
y
i
in
Cov
(
y
i
;
~
z
i
)
Cov
(
s
i
;
~
z
i
)
.
90
CHAPTER4.INSTRUMENTALVARIABLESINACTION
where

20


+

10
,

21


11
,and

2
i


1
i
+

i
inequation(4.1.4b).Equation(4.1.7)againshows
why

=

21

11
.Notealsothataslightre-arrangementof(4.1.7)gives
y
i
=

0
X
i
+

[
X
0
i

10
+

11
z
i
]+

2
i
;
(4.1.8)
where
[
X
0
i

10
+

11
z
i
]
isthepopulation…ttedvaluefromthe…rst-stageregressionof
s
i
on
X
i
and
z
i
.
Because
z
i
and
X
i
areuncorrelatedwiththereduced-formerror,

2
i
,thecoe¢cienton
[
X
0
i

10
+

11
z
i
]
in
thepopulationregressionof
y
i
on
X
i
and
[
X
0
i

10
+

11
z
i
]
equals

.
Inpractice,ofcourse,wealmostalwaysworkwithdatafromsamples.Givenarandomsample,the
…rst-stage…ttedvaluesinthepopulationareconsistentlyestimatedby
^
s
i
=
X
0
i
^

10
+^

11
z
i
;
where
^

10
and
^

11
areOLSestimatesfromequation(4.1.4a).Thecoe¢cienton
^
s
i
intheregressionof
y
i
on
X
i
and
^
s
i
iscalledtheTwo-StageLeastSquares(2SLS)estimatorof

.Inotherwords,2SLSestimates
canbeconstructedbyOLSestimationofthe“second-stageequation,”
y
i
=

0
X
i
+

^
s
i
+[

i
+

(
s
i

^
s
i
)]
;
(4.1.9)
Thisiscalled2SLSbecauseitcanbedoneintwosteps,the…rstestimating
^
s
i
usingequation(4.1.4a),and
thesecondestimatingequation(4.1.9).Theresultingestimatorisconsistentfor

because(a)…rst-stage
estimatesareconsistent;and,(b)thecovariates,
X
i
,andinstruments,
z
i
,areuncorrelatedwithboth

i
and
(
s
i

^
s
i
)
.
The2SLSnamenotwithstanding,wedon’tusuallyconstruct2SLSestimatesintwo-steps.Foronething,
Cov
(
s
i
;
^
s

i
)
,where
^
s

i
istheresidualfromaregressionof
^
s
i
on
X
i
.This
followsfromthemultivariateregressionanatomyformulaandthefactthat
Cov
(
s
i
;
^
s

i
)=
V
(^
s

i
)
.Itisalso
easytoshowthat,inamodelwithasingleendogenousvariableandasingleinstrument,the2SLSestimator
isthesameasthecorrespondingILSestimator.
6
6
Notethat
^
s

i
=~
z
i
^

11
,where
~
z
i
istheresidualfromaregressionof
z
i
on
X
i
,sothatthe2SLSestimatoristhereforethe
4.1.IVANDCAUSALITY
91
sampleanalogof
h
Cov
(
y
i
;
~
z
i
)
V
(~
z
i
)
i
^

11
.Butthesampleanalogofthenumerator,
Cov
(
y
i
;
~
z
i
)
V
(~
z
i
)
,istheOLSestimateof

21
inthereduced
form,(4.1.4b),while
^

11
istheOLSestimateofthe…rst-stagee¤ect,

11
,in(4.1.4a).Hence,2SLSwithasingleinstrumentis
ILS,i.e.,theratioofthereducedform-e¤ectoftheinstrumenttothecorresponding…rst-stagee¤ectwhereboththe…rst-stage
andreduced-formincludecovariates.
92
CHAPTER4.INSTRUMENTALVARIABLESINACTION
OLS2SLS
(1)(2)(3)(4)(5)(6)(7)(8)
Yearsofeducation0.0750.0720.1030.1120.1060.1080.0890.061
(0.0004)(0.0004)(0.024)(0.021)(0.026)(0.019)(0.016)(0.031)
Covariates:
Age(inquarters)
X
Age(inquarters)squared
X
9yearofbirthdummies
XXXXX
50stateofbirthdummies
XXXXX
Instruments:
dummy
for
QOB=1
dummy
for
QOB=1
or
QOB=2
dummy
for
QOB=1
fullset
ofQOB
dummies
fullset
ofQOB
dummies
int.with
yearof
birth
dummies
fullset
ofQOB
dummies
int.with
yearof
birth
dummies
4.1.IVANDCAUSALITY
93
The…rstpairofIVestimates,reportedincolumns3and4,arefrommodelswithoutcontrols.The
instrumentusedtoconstructtheestimatesincolumn1isasingledummyfor…rstquarterbirths,whilethe
instrumentsusedtoconstructtheestimatesincolumn2areapairofdummiesindicating…rstandsecond
quarterbirths.Thestandarderrorestimatesrangefrom.10–.11.Theresultsfrommodelsincludingyear
ofbirthandstateofbirthdummiesascontrolvariablesaresimilar,notsurprisingly,sincequarterofbirth
isnotcloselyrelatedtoeitherofthesecontrols.Overall,the2SLSestimatesaremostlyabitlargerthanthe
correspondingOLSestimates.Thissuggeststhattheobservedassociatedbetweenschoolingandearningsis
notdrivenbyomittedvariableslikeabilityandfamilybackground.
Column7inTable4.1.1showstheresultsofaddinginteractiontermstotheinstrumentlist.Inparticular,
eachspeci…cationaddsinteractionwith9dummiesforyearofbirth(thesampleincludescohortsborn1930-
39),foratotalof30excludedinstruments.The…rststageequationbecomes
s
i
=
X
0
i

10
+

11
z
1
i
+

12
z
2
i
+

13
z
3
i
(4.1.10b)
+
X
j
(
b
ij
z
1
i
)

1
j
+
X
j
(
b
ij
z
2
i
)

2
j
+
X
j
(
b
ij
z
3
i
)

3
j
+

1
i
where
b
ij
isadummyequaltooneifindividual
i
wasborninyear
j
for
j
equalto1931–39.Thecoe¢cients

1
j
;
2
j
;
3
j
arethecorrespondingyear-of-birthinteractions.Theseinteractiontermscapturedi¤erencesin
7
Thisgainmaynotbewithoutcost,astheuseofmanyadditionalinstrumentsopensupthepossibilityofincreasedbias,
anissuediscussedinChapter8,below.
94
CHAPTER4.INSTRUMENTALVARIABLESINACTION
ismanifestbyperfectmulticollinearity.2SLSestimateswithquadraticageexist.Butthevariability“left
over”inthe…rst-stage…ttedvaluesisreducedwhenthecovariatesincludevariableslikeageinquarters,that
arecloselyrelatedtotheinstruments(quarterofbirthdummies).Becausethisvariabilityistheprimary
determinantof2SLSstandarderrors,theestimateincolumn8ismarkedlylessprecisethanthatincolumn
7,thoughitisstillclosetothecorrespondingOLSestimate.
RecapofIVand2SLSLingo
Aswe’veseen,the
endogenousvariables
arethedependentvariableandtheindependentvariable(s)tobe
E
[
s
i
j
z
i
=1]

E
[
s
i
j
z
i
=0]
:
(4.1.12)
4.1.IVANDCAUSALITY
95
Adirectroutetothisresultuses(4.1.11)andthefactthat
E
[

i
j
z
i
]=0
,sowehave
E
[
y
i
j
z
i
]=

+
E
[
s
i
j
z
i
]
:
(4.1.13)
Solvingthisequationfor

produces(4.1.12).
Equation(4.1.12)isthepopulationanalogofthelandmarkWald(1940)estimatorforabivariateregres-
sionwithmismeasuredregressors.
8
The
Waldestimator
isthesampleanalogofthisexpression.Inour
context,theWaldformulaprovidesanappealinglytransparentimplementationoftheIVstrategyforthe
eliminationofomittedvariablesbias.TheprincipalclaimthatmotivatesIVestimationofcausale¤ectsis
thatthe
only
reasonforanyrelationbetweenthedependentvariableandtheinstrumentisthee¤ectofthe
instrumentonthecausalvariableofinterest.Inthecontextofabinaryinstrument,itthereforeseemsnat-
uraltodivide—orrescale—thereduced-formdi¤erenceinmeansbythecorresponding…rst-stagedi¤erence
inmeans.
8
Asnotedintheintroductiontothischapter,measurementerrorinregressorstendstoshrinkregressioncoe¢cientstowards
zero.Toeliminatethisbias,Wald(1940)suggestedthatthedatabedividedinamannerindependentofthemeasurement
error,andthecoe¢cientofinterestestimatedasaratioofdi¤erencesinmeansasin(4.1.12).Durbin(1954)showedthat
Wald’smethodof…ttingstraightlinesisanIVestimatorwheretheinstrumentisadummymarkingWald’sdivisionofthe
data.Hausman(2001)providesanoverviewofeconometricstrategiesfordealingwithmeasurementerror.
96
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Table4.1.2:Waldestimatesofthereturnstoschoolingusingquarterofbirthinstruments
(1)(2)(3)
Borninthe1st
or2ndquarterof
year
Borninthe3rd
or4thquarterof
year
Di¤erence
(std.error)
(1)-(2)
ln(weeklywage)5.89165.9051-0.01349
(0.00337)
Yearsofeducation12.688112.8394-0.1514
(0.0162)
Waldestimateof
Notes:Adaptedfromare-analysisofAngristandKrueger(1991)byAngristand
Imbens(1995).Thesampleincludesnative-bornmenwithpositiveearningsfrom
the1930-39birthcohortsinthe1980Census5percent…le.Thesamplesizeis
329,509.
4.1.IVANDCAUSALITY
97
Table4.1.3:Waldestimatesofthee¤ectsofmilitaryserviceontheearningsofwhitemenbornin1950
EarningsyearMeanEligibility
E¤ect
MeanEligibility
E¤ect
(1)(2)(3)(4)(5)
198116,461-435.80.2670.159-2,741
(210.5)(0.040)(1,324)
19713,338-325.9-2050
(46.6)(293)
19692,299-2.0
(34.5)
Notes:AdaptedfromAngrist(1990),Tables2and3.Standarderrorsareshown
inparentheses.EarningsdataarefromSocialSecurityadministrativerecords.Fig-
98
CHAPTER4.INSTRUMENTALVARIABLESINACTION
4.1.IVANDCAUSALITY
99
Table4.1.4:Waldestimatesoflaborsupplye¤ects
IVEstimatesusing:
TwinsSex-composition
DependentMeanOLSFirststageWaldestimatesFirststageWaldestimates
variable(1)(2)(3)(4)(5)(6)
Employment0.528-0.1670.625-0.0830.067-0.135
(0.002)(0.011)(0.017)(0.002)(0.029)
Weeksworked19.0-8.05"-3.83"-6.23
(0.09)"(0.758)"(1.29)
Hours/week16.7-6.02"-3.39"-5.54
(0.08)"(0.637)"(1.08)
Note:ThetablereportsOLSandWaldestimatesofthee¤ectsofathirdbirthonlaborsupply
usingtwinsandsex-compositioninstruments.DataarefromtheAngristandEvans(1998)extract
includingmarriedwomenaged21-35withatleasttwochildreninthe1980Census.OLSmodels
includecontrolsformother’sage,ageat…rstbirth,dummiesforthesexof…rstandsecondbirths,
anddummiesforrace.
100
CHAPTER4.INSTRUMENTALVARIABLESINACTION
4.1.3GroupedDataand2SLS
TheWaldestimatoristhemotherofallinstrumentalvariablesestimatorsbecausemorecomplicated2SLS
9
Anexceptionistheclassicalmeasurementerrormodel,whereboththevariabletobeinstrumentandtheinstrumentare
assumedtobecontinuous.Here,wehaveinmindIVscenariosinvolvingomittedvariablesbias.
4.1.IVANDCAUSALITY
101
ThepossibilityofconstructingmultipleWaldestimatorsforthesamecausale¤ectnaturallyraisesthe
questionofwhattodowithallofthem.Wewouldliketocomeupwithasingleestimatethatsomehow
combinestheinformationintheindividualWaldestimatese¢ciently.Asitturnsout,themoste¢cient
linearcombinationofafullsetoflinearlyindependentWaldestimatesisproducedby…ttingalinethrough
thegroupmeansusedtoconstructtheseestimates.
Thegroupeddataestimatorcanbemotivateddirectlyasfollows.Asin(4.1.11),weworkwithabivariate
constant-e¤ectsmodel,whichinthiscasecanbewritten
y
i
=

+

d
i
+

i
;
(4.1.14)
where

=
y
1
i

y
0
i
isthecausale¤ectofinterestand
y
0
i
=

+

i
.Because
r
i
wasrandomlyassignedand
n
j
,where
n
j
is
thegroupsize.
TheGLS(orweightedleastsquares)estimatorof

inequation(4.1.16)isespeciallyimportantinthis
contextfortworeasons.First,theGLSslopeestimateconstructedfrom
j
groupedobservationsisan
asymptoticallye¢cientlinearcombinationofanyfullsetof
j

1
linearlyindependentWaldestimators
102
CHAPTER4.INSTRUMENTALVARIABLESINACTION
(Angrist,1991).Thiscanbeseenwithoutanymathematics:GLSandanylinearcombinationofpairwise
Waldestimatorsarebothlinearcombinationsofthegroupeddependentvariable.Moreover,GLSisthe
10
Continuousinstrumentsrecodedasdummiescanbeseenasprovidingaparsimoniousnon-parametricmodelfortheunder-
lying…rst-stagerelation,
E
[
d
i
j
z
i
]
:
Inhomoskedasticmodelswithconstantcoe¢cients,theasymptoticallye¢cientinstrument
is
E
[
d
i
j
z
i
]
(Newey,1990).
4.2.ASYMPTOTIC2SLSINFERENCE
103
11
See,e.g.,theprefacetoBorjas(2005).
104
CHAPTER4.INSTRUMENTALVARIABLESINACTION
stage,equation(4.1.9).The2SLSestimatorcanthenbewritten
^

2
SLS

"
X
i
V
i
V
0
i
#

1
X
i
V
i
y
i
;
where




0


0
isthecorrespondingcoe¢cientvector.Notethat
^

2
SLS
=�+
"
X
i
V
i
V
0
i
#

1
X
i
V
i
[

i
+

(
s
i

^
s
i
)]
=�+
"
X
i
V
i
V
0
i
#

1
X
i
V
i

i
(4.2.1)
wherethesecondequalitycomesfromthefactthatthe…rst-stageresiduals,
(
s
i

^
s
i
)
,areorthogonalto
V
i
inthesample.Thelimitingdistributionofthe2SLScoe¢cientvectoristhereforethelimitingdistribution
of
[
P
i
V
i
V
0
i
]

1
P
i
V
i

i
.ThisquantityisalittlehardertoworkwiththanthecorrespondingOLSquantity,
becausetheregressorsinthiscaseinvolveestimated…ttedvalues,
^
s
i
.ASlutsky-typeargumentshows,
4.2.ASYMPTOTIC2SLSINFERENCE
105
youavoidothercommon2SLSmistakes.
4.2.2Over-identi…cationandthe2SLSMinimand
F
Constant-e¤ectsmodelswithmoreinstrumentsthanendogenousregressorsaresaidtobeover-identi…ed.
Becausetherearemoreinstrumentsthanneededtoidentifytheparametersofinterest,thesemodelsimpose
asetofrestrictionsthatcanbeevaluatedaspartofaprocessofspeci…cationtesting.Thisprocessamounts
N
X
Z
i

i
(�)

m
N
(�)
:
(4.2.3)
2SLScanbeunderstoodasageneralizedmethodofmoments(GMM)estimatorthatchoosesavaluefor

bymakingthesampleanalogof(4.2.2)asclosetozeroaspossible.
Bythecentrallimittheorem,thesamplemomentvector
p
Nm
N
(�)
hasanasymptoticcovariancematrix
equalto
E
[
Z
i
Z
0
i

i
(�)
2
]
,amatrixwe’llcall

.Althoughsomewhatintimidatingat…rstblush,thisisjusta
matrixof4thmoments,asinthesandwichformulausedtoconstructrobuststandarderrors,(3.1.7).As
shownbyHansen(1982),theoptimalGMMestimatorbasedon(4.2.2)minimizesaquadraticforminthe
samplemomentvector,
m
N
(^
g
)
,where
^
g
isacandidateestimatorof

.
13
Theoptimalweightingmatrixin
12
Withasingleendogenousvariableandmorethanoneinstrument,

is
[
k
+1]

1
,while
Z
i
is
[
k
+
q
]

1
for
q

1
.Hencethe
resultinglinearsystemcannotbesolvedunlessthereisalineardependencythatmakessomeoftheinstrumentsredundant.
13
"Quadraticform"ismatrixlanguageforaweightedsumofsquares.Suppose
v
isan
N

1
vectorand
M
isan
N

N
106
CHAPTER4.INSTRUMENTALVARIABLESINACTION
themiddleoftheGMMquadraticformis


1
.Inpractice,ofcourse,

,isunknownandmustbeestimated.
AfeasibleversionoftheGMMprocedureusesaconsistentestimatorof

intheweightingmatrix.Since
theestimatorusingknownandestimated

havethesamelimitingdistribution,we’llignorethisdistinction
fornow.Thequadraticformtobeminimizedcanthereforebewritten,
J
N
(^
g
)

Nm
N
(^
g
)
0


1
m
N
(^
g
)
;
(4.2.4)
wherethe
N
-termoutfrontcomesfrom
p
N
normalizationofthesamplemoments.Asshownimmediately
below,whentheresidualsareconditionallyhomoskedastic,theminimizerof
J
N
(^
g
)
isthe2SLSestimator.
Withouthomoskedasticity,theGMMestimatorthatminimizes(4.2.4)isWhite’s(1982)Two-StageIV(a
generalizationof2SLS)sothatitmakessensetocall
J
N
(^
g
)
the“2SLSminimand”.
N
i
for
E
[
Z
i
Z
0
i
]
,wehave
^
J
N
(^
g
)=(1
=
2

)

(
y

W
^
g
)
0
P
Z
(
y

W
^
g
)
;
where
P
Z
=
Z
(
Z
0
Z
)

1
Z
.Fromhere,wegetthesolution
^
g
=
^

2
SLS
=[
W
0
P
Z
W
]

1
W
0
P
Z
y
:
Sincetheprojectionoperator,
P
Z
,produces…ttedvalues,and
P
Z
isanidempotentmatrix,thiscanbeseento
betheOLSestimatorofthesecond-stageequation,(4.1.9),writteninmatrixnotation.Moregenerally,even
withouthomoskedasticitywecanobtainafeasiblee¢cient2SLS-typeestimatorbyminimizing(4.2.4)and
usingaconsistentestimatorof
E
[
Z
i
Z
0
i

i
(^
g
)
2
]
toform
^
J
N
(^
g
)
.Typically,we’dusetheempiricalfourthmo-
ments,
P
Z
i
Z
0
i
^

2
i
,where
^

i
matrix.Aquadraticformin
v
is
v
0
Mv
.If
M
isa
N

N
diagonalmatrixwithdiagonalelements
m
i
,then
v
0
Mv
=
P
i
m
i
v
2
i
:
14
MuchmoredetailedexplanationscanbefoundinNewey(1985),NeweyandWest(1987),andtheoriginalHansen(1982)
GMMpaper.
4.2.ASYMPTOTIC2SLSINFERENCE
107
Theover-identi…cationteststatisticisgivenbytheminimized2SLSminimand.Intuitively,thisstatistic

2
j
!
(
y
j

^
g
0

W
j
)
2
;
(4.2.7)
where

2
j
isthevarianceof

i
ingroup
j
.Estimationusing(4.2.7)isfeasiblebecausewecanestimate

2
j
in
a…rst-step,say,usingine¢cient-but-still-consistent2SLSthatignoresheteroskedasticity.E¢cienttwo-step
IVestimatorsareconstructedinAngrist(1990,1991).
TheGLSstructureofthe2SLSminimandallowsustoseetheover-identi…cationteststatisticfordummy
instrumentsasasimplemeasureofthegoodnessof…tofthelineconnecting

y
j
and

W
j
.Inotherwords,this
isthechi-squaregoodnessof…tstatisticforthelineinaVIVplotlike…gure4.1.2.Thechi-squaredegrees
15
If,forexample,theinstrumenttakesonthreevalues,oneofwhichisassignedtotheconstant,andthemodelincludesa
constantandasingletheendogenousvariableonly,theteststatistichas1degreeoffreedom.
108
CHAPTER4.INSTRUMENTALVARIABLESINACTION
identi…cationteststatisticdiscussedinmanywidely-usedeconometricreferencesonsimultaneousequations
models.Forexample,thisstatisticfeaturesinHausman’s(1983)chapteronsimultaneousequationsin
the
^

0
^

i
,
where
^

=
y

W
^

2
SLS
isthevectorof2SLSresiduals.
Second,it’sworthemphasizingthattheessenceofover-identi…cationcanbesaidtobe“morethanone
16
TheWaldestimatorandWaldtestarenamedafterthesamestatistician,AbrahamWald,butthelatterreferenceisWald
(1943).
17
ThefactthatWaldandLMtestingproceduresforthesamenullareequivalentinlinearmodelswasestablishedbyNewey
andWest(1987).Angrist(1991)givesaformalstatementoftheargumentinthisparagraph.
4.3.TWO-SAMPLEIVANDSPLIT-SAMPLEIV
F
109
hand,incaseswheretheunderlyingIVestimatesarequiteprecise,thefactthattheover-IDstatisticrejects
N
=
Z
0
W
N
�+
Z
0

N
(4.3.1)
GLSestimatesof

in(4.3.1)areconsistentbecause
E
h
Z
0
y
N
i
=
E
h
Z
0
W
N
i

.
The2SLSminimandcanbethoughtofasGLSappliedtoequation(4.3.1),aftermultiplyingby
p
N
to
N
and
Z
0
W
N
.Thedependentvariable,
Z
0
y
N
,isavectorofdimension
[
k
+
q
]

1
.
Theregressormatrix,
Z
0
W
N
,isofdimension
[
k
+
q
]

[
k
+1]
.Thesecond-momentequationcannotbesolved
exactlyunless
q
=1
soitmakessensetomakethe…tasgoodaspossiblebyminimizingaquadraticformin
theresiduals.Themoste¢cientweightingmatrixforthispurposeistheasymptoticcovariancematrixof
Z
0

p
N
.Thisagainproducesthe2SLSminimand,
^
J
N
(^
g
)
.
Arelatedinsightisthefactthatthemomentmatricesontheleftandrighthandsideoftheequalssign
inequation(4.3.1)neednotcomefromthesamedatasetsprovidedthesedatasetsaredrawnfromthe
samepopulation.Thisobservationleadstothetwo-sampleinstrumentalvariables(TSIV)estimatorused
byAngrist(1990)anddevelopedformallyinAngristandKrueger(1992)
19
18
Aquadraticformisthematrix-weightedproduct,
x
0
Ax
,where
x
isarandomvectorof,say,dimension
k
and
A
isa
k

k
matrixofconstants.
19
ApplicationsofTSIVincludeBjorklundandJantti(1997),Jappelli,Pischke,andSouleles(1998),CurrieandYelowitz
(2000),andDeeandEvans(2003).Inarecentpaper,InoueandSolon(2005)comparetheasymptoticdistributionsof
alternativeTSIVestimators,andintroduceamaximumlikelihood(LIML-type)versionofTSIV.Theyalsocorrectamistake
inthedistributiontheoryinAngristandKrueger(1995),discussedfurther,below.
110
CHAPTER4.INSTRUMENTALVARIABLESINACTION
N
2

=
plim

Z
0
1
W
1
N
1

,GLSestimatesofthetwo-samplemomentequation
Z
0
1
y
1
N
1
=
Z
0
2
W
2
N
2
�+

Z
0
1
W
1
N
1

Z
0
2
W
2
N
2

�+
Z
0
1

1
N
1

arealsoconsistentfor

.Thelimitingdistributionofthisestimatorisobtainedbynormalizingby
p
N
1
and
assuming
plim

N
2
N
1

isaconstant.
TheutilityofTSIVcomesfromthefactthatitwidensthescopeforIVestimationtosituationswhere
observationsondependentvariables,instruments,andtheendogenousvariableofinterestarehardto…nd
N
2
,the…rst-stagecorrelationbetween
N
1
.
Twofurthersimpli…cationsmakeTSIVespeciallyeasytouse.First,asnotedpreviously,whenthe
instrumentsconsistofafullsetofmutuallyexclusivedummyvariables,asinAngrist(1990)andAngrist
andKrueger(1992),thesecondmomentequation,(4.3.1),simpli…estoamodelforconditionalmeans.In
particular,the2SLSminimandforthetwo-sampleproblembecomes
^
J
N
(^
g
)=
X
j
!
j


y
1
j

^
g
0

W
2
j

2
;
(4.3.2)
where

y
1
j
isthemeanofthedependentvariableatinstrument/covariatevalue
j
inonesample,

W
2
j
isthe
meanofendogenousvariablesandcovariatesatinstrument/covariatevalue
j
inasecondsample,and
!
j
is
anappropriateweight.ThisamountstoweightedleastsquaresestimationoftheVIVequation,exceptthat
thedependentandindependentvariablesdonotcomefromthesamesample.Again,Angrist(1990)and
AngristandKrueger(1992)provideillustrations.Theoptimalweightsforasymptoticallye¢cientTSIVare
givenbyvarianceof

y
1
j

^
g
0

W
2
j
.Thisvarianceisa¤ectedbythefactthatmomentscomefromdi¤erent
samples,asaretheTSIVstandarderrors,whichareeasytocomputeinthedummy-instrumentcasesince
theestimatorisequivalenttoweightedleastsquares.
20
AngristandKruegercalledthisestimatorSSIVbecausetheywereconcernedwithascenariowhereasingledatasetis
deliberatelysplitintwo.AsdiscussedinSection(4.6.4),theresultingestimatormayhavelessbiasthanconventional2SLS.
InoueandSolon(2005)refertotheestimatorAngristandKrueger(1995)calledSSIVasTwo-sample2SLSorTS2SLS.
21
ThisshortcutformulausesthestandarderrorsfromthemanualSSIVsecondstage.Thecorrectasymptoticcovariance
matrixformula,fromInoueandSolon(2005),is
f
B
[(

11
+


0

22
�)
A
]

1
B
g

1
where
B
=
plim

Z
0
2
W
2
N
2

=
plim

Z
0
1
W
1
N
1

;A
=
plim

Z
0
1
Z
1
N
1

=
plim

Z
2
Z
2
N
2

,
plim

N
2
N
1

=
;
11
isthevarianceofthe
reduced-formresidualindataset1,and

22
isthevarianceofthe…rst-stageresidualindataset2.Inprinciple,thesepieces
areeasyenoughtocalculate.OtherapproachestoSSIVinferenceincludethoseofDeeandEvans(2003),whocalculate
standarderrorsforjust-identi…edmodelsusingthedelta-method,andBjorklundandJantti(1997),whouseabootstrap.
112
CHAPTER4.INSTRUMENTALVARIABLESINACTION
22
Thedistinctionbetweeninternalandexternalvalidityisrelativelynewtoappliedeconometricsbuthasalonghistoryin
socialscience.See,forexample,thechapter-lengthdiscussioninShadish,Cook,andCampbell(2002),thesuccessortoa
classictextonresearchmethodsbyCampbellandStanley(1963).
23
Hirano,Imbens,RubinandZhou(2000)notethattheexclusionrestrictionthat
y
i
(
d;z
)
equals
y
i
(
d;z
0
)
canbeweakened
torequireonlythatthedistributionsof
y
i
(
d;z
)
and
y
i
(
d;z
0
)
bethesame.
24
Asitturnsout,thereisnotmuchofarelationshipbetweenschoolingandlotterynumbersintheAngristandKrueger
(1992)data,probablybecauseeducationaldefermentswerephasedoutduringthelotteryperiod.
114
CHAPTER4.INSTRUMENTALVARIABLESINACTION
usingthesingle-index
(
y
1
i
;
y
0
i
)
notationwehavebeenusingallalong.Inparticular,
y
1
i

y
i
(1
;
1)=
y
i
(1
;
0);
y
0
i

y
i
(0
;
1)=
y
i
(0
;
0)
:
(4.4.3)
Theobservedoutcome,
y
i
,canthereforebewrittenintermsofpotentialoutcomesas:
y
i
=
y
i
(0
;
z
i
)+[
y
i
(1
;
z
i
)

y
i
(0
;
z
i
)]
d
i
(4.4.4)
=
y
0
i
+(
y
1
i

y
0
i
)
d
i
:
Arandom-coe¢cientsnotationforthisis
y
i
=

0
+

i
d
i
+

i
;
acompactversionof(4.4.4)with

0

E
[
y
0
i
]
and

i

y
1
i

y
0
i
.
A…nalassumptionneededforheterogeneousIVmodelsisthateither

1
i

0
forall
i
or

1
i

0
forall
i
.
This
monotonicity
assumption,introducedbyImbensandAngrist(1994),meansthatwhiletheinstrument
mayhavenoe¤ectonsomepeople,allofthosewhoarea¤ectedarea¤ectedinthesameway.Inotherwords,
either
d
1
i

d
0
i
or
d
1
i

d
0
i
forall
i
.Inwhatfollows,weassumemonotonicityholdswith
d
1
i

d
0
i
.Inthe
draft-lotteryexample,thismeansthatalthoughdraft-eligibilitymayhavehadnoe¤ectontheprobability
ofmilitaryserviceforsomemen,thereisnoonewhowasactuallykeptoutofthemilitarybybeingdraft-
eligible.Withoutmonotonicity,instrumentalvariablesestimatorsarenotguaranteedtoestimateaweighted
averageoftheunderlyingindividualcausale¤ects,
y
1
i

y
0
i
.
Giventheexclusionrestriction,theindependenceofinstrumentsandpotentialoutcomes,theexistence
E
[
d
i
j
z
i
=1]

E
[
d
i
j
z
i
=0]
=
E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
]=
E
[

i
j

1
i

0]
:
Proof.
Usetheexclusionrestrictiontowrite
E
[
y
i
j
z
i
=1]=
E
[
y
0
i
+(
y
1
i

y
0
i
)
d
i
j
z
i
=1]
,whichequals
Thistheoremsaysthataninstrumentwhichisasgoodasrandomlyassigned,a¤ectstheoutcomethrough
asingleknownchannel,hasa…rst-stage,anda¤ectsthecausalchannelofinterestonlyinonedirection,can
beusedtoestimatetheaveragecausale¤ectonthea¤ectedgroup.Thus,IVestimatesofe¤ectsofmilitary
serviceusingthedraftlotteryestimatethee¤ectofmilitaryserviceonmenwhoservedbecausetheywere
draft-eligible,butwouldnototherwisehaveserved.Thisobviouslyexcludesvolunteersandmenwhowere
exemptedfrommilitaryserviceformedicalreasons,butitincludesmenforwhomdraftpolicywasbinding.
HowusefulisLATE?Notheoremanswersthisquestion,butit’salwaysworthdiscussing.Partofthe
interestinthee¤ectsofVietnam-eraservicerevolvesaroundthequestionofwhetherveterans(especially,
conscripts)wereadequatelycompensatedfortheirservice.Internallyvaliddraftlotteryestimatesanswer
25
Angrist(1990)interpretsdraftlotteryestimatesasthepenaltyforlostlabormarketexperience.Thissuggestsdraftlottery
estimatesshouldhaveexternalvalidityforthee¤ectsofconscriptioninotherperiods,aconjecturebornoutbytheresultsfor
WWIIdrafteesinAngristandKrueger(1994).
116
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Withoutmonotonicity,thisisequalto
E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
]
P
[
d
1
i

d
0
i
]

E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
]
P
[
d
1
i

d
0
i
]
:
Wemightthereforehaveascenariowheretreatmente¤ectsarepositiveforeveryoneyetthereducedform
iszerobecausee¤ectsoncompliersarecanceledoutbye¤ectsonde…ers.Thisdoesn’tcomeupina
constant-e¤ectsmodelbecausethereducedformisalwaystheconstante¤ecttimesthe…rststageregardless
26
Withaconstante¤ect,
;
E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
]
P
[
d
1
i

d
0
i
]

E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
]
P
[
d
1
i

d
0
i
]
:
=

f
P
[
d
1
i

d
0
i
]

P
[
d
1
i

d
0
i
]
g
=

f
E
[
d
1
i

d
0
i
]
g
:
Soazeroreducedforme¤ectmeanseitherthe…rststageiszeroor

=0
.
theexperimentaltreatmenteventhoughtheywerenotsupposedto.Thosewhodon’ttaketheirmedicine
whenrandomlyassignedtodosoarenever-takerswhilethosewhotakethemedicineevenwhenputintothe
controlgrouparealways-takers.Withoutaddingfurtherassumptions(e.g.,constantcausale¤ects),LATE
isnotinformativeaboute¤ectsonnever-takersandalways-takersbecause,byde…nition,treatmentstatus
forthesetwogroupsisunchangedbytheinstrument(randomassignment).TheanalogybetweenIVanda
randomizedtrialwithpartialcomplianceismorethanallegorical-IVsolvestheproblemofcausalinference
inarandomizedtrialwithpartialcompliance.Thisimportantpointmeritsaseparatesubsection,below.
Beforeturningtothisimportantspecialcase,wemakeafewgeneralpoints.First,theaveragecausal
e¤ectoncompliersisnotusuallythesameastheaveragetreatmente¤ectonthetreated.Fromthesimple
factthat
d
i
=
d
0
i
+(
d
1
i

d
0
i
)
z
i
,welearnthatthetreatedpopulationconsistsoftwonon-overlappinggroups.
Bymonotonicity,wecannothaveboth
d
0
i
=1
and
d
1
i

d
0
i
=1
since
d
0
i
=1
implies
d
1
i
=1
:
Thetreated
thereforehave
either
d
0
i
=1
or
d
1
i

d
0
i
=1
and
z
i
=1
,andhence
d
i
canbewrittenasthesumoftwo
mutually-exclusivedummies,
d
i
0
and
(
d
1
i

d
0
i
)
z
i
.Thetreatedconsistofeitheralways-takersorcompliers
withtheinstrumentswitchedon.Sincetheinstrumentisasgoodasrandomlyassigned,complierswiththe
{z
}
e¤ectonthetreated
(4.4.5)
=
E
[
y
1
i

y
0
i
j
d
0
i
=1]
P
[
d
0
i
=1
j
d
i
=1]
+
E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
;
z
i
=1]
P
[
d
1
i

d
0
i
;
z
i
=1
j
d
i
=1]
=
E
[
y
1
i

y
0
i
j
d
0
i
=1]
|
{z
}
e¤ectonalways-takers
P
[
d
0
i
=1
j
d
i
=1]
+
E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
]
|
{z
}
e¤ectoncompliers
P
[
d
1
i

d
0
i
;
z
i
=1
j
d
i
=1]
118
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Since
P
[
d
0
i
=1
j
d
i
=1]
and
P
[
d
1
i

d
0
i
;
z
i
=1
j
d
i
=1]
adduptoone,thismeansthatthee¤ectoftreatment
onthetreatedisaweightedaverageofe¤ectsonalways-takersandcompliers.
Likewise,LATEisnottheaveragecausale¤ectoftreatmentonthenon-treated,
E
[
y
1
i

y
0
i
j
d
i
=0]
.In
thedraft-lotteryexample,theaveragee¤ectonthenon-treatedistheaveragecausale¤ectofmilitaryservice
{z
}
e¤ectonthenon-treated
(4.4.6)
=
E
[
y
1
i

y
0
i
j
d
1
i
=0]
|
{z
}
e¤ectonnever-takers
P
[
d
1
i
=0
j
d
i
=0]
+
E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
]
|
{z
}
e¤ectoncompliers
P
[
d
1
i

d
0
i
;
z
i
=0
j
d
i
=0]
;
whereweusethefactthat,bymonotonicity,thosewith
d
1
i
=0
mustbenever-takers.
Finally,averaging(4.4.5)and(4.4.6)using
E
[
y
1
i

y
0
i
]=
E
[
y
1
i

y
0
i
j
d
i
=1]
P
[
d
i
=1]+
E
[
y
1
i

y
0
i
j
d
i
=0]
P
[
d
i
=0]
showstheoverallpopulationaveragetreatmente¤ecttobeaweightedaverageofe¤ectsoncompliers,always-
takers,andnever-takers.Ofcourse,thisisaconclusionwecouldhavereacheddirectlygivenmonotonicity
andthede…nitionatthebeginningofthissubsection.
Becauseaninstrumentalvariableisnotdirectlyinformativeaboute¤ectsonalways-takersandnever-
takers,instrumentsdonotusuallycapturetheaveragecausale¤ectonallofthetreatedoronallofthe
non-treated.Thereareimportantexceptionstothisrule,however:instrumentalvariablesthatallowno
always-takersornonever-takers.Althoughthisscenarioisnottypical,itisanimportantspecialcase.One
exampleisthetwinsinstrumentforfertility,usedbyRosenzweigandWolpin(1980),BronarsandGrogger
(1994),AngristandEvans(1998),andAngrist,Lavy,andSchlosser(2006).AnotherisOreopoulos’(2006)
recentstudyusingchangesincompulsoryattendancelawsasinstrumentsforschoolinginBritain.
moreyearofhighschoolonallthosewholeaveschoolat14.ThisturnsonthefactthatBritishteensare
remarkablylaw-abidingpeople-Oreopoulos’IVstrategywouldn’testimatethee¤ectoftreatmentonthe
non-treatedin,say,Israel,whereteenagersgetmoreleewaywhenitcomestocompulsoryschoolattendance.
Israelieconometriciansusingchangesincompulsoryattendancelawsasinstrumentsmustthereforemakedo
withLATE.
4.4.3IVinRandomizedTrials
ThelanguageoftheLATEframeworkisbasedonan
analogy
120
CHAPTER4.INSTRUMENTALVARIABLESINACTION
ofinterestamongparticipantsandthefailureofprogramoperatorstoencourageparticipation.Sincethe
complianceprobleminthiscaseislargelycon…nedtothetreatmentgroup,LATEusingrandomassignment,
z
i
,asaninstrumentfortreatmentreceived,
d
i
,isthee¤ectoftreatmentonthetreated.
ThisuseofIVtosolvethecomplianceproblemsisillustratedinTable4.4.1,whichpresentsresults
fromtheJTPAexperiment.TheoutcomevariableofprimaryinterestintheJTPAexperimentistotal
earningsinthe30-monthperiodafterrandomassignment.Columns1-2ofthetableshowthedi¤erence
inearningsbetweenthosewhoweretrainedandthosewhowerenot(theestimatesincolumn2arefrom
aregressionmodelthatadjustsforanumberofindividualcharacteristicsmeasuredatthebeginningofthe
experiment.Thecontrastreportedincolumns1-2isontheorderof$4,000formenand$2,200forwomen,
inbothcasesalargetreatmente¤ectthatamountstoabout20percentofaverageearnings.Butthese
estimatesaremisleadingbecausetheycompareindividualsaccordingto
d
i
,theactualtreatmentreceived.
Sinceindividualsassignedtothetreatmentgroupwerefreetodecline(and40%didso),thiscomparison
throwsawaytherandomassignmentunlessthedecisiontoaccepttreatmentisitselfindependentofpotential
outcomes.Thisseemsunlikely.
Table4.4.1:ResultsfromtheJTPAexperiment:OLSandIVestimatesoftrainingimpacts
ComparisonsbyComparisonsbyInstrumentalVariable
TrainingStatusAssignmentStatusEstimates
WithoutWithWithoutWithWithoutWith
CovariatesCovariatesCovariatesCovariatesCovariatesCovariates
(1)(2)(3)(4)(5)(6)
A.Men3,9703,7541,1179701,8251,593
(555)(536)(569)(546)(928)(895)
B.Women2,1332,2151,2431,1391,9421,780
(345)(334)(359)(341)(560)(532)
Notes:ThetablereportsOLS,reduced-form,andIVestimatesofthee¤ectofsubsidizedtrainingonearningsin
theJTPAexperiment.Columns(1)and(2)showdi¤erencesinearningsbytrainingstatus;columns(3)and(4)
showdi¤erencesbyrandom-assignmentstatus.Columns(5)and(6)reporttheresultofusingrandom-assignment
statusasaninstrumentfortraining.Thecovariatesusedincolumns(2),(5)and(6)are
HighschoolorGED,
Black,Hispanic,Married,Workedlessthan13weeksinpastyear,AFDC
(forwomen),plusindicatorsfortheservice
strategyrecommended,agegroupandsecondfollow-upsurvey.Robuststandarderrorsareshowninparenthesis.
122
CHAPTER4.INSTRUMENTALVARIABLESINACTION
E
[
d
i
j
z
i
=1]
=
E
[
y
1
i

y
0
i
j
d
i
=1]
:
Proof.
E
[
y
i
j
z
i
=1]=
E
[
y
i
0
+(
y
1
i

y
0
i
)
d
i
j
z
i
=1]
,while
E
[
y
i
j
z
i
=0]=
E
[
y
i
0
j
z
i
=0]
because
E
[
d
i
j
z
i
=
0]=0
:
Therefore
E
[
y
i
j
z
i
=1]

E
[
y
i
j
z
i
=0]=
E
[(
y
1
i

y
0
i
)
d
i
j
z
i
=1]
byindependence.But
E
[(
y
1
i

y
0
i
)
d
i
j
z
i
=1]=
E
[
y
1
i

y
0
i
j
d
i
=1
;
z
i
=1]
P
[
d
i
=1
j
z
i
=1]
while
E
[
d
i
j
z
i
=0]=0
means
d
i
=1
implies
z
i
=1
:
Hence,
E
[
y
1
i

y
0
i
j
d
i
=1
;
z
i
=1]=
E
[
y
1
i

y
0
i
j
d
i
=1]
Inadditiontotellingushowtoanalyzerandomizedtrialswithnon-compliance,theLATEframework
27
AnotherapplicationofIVtodatafromarandomizedtrialisKrueger(1999).Thisstudyusesrandomlyassignedclasssize
asaninstrumentforactualclasssizewithdatafromtheTennesseeSTARexperiment.Forstudentsin…rstgradeandhigher,
actualclasssizedi¤ersfromrandomlyassignedclasssizeintheSTARexperimentbecauseparentsandteachersmovestudents
aroundinyearsaftertheexperimentbegan.Krueger1999alsoillustrates2SLSappliedtoamodelwithvariabletreatment
intensity,asdiscussedinsection4.5.3.
124
CHAPTER4.INSTRUMENTALVARIABLESINACTION
instrumentsthatallowforperfectcomplianceononesideortheother).Althoughdi¤erentIVestimates
are"weighted-up"by2SLStoproduceasingleaveragecausale¤ect,over-identi…cationtestingofthesort
28
Infact,maintainingthehypothesisthatallinstrumentsinanover-identi…edmodelarevalid,thetraditionalover-
identi…cationteststatisticbecomesaformaltestfortreatment-e¤ectheterogeneity.
P
[
d
i
=1
]
(4.4.7)
=
P
[
z
i
=
1](
E
[
d
i
j
z
i
=
1]

E
[
d
i
j
z
i
=
0])
P
[
d
i
=
1]
:
Thesecondequalityusesthefactthat
P
[
d
i
=1
j
d
1
i

d
0
i
]=
P
[
z
i
=1
j
d
1
i

d
0
i
]
andthat
P
[
z
i
=1
j
d
1
i

d
0
i
]
=
P
[
z
i
=1
]
byIndependence.Inotherwords,theproportionofthetreatedwhoarecompliersisgivenby
the…rststage,timestheprobabilitytheinstrumentisswitchedon,dividedbytheproportiontreated.
366
,andthemarginalprobabilityoftreatmentis.267.
Fromthesestatistics,wecomputethatthecompliantsubpopulationis.32oftheveteranpopulationinthis
126
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Table4.4.2:Probabilitiesofcomplianceininstrumentalvariablesstudies
SourceEndogenous
Variable
(
d
)
Instrument(
z
)Sample
P
[
d
=1]
1st
Stage,
P
[
d
1

d
0
]
P
[
z
=1]
P
[
d
1

d
0
j
d
=1]
P
[
d
1

d
0
j
d
=0]
(1)(2)(3)(4)(5)(6)(7)(8)(9)
Angrist
(1990)
VeteranSta-
tus
DrafteligibilityWhitemenbornin
1950
0.2670.1590.5340.3180.101
Non-whitemenborn
in1950
0.1630.0600.5340.1970.033
Angristand
Evans(1998)
Morethan2
children
TwinsatsecondbirthMarriedwomenaged
21-35withtwoor
morechildrenin1980
0.3810.6030.0080.0130.966
Firsttwochildrenare
ofthesamesex
Marriedwomenaged
21-35withtwoor
morechildrenin1980
0.3810.0600.5060.0800.048
Angristand
Krueger
(1991)
Highschool
graduate
Thirdorfourthquarter
birth
Menbornbetween
1930and1939
0.7700.0160.5090.0110.034
Acemoglu
andAngrist
(2000)
Highschool
graduate
Staterequires11or
moreyearsofschoolat-
tendance
Whitemenaged40-
49
0.6170.0370.3000.0180.068
Notes:Thetableshowsananalysisoftheabsoluteandrelativesizeofthecomplierpopulationfor
anumberofinstrumentalvariables.The…rst-stage,reportedincolumn6,givestheabsolutesize
ofthecompliergroup.Columns8and9showthesizeofthecomplierpopulationrelativetothe
treatedanduntreatedpopulations.
128
CHAPTER4.INSTRUMENTALVARIABLESINACTION
P
[
x
1
i
=1]
=
P
[
d
1
i

d
0
i
j
x
1
i
=1]
P
[
d
1
i

d
0
i
]
=
E
[
d
i
j
z
i
=1
;x
1
i
=1]

E
[
d
i
j
z
i
=0
;x
1
i
=1]
E
[
d
i
j
z
i
=1]

E
[
d
i
j
z
i
=0]
:
(4.4.8)
Inotherwords,therelativelikelihoodacomplierisacollegegraduateisgivenbytheratioofthe…rststage
forcollegegraduatestotheoverall…rststage.
29
ThiscalculationisillustratedinTable4.4.3
,
whichreportscompliers’characteristicsratiosforageat
29
AgeneralmethodforconstructingthemeanorotherfeaturesofthedistributionofcovariatesforcompliersusesAbadie’s
(2003)kappa-weightingscheme.Forexample,
E
[
X
i
j
d
1
i

d
0
i
]=
E
[

i
X
i
]
E
[

i
]
;
where

i
=1

d
i
(1

z
i
)
1

P
(
z
i
=1
j
X
i
)

(1

d
i
)
z
i
P
(
z
i
=1
X
i
)
:
Thisworksbecausetheweightingfunction,

i
,"…ndscompliers,"inasensediscussedinSection(4.5.2),below.
Table4.4.3:Complier-characteristicsratiosfortwinsandsex-compositioninstruments
TwinsatsecondbirthFirsttwochildrenaresamesex
Variable
E
[
x
]
E
[
x
j
d
1

d
0
]
P
[
x
j
d
1

d
0
]
=P
[
X
]
E
[
x
j
d
1

d
0
]
P
[
x
j
d
1

d
0
]
=P
[
X
]
(1)(2)(3)(6)(5)
Age30orolderat…rstbirth0.002910.004041.390.002330.995
(0.0201)(0.374)
Blackorhispanic0.1250.1030.8220.1020.814
(0.00421)(0.0775)
Highschoolgraduate0.8220.8611.0480.8150.998
(0.000772)(0.0140)
Collegegraduate0.1320.1511.140.09040.704
(0.00376)(0.0692)
Notes:Thetablereportsananalysisofcompliercharacteristicsfortwinsandsex-compositioninstru-
ments.Theratiosincolumns3and5givetherelativelikelihoodcompliershavethecharacteristic
indicatedineachrow.Dataarefromthe1980Census5%sample,includingmarriedmothersage
21-35withatleasttwochildren,asinAngristandEvans(1998).Thesamplesizeis254,654forall
columns.
130
CHAPTER4.INSTRUMENTALVARIABLESINACTION
4.5GeneralizingLATE
TheLATEtheoremappliestoastripped-downcausalmodelwhereasingledummyinstrumentisusedto
estimatetheimpactofadummytreatmentwithnocovariates.Wecangeneralizethisinthreeimportant
Cov
(
d
i
;
z
ji
)
;
j
=1
;
2
4.5.GENERALIZINGLATE
131
denotethetwoIVestimandsusing
z
1
i
and
z
2
i
:
The(population)…rststage…ttedvaluesfor2SLSare
ˆ
d
i
=

11
z
1
i
+

12
z
2
i
.ByvirtueoftheIV
Cov
(
d
i
;
ˆ
d
i
)
=

11
Cov
(
y
i
;
z
1
i
)
Cov
(
d
i
;
ˆ
d
i
)
+

12
Cov
(
y
i
;
z
2
i
)
Cov
(
d
i
;
ˆ
d
i
)
=


11
Cov
(
d
i
;
z
1
i
)
Cov
(
d
i
;
ˆ
d
i
)

Cov
(
y
i
;
z
1
i
)
Cov
(
d
i
;
z
1
i
)

+


21
Cov
(
d
i
;
z
2
i
)
Cov
(
d
i
;
ˆ
d
i
)

Cov
(
y
i
;
z
2
i
)
Cov
(
d
i
;
z
2
i
)

=

1
+(1


)

2
;
where

=

11
Cov
(
d
i
;
z
1
i
)

11
Cov
(
d
i
;
z
1
i
)+

21
Cov
(
d
i
;
z
2
i
)
isanumberbetweenzeroandonethatdependsontherelativestrengthofeachinstrumentinthe…rststage.
Thus,wehaveshownthat2SLSisaweightedaverageofcausale¤ectsforinstrument-speci…ccompliant
subpopulations.Suppose,forexample,that
z
1
i
denotestwinsbirthsand
z
2
i
indicatessame-sexsibshipsin
familieswithtwoormorechildren,bothinstrumentsforfamilysizeasinAngristandEvans(1998).A
multiplesecondbirthincreasesthelikelihoodofhavingathirdchildbyabout
:
6
whileasame-sexsiblingpair
increasesthelikelihoodofathirdbirthbyabout
:
07
30
Usingtwinsinstrumentsalone,theIVestimateofthee¤ectofathirdchildonfemalelaborforceparticipationis-.084
(s.e.=.017).Thecorrespondingsamesexestimateis-.138(s.e.=.029).Usingbothinstrumentsproducesa2SLSestimateof
-.098(.015).The2SLSweightinthiscaseis.74fortwins,.26forsamesex,duetothemuchstrongertwins…rststage.
132
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Moreformally,IVestimationwithcovariatesmaybejusti…edbya
conditional
independenceassumption
f
y
1
i
;
y
0
i
;
d
1
i
;
d
0
i
gq
z
i
j
X
i
(4.5.1)
Inotherwords,wethinkoftheinstrumentalvariablesasbeing“asgoodasrandomlyassigned,”conditional
oncovariates,
X
i
(hereweareimplicitlymaintainingtheexclusionrestrictionaswell).Asecondreason
forincorporatingcovariatesisthatconditioningoncovariatesmayreducesomeofthevariabilityinthe
dependentvariable.Thisleadstomoreprecise2SLSestimatesunderconstantconditionale¤ects.
Thesimplestcausalmodelwithcovariatesistheconstant-e¤ectsmodel,withfunctionalformrestrictions
asfollows:
E
[
y
0
i
j
X
i
]=
X
0
i


fora
k

1
vectorofcoe¢cients,


;
y
1
i

y
0
i
=
:
Incombinationwith(4.5.1),thismotivates2SLSestimationofanequationlike(4.1.6)asdiscussedinSection
4.1.
Astraightforwardgeneralizationoftheconstant-e¤ectsmodelallows
y
1
i

y
0
i
=

(
X
i
)
;
where

(
X
i
)
4.5.GENERALIZINGLATE
133
X
i
,wede…necovariate-speci…cLATE,

(
X
i
)

E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
;
X
i
]
:
The"saturateandweight”approachtoestimationwithcovariatesisspelledoutinthefollowingtheorem
(fromAngristandImbens,1995).
Theorem4.5.1
SATURATEANDWEIGHT.SupposetheassumptionsoftheLATEtheoremholdcondi-
tionalon
X
i
:
Thatis,
(CA1,Independence)
f
y
i
(
d
1
i
;
1)
;
y
0
i
(
d
0
i
;
0)
;
d
1
i
;
d
0
i
gq
z
i
j
X
i
;
(CA2,Exclusion)
P
[
y
i
(
d;
0)=
y
i
(
d;
1)
j
X
i
]=1
for
d
=0
;
1
;
(CA3,First-stage),
E
[
d
1
i

d
0
i
j
X
i
]
6
=0
Wealsoassumemonotonicity(A4)holdsasbefore.Considerthe2SLSestimandbasedonthe…rststage
equation
d
i
=

X
+

1
X
z
i
+

1
i
(4.5.3)
andthesecondstageequation
y
i
=

X
+

c
d
i
+

i
where

X
and

X
E
[
V
f
E
[
d
i
j
X
i
;
z
i
]
j
X
i
g
]
(4.5.4)
=
E
f
P
[
d
i
=1
j
X
i
;
z
i
](1

P
[
d
i
=1
j
X
i
;
z
i
])
j
X
i
g
E
[
E
[
d
i
j
X
i
;
z
i
](1

P
[
d
i
=1
j
X
i
;
z
i
])]
:
:
Thistheoremsaysthat2SLSwithafullysaturated…rststageandasaturatedmodelforcovariatesin
thesecondstageproducesaweightedaverageofcovariate-speci…cLATEs.Theweightsareproportionalto
theaverageconditionalvarianceofthepopulation…rst-stage…ttedvalue,
E
[
d
i
j
X
i
;
z
i
]
,ateachvalueof
X
i
.
31
Thetheoremcomesfromhefactthatthe…rststagecoincideswith
E
[
d
i
j
X
i
;
z
i
]
when(4.5.3)issaturated
(i.e.,the…rst-stageregressionrecoverstheCEF).
31
Notethatthevariabilityin
E
[
d
i
j
X
i
;
z
i
]
conditionalon
X
i
comesfrom
z
i
.Sotheweightingformulagivesmoreweightto
covariatevalueswheretheinstrumentcreatesmorevariationin…ttedvalues.The…rstlineoftheweightformula,(4.5.4),holds
foranyendogenousvariableina2SLSsetup.Thesecondisaconsequenceofthefactthatheretheendogenousvariableisa
dummy.
134
CHAPTER4.INSTRUMENTALVARIABLESINACTION
1

P
(
z
i
=1
j
X
i
)

(1

d
i
)
z
i
P
(
z
i
=1
j
X
i
)
:
Then
E
[
g
(
y
i
;
d
i
;
X
i
)
j
d
1
i

d
0
i
]=
E
[

i
g
(
y
i
;
d
i
;
X
i
)]
E
[

i
]
:
32
Forcompliers,
P
[
d
i
=1
jf
y
1
i
;
y
0
i
g
;
X
i
;
d
1
i

d
0
i
]
=
P
[
z
i
=1
jf
y
1
i
;
y
0
i
g
;
X
i
;
d
1
i

d
0
i
]
:
Andbyconditionalindependence,
P
[
z
i
=1
jf
y
1
i
;
y
0
i
g
;
X
i
;
d
1
i

d
0
i
]
=
P
[
z
i
=1
j
X
i
;
d
1
i

d
0
i
]
:
4.5.GENERALIZINGLATE
135
Thiscanbeprovedbydirectcalculationusingthefactthat,giventheassumptionsoftheLATEthe-
orem,anyexpectationisaweightedaverageofmeansforalways-takers,never-takers,andcompliers.By
monotonicity,thosewith
d
i
(1

z
i
)=1
arealways-takersbecausetheyhave
d
0
i
=1
,whilethosewith
(1

d
i
)
z
i
=1
arenever-takersbecausetheyhave
d
1
i
=0
.Hence,thecompliersaretheleft-outgroup.
TheAbadietheoremhasanumberofimportantimplications;forexample,itcropsupagaininthe
discussionofquantiletreatmente¤ects.Here,weuseittoapproximate
E
[
y
i
j
d
i
;
X
i
;
d
1
i

d
0
i
]
bylinear
regression.Speci…cally,let

a
and

a
solve
(

a
;
a
)=argmin
a;b
E
f
(
E
[
y
i
j
d
i
;
X
i
;
d
1
i

d
0
i
]

a
d
i

X
0
i
b
)
2
j
d
1
i

d
0
i
g
:
Inotherwords,

a
d
i
+
X
0
i

a
givestheMMSEapproximationto
E
[
y
i
j
d
i
;
X
i
;
d
1
i

d
0
i
]
,or…tsitexactlyifit’s
linear.AconsequenceofAbadie’stheoremisthatthisapproximatingfunctioncanbeobtainedbysolving
(

a
;
a
)=argmin
a;b
E
f

i
(
y
i

a
d
i

X
0
i
b
)
2
g
;
(4.5.5)
thekappa-weightedleast-squaresminimand.
33
Abadieproposesanestimationstrategy(anddevelopsdistributiontheory)foraprocedurewhichinvolves
…rst-stepestimationof

i
usingparametricorsemiparametricmodelsforthefunction,
p
(
X
i
)=
P
(
z
i
=1
j
X
i
)
.
Theestimatesfromthe…rststeparethenpluggedintothesampleanalogof(4.5.5)inthesecondstep.Not
surprisingly,whentheonlycovariateisaconstant,Abadie’sproceduresimpli…estotheWaldestimator.
Moresurprisingly,minimizationof(4.5.5)producesthetraditional2SLSestimatoraslongasalinearmodel
isusedfor
p
(
X
i
)
intheconstructionof

i
.Inotherwords,if
P
(
z
i
=1
j
X
i
)=
X
0
i

isusedwhenconstructing
anestimateof

i
,theAbadieestimandis2SLS.Thus,wecanconcludethatwhenever
p
(
X
i
)
canbe…tor
closelyapproximatedbyalinearmodel,itmakessensetoview2SLSasanapproximationtothecomplier
causalresponsefunction,
E
[
y
i
j
d
i
;
X
i
;
d
1
i

d
0
i
]
.Ontheotherhand,

a
isnot,ingeneral,the2SLSestimand
and

a
isnot,ingeneral,thevectorofcovariatee¤ectsproducedby2SLS.Still,theequivalenceto2SLS
forlinear
P
(
z
i
=1
j
X
i
)
33
Theclassofapproximatingfunctionsneedn’tbelinear.Insteadof
a
d
i
+
X
0
i
b
,itmightmakesensetouseanonlinearfunction
likeanexponential(ifthedependentvariableisnon-negative)orprobit(ifthedependentvariableiszero-one).Wereturn
tothispointattheendofthischapter.AsnotedinSection(4.4.4),thekappa-weightingscemecanbeusedtocharacterize
covariatedistributionsforcompliersaswellastoestimateoutcomedistributions.
136
CHAPTER4.INSTRUMENTALVARIABLESINACTION
areidenticalat-3.55(s.e.=.617).ThisisnotastrikeagainstAbadie’sprocedure.Rather,itsupportsthe
notion,whichweholddear,that2SLSapproximatesthecausalrelationofinterest.
34
4.5.3AverageCausalResponsewithVariableTreatmentIntensity
F
s
yearsofeducation.
Notethatthefunction
f
i
(
s
)
hasan“
i
”subscriptonitwhile
s
doesnot.Thefunction
f
i
(
s
)
tellsuswhat
i
wouldearnfor
any
valueofschooling,
s
,andnotjustfortherealizedvalue,
s
i
.Inotherwords,
f
i
(
s
)
answers
causal“whatif”questionsformultinomial
s
i
.
Supposethat
s
i
takesonvaluesintheset
f
0
;
1
;:::;

s
g
.Thenthereare

s
unitcausale¤ects,
Y
si

Y
s

1
;i
:
A
linearcausalmodelassumesthesearethesameforall
s
andforall
i
,obviouslyunrealisticassumptions.But
weneednottaketheseassumptionsliterally.Rather,2SLSprovidesacomputationaldevicethatgenerates
aweightedaverageofunitcausale¤ects,withaweightingfunctionwecanestimateandstudy,soasto
learnwheretheactioniscomingfromwithaparticularinstrument.Thisweightingfunctiontellsushow
thecompliersaredistributedovertherangeof
s
i
:
34
Abadie(2003)givesformulasforstandarderrorsandAlbertoAbadiehaspostedsoftwaretocomputethem.Thebootstrap
providesasimplealternative,whichweusedtoconstructstandarderrorsfortheAbadieestimatesmentionedinthisparagraph.
4.5.GENERALIZINGLATE
137
(ACR1,IndependenceandExclusion)
f
Y
0
i
;Y
1
i
;:::;Y

si
;
s
0
i
;s
1
i
gq
z
i
;
(ACR2,First-stage),
E
[
s
1
i

s
0
i
]
6
=0
(ACR3,Monotonicity)
s
1
i

s
0
i

0
8
i
,orviceversa;assumethe…rst
Then
E
[
y
i
j
z
i
=1]

E
[
y
i
j
z
i
=0]
E
[
s
i
j
z
i
=1]

E
[
s
i
j
z
i
=0]
=

s
X
s
=1
!
s
E
[
Y
si

Y
s

1
;i
j
s
1
i

s�s
0
i
]
where
!
s
=
P
[
s
1
i

s�s
0
i
]
P

s
j
=1
P
[
s
1
i

j�s
0
i
]
Theweights
!
s
arenon-negativeandsumtoone.
Theaveragecausalresponse(ACR)theoremsaysthattheWaldestimatorwithvariabletreatmentin-
tensityisaweightedaverageofthe
unitcausalresponse
alongthelengthofthepotentiallynonlinearcausal
relationdescribedby
f
i
(
s
)
.Theunitcausalresponse,
E
[
Y
si

Y
s

1
;i
j
s
1
i

s�s
0
i
]
;
istheaveragedi¤erence
inpotentialoutcomesfor
compliersatpoint
s
,i.e.,individualsdrivenbytheinstrumentfromatreatment
intensitylessthan
s
toatleast
s
.Forexample,thequarterofbirthinstrumentsusedbyAngristand
Krueger(1991)pushsomepeoplefrom11
th
gradeto…nishing12
th
orhigher,andothersfrom10
th
gradeto
…nishing11
th
orhigher.TheWaldestimatorusingquarterofbirthinstrumentscombinesallofthesee¤ects
intoasingleaveragecausalresponse.
Therelativesizeofthegroupofcompliersatpoint
s
is
P
[
s
1
i

s�s
0
i
]
.Bymonotonicity,thismustbe
non-negativeandisgivenbythedi¤erenceintheCDFof
s
i
atpoint
s
.Toseethis,notethat
P
[
s
1
i

s�s
0
i
]=
P
[
s
1
i

s
]

P
[
s
0
i

s
]
=
P
[
s
0
i
s
]

P
[
s
1
i
s
]
;
whichisnon-negativesincemonotonicityrequires
s
1
i

s
0
i
.Moreover,
P
[
s
0
i
s
]

P
[
s
1
i
s
]=
P
[
s
i
s
j
z
i
=0]

P
[
s
i
s
j
z
i
=1]
byIndependence.Finally,notethatbecausethemeanofanon-negativerandomvariableisoneminusthe
CDF,wehave,
E
[
s
i
j
z
i
=1]

E
[
s
i
j
z
i
=0]
=

s
X
j
=1
(
P
[
s
i
j
j
z
i
=1]

P
[
s
i
j
j
z
i
=0])=

s
X
j
=1
P
[
s
1
i

j�s
0
i
]
Thus,theACRweightingfunctioncanbeconsistentlyestimatedbycomparingtheCDFsoftheendogenous
variables(treatmentintensity)withtheinstrumentswitchedo¤andon.Theweightingfunctionisnormalized
138
CHAPTER4.INSTRUMENTALVARIABLESINACTION
bythe…rst-stage.
TheACRtheoremhelpsusunderstandwhatwearelearningfroma2SLSestimate.Forexample,
instrumentalvariablesderivedfromcompulsoryattendanceandchildlaborlawscapturethecausale¤ectof
increasesinschoolinginthe6-12graderange,butnotfrompost-secondaryschooling.Thisisillustratedin
Figure4.5.1,takenfromAcemogluandAngrist(2000).
The…gureplotsdi¤erencesintheprobabilitythateducationalattainmentisatorexceedsthegradelevel
ontheX-axis(i.e.,oneminustheCDF).Thedi¤erencesarebetweenmenexposedtodi¤erentchildlabor
lawsandcompulsoryschoolinglawsintheasampleofwhitemenaged40-49drawnfromthe1960,1970,
and1980censuses.Theinstrumentsarecodedasthenumberofyearsofschoolingrequiredeithertowork
(PanelA)orleaveschool(PanelB)intheyeartherespondentwasaged14.Menexposedtotheleast
restrictivelawsarethereferencegroup.Eachinstrument(e.g.,adummyfor7yearsofschoolingrequired
beforeworkisallowed)canbeusedtoconstructaWaldestimatorbymakingcomparisonswiththereference
group.
PanelAofFigure4.5.1showsthatmenexposedtomorerestrictivechildlaborlawswere1-6percentage
4.5.GENERALIZINGLATE
139
Figure4.5.1:Thee¤ectofcompulsoryschoolinginstrumentsontheprobabilityofschooling(fromAcemoglu
andAngrist2000).The…guresshowthedi¤erenceintheprobabilityofschoolingatorexceedingthegrade
levelonthex-axis.Thereferencegroupis6orfeweryearsofrequiredschoolinginthetoppanel,and8or
feweryearsinthebottompanel.ThetoppanelshowstheCDFdi¤erencebyseverityofchildlaborlaws.
ThebottompanelshowstheCDFdi¤erencebyseverityofcompulsoryattendacelaws.
140
CHAPTER4.INSTRUMENTALVARIABLESINACTION
slopeofthisdemandcurveis
q
0
i
(
p
)
;ifquantityandpricearemeasuredinlogs,thisisanelasticity.
TheinstrumentsinAngrist,Graddy,andImbens(2000)arederivedfromdataonweatherconditions
o¤thecoastofLongIsland,nottoofarfrommajorcommercial…shinggrounds.Stormyweathermakesit
hardtocatch…sh,drivinguptheprice,andreducingquantitydemanded.Angrist,Graddy,andImbens
usedummyvariablessuchas
stormy
i
,adummyindicatingperiodswithhighwindandwavestoestimate
thedemandfor…sh.ThedataconsistofdailyobservationsonwholesalepurchasesofWhiting,acheap…sh
usedfor…shcakesandthingslikethat.
TheWaldestimatorusingthe
stormy
i
instrumentcanberepresentedas
E
[
q
i
j
stormy
i
=1]

E
[
q
i
j
stormy
i
=0]
E
[
p
i
j
stormy
i
=1]

E
[
p
i
j
stormy
i
=0]
(4.5.6)
=
R
E
[
q
0
i
(
t
)
j
p
1
i

t�p
0
i
]
P
[
p
1
i

t�p
0
i
]
dt
R
P
[
p
1
i

t�p
0
i
]
dt
;
(4.5.7)
where
p
i
isthepriceinmarket(day)
i
and
p
1
i
and
p
0
i
arepotentialpricesindexedby
stormy
i
.Thisisa
weightedaveragederivativewithweightingfunction
P
[
p
1
i

t�p
0
i
]=
P
[
p
i

t
j
z
i
=0]

P
[
p
i

t
j
z
i
=1]
atprice
t
.Inotherwords,IVestimationusing
stormy
i
producesanaverageofthederivative
q
0
i
(
t
)
,with
weightgiventoeachpossibleprice(indexedby
t
)
inproportiontotheinstrument-inducedchangeinthe
cumulativedistributionfunction(CDF)ofpricesatthatpoint.Thisisthesamesortofaveragingasinthe
ACRtheoremexceptthatnowtheunderlyingcausalresponseisaderivativeinsteadofaone-unitdi¤erence.
Theaveragecausalresponseformula,(4.5.6),comesfromthefactthat
E
[
q
i
j
stormy
i
=1]

E
[
q
i
j
stormy
i
=0]=
E
Z
p
1
i
p
0
i
q
0
i
(
t
)
dt;
(4.5.8)
bythefundamentaltheoremofcalculus.Twointerestingspecialcasesfallneatlyoutofequation(4.5.8).
The…rstiswhenthecausalresponsefunctionislinear,i.e.,
q
i
(
p
)=

0
i
+

1
i
p
,forsomerandomcoe¢cients,

0
i
and

1
i
:
Then,wehave
E
[
q
i
j
stormy
i
=1]

E
[
q
i
j
stormy
i
=0]
E
[
p
i
j
stormy
i
=1]

E
[
p
i
j
stormy
i
=0]
=
E
[

1
i
(
p
1
i

p
0
i
)]
E
[
p
1
i

p
0
i
]
;
(4.5.9)
aweightedaverageoftherandomcoe¢cient,

1
i
:
Theweightsareproportionaltothepricechangeinduced
4.6.IVDETAILS
141
R
P
[
p
1
i

r�p
0
i
]
dr
:
ThesespecialcaseshighlightthetwotypesofaveragingwrappedupintheACRtheoremanditscontin-
uouscorollary,(4.5.6).First,thereisaveraging
across
markets,withweightsproportionaltothe…rst-stage
142
CHAPTER4.INSTRUMENTALVARIABLESINACTION
constructing2SLSestimatesofawageequationthattreatsAFQTscores(anabilitytestusedbythearmed
forces)asanendogenouscontrolvariabletobeinstrumented.TheinstrumentsforAFQTareearlyschooling
4.6.IVDETAILS
143
therefore,useanonlinear…rststageinanattempttocomeclosertotheCEF.SupposethatweuseProbit
tomodel
E
[
d
i
j
X
i
;
Z
i
]
:
TheProbit…rststageis
[
X
0
i

p
0
+

0
p
1
Z
i
]
,where

p
0
and

p
1
areProbitcoe¢cients,
andthe…ttedvaluesare
ˆ
d
pi
=[
X
0
i
^

p
0
+^

0
p
1
Z
i
]
:
Theforbiddenregressioninthiscaseisthesecondstage
equationcreatedbysubstituting
ˆ
d
pi
for
d
i
:
y
i
=

0
X
i
+

ˆ
d
pi
+[

i
+

(
d
i

ˆ
d
pi
)]
:
(4.6.3)
Theproblemwith(4.6.3)isthatonlyOLSestimationof(4.6.2)isguaranteedtoproduce…rst-stageresiduals
thatareuncorrelatedwith…ttedvaluesandcovariates.If
E
[
d
i
j
X
i
;
Z
i
]=[
X
0
i

p
0
+

0
p
1
Z
i
]
;
thenresiduals
fromthenonlinearmodelwillbeasymptoticallyuncorrelatedwith
X
i
and
ˆ
d
pi
,butwhoistosaythatthe
35
Theinsightthatconsistencyof2SLSestimatesinatraditionalSEMdoesnotdependoncorrectspeci…cationofthe…rst-
stageCEFgoesbacktoKelejian(1971).Useofanonlinearplug-in…rst-stagemaynotdotoomuchdamageinpractice-a
probit…rst-stagecanbeprettyclosetolinear-butwhytakeachancewhenyoudon’thaveto?
144
CHAPTER4.INSTRUMENTALVARIABLESINACTION
S
jt
+

s
i
+
u
jt
+

ijt
;
(4.6.6)
4.6.IVDETAILS
145
where
Y
ijt
isthelogweeklywageofindividual
i
instate
j
inyear
t
,
u
jt
isastate-yearerrorcomponent,and

i
isanindividualerrorterm.Thecontrols

j
and

t
arestate-of-residenceandyeare¤ects.Thecoe¢cient

S
jt
,instate
j
andyear
t
.
Inadditiontotheusualconcernsabout
s
i
,themostimportantidenti…cationproblemraisedbyequa-
tion(4.6.6)isomittedvariablesbiasfromcorrelationbetweenaverageschoolingandotherstate-yeare¤ects
embodiedintheerrorcomponent
u
jt
.Forexample,publicuniversitysystemsmayexpandduringcyclical
upturns,generatingacommontrendinstateaverageschoolinglevelsandstateaverageearnings.Ace-
mogluandAngrist(2000)attempttosolvethisproblemusinginstrumentalvariablesderivedfromhistorical
compulsoryattendancelawsthatarecorrelatedwith
S
jt
butuncorrelatedwithcontemporary
u
jt
and

i
:
Whileomittedstate-yeare¤ectsaretheprimaryconcernmotivatingAcemogluandAngrist’s(2000)
instrumentalvariablesestimation,thefactthatoneregressor,
S
jt
,istheaverageofanotherregressor,
s
i
,
S
j
+

i
;
where
E
[

i
s
i
]=
E
[

i
S
j
]

0
:
(4.6.7)
where
Y
ij
ishelogweeklywageofindividual
i
instate
j
and
S
j
isaverageschoolinginthestate.Now,let

0
denotethecoe¢cientfromabivariateregressionof
Y
ij
on
s
i
S
j
only.Fromthediscussionofgroupingand2SLSearlierinthischapter,it’s
clearthat

1
isthe2SLSestimateofthecoe¢cienton
s
i
inabivariateregressionof
Y
ij
on
s
i
usingafull
setofstatedummiesasinstruments.TheAppendixusesthisfacttoshowthattheparametersinequation
(4.6.7)canbewrittenintermsof

0
and

1
as

0
=

1
+

(

0


1
)
(4.6.8)

1
=

(

1


0
)
where

=
1
1

R
2

1
;
and
R
2
isthe…rst-stageR-squared.
Theupshotof(4.6.8)isthatif,
foranyreason,
OLSestimatesofthebivariateregressionofwageson
individualschoolingdi¤erfrom2SLSestimatesusingstate-dummyinstruments,thecoe¢cientonaverage
schoolingin(4.6.7)willbenonzero.Forexample,ifinstrumentingwithstatedummiescorrectsforattenua-
tionbiasduetomeasurementerrorin
s
i
,wehave

1
�
0
andthespuriousappearanceofpositiveexternal
36
Thecoe¢cientonaverageschoolinginanequationwithindividualschoolingcanbeinterpretedastheHausman(1978)
146
CHAPTER4.INSTRUMENTALVARIABLESINACTION
equationlike4.6.6,thoughmoresophisticatedstrategieswhereboththeindividualandgroupaveragesare
treatedasendogenousmaywork.
Asecondandevenmoredi¢cultpeere¤ecttouncoveristhee¤ectofthegroupaverageofavariable
ontheindividuallevelof
thissamevariable.
ThisisnotreallyanIVproblem;ittakesusbacktobasic
regressionissues.Toseethispoint,supposethat
S
j
isthehigh-schoolgraduationrateinschool
j
,andwe
S
j
+

ij
;
(4.6.9)
where
s
ij
isindividual
i
’shighschoolgraduationstatusand
S
j
istheaveragehighschoolgraduationrate
inschool
j
,which
i
attends.
At…rstblush,equation(4.6.9)seemslikeasensibleformulationofawell-de…nedcausalquestion,but
infactitisnonsense.Theregressionof
s
ij
on
S
j
always
hasacoe¢cientof1,aconclusionthatcanbe
drawnimmediatelyonceyourecognize
S
j
asthe…rst-stage…ttedvaluefromaregressionof
s
ij
onafullset
ofschooldummies.
37
Thus,anequationlike(4.6.9)cannotpossiblybeinformativeaboutcausale¤ects.
Amodestlyimprovedversionofthebadpeerregressionchanges(4.6.9)to
s
ij
=

+

4
S
(
i
)
j
+

ij
;
(4.6.10)
where
S
(
i
)
j
isthemeanof
s
ij
inschool
j
,excludingstudent
i
.Thisisastepintherightdirection-
byde…nition,
i
isnotinthegroupusedtoconstruct
S
(
i
)
j
-butstillproblematicbecause
s
ij
and
S
(
i
)
j
arebotha¤ectedbyschool-levelrandomshocks.Thepresenceofrandome¤ectsintheerrortermraises
importantissuesforstatisticalinference,issuesdiscussedatlengthinChapter8.Butinanequationlike
(4.6.10),group-levelrandomshocksaremorethataproblemforstandarderrors:anyshockcommontothe
group(school)createsspuriouspeere¤ects.Forexample,particularlye¤ectiveschoolprincipalsmayraise
graduationratesforeveryoneintheschoolsatwhichtheywork.Thislookslikeapeere¤ectsinceitinduces
S
(
i
)
j
teststatisticfortheequalityofOLSestimatesand2SLSestimatesofprivatereturnstoschoolingusingstatedummiesas
instruments.Borjas(1992)discussesasimilarproblema¤ectingtheestimationofethnic-backgrounde¤ects.
37
Hereisadirectproofthattheregressionof
s
ij
on
S
j
isalwaysunity:
X
j
X
i
s
ij
(
S
j

S
)
X
j
n
j
(
S
j

S
)
2
=
X
j
(
S
j

S
)
X
i
s
ij
X
j
n
j
(
S
j

S
)
2
=
X
j
(
S
j

S
)(
n
j
S
j
)
X
j
n
j
(
S
j

S
)
2
=1
:
4.6.IVDETAILS
147
achievement.Wethereforeprefernotseeregressionslike(4.6.10)either.
Thebestshotatacausalinvestigationofpeere¤ectsfocusesonvariationin
exante
peercharacteristics,
thatis,somemeasureofpeerqualitywhichpredatestheoutcomevariableandisthereforeuna¤ected
bycommonshocks.ArecentexampleisAmmermuellerandPischke(2006),whostudythelinkbetween
classmates’familybackground,asmeasuredbythenumberofbooksintheirhomes,andstudentachievement
inEuropeanprimaryschools.TheAmmermuellerandPischkeregressionsareversionsof
s
ij
=


+

4
B
(
i
)
j
+

ij
;
where
B
(
i
)
j
istheaveragenumberofbooksinthehomeofstudent
i
’speers.Thislookslike(4.6.10),but
withanimportantdi¤erence.Thevariable
B
(
i
)
j
isafeatureofthehomeenvironmentthatpredatestest
scoresandisthereforeuna¤ectedbyschool-levelrandomshocks.
AngristandLang(2004)provideanotherexampleofanattempttolinkstudentachievementwiththe
exante
characteristicsofpeers.TheAngristandLangstudylooksattheimpactofbused-inlow-achieving
newcomersonhigh-achievingresidents’testscores.Theregressionofinterestinthiscaseisaversionof
s
ij
=

+

3
m
j
+

ij
;
(4.6.11)
where
m
j
isthenumberofbused-inlow-achieversinschool
j
and
s
ij
isresident-student
i
’stestscore.
Spuriouscorrelationduetocommonshocksisnotaconcerninthiscontextfortworeasons.First,
m
j
isafeatureoftheschoolpopulationdeterminedbystudentsoutsidethesampleusedtoestimate(4.6.11).
Second,thenumberoflow-achieversisan
exante
variablebiasedonpriorinformationaboutwherethe
studentscomefromandnottheoutcomevariable,
s
ij
.School-levelrandome¤ectsremainanimportant
issueforinference,however,since
m
j
isagroup-levelvariable.
4.6.3LimitedDependentVariablesReprise
InSection3.4.2,wediscussedtheconsequencesoflimiteddependentvariablesforregressionmodels.When
thedependentvariableisbinaryornon-negative,say,employmentstatusorhoursworked,theCEFis
typicallynonlinear.MostnonlinearLDVmodelsarebuiltaroundanon-lineartransformationofalinear
latentindex.ExamplesincludeProbit,Logit,andTobit.Thesemodelscapturefeaturesoftheassociated
CEFs(e.g.,Probit…ttedvaluesareguaranteedtobebetweenzeroandone,whileTobit…ttedvaluesare
148
CHAPTER4.INSTRUMENTALVARIABLESINACTION
acrossstudies.Nonlinearlatent-indexmodelsaremorelikeGLS-theyprovideane¢ciencygainwhentaken
literally,buttheyrequireacommitmenttofunctionalformanddistributionalassumptionsaboutwhichwe
donotusuallyfeelstrongly.
38
38
TheanalogybetweennonlinearLDVmodelsandGLSismorethanrhetorical.ConsideraProbitmodelwithnonlinear
CEF
E
[
y
i
j
X
i
]=
h
X
0
i


i

p
i
:
The…rst-orderconditionsformaximumlikelihoodestimationofthismodelare
X
(
y
i

p
i
)
X
i
p
i
(1

p
i
)
=0
:
Thus,maximumlikelihoodisthesameasGLSestimationofthenonlinearmodel
y
i
=

X
0
i



+

i
:
Consistencyofthemaximumlikelihoodestimatorturnsontheassumptionthattheconditionalvarianceof
y
i
is
p
i
(1

p
i
)
:
It’sworthnotingthatwecandispensewiththisassumptionandsimply…t
y
i
to

h
X
0
i


i
bynonlinearleastsquares(NLLS).
ThissortofagnosticNLLSsharestherobustnesspropertiesofOLS;itgivesthebestMMSE…tinaclassofapproximating
functions.
4.6.IVDETAILS
149
ofathirdchildinfamilieswithsame-sexsibships.
Anoutcomeofprimaryinterestinthiscontextisemploymentstatus,aBernoullirandomvariablewith

"
;
X
0
i

0
+

1
z
i


;

"

(4.6.14)
+(1

y
i
)ln

1


b

X
0
i

0
+

1
d
i

"
;
X
0
i

0
+

1
z
i


;

"

;
where

b
(

;

;

"
)
isthebivariatenormaldistributionfunctionwithcorrelationcoe¢cient

"
.Note,how-
ever,thatwecanmultiplythelatentindexcoe¢cientsbyapositiveconstantwithoutchangingthelikelihood.
Theobjectofestimationisthereforetheratiooftheindexcoe¢cientstothestandarddeviationoftheerror
terms(e.g.,

1
=
"
).
Thepotentialoutcomesde…nedbythebivariateProbitmodelare
y
0
i
=1[
X
0
i

0
�"
i
]
and
y
1
i
=1[
X
0
i

0
+

1
�"
i
]
;
whilepotentialtreatmentassignmentsare
d
0
i
=1[
X
0
i

0
�v
i
]
and
d
1
i
=1[
X
0
i

0
+

1
�v
i
]
:
Asusual,onlyonepotentialoutcomeandonepotentialassignmentisobservedforanyoneperson.It’salso
clearfromthisrepresentationthatcorrelationbetween
v
i
and
"
i
150
CHAPTER4.INSTRUMENTALVARIABLESINACTION
whiletheaveragee¤ectonthetreatedis
E
[
y
1
i

y
0
i
j
d
i
=1]=
E
f
1[
X
0
i

0
+

1
�"
i
]

1[
X
0
i

0
�"
i
]
j
X
0
i

0
+

1
z
i
�v
i
g
:
Givenalterativedistributionalassumptionsfor
v
i
and
"
i
,thesecanbeanything(Iftheerrortermsare





X
0
i

0


;
where
[

]
isthenormalCDF.Thee¤ectonthetreatedisalittlemorecomplicatedsinceitinvolvesthe
bivariatenormalCDF
E
[
y
1
i

y
0
i
j
d
i
=1]
(4.6.16)
=
E
8

:

b

X
0
i

0
+

1

"
;
X
0
i

0
+

1
z
i


;

"



b

X
0
i

0

"
;
X
0
i

0
+

1
z
i


;

"



X
0
i

0
+

1
z
i



:
9
=
;
SincethebivariatenormalCDFisacannedfunctioninmanysoftwarepackages,thisiseasyenoughto
calculateinpractice.
BivariateProbitprobablyquali…esasharmlessinthesensethatit’snotverycomplicated,andeasy
39
Supposethelatenterrortermhasanunknowndistribution,withCDF
[

]
:
Theaveragecausale¤ectinthiscaseis
E



X
0
i

0
+

1




X
0
i

0

=
0
[
X
0
i

0
+
~

1
]

1
;
where
~

1
isin
[0
;
1
]
.Thisalwaysdependsontheshapeof
[

]
:
4.6.IVDETAILS
151
writeLATEas
E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
]
=
E
f
1[
X
0
i

0
+

1
�"
i
]

1[
X
0
i

0
�"
i
]
j
X
0
i

0
+

1
�v
i

X
0
i

0
g
;
which,like(4.6.16),canbeevaluatedusingjointnormalityof
v
i
and
"
i
:
Butyouneedn’tbotherusing
normalitytoevaluate
E
[
y
1
i

y
0
i
j
d
1
i

d
0
i
]
,sinceLATEcanbeestimatedbyIVforeach
X
i
andaveraged

"

2
)
is-.137,reportedincolumn3.Thisisnotsurprisingsincethemodelwithoutcovariatesimposesnofunctional
formassumptions.
Perhapsmoresurprisingisthefactthatmarginale¤ectsandtheaveragetreatmente¤ectscalculated
using(4.6.15)and(4.6.16)arealsothesameasthe2SLSandAbadieestimates.Theseresultsarereported
incolumns4-6.Themarginale¤ectcalculatedusingaderivativetoapproximatetothe…nitedi¤erencein
(4.6.15)is-.138(incolumn4,labelledMFXformarginale¤ects),whilebothaveragetreatmente¤ectsare
-.139incolumns5and6.Addingafewcovariateshaslittlee¤ectontheestimates,ascanbeseeninPanel
152
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Table4.6.1:2SLS,Abadie,andbivariateprobitestimatesofthee¤ectsofathirdchildonfemalelabor
supply
2SLSAbadieEstimatesBivariateprobit
LinearProbitMFXATETOT
(1)(2)(3)(4)(5)(6)
A.NoCovariates
Employment-0.138-0.138-0.137-0.138-0.139-0.139
(0.029)(0.030)(0.030)(0.029)(0.029)(0.029)
B.Somecovariates(noagecontrols)
Employment-0.132-0.132-0.131-0.135-0.135-0.135
(0.029)(0.029)(0.028)(0.028)(0.028)(0.028)
C.Somecovariatesplusageat…rstbirth
Employment-0.129-0.129-0.129-0.133-0.133-0.133
(0.028)(0.028)(0.028)(0.026)(0.026)(0.026)
D.Somecovariatesplusageat…rstbirthandadummyforage

30
Employment-0.124-0.125-0.125-0.131-0.131-0.131
(0.028)(0.029)(0.029)(0.025)(0.025)(0.025)
E.Somecovariatesplusageat…rstbirthandage
Employment-0.120-0.121-0.121-0.171-0.171-0.171
(0.028)(0.026)(0.026)(0.023)(0.023)(0.023)
Notes:AdaptedfromAngrist(2001).Thetablecompares2SLSestimatestoalternativeIV-
typeestimatesofthee¤ectofchildbearingonlaborsupplyusingnonlinearmodels.Standard
errorsfortheAbadieestimateswerebootstrappedusing100replicationsofsubsamplesof
size20,000.MFXdenotesmarginale¤ects;ATEistheaveragetreatmente¤ect;TOTis
theaveragee¤ectoftreatmentonthetreated.
B.Inthiscase,thecovariatesarealldummyvariables,threeforrace(black,Hispanic,andother),andtwo
indicating…rstandsecond-bornboys(theexcludedinstrumentistheinteractionofthesetwo).PanelsC
andDshowthataddingalinearterminageat…rstbirthandadummyformaternalagealsoleavesthe
estimatesunchanged.
Theinvariancetocovariatesseemsdesirable:sincethesame-sexinstrumentisessentiallyindependentof
thecovariates,controlforcovariatesisunnecessarytoeliminatebiasandshouldprimarilya¤ectprecision.
Yet,asPanelEshows,themarginale¤ectsgeneratedbybivariateProbitaresensitivetothelistofcovariates.
Swappingadummyindicatingmothersover30withalinearagetermincreasesthebivariateProbitestimates
markedly,to-.171,whileleaving2SLSandtheAbadieestimatorsunchanged.Thisprobablyre‡ectsthefact
thatthelinearagechangeinducesanextrapolationintocellswherethereislittledata.Althoughthereisno
harminreportingtheresultsinPanelE,it’shardtoseewhythemorerobust2SLSandAbadieestimators
shouldnotbefeaturedasmostlikelymorereliable.
40
40
Angrist(2001)makesthesamepointusingtwinsinstruments,andreportsasimilarpatterninacomparisonof2SLS,
4.6.IVDETAILS
153
4.6.4TheBiasof2SLS
F
ItisafortunatefactthattheOLSestimatorisnotonlyconsistent,itisalsounbiased.Thismeansthatina
sampleofanysize,theestimatedOLScoe¢cientvectorhasadistributionthatiscenteredonthepopulation
coe¢cientvector.
41
The2SLSestimator,incontrast,isconsistent,butbiased.Thismeansthatthe2SLS
estimatoronlypromisestobeclosethecausale¤ectofinterestinlargesamples.Insmallsamples,the2SLS
estimatorcandi¤ersystematicallyfromthepopulationestimand.
Formanyyears,appliedresearchershavelivedwiththeknowledgethat2SLSisbiasedwithoutlosing
toomuchsleep.Neitherofusheardmuchaboutthebiasof2SLSinourgraduateeconometricsclasses.A
seriesofpapersintheearly1990schangedthis,however.Thesepapersshowthat2SLSestimatescanbe
highlymisleadingincasesrelevantforempiricalpractice.
42
The2SLSestimatorismostbiasedwhentheinstrumentsare“weak,”meaningthecorrelationwith
endogenousregressorsislow,andwhentherearemanyover-identifyingrestrictions.Whentheinstruments
arebothmanyandweak,the2SLSestimatorisbiasedtowardstheprobabilitylimitofthecorresponding
OLSestimate.Intheworst-casescenarioformanyweakinstruments,whentheinstrumentsaresoweakthat
therereallyisno…rst-stageinthepopulation,the2SLSsamplingdistributioniscenteredontheprobability
limitofOLS.Thetheorybehindthisresultisalittletechnicalbutthebasicideaiseasytosee.Thesource
ofthebiasin2SLSestimatesistherandomnessinestimatesofthe…rst-stage…ttedvalues.Inpractice,the
…rst-stageestimatesre‡ectsomeoftherandomnessintheendogenousvariablesincethe…rst-stagecoe¢cients
comefromaregressionoftheendogenousvariableontheinstruments.Ifthepopulation…rst-stageiszero,
thenalloftherandomnessinthe…rststageisduetotheendogenousvariable.Thisrandomnessturnsinto
…nite-samplecorrelationbetween…rst-stage…ttedvaluesandthesecond-stageerrors,sincetheendogenous
variableiscorrelatedwiththesecond-stageerrors(orelseyouwouldn’tbeinstrumentinginthe…rstplace).
Amoreformalderivationof2SLSbiasgoeslikethis.Tostreamlinethediscussionweusematricesand
Abadie,andnonlinearstructuralestimatesofmodelsforhoursworked.Angrist(1991)compares2SLSandbivariateProbit
estimatesinsamplingexperiments.
41
AmoreprecisestatementisthatOLSisunbiasedwhen,either(a)theCEFislinearor,(b)theregressorsarenon-stochastic,
i.e.,…xedinrepeatedsamples.Inpractice,thesequali…cationsdonotseemtomattermuch.Asarule,thesamplingdistribution
of
^

=

P
i
X
i
X
0
i


1
P
i
X
i
y
i
;
tendstobecenteredonthepopulationanalog,

=
E
[
X
i
X
0
i
]

1
E
[
X
i
y
i
]
insamplesofanysize
;
whetherornottheCEFislinearortheregressorsarestochastic.
42
KeyreferencesareNelsonandStartz,(1990a,b);Buse(1992),Bekker(1994);andespeciallyBound,Jaeger,andBaker
(1995).
154
CHAPTER4.INSTRUMENTALVARIABLESINACTION
The
N

q
matrixofinstrumentalvariablesis
Z
,withtheassociated…rst-stageequation
x
=
Z
+
:
(4.6.18)
OLSestimatesof(4.6.17)arebiasedbecause

i
iscorrelatedwith

i
.Theinstruments,
Z
i
areuncorrelated
with

i
byconstructionanduncorrelatedwith

i
byassumption.
The2SLSestimatoris
b

2
SLS
=(
x
0
P
Z
x
)

1
x
0
P
Z
y
=

+(
x
0
P
Z
x
)

1
x
0
P
Z
:
where
P
Z
=
Z
(
Z
0
Z
)

1
Z
0
istheprojectionmatrixthatproduces…ttedvaluesfromaregressionof
x
on
Z
.
Substitutingfor
x
in
x
0
P
Z


2

"
E
(

0
Z
0
Z
)
=
q

2

+1
#

1
43
SeeBekker(1994)andAngristandKrueger(1995).Thisisalsocalledagroup-asymptoticapproximationbecauseitcan
bederivedfromananasymptoticsequencethatletsthe
numberinstruments
gotoin…nityatthesametimeasthenumberof
observationsgoestoin…nity,therebykeepingthenumberofobservationsperinstrumentconstant.
4.6.IVDETAILS
155
(seetheappendixforaderivation).Theterm
(1
=
2

)
E
(

0
Z
0
Z
)
=
q
istheF-statisticforthejointsigni…cance
ofallregressorsinthe…rststageregression.
44
Callthisstatistic
F
,sothatwecanwrite
E
[
b

2
SLS


]




2

1
F
+1
:
(4.6.21)

2

.Thebiasof
theOLSestimatoris



2
x
,whichalsoequals



2

if

=0
:
Thus,wehaveshownthat2SLSiscenteredon
thesamepointasOLSwhenthe…rststageiszero.Moregenerally,wecansay2SLSestimatesare"biased
towards"OLSestimateswhenthereisn’tmuchofa…rststage.Ontheotherhand,thebiasof2SLSvanishes
when
F
44
Sortof;theactualF-statisticis
(1
=
^

2

)^

0
Z
0
Z
^
=
q
,wherehatsdenoteestimates.
(1
=
2

)
E
(

0
Z
0
Z
)
=
q
isthereforesometimes
calledthepopulationF-statisticsinceit’stheF-statisticwe’dgetinanin…nitelylargesample.Inpractice,thedistinction
betweenpopulationandsampleFmatterslittleinthiscontext.
156
CHAPTER4.INSTRUMENTALVARIABLESINACTION
large-sampledistributionas2SLS(underconstante¤ects)whileproviding…nite-samplebiasreduction.A
numberofestimatorsreducethebiasinoveridenti…ed2SLSmodels.ButanextensiveMonteCarlostudy
byFlores-Lagunes(2007)suggeststhatLIMLdoesatleastaswellasthealternativesinawiderangeof
circumstances(intermsofbias,meanabsoluteerror,andtheempiricalrejectionratesfor
t
-tests).Another
advantageofLIMListhatmanystatisticalpackagescomputeitwhileotherestimatorstypicallyrequire
someprogramming.
45
45
LIMLisavailableinSASandinSTATA10.Withweakinstruments,LIMLstandarderrorsarenotquiteright,butBekker
(1994)givesasimple…xforthis.WhyisLIMLunbiased?Expression(4.6.21)showsthattheapproximatebiasof2SLSis
proportionaltothebiasofOLS.FromthisweconcludethatthereisalinearcombinationofOLSand2SLSthatisapproximately
unbiased.LIMLturnsouttobejustsucha"combinationestimator".Likethebiasof2SLS,theapproximateunbiasedness
ofLIMLcanbeshownusingaBekker-stylegroup-asymptoticsequencethat…xestheratioofinstrumentstosamplesize.Its
worthmentioning,however,thatLIMLisbiasedinmodelswithacertaintypeofheteroskedasticity;SeeHausman,Newey,and
Wouterson(2006)fordetails.
4.6.IVDETAILS
157
performwellandiscenteredaround

=1
,withabitmoredispersionthaninthe
q
=2
case.
Finally,Figure4.6.3reportssimulationresultsfromamodelthatistrulyunidenti…ed.Inthiscase,we
set

j
=0;
j
=1
;:::;
20
.Notsurprisingly,allthesamplingdistributionsarecenteredaroundthesamevalue
asOLS.Ontheotherhand,the2SLSsamplingdistributionismuchtighterthantheLIMLdistribution.We
wouldsayadvantage-LIMLinthiscasebecausethewidelydispersedLIMLsamplingdistributioncorrectly
re‡ectsthefactthatthesampleisuninformativeabouttheparameterofinterest.
46
ArecentpaperbyChernozhukovandHansen(2007)formalizesthismaxim.
158
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Table4.6.2:AlternativeIVestimatesoftheeconomicreturnstoschooling
(1)(2)(3)(4)(5)(6)
2SLS0.1050.4350.0890.0760.0930.091
(0.020)(0.450)(0.016)(0.029)(0.009)(0.011)
LIML0.1060.5390.0930.0810.1060.110
(0.020)(0.627)(0.018)(0.041)(0.012)(0.015)
F-statistic(excludedinstruments)32.270.424.911.612.581.97
Controls
Yearofbirth
XXXXXX
Stateofbirth
XX
Age,Agesquared
XXX
ExcludedInstruments
Quarterofbirth
XX
Quarterofbirth*yearofbirth
XXXX
Quarterofbirth*stateofbirth
XX
Numberofexcludedinstruments323028180178
4.6.IVDETAILS
159
The…rstcolumninthetablereports2SLSandLIMLestimatesofamodelusingthreequarterofbirth
dummiesasinstrumentswithyearofbirthdummiesascovariates.TheOLSestimateforthisspeci…cation
is0.071,whilethe2SLSestimateisabithigherat0.105.The…rst-stageF-statisticisover32,wellabove
thedangerzone.Notsurprisingly,theLIMLestimateisalmostidenticalto2SLSinthiscase.
AngristandKrueger(1991)experimentedwithmodelsthatincludeageandagesquaredmeasuredin
quartersasadditionalcontrols.Thesecontrolsaremeanttopickupomittedagee¤ectsthatmightconfound
thequarter-of-birthinstruments.Theadditionofageandagesquaredreducesthenumberofinstruments
totwo,sinceageinquarters,yearofbirth,andquarterofbirtharelinearlydependent.Asshownincolumn
2,the…rststageF-statisticdropsto0.4whenageandagesquaredareincludedascontrols,asuresignof
trouble.Butthe2SLSstandarderrorishighenoughthatwewouldnotdrawanysubstantiveconclusions
fromthisestimate.TheLIMLestimateisevenlessprecise.Thismodelise¤ectivelyunidenti…ed.
47
CruzandMoreira(2005)similarlyconcludethat,lowF-statisticsnotwithstanding,thereislittlebiasintheAngristand
Krueger(1991)180-instrumentspeci…cations.
160
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Figure4.6.1:DistributionoftheOLS,IV,2SLS,andLIMLestimators.IVusesoneinstrument,while2SLS
andLIMLusetwoinstruments.
4.7Appendix
DerivationofEquation(4.6.8)
Rewriteequation(4.6.7)asfollows
Y
ij
=


+

0

i
+(

0
+

1
)
S
j
+

i
;
where

i

s
i

S
j
:
Since

i
and
S
j
areuncorrelatedbyconstruction,wehave:

1
=

0
+

1
:

0
=
C
(

i
;Y
ij
)
V
(

i
)
:
Simplifyingthesecondline,

0
=
C
[(
s
i

S
j
)
;Y
ij
]
[
V
(
s
i
)

V
(
S
j
)]
=

C
(
s
i
;Y
ij
)
V
(
s
i
)

V
(
s
i
)
V
(
s
i
)

V
(
S
j
)



C
(
S
j
;Y
ij
)
V
(
S
j
)

V
(
S
j
)
V
(
s
i
)

V
(
S
j
)

=

0

+

1
(1


)=

1
+

(

0


1
)
where


V
(
s
i
)
V
(
s
i
)

V
(
S
j
)
:
Solvingfor

1
,wehave

1
=

1


0
=

(

1


0
)
:
4.7.APPENDIX
161
Figure4.6.2:DistributionoftheOLS,2SLS,andLIMLestimatorswith20instruments
Figure4.6.3:DistributionoftheOLS,2SLS,andLIMLestimatorswith20worthlessinstruments
162
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Derivationoftheapproximatebiasof2SLS
Startfromthelastequalityin(4.6.20):
E
[
b

2
SLS


]


E
(

0
Z
0
Z
)+
E


0
P
Z



1
E


0
P
Z


:
Themagicoflinearalgebrahelpsussimplifythisexpression:Theterm

0
P
Z

isascalarandtherefore
equaltoitstrace;thetraceisalinearoperatorwhichpassesthroughexpectationsandisinvarianttocyclic
permutations;…nally,thetraceof
P
Z
,anidempotentmatrix,isequaltoit’srank,
q
.Usingthesefacts,we
have
E


0
P
Z


=
E

tr


0
P
Z


=
E

tr

P
Z

0

=
tr

P
Z
E


0

=
tr

P
Z

2

I

=

2

tr
(
P
Z
)
=

2

q
;
wherewehaveassumedthat

i
ishomoskedastic.Similarly,applyingthetracetrickto

0
P
Z

showsthat
thistermisequalto


q
.Therefore,
E
[
b

2
SLS


]


E
(

0
Z
0
Z
)+

2

q


1
E

tr


0
P
Z


=


q

E
(

0
Z
0
Z
)+

2

q


1
=



2

"
E
(

0
Z
0
Z
)
=
q

2

+1
#

1
:
Multivariate…rst-stageF-statistics
Assumeanyexogenouscovariateshavebeenpartialledoutoftheinstrumentlistandthattherearetwo
endogenousvariables,
x
1
and
x
2
withcoe¢cients

1
and

2
.Weareinterestedinthebiasofthe2SLS
estimatorof

2
when
x
1
isalsotreatedasendogenous.Thesecondstageequationis
y
=
P
Z
x
1

1
+
P
Z
x
2

2
+[

+(
x
1

P
Z
x
1
)

1
+(
x
2

P
Z
x
2
)

2
]
:
(4.7.1)
where
P
Z
x
1
and
P
Z
x
2
arethe…rst-stage…ttedvaluesfromregressionsof
x
1
and
x
2
on
Z
.Bytheusual
anatomyformulaformultivariateregression,

2
in(4.7.1)isthebivariateregressionof
y
ontheresidualfrom
4.7.APPENDIX
163
aregressionof
P
Z
x
2
on
P
Z
x
1
.Thisresidualis
[
I

P
Z
x
1
(
x
0
1
P
Z
x
1
)

1
x
0
1
P
Z
]
P
Z
x
2
=
M
1
z
P
Z
x
2
;
where
M
1
z
=[
I

P
Z
x
1
(
x
0
1
P
Z
x
1
)

1
x
0
1
P
Z
]
istherelevantresidual-makermatrix.Inaddition,notethat
M
1
z
P
Z
x
2
=
P
Z
[
M
1
z
x
2
]
:
Fromhereweconcludethatthe2SLSestimatorof

2
istheOLSregressionon
P
Z
[
M
1
z
x
2
]
;
inotherwords,
OLSonthe…ttedvaluesfromaregressionof
M
1
z
x
2
on
Z
.Thisisthesameas2SLSusing
P
Z
toinstrument
M
1
z
x
2
.Sothe2SLSestimatorof

2
canbewritten
[
x
0
2
M
1
z
P
Z
M
1
z
x
2
]

1
x
0
2
M
1
z
P
Z
y
=

2
+[
x
0
2
M
1
z
P
Z
M
1
z
x
2
]

1
x
0
2
M
1
z
P
Z
:
164
CHAPTER4.INSTRUMENTALVARIABLESINACTION
Chapter5
ParallelWorlds:FixedE¤ects,
Di¤erences-in-di¤erences,andPanel
Data
The…rstthingtorealizeaboutparalleluniverses...isthattheyarenotparallel.
DouglasAdams,
MostlyHarmless
(1995)
Thekeytocausalinferenceinchapter3iscontrolforobservedconfoundingfactors.Ifimportant
166
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
i.e.unionstatusisasgoodasrandomlyassignedconditionalonunobservedworkerability,
A
i
,andother
observedcovariates
X
it
,likeageandschooling.
Thekeyto…xed-e¤ectsestimationistheassumptionthattheunobserved
A
i
appearswithoutatime
subscriptinalinearmodelfor
E
(
y
0
it
j
A
i
;
X
it
;t
):
E
(
y
0
it
j
A
i
;
X
it
;t
)=

+

t
+
A
0
i

+
X
it
;
(5.1.1)
Finally,weassumethatthecausale¤ectofunionmembershipisadditiveandconstant:
E
(
y
1
it
j
A
i
;
X
it
;t
)=
E
(
y
0
it
j
A
i
;
X
it
;t
)+
:
Thisimplies
E
(
y
it
j
A
i
;
X
it
;t;
d
it
)=

+

t
+

d
it
+
A
0
i

+
X
it
;
(5.1.2)
where

isthecausale¤ectofinterest.Thesetofassumptionsleadingto(5.1.2)ismorerestrictivethose
weusedtomotivateregressioninChapter3;weneedthelinear,additivefunctionalformtomakeheadway
ontheproblemof
unobserved
confoundersusingpaneldatawithnoinstruments.
1
Equation(5.1.2)implies
y
it
=

i
+

t
+

d
it
+
X
it

+
"
it
:
(5.1.3)
where

i


+
A
0
i
:
Thisisa
…xed-e¤ectsmodel.
Givenpaneldata,i.e.,repeatedobservationsonindividuals,thecausale¤ect
ofunionstatusonwagescanbeestimatedbytreating

i
1
Insomecases,wecanallowheterogeneoustreatmente¤ectssothat
E
(
y
1
it

y
0
it
j
A
i
;
X
it
;t
)=

i
:
See,e.g.,Wooldridge(2005),whodiscussesestimatorsfortheaverageof

i
:
2
Analternativetothe…xed-e¤ectsspeci…cationis"randome¤ects"(See,e.g.,Wooldridge,2006).Therandom-e¤ectsmodel
assumesthat

i
isuncorrelatedwiththeregressors.Becausetheomittedvariableinarandom-e¤ectsmodelisuncorrelated
withincludedregressorsthereisnobiasfromignoringit-ine¤ect,itbecomespartoftheresidual.Themostimportant
consequenceofrandome¤ectsisthattheresidualsforagivenpersonarecorrelatedacrossperiods.Chapter8discussesthe
implicationsofthisforstandarderrors.AnalternativeapproachisGLS,whichpromisestobemoree¢cientiftheassumptions
oftherandom-e¤ectsmodelaresatis…ed(linearCEF,homoskedasticity).WepreferOLS/…x-the-standard-errorstoGLSunder
random-e¤ectsassumptions.AsdiscussedinSection3.4.1,GLSrequiresstrongerassumptionsthanthosewearecomfortable
withandtheresultinge¢ciencygainislikelytobemodest.
5.1.INDIVIDUALFIXEDEFFECTS
167
y
i
=

i
+

+

d
i
+
X
i

+
"
i
:
Subtractingthisfrom(5.1.3)gives
y
it

y
i
=

t


+

(
d
it

d
i
)+

X
it

X
i


+(
"
it

"
i
)
;
(5.1.4)
sodeviationsfrommeanskillstheunobservedindividuale¤ects.
3
Analternativetodeviationsfrommeansisdi¤erencing.Inotherwords,weestimate,

y
it
=

t
+


d
it
+
X
it

+
"
it
;
(5.1.5)
wherethe

pre…xdenotesthechangefromoneyeartothenext.Forexample,

y
it
=
y
it

y
it

1
:
Withtwo
periods,di¤erencingisalgebraicallythesameasdeviationsfrommeans,butnototherwise.Bothshould
work,althoughwithhomoskedasticandseriallyuncorrelated
"
it
deviationsfrommeansismoree¢cient.
Youmight…nddi¤erencingmoreconvenientifyouhavetodoitbyhand,thoughthedi¤erencedstandard
errorsshouldbeadjustedforthefactthatthedi¤erencedresidualsareseriallycorrelated.
Someregressionpackagesautomatethedeviations-from-meansestimator,withanappropriatestandard-
erroradjustmentforthedegreesoffreedomslostinestimating
N
individualmeans.Thisisallthat’sneeded
3
Whyisdeviationsfrommeansthesameasestimatingeach…xede¤ectin(5.1.3)?Because,bytheregressionanatomy
formula,(3.1.3),anysetofmultivariateregressioncoe¢cientscanbeestimatedintwosteps.Togetthemultivariatecoe¢cient
ononesetofvariables,…rstregressthemonalltheotherincludedvariables,thenregresstheoriginaldependentvariableon
theresidualsfromthis…rststep.Theresidualsfromaregressiononafullsetofperson-dummiesinaperson-yearpanelare
deviationsfrompersonmeans.
4
The…xede¤ectsarenotestimatedconsistentlyinapanelwherethenumberofperiods
T
is…xedwhile
N
!1
.This
iscalledthe"incidentalparametersproblem,"anamewhichre‡ectsthefactthatthenumberofparametersgrowswiththe
samplesize.Nevertheless,otherparametersinthe…xede¤ectsmodel-theoneswecareabout-areconsistentlyestimated.
168
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
…xede¤ectsestimates(rangingfrom.10-.20).Thismayindicatepositiveselectionbiasinthecross-section
estimates,thoughselectionbiasisnottheonlyexplanationforthelower…xed-e¤ectsestimates.
Table5.1.1:Estimatede¤ectsofunionstatusonlogwages
SurveyCrosssectionestimateFixede¤ectsestimate
MayCPS,1974-750.190.09
NationalLongitudinalSurveyofYoungMen,1970-780.280.19
MichiganPSID,1970-790.230.14
QES,1973-770.140.16
Notes:AdaptedfromFreeman(1984).Thetablereportscross-sectionandpanel
estimatesoftheunionrelativewagee¤ect.Theestimateswerecalculatedusingthe
surveyslistedatleft.Thecross-sectionestimatesincludecontrolsfordemographic
andhumancapitalvariables.
Althoughtheycontrolforacertaintypeofomittedvariable,…xed-e¤ectsestimatesarenotoriouslysus-
ceptibletoattenuationbiasfrommeasurementerror.Ononehand,economicvariableslikeunionstatus
tendtobepersistent(aworkerwhoisaunionmemberthisyearismostlikelyaunionmembernextyear).
Ontheotherhand,measurementerroroftenchangesfromyear-to-year(unionstatusmaybemisreported
ormiscodedthisyearbutnotnextyear).Therefore,whileunionstatusmaybemisreportedormiscoded
foronlyafewworkersinanysingleyear,theobservedyear-to-yearchangesinunionstatusmaybemostly
noise.Inotherwords,thereismoremeasurementerrorintheregressorsinanequationlike(5.1.5)or(5.1.4)
thaninthelevelsoftheregressors.Thisfactmayaccountforsmaller…xed-e¤ectsestimates.
5
Avariantonthemeasurement-errorproblemarisesfromthatfactthatthedi¤erencinganddeviations-
from-meansestimatorsusedtocontrolfor…xede¤ectstypicallyremovebothgoodandbadvariation.In
otherwords,thesetransformationsmaykillsomeoftheomitted-variables-biasbathwater,buttheyalso
removemuchoftheusefulinformationinthebaby-thevariableofinterest.Anexampleistheuseoftwins
toestimatethecausale¤ectofschoolingonwages.Althoughthereisnotimedimensiontothisproblem,
thebasicideaisthesameastheunionproblemdiscussedabove:twinshavesimilarbutlargelyunobserved
familyandgeneticbackgrounds.Wecanthereforecontrolfortheircommonfamilybackgroundbyincluding
afamily…xede¤ectinsamplesofpairsoftwins.
5
SeeGrilichesandHausman(1986)foramorecompleteanalysisofmeasurementerrorinpaneldata.
5.2.DIFFERENCES-IN-DIFFERENCES
169
neitheristhedi¤erenceintheirschooling.Hence,asmallamountofunobservedabilitydi¤erencesamong
twinscouldberesponsibleforsubstantialbiasintheresultingestimates.
Whatshouldbedoneaboutmeasurementerrorandrelatedproblemsinmodelswith…xede¤ects?A
possible…x-upformeasurementerrorisinstrumentalvariables.AshenfelterandKrueger(1994)usecross-
siblingreportstoconstructinstrumentsforschoolingdi¤erencesacrosstwins.Forexample,theyuseeach
twin’sreportofhisbrother’sschoolingasaninstrumentforself-reports.Asecondapproachistobringin
externalinformationontheextentofmeasurementerrorandadjustnaiveestimatesaccordingly.Inastudy
ofunionwagee¤ects,Card(1996)usesexternalinformationfromaseparatevalidationsurveytoadjust
panel-dataestimatesformeasurementerrorinreportedunionstatus.Butdatafrommultiplereportsand
repeatedmeasuresofthesortusedbyAshenfelterandRouse(1994)andCard(1996)areunusual.Ata
170
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
DDisaversionof…xed-e¤ectsestimationusingaggregatedata.
6
6
TheDDideaisatleastasoldasIV.Kennan(1995)referencesa1915BLSreportusingDDtostudytheemployment
e¤ectsoftheminimumwage(ObenauerandvonderNienburg,1915).
5.2.DIFFERENCES-IN-DIFFERENCES
171
isthecausale¤ectofinterest.Thisiseasilyestimatedusingthesampleanalogofthepopulationmeans.
Table5.2.1:AverageemploymentperstorebeforeandaftertheNewJerseyminimumwageincrease
PANJDi¤erence,NJ-PA
Variable(i)(ii)(iii)
1.FTEemploymentbefore,23.3320.44-2.89
allavailableobservations(1.35)(0.51)(1.44)
2.FTEemploymentafter,21.1721.03-0.14
allavailableobservations(0.94)(0.52)(1.07)
3.ChangeinmeanFTE-2.160.592.76
employment(1.25)(0.54)(1.36)
Notes:AdaptedfromCardandKrueger(1994),Table3.The
tablereportsaveragefull-timeequivalent(FTE)employmentat
restaurantsinPennsylvaniaandNewJerseybeforeandaftera
minimumwageincreaseinNewJersey.Thesampleconsistsof
allstoreswithdataonemployment.Employmentatsixclosed
storesissettozero.Employmentatfourtemporarilyclosedstores
istreatedasmissing.Standarderrorsarereportedinparentheses
Table5.2.1(basedonTable3inCardandKrueger,1994)showsaverageemploymentatfastfood
restaurantsinNewJerseyandPennsylvaniabeforeandafterthechangeintheNewJerseyminimumwage.
Therearefourcellsinthe…rsttworowsandcolumns,whilethemarginsshowstatedi¤erencesineach
period,thechangesovertimeineachstate,andthedi¤erence-in-di¤erences.EmploymentinPennsylvania
restaurantsissomewhathigherthaninNewJerseyinFebruarybutfallsbyNovember.EmploymentinNew
Jersey,incontrast,increasesslightly.Thesetwochangesproduceapositivedi¤erence-in-di¤erences,the
oppositeofwhatwemightexpectifahigherminimumwagepushesbusinessesupthelabordemandcurve.
Howconvincingisthisevidenceagainstthestandardlabor-demandstory?Thekeyidentifyingassump-
tionhereisthatemployment
trends
wouldbethesameinbothstatesintheabsenceoftreatment.Treatment
inducesadeviationfromthiscommontrend,asillustratedin…gure5.2.1.Althoughthetreatmentandcon-
trolstatescandi¤er,thisdi¤erenceincapturedbythestate…xede¤ect,whichplaysthesameroleasthe
unobservedindividuale¤ectin(5.1.3).
7
Thecommontrendsassumptioncanbeinvestigatedusingdataonmultipleperiods.Inanupdateoftheir
7
Thecommontrendsassumptioncanbeappliedtotransformeddata,forexample,
E
(log
y
0
ist
j
s;t
)=

s
+

t
:
Note,however,thatifthereisacommontrendinlogs,therewillnotbeoneinlevelsandviceversa.AtheyandImbens
(2006)introduceasemi-parametricDDestimatorthatallowsforcommontrendsafteranunknowntransformation,whichthey
proposetousethedatatoestimate.Poterba,VentiandWise(1995)andMeyer,Viscusi,andDurbin(1995)discussDD-type
modelsforquantiles.
172
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
Figure5.2.1:Causale¤ectsinthedi¤erences-in-di¤erencesmodel
originalminimumwagestudy,CardandKrueger(2000)obtainedadministrativepayrolldataforrestaurants
inNewJerseyandPennsylvaniaforanumberofyears.ThesedataareshownhereinFigure5.2.2,similar
toFigure2intheirfollow-upstudy.Theverticallinesindicatethedateswhentheiroriginalsurveyswere
conducted,andthethirdverticallinedenotestheincreaseinthefederalminimumwageto$4.75inOctober
1996,whicha¤ectedPennsylvaniabutnotNewJersey.Thesedatagiveusanopportunitytolookatanew
minimumwage"experiment".
LiketheoriginalCardandKruegersurvey,theadministrativedatashowaslightdeclineinemployment
fromFebruarytoNovember1992inPennsylvania,andlittlechangeinNewJerseyoverthesameperiod.
However,thedataalsorevealfairlysubstantialyear-to-yearemploymentvariationinotherperiods.These
swingsoftenseemtodi¤ersubstantiallyinthetwostates.Inparticular,whileemploymentlevelsin
NewJerseyandPennsylvaniaweresimilarattheendof1991,employmentinPennsylvaniafellrelativeto
employmentinNewJerseyoverthenextthreeyears(especiallyinthe14-countygroup),mostlybeforethe
1996changeinFederalminimum.SoPennsylvaniamaynotprovideaverygoodmeasureofcounterfactual
employmentratesinNewJerseyintheabsenceofapolicychange,andviceversa.
AmoreencouragingexamplecomesfromPischke(2007),wholooksatthee¤ectofschooltermlength
onstudentperformanceusingvariationgeneratedbyasharppolicychangeinGermany.Untilthe1960s,
childreninallGermanstatesexceptBavariastartedschoolintheSpring.Beginninginthe1966-67school
year,theSpring-startersmovedtostartschoolintheFall.ThetransitiontoaFallstartrequiredtwoshort
schoolyearsfora¤ectedcohorts,24weekslonginsteadof37.Studentsinthesecohortse¤ectivelyhad
theirtimeinschoolcompressedrelativetocohortsoneithersideandrelativetostudentsinBavaria,which
5.2.DIFFERENCES-IN-DIFFERENCES
173
Figure5.2.2:EmploymentinNewJerseyandPennsylvaniafast-foodrestaurants,October1991toSeptember
1997(fromCardandKrueger2000).VerticallinesindicatedatesoftheoriginalCardandKrueger(1994)
surveyandtheOctober1996federalminimum-wageincrease.
174
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
alreadyhadaFallstart.
5.2.DIFFERENCES-IN-DIFFERENCES
175
inNovember(i.e.,aftertheminimumwagechange).Then
y
ist
=

+
NJ
s
+
d
t
+

(
NJ
s

d
t
)+
"
ist
(5.2.3)
isthesameas(5.2.2)where
NJ
s

d
t
=
d
st
.InthelanguageofSection3.1.4,thismodelincludestwomain
e¤ectsforstateandyearandaninteractiontermthatmarksobservationsfromNewJerseyinNovember.
Thisisasaturatedmodelsincetheconditionalmeanfunction
E
(
y
ist
j
s;t
)
takesonfourpossiblevaluesand
176
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
laborforceearninglessthan$3.80.
AsintheNewJersey/Pennsylvaniastudy,Card(1992)workswithdatafromtwoperiods,beforeand
after,inthiscase1989and1992.Butthisstudyuses51states(includingtheDistrictofColumbia),fora
totalof102state-yearobservations.Sincetherearenoindividual-levelcovariatesin(5.2.4),thisisthesame
asestimationwithmicrodata(providedthegroup-levelestimatesareweightedbycellsize).Notethat
fa
s

d
t
isaninteractionterm,like
NJ
s

d
t
in(5.2.3),thoughheretheinteractiontermtakesonadistinct
valueforeachobservationinthedataset.Finally,becauseCard(1992)analyzesdataforonlytwoperiods,
thereportedestimatesarefromanequationin…rst-di¤erences:

¯
y
s
=


+

fa
s
+
"
s
;
where

¯
y
s
isthechangeinaverageteenemploymentinstate
s
and

"
s
istheerrorterminthedi¤erenced
equation.
8
Table5.2.2,basedonTable3inCard(1992),showsthatwagesincreasedmoreinstateswherethe
minimumwageincreaseislikelytohavehadmorebite(seetheestimateof.15incolumn1).Thisisan
importantstepinCard’sanalysis-itveri…esthenotionthatthe
fractiona¤ected
variableisagoodpredictor
ofthewagechangesinducedbyanincreaseinthefederalminimum.Employment,ontheotherhand,seems
largelyunrelatedto
fractiona¤ected
,ascanbeseenincolumn3.Thus,theresultsinCard(1992)arein
linewiththeresultsfromtheNewJersey/Pennsylvaniastudy.
Table5.2.2:Regression-DDestimatesofminimumwagee¤ectsonteens,1989to1992
EquationsforChangeEquationsforchangeinTeen
inMeanLogWage:Employment-PopulationRatio:
ExplanatoryVariable(1)(2)(3)(4)
1.Fractionof0.15.140.02-.01
A¤ectedTeens(0.03)(0.04)(0.03)(0.03)
2.ChangeinOverall–0.46–1.24
Emp./Pop.Ratio(0.60)(0.60)
3.R-squared0.300.310.010.09
Notes:AdaptedfromCard(1992).Thetablereportsestimatesfromaregressionof
thechangeinaverageteenemploymentbystateonthefractionofteensa¤ectedby
achangeinthefederalminimumwageineachstate.Dataarefromthe1989and
1992CPS.RegressionsareweightedbytheCPSsamplesizebystateandyear.
Card’s(1992)analysisillustratesafurtheradvantageofregression-DD:it’seasytoaddadditionalcovari-
atesinthisframework.Forexample,wemightliketocontrolforadultemploymentasasourceofomitted
8
Cardweightsestimatesof(5.2.4)bythesamplesizeusedtoconstructaveragesforeachstate.Otherspeci…cationsin
thespiritof(5.2.4)putanormalizedfunctionofstateandfederalminimumwagesontherighthandsideinsteadof
fa
s

d
t
.
See,forexample,NeumarkandWascher(1992),whoworkwiththedi¤erencebetweenstateandfederalminima,adjustedfor
minimum-wagecoverageprovisions,andnormalizedbystateaveragehourlywages.
5.2.DIFFERENCES-IN-DIFFERENCES
177
state-speci…ctrends.Inotherwords,wecanmodelcounterfactualemploymentintheabsenceofachange
intheminimumwageas
E
[
y
0
ist
j
s;t;
X
st
]=

s
+

t
+
X
0
st
:
where
X
st
isavectorofstate-and-time-varyingcovariates,includingadultemployment(thoughthismaynot
bekosherifadultemploymentalsorespondstotheminimumwagechange,inwhichcaseit’s
badcontrol
;
seeSection3.2.3).Asitturnsout,theadditionofanadultemploymentcontrolhaslittlee¤ectonCard’s
estimates,ascanbeseenincolumns2and4inTable5.2.2.
It’sworthemphasizingthefactthatCard(1992)analyzesstateaveragesinsteadofindividualdata.He
mighthaveusedapooledmulti-yearsampleofmicrodatafromtheCPStoestimateanequationlike
y
ist
=

s
+

t
+

(
fa
s

d
t
)+
X
0
ist

+
"
ist
;
(5.2.5)
where
X
ist
canincludeindividuallevelcharacteristicssuchasrace.Thecovariatevectormightalsoinclude
time-varyingvariablesmeasuredatthestatelevel.Onlythelatterarelikelytobeasourceofomitted
variablesbias,butindividual-levelcontrolscanincreaseprecision,apointwenotedinSection2.3.Inference
isalittlemorecomplicatedinaframeworkthatcombinesofmicrodataondependentvariableswithgroup-
levelregressors,however.Thekeyissueishowbesttoadjustforpossiblegroup-levelrandome¤ects,aswe
discussinChapter8,below.
Whenthesampleincludesmanyyears,theregression-DDmodellendsitselftoatestforcausalityinthe
178
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
justcauseornocause,attheemployer’swhim.Butsomestatecourtshaveallowedanumberofexceptions
totheemployment-at-willdoctrine,leadingtolawsuitsfor"unjustdismissal".Autorisinterestedinwhether
fearofemployeelawsuitsmakes…rmsmorelikelytousetemporaryworkersfortasksforwhichtheywould
otherwisehaveincreasedtheirworkforce.Temporaryworkersworkforsomeoneelsebesidesthe…rmfor
whichtheyareexecutingtasks.Asaresult,the…rmusingthemcannotbesuedforunjustdismissalwhen
5.2.DIFFERENCES-IN-DIFFERENCES
179
Figure5.2.4:Estimatedimpactofstatecourts’adoptionofanimplied-contractexceptiontotheemployment-
at-willdoctrineonuseoftemporaryworkers(fromAutor2003).Thedependentvariableisthelogofstate
temporaryhelpemploymentin1979-1995.Estimatesarefromamodelthatallowsfore¤ectsbefore,during,
andafteradoption.
180
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
Table5.2.3:E¤ectoflaborregulationontheperformanceof…rmsinIndianstates
(1)(2)(3)(4)
Laborregulation(lagged)-0.186-0.185-0.1040.0002
(.0641)(.0507)(.039)(.02)
Logdevelopment0.2400.1840.241
expenditurepercapita(.1277)(.1187)(.1057)
Loginstalledelectricity0.0890.0820.023
capacitypercapita(.0605)(.0543)(.0333)
Logstatepopulation0.7200.310-1.419
(.96)(1.1923)(2.3262)
Congressmajority-0.00090.020
(.01)(.0096)
Hardleftmajority-0.050-0.007
(.0168)(.0091)
Janatamajority0.008-0.020
(.0235)(.0333)
Regionalmajority0.0060.026
(.0086)(.0234)
State-speci…ctrendsNONONOYES
AdjustedR-squared0.930.930.940.95
Notes:AdaptedfromBesleyandBurgess(2004),TableIV.Thetablereports
regression-DDestimatesofthee¤ectsoflaborregulationonproductivity.The
dependentvariableislogmanufacturingoutputpercapita.Allmodelsinclude
stateandyeare¤ects.Robuststandarderrorsclusteredatthestatelevelare
reportedinparentheses.StateamendmentstotheIndustrialDisputesActare
coded1=pro-worker,0=neutral,-1=pro-employerandthencumulatedover
theperiodtogeneratethelaborregulationmeasure.Logofinstalledelectrical
capacityismeasuredinkilowatts,andlogdevelopmentexpenditureisrealper
capitastatespendingonsocialandeconomicservices.Congress,hardleft,Janata,
andregionalmajorityarecountsofthenumberofyearsforwhichthesepolitical
groupingsheldamajorityoftheseatsinthestatelegislatures.Thedataarefor
thesixteenmainstatesfortheperiod1958-1992.Thereare552observations.
wagestudy.Theadditionofcontrolsa¤ectstheBesleyandBurgessestimateslittle.Buttheaddition
ofstate-speci…ctrendskillsthelabor-regulatione¤ect,ascanbeseenincolumn4.Apparently,labor
regulationinIndiaincreasesinstateswhereoutputisdeclininganyway.Controlforthistrendtherefore
drivestheestimatedregulatione¤ecttozero.
PickingControls
5.2.DIFFERENCES-IN-DIFFERENCES
181
policiesinSpain.Likewise,insteadoftime,wemightgroupdatabycohortorothertypesofcharacteristics.
AnexampleisAngristandEvans(1999),whostudythee¤ectofchangesinstateabortionlawsonteen
pregnancyusingvariationbystateandyearofbirth.Implicitly,however,DDdesignsalwayssetupan
182
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
agee¤ects(

at
),andstate-speci…cagee¤ects(

as
).Theregressorofinterest,
d
ast
,indicateschildrenin
a¤ectedagegroupsinstatesandperiodswherecoverageisprovided.Thistriple-di¤erencesmodelmay
5.3.FIXEDEFFECTSVERSUSLAGGEDDEPENDENTVARIABLES
183
wherethecausale¤ectoftrainingis

.Tomakethismoregeneral,
y
it

h
canbeavectorincludinglagged
earningsformultipleperiods
:
9
Appliedresearchersusingpaneldataareoftenfacedwiththechallengeofchoosingbetween…xed-e¤ects
andlagged-dependentvariablesmodels,i.e.,betweencausalinferencesbasedon(5.3.1)and(5.3.2).One
solutiontothisdilemmaistoworkwithamodelthatincludesbothlaggeddependentvariablesandunob-
servedindividuale¤ects.Inotherwords,identi…cationmightbebasedonaweakerconditionalindependence
assumption:
E
(
y
0
it
j
a
i
;
y
it

h
;
X
it
;
d
it
)=
E
(
y
0
it
j

i
;
y
it

h
;
X
it
)
;
(5.3.4)
whichrequiresconditioningonboth

i
and
y
it

h
:
Wecanthentrytoestimatecausale¤ectsusinga
speci…cationlike
y
it
=

i
+

y
it

h
+

t
+

d
it
+
X
it

+

it
:
(5.3.5)
Unfortunately,theconditionsforconsistentestimationof

inequation(5.3.5)aremuchmoredemanding
thanthoserequiredwith…xede¤ectsorlaggeddependentvariablesalone.Thiscanbeseeninasimple
examplewherethelaggeddependentvariableis
y
it

1
.Wekillthe…xede¤ectbydi¤erencing,whichproduces

y
it
=


y
it

1
+

t
+


d
it
+
X
it

+

it
:
(5.3.6)
Theproblemhereisthatthedi¤erencedresidual,


it
,isnecessarilycorrelatedwiththelaggeddependent
variable,

y
it

1
,becausebothareafunctionof

it

1
:
Consequently,OLSestimatesof(5.3.6)arenot
consistentfortheparametersin(5.3.5),aproblem…rstnotedbyNickell(1981).Thisproblemcanbesolved,
thoughthesolutionrequiresstrongassumptions.Theeasiestsolutionistouse
y
it

2
asaninstrumentfor

y
it

1
in(5.3.6).
10
Butthisrequiresthat
y
it

2
beuncorrelatedwiththedi¤erencedresiduals,


it
.This
seemsunlikelysinceresidualsarethepartofearningsleftoverafteraccountingforcovariates.Mostpeople’s
earningsarehighlycorrelatedfromoneyeartothenext,sothatpastearningsareanexcellentpredictor
offutureearningsandearningsgrowth.If

it
isseriallycorrelated,theremaybenoconsistentestimator
for(5.3.6).(NotealsothattheIVstrategyusing
y
it

2
asaninstrumentrequiresatleastthreeperiodsto
obtaindatafor
t;t

1
;
and
t

2
).
9
Abadie,Diamond,andHainmueller(2007)developasemiparametricversionofthelagged-dependentvariablesmodel,more
‡exiblethanthetraditionalregressionsetup.Aswithourregressionsetup,thekeyassumptioninthismodelisconditional
independenceofpotentialoutcomesconditionalonlaggedearnings,i.e.,assumption(5.3.2).
10
SeeHoltz-Eakin,NeweyandRosen(1988),ArellanoandBond(1991),BlundellandBond(1998)fordetailsandexamples.
184
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
V
(
˜
d
it
)
,where
˜
d
it
=
d
it


y
it

1
istheresidualfromaregressionof
d
it
on
y
it

1
.
11
Inparticular,setting

=1
in(5.3.3)doesnotproducethe…xed-e¤ectsmodelasaspecialcaseofthelaggeddependent
variablesmodel.Insteadweget

y
it
=

+

t
+

d
it
+
X
it

+
"
it
i.e.,adi¤erenceddependentvariablewithregressorsinlevels.Thisisnotthemodelwith…rstdi¤erencesonboththeright
andleftsideneededtokillthe…xede¤ect.
5.4.APPENDIX:MOREONFIXEDEFFECTSANDLAGGEDDEPENDENTVARIABLES
185
Nowsubstitute
a
i
=
y
it

1

"
it

1
V
(
˜
d
it
)
=


Cov
(
"
it

1
;
˜
d
it
)
V
(
˜
d
it
)
=


Cov
(
"
it

1
;
d
it


y
it

1
)
V
(
˜
d
it
)
=

+

2
"
V
(
˜
d
it
)
:
where

2
"
isthevarianceof
"
it

1
.Sincetraineeshavelow
y
it

1
;
0
andtheresultingestimateof

is
toosmall.
V
(
d
it

d
it

1
)
=
Cov
(
y
it

y
it

1
;
d
it
)
V
(
d
it
)
:
(5.4.3)
Subtracting
y
it

1
frombothsidesof(5.4.2),wehave
y
it

y
it

1
=

+(


1)
y
it

1
+

d
it
+
"
it
:
Substitutingthisin(4.2.2),theinappropriatelydi¤erencedmodelyields
Cov
(
y
it

y
it

1
;
d
it
)
V
(
d
it
)
=

+(


1)

Cov
(
y
it

1
;
d
it
)
V
(
d
it
)

:
Ingeneral,wethink

isanumberbetweenzeroandone,otherwise
y
it
isnon-stationary(i.e.,anexplosive
timeseriesprocess).Therefore,sincetraineeshavelow
y
it

1
;
theestimateof

in…rstdi¤erencesistoo
big.
186
CHAPTER5.FIXEDEFFECTS,DD,ANDPANELDATA
PartIII
Extensions
187
Chapter6
GettingaLittleJumpy:Regression
DiscontinuityDesigns
Butwhenyoustartexercisingthoserules,allsortsofprocessesstarttohappenandyoustart
to…ndoutallsortsofstu¤aboutpeople...Itsjustawayofthinkingaboutaproblem,which
190
CHAPTER6.REGRESSIONDISCONTINUITYDESIGNS
schooljuniors,especiallythosewhowilllatertaketheSAT.Thequestionthatmotivatedthe…rstdiscussions
1
ThebasicstructureofRDdesignsappearstohaveemergedsimultaneouslyinanumberofdisciplinesbuthasonlyrecently
becomeimportantinappliedeconometrics.Cook(2008)givesanintellectualhistory.InarecentpaperusingLalonde(1986)
stylewithin-studycomparisons,CookandWong(2008)…ndthatRDgenerallydoesagoodjobofreproducingtheresultsfrom
randomizedtrials.
6.1.SHARPRD
191
Figure6.1.1:Thesharpregressiondiscontinuitydesign
192
CHAPTER6.REGRESSIONDISCONTINUITYDESIGNS
Butwhatifthetrendrelation,
E
[
y
0
i
j
x
i
]
,isnonlinear?Tobeprecise,supposethat
E
[
y
0
i
j
x
i
]=
f
(
x
i
)
for
somereasonablysmoothfunction,
f
(
x
i
)
.PanelBinFigure6.1.1suggeststhereisstillhopeeveninthis
moregeneralcase.NowwecanconstructRDestimatesby…tting
y
i
=
f
(
x
i
)+

d
i
+

i
;
(6.1.3)
whereagain,
d
i
=1(
x
i

x
0
)
isdiscontinuousin
x
i
at
x
0
.Aslongas
f
(
x
i
)
iscontinuousinaneighborhood
of
x
0
,itshouldbepossibletoestimateamodellike(6.1.3),evenwitha‡exiblefunctionalformfor
f
(
x
i
)
.For
example,modeling
f
(
x
i
)
witha
p
th
-orderpolynomial,RDestimatescanbeconstructedfromtheregression
y
i
=

+

1
x
i
+

2
x
2
i
+
:::
+

p
x
p
i
+

d
i
+

i
:
(6.1.4)
AgeneralizationofRDbasedon(6.1.4)allowsdi¤erenttrendfunctionsfor
E
[
y
0
i
j
x
i
]
and
E
[
y
1
i
j
x
i
]
:
ModelingbothoftheseCEFswith
p
th
-orderpolynomials,wehave
E
[
y
0
i
j
x
i
]=
f
0
(
x
i
)=

+

01
~
x
i
+

02
~
x
2
i
+
:::
+

0
p
~
x
p
i
E
[
y
1
i
j
x
i
]=
f
1
(
x
i
)=

+

+

11
~
x
i
+

12
~
x
i
2
+
:::
+

1
p
~
x
i
p
;
where
~
x
i

x
i

x
0
.Centering
x
i
at
x
0
isjustanormalization;itensuresthatthetreatmente¤ectat
x
i
=
x
0
isstillthecoe¢cienton
d
i
intheregressionmodelwithinteractions.
Toderivearegressionmodelthatcanbeusedtoestimatethee¤ectsinterestinthiscase,weusethefact
that
d
i
6.1.SHARPRD
193
ThevalidityofRDestimatesbasedon(6.1.4)or(6.1.6)turnsonwhetherpolynomialmodelsprovidean
adequatedescriptionof
E
[
y
0
i
j
X
i
]
:
Ifnot,thenwhatlookslikeajumpduetotreatmentmightsimplybean
unaccounted-fornonlinearityinthecounterfactualconditionalmeanfunction.Thispossibilityisillustrated
inPanelCofFigure6.1.1,whichshowshowasharpturnin
E
[
y
0
i
j
x
i
]
mightbemistakenforajumpfrom
oneregressionlinetoanother.Toreducethelikelihoodofsuchmistakes,wecanlookonlyatdataina
neighborhoodaroundthediscontinuity,saytheinterval
[
x
0

;x
0
+

]
forsomesmallnumber

.Thenwe
have
E
[
y
i
j
x
0

x
i
x
0
]
'
E
[
y
0
i
j
x
i
=
x
0
]
E
[
y
i
j
x
0
x
i
x
0
+

]
'
E
[
y
1
i
j
x
i
=
x
0
]
;
sothat
lim

!
0
E
[
y
i
j
x
0
x
i
x
0
+

]

E
[
y
i
j
x
0

x
i
x
0
]=
E
[
y
1
i

y
0
i
j
x
i
=
x
0
]
:
(6.1.7)
Inotherwords,comparisonsofaverageoutcomesinasmallenoughneighborhoodtotheleftandrightof
x
0
shouldprovideanestimateofthetreatmente¤ectthatdoesnotdependonthecorrectspeci…cationofamodel
for
E
[
y
0
i
j
x
i
]
:
Moreover,thevalidityofthisnonparametricestimationstrategydoesnotturnontheconstant
e¤ectsassumption,
y
1
i

y
0
i
=

;theestimandin(6.1.7)istheaveragecausale¤ect,
E
[
y
1
i

y
0
i
j
x
i
=
x
0
]
:
neighborhoodofthecuto¤meansthatyoudon’thavemuchdata.Also,thesampleaverageisbiasedfor
thepopulationaverageintheneighborhoodofaboundary(inthiscase,
x
0
).Solutionstotheseproblems
includetheuseofanon-parametricversionofregressioncalledlocallinearregression(Hahn,Todd,and
vanderKlaauw,2001)andthepartial-linearandlocal-polynomialestimatorsdevelopedbyPorter(2003).
Locallinearregressionamountstoweightedleastsquaresestimationofanequationlike(6.1.6),withlinear
termsonlyandmoreweightgiventopointsclosetothecuto¤.
2
Hoxby(2000)alsousesthisideatocheckRDestimatesofclasssizee¤ects.Afullynonparametricapproachrequires
data-drivenrulesforselectionofthewidthofthediscontinuity-samplewindow,alsoknownas"bandwidth".Thebandwidth
mustshrinkwiththesamplesizeataratesu¢cientlyslowsoastoensureconsistentestimationoftheunderlyingconditional
meanfunctions.SeeImbensandLemieux(2007)fordetails.Weprefertothinkofestimationusing(6.1.4)or(6.1.6)as
essentiallyparametric:inanygivensample,theestimatesareonlyasgoodasthemodelfor
E
[
y
0
i
j
x
i
]
thatyouhappentobe
194
CHAPTER6.REGRESSIONDISCONTINUITYDESIGNS
pre-treatmentvariablesnearthediscontinuity.Sincepre-treatmentvariablesareuna¤ectedbytreatment,
thereshouldbenojumpintheCEFofthesevariablesat
x
0
.
Lee’s(2008)studyofthee¤ectofpartyincumbencyonre-electionprobabilitiesillustratesthesharpRD
using.Promisesabouthowyoumightchangethemodelifyouhadmoredatashouldbeirrelevant.
3
The…ttedvaluesinthis…gurearefromaLogitmodelfortheprobabilityofwinningasafunctionofthecuto¤indicator
d
i
=1(
x
i

0)
,a4
th
-orderpolynomialin
x
i
,andinteractionsbetweenthepolynomialtermsand
d
i
.
6.1.SHARPRD
195
Figure6.1.2:Probabilityofwinninganelectionbypastandfuturevoteshare(fromLee,2008).(a)Candi-
date’sprobabilityofwinningelection
t
+1
,bymarginofvictoryinelection
t
:localaveragesandparametric
…t.(b)Candidate’saccumulatednumberofpastelectionvictories,bymarginofvictoryinelection
t
:local
196
CHAPTER6.REGRESSIONDISCONTINUITYDESIGNS
6.2FuzzyRDisIV
FuzzyRDexploitsdiscontinuitiesinthe
probabilityorexpectedvalue
oftreatmentconditionalonacovariate.
Theresultisaresearchdesignwherethediscontinuitybecomesaninstrumentalvariablefortreatmentstatus
4
Theideaofusingjumpsintheprobabilityofassignmentasasourceofidentifyinginformationappearstooriginatewith
Trochim(1984),althoughtheIVinterpretationcamelater.NoteveryoneagreesthatfuzzyRDisIV,butthisviewiscatching
on.InarecenthistoryoftheRDidea,Cook(2008)writesaboutthefuzzydesign:"Inmanycontexts,thecuto¤valuecan
functionasanIVandengenderunbiasedcausalconclusions...fuzzyassignmentdoesnotseemasseriousaproblemtoday
asearlier."
6.2.FUZZYRDISIV
197
interactiontermsintheinstrumentlist,wemightalsoliketoallowforinteractionsinthesecondstageas
in6.1.6).Theresultingjust-identi…edIVestimatorhasthevirtuesoftransparencyandgood…nite-sample
properties.The…rststageinthiscaseis
d
i
=

0
+

1
x
i
+

2
x
2
i
+
:::
+

p
x
p
i
+

t
i
+

1
i
;
(6.2.2)
where
t
i
istheexcludedinstrumentthatprovidesidentifyingpowerwitha…rst-stagee¤ectgivenby

.
ThefuzzyRDreducedformisobtainedbysubstituting(6.2.2)into(6.1.4):
y
i
=

+

1
x
i
+

2
x
2
i
+
:::
+

p
x
p
i
+

t
i
+

2
i
;
(6.2.3)
where

=

+

0
and

j
=

1
+

j
for
j
=1
;:::;p
.AswithsharpRD,identi…cationinthefuzzycase
5
vanderKlaauw’soriginalworkingpapercirculatedin1997.Notethatthefactthat(6.2.2)isonlyanapproximationof
E
[
d
i
j
x
i
]
isnotveryimportant;second-stageestimatesarestillconsistent.
6
Alternately,centerneitherthe…rstorsecondstage.Inthiscase,however,

nolongercapturesthetreatmente¤ectat
thecuto¤.
198
CHAPTER6.REGRESSIONDISCONTINUITYDESIGNS
E
[
y
i
j
x
0
x
i
x
0
+

]

E
[
y
i
j
x
0

x
i
x
0
]
'


0
:
Similarly,forthe…rststagefor
d
i
,wehave
E
[
d
i
j
x
0
x
i
x
0
+

]

E
[
d
i
j
x
0

x
i
x
0
]
'


0
:
Therefore
lim

!
0
E
[
y
i
j
x
0
x
i
x
0
+

]

E
[
y
i
j
x
0

x
i
x
0
]
E
[
d
i
j
x
0
x
i
x
0
+

]

E
[
d
i
j
x
0

x
i
x
0
]
=
:
(6.2.5)
Thesampleanalogof(6.2.5)isaWaldestimatorofthesortdiscussedinSection
??
,inthiscaseusing
t
i
asaninstrumentfor
d
i
ina


neighborhoodof
x
0
.Aswithotherdummy-variableinstruments,theresult
isalocalaveragetreatmente¤ect.Inparticular,theWaldestimandforfuzzyRDcapturesthecausale¤ect
oncompliersde…nedasindividualswhosetreatmentstatuschangesaswemovethevalueof
x
i
fromjustto
theleftof
x
0
tojusttotherightof
x
0
.ThisinterpretationoffuzzyRDwasintroducedbyHahn,Todd,and
vanderKlaauw(2001).Note,however,thatthereisanothersenseinwhichthisversionofLATEislocal:
theestimatesareforcomplierswith
x
i
=
x
0
6.2.FUZZYRDISIV
199
andLavycallthis"MaimonidesRule"sinceamaximumclasssizeof40was…rstproposedbythemedieval
int
[
(
e
s

1)
40
]+1
where
int
(
x
)
istheintegerpartofarealnumber,
x
.Thisfunction,plottedwithdottedlinesinFigure
6.2.1forfourthand…fthgraders,hasasawtoothpatternwithdiscontinuities(inthiscase,sharpdropsin
predictedclasssize)atintegermultiplesof40.Atthesametime,
m
sc
isclearlyanincreasingfunctionof
enrollment,
e
s
,makingtheenrollmentvariableanimportantcontrol.
AngristandLavyexploitthediscontinuitiesinMaimonidesRulebyconstructing2SLSestimatesofan
equationlike
y
isc
=

0
+

1
pd
s
+

1
e
s
+

2
e
2
s
+
:::
+

p
e
p
s
+
n
sc
+

isc
;
(6.2.6)
where
y
isc
is
i
0
s
testscoreinschool
s
andclass
c
,
n
sc
isthesizeofthisclass,and
e
s
isenrollment.Inthis
versionoffuzzyRD,
m
sc
playstheroleof
t
i
;e
s
playstheroleof
x
i
;
andclasssize,
n
sc
playstheroleof
d
i
:
AngristandLavyalsoincludeanon-enrollmentcovariate,
pd
s
,tocontrolfortheproportionofstudents
intheschoolfromadisadvantagedbackground.ThisisnotnecessaryforRD,sincetheonlysourceof
omittedvariablesbiasintheRDmodelis
e
s
,butitmakesthespeci…cationcomparabletothemodelused
toconstructacorrespondingsetofOLSestimates.
7
Figure6.2.1fromAngristandLavy(1999)plotstheaverageofactualandpredictedclasssizesagainst
enrollmentinfourthand…fthgrade.Maimonides’Ruledoesnotpredictclasssizeperfectlybecausesome
schoolssplitgradesatenrollmentslowerthan40.ThisiswhatmakestheRDdesignfuzzy.Still,thereare
cleardropsinclasssizeatenrollmentlevelsofenrollmentlevelsof40,80,and120.Notealsothatthe
m
sc
instrumentneatlycombinesbothdiscontinuitiesandslope-discontinuityinteractionssuchas
~
x
i
t
i
in(6.2.4)
7
TheAngristandLavy(1999)studydi¤ersmodestlyfromthedescriptionhereinthatthedatausedtoestimateequation
(6.2.6)areclassaverages.Butsincethecovariatesareallde…nedattheclassorschoollevel,theonlydi¤erencebetween
student-levelandclass-levelestimationistheimplicitweightingbynumberofstudentsinthestudent-levelestimates.
200
CHAPTER6.REGRESSIONDISCONTINUITYDESIGNS
Figure6.2.1:Thefuzzy-RD…rst-stageforregression-discontinuityestimatesofthee¤ectofclasssizeon
pupils’testscores(fromAngristandLavy,1999)
6.2.FUZZYRDISIV
201
IncontrastwiththeOLSestimatesincolumn3,2SLSestimatesofsimilarspeci…cationusing
m
sc
as
aninstrumentfor
n
sc
stronglysuggestthatsmallerclassesincreasetestscores.Theseresults,reportedin
column4formodelsthatincludealinearenrollmentcontrolandincolumn5formodelsthatincludea
quadraticenrollmentcontrolrangefrom-.23to-.26withstandarderroraround.1.Theseresultssuggest
a7-studentreductioninclasssize(asinTennesseeSTAR)raisesMathscoresbyabout1.75points,foran
e¤ectsizeof.18

,where

isthestandarddeviationofclassaveragescores.Thisisnottoofarfromthe
Tennesseeestimates.
Importantly,thefunctionalformoftheenrollmentcontroldoesnotseemtomatterverymuch(though
estimateswithnocontrols-notreportedinthetable-comeoutmuchsmallerandinsigni…cant).Columns
6and7checktherobustnessofthemain…ndingsusinga+/-5discontinuitysample.Notsurprisingly,these
resultsaremuchlessprecisethanthosereportedincolumns5and6sincetheywereestimatedwithonly
aboutone-quarterofthedatausedtoconstructthefull-sampleestimates.Still,theybouncearoundthe-.25
mark.Finally,thelastcolumnshowstheresultsofestimationusinganevennarrowerdiscontinuitysample
limitedtoschoolswithplusorminusanenrollmentof3studentsaroundthediscontinuitiesat40,80,and
120(withdummycontrolsforwhichofthesediscontinuitiesisrelevant).TheseareWaldestimatesinthe
spiritofHahn,Todd,andvanderKlaauw(2001)andformula(6.2.5);theinstrumentusedtoconstructthese
estimatesisadummyforbeinginaschoolwithenrollmentjusttotherightoftherelevantdiscontinuity.
Theresultisanimprecise-.270(s.e.=.281),butstillstrikinglysimilartotheotherestimatesinthetable.
202
CHAPTER6.REGRESSIONDISCONTINUITYDESIGNS
Table6.2.1:OLSandfuzzyRDestimatesofthee¤ectsofclasssizeon…fthgrademathscores
OLS2SLS
FullsampleDiscontinuitysamples
+/-5+/-3
(1)(2)(3)(4)(5)(6)(7)(8)
Meanscore
67.367.367.067.0
(s.d.)
(9.6)(9.6)(10.2)(10.6)
Regressors
Classsize.322.076.019-.230-.261-.185-.443-.270
(.039)(.036)(.044)(.092)(.113)(.151)(.236)(.281)
Percentdisadvantaged-.340-.332-.350-.350-.459-.435
(.018)(.018)(.019)(.019)(.049)(.049)
Enrollment.017.041.062.079
(.009)(.012)(.037(.036)
Enrollmentsquared/100-.010
(.016)
Segment1
-12.6
(enrollment36-45)
(3.80)
Segment2
-2.89
(enrollment76-85)
(2.41)
RootMSE9.368.328.308.408.428.799.1010.2
R-squared.048.249.252
N2,0182,018471302
Notes:AdaptedfromAngristandLavy(1999).Thetablereportsestimatesofequation
(6.2.6)inthetextusingclassaverages.Standarderrors,reportedinparentheses,arecor-
rectedforwithin-schoolcorrelation.
Chapter7
QuantileRegression
Here’saprayerforyou.Gotapencil?...‘ProtectmefromknowingwhatIdon’tneedto
know.ProtectmefromevenknowingthattherearethingstoknowthatIdon’tknow.Protect
mefromknowingthatIdecidednottoknowaboutthethingsIdecidednottoknowabout.
Amen.’There’sanotherprayerthatgoeswithit.‘Lord,lord,lord.Protectmefromthe
consequencesoftheaboveprayer.’
DouglasAdams,
MostlyHarmless
(1995)
variableisadummyforsomethinglikeemployment,themeandescribestheentiredistribution.Butmany
variables,likeearningsandtestscores,havecontinuousdistributions.Thesedistributionscanchangeinways
notrevealedbyanexaminationofaverages,forexample,theycanspreadoutorbecomemorecompressed.
Appliedeconomistsincreasinglywanttoknowwhat’shappeningtoanentiredistribution,totherelative
winnersandlosers,aswellastoaverages.
Policy-makersandlaboreconomistshavebeenespeciallyconcernedwithchangesinthewagedistribution.
Weknow,forexample,that‡ataveragerealwagesareonlyasmallpartofwhat’sbeengoingoninthelabor
marketforthepast25years.Upperearningsquantileshavebeenincreasing,whilelowerquantileshave
204
CHAPTER7.QUANTILEREGRESSION
2
(sign
u
)
u
=
1
2
j
u
j
.Inthiscase,
Q

(
y
i
j
X
i
)
istheconditionalmediansincetheconditionalmedianminimizesabsolutedeviations.Otherwise,
1
Moregenerally,wecande…netheCQFfordiscreterandomvariablesandrandomvariableswithless-than-well-behaved
densitiesas
Q

(
y
i
j
X
i
)=inf
f
y
:
F
Y
(
y
j
X
i
)


g
:
7.1.THEQUANTILEREGRESSIONMODEL
205
206
CHAPTER7.QUANTILEREGRESSION
Table7.1.1:Quantileregressioncoe¢cientsforschoolinginthe1970,1980,and2000Censuses
Desc.Stats.QuantileRegressionEstimatesOLSEstimates
CensusObs.MeanSD0.10.250.50.750.9Coe¤.RootMSE
.074.074.068.070.079.072
1980650236.40.67(.002)(.001)(.001)(.001)(.001)(.001)0.63
.112.110.106.111.137.114
1990867856.460.06(.003)(.001)(.001)(.001)(.003)(.001)0.64
.092.105.111.120.157.114
2000973976.50.75(.002)(.001)(.001)(.001)(.004)(.001)0.69
Notes:AdaptedfromAngrist,Chernozhukov,andFernandez-Val(2006).Thetablesreportsquantilere-
gressionestimatesofthereturnstoschooling,withOLSestimatesshownattherightforcomparison.The
sampleincludesUS-bornwhiteandblackmenaged40-49.Standarderrorsarereportedinparentheses.All
modelscontrolforraceandpotentialexperience.Samplingweightswereusedforthe2000Censusestimates.
7.1.THEQUANTILEREGRESSIONMODEL
207
Table7.1.1reportsschoolingcoe¢cientsfromquantileregressionsestimatedusingthe1980,1990,and
2000Censuses.Themodelsusedtoconstructtheseestimatescontrolforraceandaquadraticfunctionof
208
CHAPTER7.QUANTILEREGRESSION
2
SeeCardandLemieux(1996)foranempiricalexampleofaregressionmodelwiththissortofheteroskedasticity.Koenker
andPortnoy(1996)callthisalinearlocation-scalemodel.
3
Theresultsintable7.1.1includetwosetsofstandarderrors.The…rstareconventionalstandarderrors,ofthesortreported
byStata’sqregcommand(alsospecifying"robust").ThesepresumetheCQFistrulylinear.Theformulafortheseis

(1


)
f
E
[
f
u

(0
j
X
i
)
X
i
X
0
i
]

1
E
[
X
i
X
0
i
]
E
[
f
u

(0
j
X
i
)
X
i
X
0
i
]

1
;
where
f
u

(0
j
X
i
)
istheconditionaldensityofthequantile-regressionresidualatzero.Iftheresidualsarehomoskedasticthissim-
pli…esto

(1


)
f
2
u

(0)
E
[
X
i
X
0
i
]

1
:
Thesecondsetarerobusttomisspeci…cation,computedusingformulasinAngrist,Chernozhukov,
andFernandez-Val(2006).Inthisexample,theimpactofnonlinearityonstandarderrorsisminor.
7.1.THEQUANTILEREGRESSIONMODEL
209
where
y
i;obs
censored;theycommonlytakeon
thevaluezerobytheirnature,justasdummyvariableslikeemploymentstatusdo.
Whendealingwithcensoreddependentvariables,quantileregressioncanbeusedtoestimatethee¤ect
ofcovariatesonconditionalquantilesthatarebelowthecensoringpoint(assumingcensoringisfromabove).
Thisre‡ectsthefactthatrecodingearningsabovetheupperdeciletobeequaltotheupperdecilehasno
e¤ectonthemedian.SoifCPStopcodinga¤ectsrelativelyfewpeople(asisoftentrue),censoringhasno
e¤ectonestimatesoftheconditionalmedianoreven


for

=
:
75
:
Likewise,iflessthan10percentofthe
sampleiscensoredconditionalonallvaluesof
X
i
,thenwhenestimating


for

upto
:
9
youcansimply
ignoreit.Alternately,youcanlimitthesampletovaluesof
X
i
where
Q

(
y
i
j
X
i
)
isbelow
c
(orabove,if
censoringisfromthebottomwith
y
i;obs
=
y
i

1[
y
i
�c
]
).
Powell(1986)formalizesthisideawiththecensoredquantileregressionestimator.Becausewemaynot
knowwhichconditionalquantilesarebelowthecensoringpoint(continuingtothinkoftopcodes),Powell
proposesweworkwith
Q

(
y
i
j
X
i
)=min(
c;
X
0
i

c

)
:
210
CHAPTER7.QUANTILEREGRESSION
7.1.THEQUANTILEREGRESSIONMODEL
211
Thequantileregressionapproximationtheoremlookscomplicatedbutthebigpictureissimple.Wecan
thinkofquantileregressionasapproximating
Q

(
y
i
j
X
i
)
,justasOLSapproximates
E
[
y
i
j
X
i
]
.TheOLS
weightingfunctionisthehistogramof
X
i
;
whichwedenote

(
X
i
)
.Thequantileregressionweightingfunction,
implicitlygivenby
w

(
X
i
;

)


(
X
i
)
;
ismoreelaboratethan

(
X
i
)
alone(thehistogramisimplicitlypartof
thequantileregressionweightingfunctionbecausetheexpectationin(7.1.7)isoverthedistributionof
X
i
)
:
Theterm
w

(
X
i
;

)
involvesthequantileregressionvector,


,butcanberewrittenwith


partialledout
sothatitisafunctionof
X
i
only(seeAngrist,Chernozhukov,andFernandez-Val,2006,fordetails).Inany
case,thequantileregressionweightsareapproximatelyproportionaltothedensityof
y
i
intheneighborhood
oftheCQF.
ThequantileregressionapproximationpropertyisillustratedinFigure7.1.1,whichplotstheconditional
quantilefunctionoflogwagesgivenhighestgradecompletedusing1980Censusdata.Herewetakead-
vantageofthediscretenessofschoolingandlargecensussamplestoestimatetheCQFnon-parametrically
bycomputingthequantileofwagesforeachschoolinglevel.PanelsA-Cplotanonparametricestimate
of
Q

(
y
i
j
X
i
)
alongwiththelinearquantileregression…tforthe0.10,0.50,and0.90quantiles,where
X
i
~


=argmin
b
2
R
d
E

(
Q

(
y
i
j
X
i
)

X
0
i
b
)
2

=argmin
b
2
R
d
E


2

(
X
i
;b
)

:
Inotherwords,
~


istheslopeofthelinearregressionof
Q

(
y
i
j
X
i
)
on
X
i
,weightedbythehistogramof
X
i
:
Incontrastwithquantileregression,whichrequiresonlyonepassthroughthedata,MDreliesontheability
toestimate
Q

(
y
i
j
X
i
)
212
CHAPTER7.QUANTILEREGRESSION
Figure7.1.1:Thequantileregressionapproximationproperty(adaptedfromAngrist,Chernozhukov,and
Fernandez-Val,2006).The…gureshowsalternativeestimatesoftheconditionalquantilefunctionoflog
wagesgivenhighestgradecompletedusing1980Censusdata,alongwiththeimpliedweightingfunction.
7.1.THEQUANTILEREGRESSIONMODEL
213
7.1.3TrickyPoints
Thelanguageofconditionalquantilesistricky.Sometimeswetalkabout"quantileregressioncoe¢cients
atthemedian,"or"e¤ectsonthoseatthelowerdecile."Butit’simportanttorememberthatquantile
coe¢cientstellusaboute¤ectson
distributions
andnoton
individuals
.Ifwediscover,forexample,thata
trainingprogramraisesthelowerdecileofthewagedistribution,thisdoesnotnecessarilymeanthatsomeone
whowouldhavebeenpoor(i.e.atthelowerdecilewithouttraining)isnowlesspoor.Itonlymeansthat
thosewhoarepoorintheregimewithtrainingarelesspoorthanthepoorwouldbeinaregimewithout
training.
214
CHAPTER7.QUANTILEREGRESSION
Inotherwords,theproportionofthepopulationbelow
y
conditionalon
X
i
isthesameastheproportionof
conditionalquantilesthatarebelow
y
.
4
SubstitutingfortheCQFinsidetheintegral,
F
Y
(
y
j
X
i
)=
1
Z
0
1[
X
0
i


y
]
d:
Next,weusetheCDFof
X
i
,
F
X
(
x
)
4
Forexample,if
y
istheconditionalmedian,then
F
Y
(
y
j
X
i
)=
:
5
andhalfofallconditionalquantilesarebelow
y
.The
relation(7.1.9)canbeprovedformallyusingthechangeofvariablesformula.
7.2.QUANTILETREATMENTEFFECTS
215
programonaverageearningswhilequantileregressionestimatescanbeusedtomeasuretheimpactofthe
5
Foranalternativeapproach,seeChernozhukovandHansen(2005),whichallowsforregressorsofanytype(i.e.,notjust
dummies),butinvokesarank-invarianceassumptionthatisunnecessaryintheQTEframework.
6
See,forexample,Heckman,Smith,andClements(1997).
216
CHAPTER7.QUANTILEREGRESSION
(
y
1
i

y
0
i
)
.Thisdoesnotmatterforaveragetreatmente¤ectssincethemeanofadi¤erenceisthedi¤erence
inmeans.Butallotherfeaturesofthedistributionof
y
1
i

y
0
i
1

P
(
z
i
=1
j
X
i
)

(1

d
i
)
z
i
P
(
z
i
=1
j
X
i
)
;
asbefore.TheQTEestimatoristhesampleanalogof(7.2.3).
ThereareanumberofpracticalissuesthatarisewhenimplementingQTE.First,

i
mustbeestimated
andthesamplingvarianceinducedbythis…rst-stepestimationshouldbere‡ectedintherelevantasymptotic
distributiontheory.Abadie,Angrist,andImbens(2002)derivethelimitingdistributionofthesampleanalog
of(7.2.3)when

i
isestimatednonparametrically.Inpractice,however,itiseasiertobootstrapthewhole
procedure(i.e.,beginningwiththeconstructionofestimatedkappas)thantousetheasymptoticformulas.
Second,

i
isnegativewhen
d
i
6
=
z
i
:
Thekappa-weightedquantileregressionminimandisthereforenon-
convexandnolongerhasalinearprogrammingrepresentation.Thisproblemcanbesolvedbyworking
withthefollowingminimizationprobleminstead:
min
a;b
E
f
E
[

i
j
y
i
;
d
i
;
X
i
]


(
y
i

a
d
i

X
0
i
b
)
g
(7.2.4)
7.2.QUANTILETREATMENTEFFECTS
217
1

P
(
z
i
=1
j
X
i
)

(1

d
i
)
E
[
z
i
j
y
i
;
d
i
=0
;
X
i
)
P
(
z
i
=1
j
X
i
)
:
(7.2.5)
Angrist(2001)usesthistoimplementQTEviaaProbit…rststeptoestimate
E
[
z
i
j
y
i
;
d
i
;
X
i
]
separatelyin
the
d
i
=0
and
d
i
=1
subsamples,constructing
E
[

i
j
y
i
;
d
i
;
X
i
]
using(7.2.5),andthentrimminganyofthe
resultingestimatesof
E
[

i
j
y
i
;
d
i
;
X
i
]
thatareoutsidetheunitinterval.Theresulting…rst-stepestimates
of
E
[

i
j
y
i
;
d
i
;
X
i
]
cansimplybepluggedinasweightswhenconstructingquantileregressionestimatesina
secondstepusingStata’s
qreg
command.
8
EstimatesoftheE¤ectofTrainingontheQuantilesofTraineeEarnings
TheJobTrainingPartnershipActwasalargefederalprogramthatprovidedsubsidizedtrainingtodis-
advantagedAmericanworkersinthe1980s.JTPAservicesweredeliveredat649sites,alsocalledService
7
Intuitively,thisisbecause

i
"…ndscompliers".AformalstatementofthisresultappearsinAbadie,Angrist,andImbens
(2002;Lemma3.2).
8
Step-by-step,itgoeslikethis:
1.Probit
z
i
on
y
i
and
X
i
separatelyinthe
d
i
=0
and
d
i
=1
subsamples.Savethese…ttedvalues.2.Probit
z
i
on
X
i
in
thewholesample.Savethese…ttedvalues.3.Construct
E
[

i
j
y
i
;
d
i
;
X
i
]
bypluggingthetwosetsof…ttedvaluesinto(7.2.5).
Setanythinglessthanzerotozeroandanythinggreaterthanonetoone.4.Usethesekappastoweightquantileregressions.
5.Bootstrapthiswholeproceduretoconstructstandarderrors.
9
SeeBloom
etal
(1997).
218
CHAPTER7.QUANTILEREGRESSION
outcomes.Moreover,becauseoftheverylowpercentageofindividualsreceivingJTPAservicesinthecontrol
7.2.QUANTILETREATMENTEFFECTS
219
ofe¤ectsonmaleearningsabovethemedianarelargeandstatisticallysigni…cant(thoughstillsmallerthan
thecorrespondingquantileregressionestimates).
TheresultthatJTPAtrainingforadultmendidnotraisethelowerquantilesoftheirearningsisthe
mostinteresting…ndingarisingfromthisanalysis.Thissuggeststhatthequantileregressionestimatesin
thetophalfofTable7.2.1arecontaminatedbypositiveselectionbias.Oneresponsetothis…ndingmight
bethatfewJTPAapplicantswereverywello¤,sothatdistributionale¤ectswithinapplicantsareofless
concernthanthefactthattheprogramhelpedmanyapplicantsoverall.However,theupperquantilesof
earningswerereasonablyhighforadultswhoparticipatedintheNationalJTPAStudy.Increasingearnings
inthisuppertailisthereforeunlikelytohavebeenahighpriority.
220
CHAPTER7.QUANTILEREGRESSION
Table7.2.1:Quantileregressionestimatesandquantiletreatmente¤ectsfromtheJTPAexperiment
A.OLSandQuantileRegressionEstimates
OLSQuantile
0.150.250.500.750.85
Training3,7541,1872,5104,4204,6784,806
(536)(205)(356)(651)(937)(1,055)
%ImpactofTraining21.20135.5675.2034.5017.2413.43
HighschoolorGED4,0153391,2803,6656,0456,224
(571)(186)(305)(618)(1,029)(1,170)
Black-2,354-134-500-2,084-3,576-3,609
(626)(194)(324)(684)(1087)(1,331)
Hispanic25191278925-877-85
(883)(315)(512)(1,066)(1,769)(2,047)
Married6,5465871,9647,11310,07311,062
(629)(222)(427)(839)(1,046)(1,093)
Workedlessthan13-6,582-1,090-3,097-7,610-9,834-9,951
weeksinpastyear(566)(190)(339)(665)(1,000)(1,099)
Constant9,811-2163656,11014,87421,527
(1,541)(468)(765)(1,403)(2,134)(3,896)
B.2SLSandQTEEstimates
2SLSQuantile
0.150.250.500.750.85
Training1,5931217021,5443,1313,378
(895)(475)(670)(1,073)(1,376)(1,811)
%ImpactofTraining8.555.1911.999.6410.699.02
HighschoolorGED4,0757141,7524,0245,3925,954
(573)(429)(644)(940)(1,441)(1,783)
Black-2,349-171-377-2,656-4,182-3,523
(625)(439)(626)(1,136)(1,587)(1,867)
Hispanic3353281,4761,4993791,023
(888)(757)(1,128)(1,390)(2,294)(2,427)
Married6,6471,5643,1907,6839,50910,185
(627)(596)(865)(1,202)(1,430)(1,525)
Workedlessthan13-6,575-1,932-4,195-7,009-9,289-9,078
weeksinpastyear(567)(442)(664)(1,040)(1,420)(1,596)
Constant10,641-1341,0497,68914,90122,412
(1,569)(1,116)(1,655)(2,361)(3,292)(7,655)
Notes:ThetablereportsOLS,quantileregression,2SLS,andQTEestimatesofthee¤ect
oftrainingonearnings(adaptedfromAbadie,Angrist,andImbens(2002)).Assignment
statusisusedasaninstrumentfortrainingstatusinPanelB.Allmodelsincludeascovariates
dummiesforservicestrategyrecommendedandagegroup,andadummyindicatingdata
fromasecondfollow-upsurvey.Robuststandarderrorsarereportedinparenthesis.
Chapter8
NonstandardStandardErrorIssues
Wehavenormality.Irepeat,wehavenormality.
Anythingyoustillcan’tcopewithisthereforeyourownproblem.
DouglasAdams,
TheHitchhiker’sGuidetotheGalaxy
(1979)
Today,softwarepackagesroutinelycomputeasymptoticstandarderrorsderivedunderweakassumptions
222
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
relatedbutdistinctfromtheMoultonproblem.
Researchersplaguedbyclusteringandserialcorrelationalsohavetoconfrontthefactthatthesimplest…x-
upsfortheseproblems,likeStata’s"cluster"option,maynotbeverygood.Theasymptoticapproximation
relevantforclusteredorseriallycorrelateddatareliesonalargenumberofclustersortimeseriesobservations.
Alas,wearerarelyblessedwithmanyclustersorlongtimeseries.Theresultinginferenceproblemsare
N
(
^



)

N
(0
;
)
where

istheasymptoticcovariancematrix.Repeating(3.1.7),theformulafor

inthiscaseis

r
=
E
[
X
i
X
0
i
]

1
E
[
X
i
X
0
i
e
2
i
]
E
[
X
i
X
0
i
]

1
;
(8.1.1)
where
e
i
=
y
i

X
0
i
:
Whenresidualsarehomoskedastic,

simpli…esto

c
=

2
E
[
X
i
X
0
i
]

1
where

2
=
E
[
e
2
i
]
:
Weareconcernedherewiththebiasofrobuststandarderrorsinindependentsamples(i.e.,noclustering
orserialcorrelation).Tosimplifythederivationofbias,weassumethattheregressorvectorcanbetreated
as…xedinrepeatedsamples,asitwouldbeifwesampledstratifyingon
X
i
:
Non-stochastic-regressorsgives
abenchmarksamplingmodelthatisoftenusedtolookat…nite-sampledistributions.Itturnsoutthatwe
misslittlebymakingthisassumption,whilesimplifyingthederivationsconsiderably.
With…xedregressors,wehave

r
=

X
0
X
N


1

X
0

X
N

X
0
X
N


1
(8.1.2)
where
=
E
[
ee
0
]=
diag
(

i
)
8.1.THEBIASOFROBUSTSTANDARDERRORS
F
223
isthevariancematrixofresiduals.Underhomoskedasticity,

i
=

2
forall
i
N


1
:
Asymptoticstandarderrorsaregivenbythesquarerootofthediagonalelementsof

r
and

c
;
afterremoving
theasymptoticnormalizationbydividingby
N:
Inpractice,theasymptoticcovariancematrixmustbeestimated.Theold-fashionedorconventional
variancematrixestimatoris
^

c
=(
X
0
X
)

1
^

2
=(
X
0
X
)

1

X
b
e
2
i
N

;
where
b
e
i
=
y
i

X
0
i
^

istheestimatedregressionresidual,and
^

2
=
X
b
e
2
i
N
estimatestheresidualvariance.Thecorrespondingrobustvariancematrixestimatoris
^

r
=(
X
0
X
)

1

X
X
i
X
0
i
b
e
2
i
N

(
X
0
X
)

1
:
(8.1.3)
Wecanthinkofthemiddletermasanestimatoroftheform
P
X
i
X
0
i
b

i
N
,where
b

i
=
b
e
2
i
estimates

i
:
BythelawoflargenumbersandSlutskytheorems,
N
^

c
convergesinprobabilityto

c
while
N
^

r
convergesto

r
.Butin…nitesamples,bothvarianceestimatorsarebiased.Thebiasin
^

c
iswell-known
fromclassicalleast-squarestheoryandeasytocorrect.Lessappreciatedisthefactthatiftheresiduals
arehomoskedastic,therobustestimatorismorebiasedthantheconventional,perhapsalotmore.From
thisweconcludethatrobuststandarderrorscanbemoremisleadingthanconventionalstandarderrorsin
situationswhereheteroskedasticityismodest.Wealsoproposearule-of-thumbthatusesthemaximumof
old-fashionedandrobuststandarderrorstoavoidgrossmisjudgmentsofprecision.
Withnon-stochasticregressors,wehave
E
[
^

c
]=(
X
0
X
)

1
^

2
=(
X
0
X
)

1

X
E
(
b
e
2
i
)
N

:
Toanalyze
E
[^
e
2
i
]
,startbyexpanding
b
e
=
y

X
b

:
b
e
=
y

X
(
X
0
X
)

1
X
0
y
=

I

X
(
X
0
X
)

1
X
0

(
X
+
e
)=
Me
where
e
isthevectorofpopulationresiduals,
M
=
I
N

X
(
X
0
X
)

1
X
0
isanon-stochasticresidual-maker
224
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
matrixwith
i
th
row
m
0
i
,and
I
N
isthe
N

N
identitymatrix.Then
b
e
i
=
m
0
i
e
,and
E

b
e
2
i

=
E
(
m
0
i
ee
0
m
i
)
=
m
0
i

m
i
Tosimplifyfurther,write
m
i
=
`
i

h
i
where
`
i
isthe
i
th
columnof
I
N
and
h
i
=
X
(
X
0
X
)

1
X
i
,the
i
th
columnoftheprojectionmatrix
H
=
X
(
X
0
X
)

1
X
0
.Then
E

b
e
2
i

=(
`
i

h
i
)
0
(
`
i

h
i
)
=

i

2

i
h
ii
+
h
0
i

h
i
(8.1.4)
where
h
ii
,the
i
th
diagonalelementof
H;
satis…es
h
ii
=
h
0
i
h
i
=
X
0
i
(
X
0
X
)

1
X
i
:
(8.1.5)
N
+
(
x
i

x
)
2
P
(
x
j

x
)
2
:
(8.1.7)
Thisshowsthatleverageincreaseswhen
x
i
isfarthemean.Inadditionto(8.1.6),weknowthat
h
ii
isa
numberthatliesintheinterval
[0
;
1]
andthat
N
X
j
=1
h
ij
=
k
,thenumberofregressors(see,e.g.,Hoaglinand
Welch,1978).
1
Supposeresidualsarehomoskedastic,sothat

i
=

2
.Then(8.1.4)simpli…esto
E

b
e
2
i

=

2
[1

2
h
ii
+
h
0
i
h
i
]=

2
(1

h
ii
)

2
:
1
Theproperty
N
X
j
=1
h
ij
=
k
comesfromthefactthat
H
isidempotent.Youcanalsouse(8.1.7)toverifythatinabivariate
regression,
N
X
j
=1
h
ij
=2
.
8.1.THEBIASOFROBUSTSTANDARDERRORS
F
225
So
^

c
tendstobetoosmall.Usingthepropertiesof
h
ii
,wecangoonestepfurther:
X
E
(
b
e
2
i
)
N
=

2
X
1

h
ii
N
=

2

N

k
N

:
Thus,thebiasin
^

c
canbe…xedbyasimpledegrees-of-freedomcorrection:divideby
N

k
insteadof
N
in
theformulafor
^

2
;
thedefaultinmostempiricalvariancecomputations.
Wenowwanttoshowthatunderhomoskedasticitythebiasin
^

r
islikelytobeworsethanthebiasin
^

c
.Thebiasintherobustcovariancematrixestimatoris
E
[
^

r
]=
N
(
X
0
X
)

1

X
X
i
X
0
i
E
(
b
e
2
i
)
N

(
X
0
X
)

1
;
(8.1.8)
where
E

b
e
2
i

isgivenby(8.1.4).Underhomoskedasticity,

i
=

2
andwehave
E

b
e
2
i

=

2
(1

h
ii
)
as
in
^

c
.It’sclear,therefore,thatthebiasin
b
e
2
i
tendstopullrobuststandarderrorsdown.Thegeneral
expression,(8.1.8),ishardtoevaluate,however.ChesherandJewitt(1987)showthataslongasthereis
P
~
x
2
i
andtheleverage
is
h
ii
=
~
x
2
i
P
~
x
2
i
(welosethe
1
N
N
:
Forthe
conventionalcovarianceestimator,wehave
E
[
^

c
]=

2
Ns
2
x

P
(1

h
ii
)
N

=

2
Ns
2
x

1

1
N

;
sothebiashereissmall.Asimplecalculationusing(8.1.8)showsthatunderheteroskedasticity,therobust
estimatorhasexpectation:
E
[
^

r
]=

2
Ns
2
x
X
(1

h
ii
)
N

~
x
2
i
s
2
x

=

2
Ns
2
x
X
(1

h
ii
)
h
ii
=

2
Ns
2
x

1

P
h
2
ii

:
Thebiasof
^

r
isthereforeworsethanthebiasof
^

c
if
P
h
2
ii

1
N
,asitisbyJensen’sinequalityunlessthe
regressorhasconstantleverage,inwhichcase
h
ii
=
1
N
forall
i
.
3
Wecanreducethebiasin
^

r
2
Inparticular,aslongastheratioofthelargest

i
tothesmallest

i
islessthan2,robuststandarderrorsarebiased
downwards.
3
Thinkof
h
ii
asarandomvariablewithauniformdistributioninthesample.Then
E
[
h
ii
]=
P
h
ii
N
=
1
N
226
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
oftheproposalsexploredinMacKinnonandWhite(1985):
HC
0
:
b

i
=
b
e
2
i
HC
1
:
b

i
=
N
N

k
b
e
2
i
HC
2
:
b

i
=
1
1

h
ii
b
e
2
i
HC
3
:
b

i
=
1
(1

h
ii
)
2
b
e
2
i
:
HC
1
isasimpledegreesoffreedomcorrectionasisusedfor
^

c
.
HC
2
usestheleveragetogiveanunbiased
estimateofthevarianceestimateofthe
i
th
residualwhentheresidualsarehomoskedastic,while
HC
3
approximatesajackknifeestimator.
4
Intheapplicationswe’veseen,theestimatedstandarderrorstendto
and
E
[
h
2
ii
]=
P
h
2
ii
N

(
E
[
h
ii
])
2
=

1
N

2
byJensen’sinequalityunless
h
ii
isconstant.Therefore
P
h
2
ii

1
N
:
Theconstant-leveragecaseoccurswhen
~
x
i
=+
=


,for
someconstant,
:
4
Ajackknifevarianceestimatorestimatessamplingvariancefromtheempiricaldistributiongeneratedbyomittingone
observationatatime.Statacomputes
HC
1
,
HC
2
,and
HC
3
.YoucanalsouseatricksuggestedbyMesserandWhite(1984):
divide
y
i
and
X
i
by
q
b

i
andinstrumentthetransformedmodelby
X
i
=
q
b

i
foryourpreferredchoiceof
b

i
.
8.1.THEBIASOFROBUSTSTANDARDERRORS
F
227
Bootstrappingisusefulfortworeasons.First,insomecasestheasymptoticdistributionofanestimator
canbehardtocompute(e.g.,theasymptoticdistributionsofquantileregressionestimatesinvolveunknown
densities).Bootstrappingprovidesacomputer-intensivebutotherwisestraightforwardcomputationalstrat-
egy.Notallasymptoticdistributionsareapproximatedbythebootstrap,butitseemstoworkwellforthe
simpleestimatorswecareabout.Second,undersomecircumstances,thesamplingdistributionobtained
viabootstrapmaybeclosertothe…nite-sampledistributionofinterestthantheasymptoticapproximation
-statisticianscallthispropertyasymptoticre…nement.
Here,wearemostlyinterestedinthebootstrapbecauseofasymptoticre…nement.Theasymptotic
distributionofregressionestimatesiseasyenoughtocompute,butweworrythattheestimators
HC
0
-
HC
3
arebiased.Asarule,bootstrappingprovidesanasymptoticre…nementwhenappliedtoteststatisticsthat
haveasymptoticdistributionswhichdonotdependonanyunknownparameters(see,e.g.,Horowitz,2001).
Suchteststatisticsaresaidtobeasymptoticallypivotal.Anexampleisa
t
-statistic:thisisasymptotically
standardnormal.Regressioncoe¢cientsarenotasymptoticallypivotal;theyhaveanasymptoticdistribution
whichdependsontheunknownresidualvariance.
Theupshotisthatifyouwantbetter…nite-sampleinferenceforregressioncoe¢cients,youshouldboot-
strap
t
-statistics.Thatis,youcalculatethe
t
-statisticineachbootstrapsampleandcomparetheanalogous
t
-statisticfromyouroriginalsampletothisbootstrap“
t
”-distribution.Ahypothesisisrejectedifthe
absolutevalueoftheoriginal
t
-statisticisabove,say,the95
th
percentileoftheabsolutevaluesfromthe
bootstrapdistribution.
228
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
where
d
i
isadummyvariable.TheOLSestimateof

1
isthedi¤erenceinthemeansbetweenthosewith
d
i
switchedonando¤.Denotingthesesubsamplesbythesubscripts
1
and
0
,wehave
b

1
=
y
1

y
0
:
Forthepurposesofthisderivationwethinkof
d
i
asnon-random,sothat
P
d
i
=
N
1
and
P
(1

d
i
)=
N
0
y
j
)
2
for
j
=0
;
1
.Theleverageinthisexampleis
h
ii
=
1
=N
0
if
d
i
=0
1
=N
1
if
d
i
=1
:
Usingthis,it’sstraightforwardtoshowthatthe…vevarianceestimatorswe’vebeendiscussingare
Conventional
:
N
N
0
N
1

S
2
0
+
S
2
1
N

2

=
1
Nr
(1

r
)

S
2
0
+
S
2
1
N

2

HC
0
(White,1980):
S
2
0
N
2
0
+
S
2
1
N
2
1
HC
1
:
N
N

2

S
2
0
N
2
0
+
S
2
1
N
2
1

HC
2
:
S
2
0
N
0
(
N
0

1)
+
S
2
1
N
1
(
N
1

1)
HC
3
:
S
2
0
(
N
0

1)
2
+
S
2
1
(
N
1

1)
2
:
Theconventionalestimatorpoolssubsamples:thisise¢cientwhenthetwovariancesarethesame.The
White(1980a)estimator,
HC
0
,addsseparateestimatesofthesamplingvariancesofthemeans,usingthe
consistent(butbiased)varianceestimators,
S
2
j
N
j
.The
HC
2
estimatorusesunbiasedestimatorsofthesample
samplevarianceforeachgroup,sinceitmakesthecorrectdegreesoffreedomcorrection.
HC
1
makesa
degreesoffreedomcorrectionoutsidethesum,whichwillhelpbutisgenerallynotquitecorrect.Sincewe
know
HC
2
tobetheunbiasedestimateofthesamplingvarianceunderhomoskedasticity,
HC
3
mustbetoo
5
ThisisknownastheBehrens-Fisherproblem(seee.g.DeGrootandSchervish,2001,ch.8).
8.1.THEBIASOFROBUSTSTANDARDERRORS
F
229
big.Notethatwith
r
=0
:
5
,acasewheretheregressiondesignissaidtobebalanced,theconventional
estimatorequals
HC
1
andall…veestimatorsdi¤erlittle.
AsmallMonteCarlostudybasedon(8.1.9)illustratestheplusesandminusesoftheestimatorsandthe
extenttowhichasimpleruleofthumbgoesalongwaytowardsamelioratingthebiasofthe
HC
class.We
choose
N
=30
tohighlightsmallsampleissues,and
r
=0
:
9
,whichimplies
h
ii
=10
=N
=1
=
3
if
d
i
=1
.
Thisisahighlyunbalanceddesign.Wedraw
"
i

8



:
N
(0
;
2
)
if
d
i
=0
N
(0
;
1)
if
d
i
=1
6
Noticethat
HC
2
isanunbiasedestimatorofthe
samplingvariance,
whilethemeanofthe
HC
2
standarderrors
across
samplingexperiments(0.52)isstillbelowthestandarddeviationof
b

(0.59).Thiscomesfromthefactthatthestandard
erroristhesquarerootofthesamplingvariance,thesamplingvarianceisitselfestimatedandhencehassamplingvariability,
andthesquarerootisaconcavefunction.
7
ThelargesamplingvarianceofrobuststandarderrorestimatorsisnotedbyChesherandAustin(1991).Kauermannand
Carroll(2001)proposeanadjustmenttocon…denceintervalstocorrectforthis.
230
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
empiricalrejectionratesarehigherforthesetwothanforconventionalstandarderrors.Thismeansthe
Max
(
HC
j
;Conventional
)
:
The
HC
1
standarderrorsbelowconventionalOLSstandarderrors
whenworkingwithschool-leveldata.Asarule,evenifyoustartwiththemicrodataonindividuals,when
theregressorofinterestvariesatahigherlevelofaggregation-aschool,state,orsomeothergroupor
cluster-e¤ectivesamplesizesaremuchclosertothenumberofclustersthantothenumberofindividuals.
8
Yang,Hsu,andZhao(2005)formalizethenotionoftestproceduresbasedonthemaximumofaasetofteststatisticswith
di¤eringe¢ciencyandrobustnessproperties.
8.2.CLUSTERINGANDSERIALCORRELATIONINPANELS
231
8.2ClusteringandSerialCorrelationinPanels
8.2.1ClusteringandtheMoultonFactor
9
Thissortofresidualcorrelationstructureisalsoaconsequenceofstrati…edsampling(see,e.g.,Wooldridge,2003).Most
ofthesamplesthatweworkwitharecloseenoughtorandomthatwetypicallyworrymoreaboutthedependenceduetoa
groupstructurethanclusteringduetostrati…cation.
232
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
zeitgeistuntilabout15yearsago.
Giventheerrorstructure,(8.2.3),theintra-classcorrelationcoe¢cientbecomes

=

2
v

2
v
+

2

:
where

2
v
isthevarianceof
v
g
and

2

isthevarianceof

ig
.Awordonterminology:

iscalledthe
intra-class
correlationcoe¢cient
evenwhenthegroupsofinterestarenotclassrooms.
V
c
(
b

1
)
=1+(
n

1)
;
(8.2.4)
aformuladerivedintheappendixtothischapter.WecallthesquarerootofthisratiotheMoultonfactor,
afterMoulton’s(1986)in‡uentialstudy.Equation(8.2.4)tellsushowmuchweover-estimateprecisionby
ignoringintra-classcorrelation.Conventionalstandarderrorsbecomeincreasinglymisleadingas
n
and

increase.Suppose,forexample,that

=1
.Inthiscase,alltheerrorswithinagrouparethesame,so
the
y
ig
V
c
(
b

1
)
=1+

V
(
n
g
)
n
+
n

1


x
;
(8.2.5)
10
Withnon-stochasticregressorsandhomoscedasticresiduals,theMoultonfactorisa…nite-sampleresult.Surveystatisticians
calltheMoultonfactorthe
designe¤ect
becauseittellsushowmuchtoadjuststandarderrorsinstrati…edsamplesfordeviations
fromsimplerandomsampling(Kish,1965).
8.2.CLUSTERINGANDSERIALCORRELATIONINPANELS
233
where
n
istheaveragegroupsize,and

x
istheintra-classcorrelationof
x
ig
:

x
=
P
g
P
i
6
=
k
(
x
ig

x
)(
x
kg

x
)
V
(
x
ig
)
P
g
n
g
(
n
g

1)
:
Notethat

x
doesnotimposeavariance-componentsstructurelike(8.2.3)-here,

x
isagenericmeasure
ofthecorrelationofregressorswithingroups.ThegeneralMoultonformulatellsusthatclusteringhasa
biggerimpactonstandarderrorswithvariablegroupsizesandwhen

x
islarge.Theimpactvanisheswhen

x
=0
:
Inotherwords,ifthe
x
ig
’sareuncorrelatedwithingroups,thegroupederrorstructuredoesnot
matterfortheestimationofstandarderrors.That’swhyweworrymostaboutclusteringwhentheregressor
ofinterestis…xedwithingroups.
Weillustrateformula(8.2.1)usingtheTennesseeSTARexample.AregressionofKindergartners’
percentilescoreonclasssizeyieldsanestimateof-0.62witharobust(
HC
1
)standarderrorof0.09.In
thiscase,

x
=1
becauseclasssizeis…xedwithinclasseswhile
V
(
n
g
)
ispositivebecauseclassesvaryinsize
(inthiscase,
V
(
n
g
)=17
:
1
).Theintra-classcorrelationcoe¢cientforresidualsis.31andtheaverageclass
sizeis19.4.Pluggingthesenumbersinto(8.2.1)givesavalueofabout7for
V
(
b

1
)
V
c
(
c

1
)
;
sothatconventional
standarderrorsshouldbemultipliedbyafactorof
2
:
65=
p
7
.Thecorrectedstandarderroristherefore
about0.24.
TheMoultonfactorworkssimilarlywith2SLSexceptthat

x
shouldbecomputedfortheinstrumental
variableandnottheregressor.Inparticular,use(8.2.5)replacing

x
with

z
,where

z
istheintra-class
correlationcoe¢cientoftheinstrumentalvariable(Shore-Sheppard,1996)and

istheintra-classcorrelation
ofthesecond-stageresiduals.Tounderstandwhythisworks,recallthatconventionalstandarderrorsfor
2SLSarederivedfromtheresidualvarianceofthesecond-stageequationdividedbythevarianceofthe
…rst-stage…ttedvalues.ThisisthesameasymptoticvarianceformulaasforOLS,with…rst-stage…tted
valuesplayingtheroleofregressor.
11
HerearesomesolutionstotheMoultonproblem:
11
Clusteringcanalsobeaprobleminregression-discontinuitydesignsifthevariablethatdeterminestreatmentassignment
variesonlyatagrouplevel
(
seeCardandLee,2008,fordetails
).
12
UseStata’s
loneway
command,forexample.
234
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
y
g
bethemeanof
y
ig
ingroup
g
.Estimate
y
g
=

0
+

1
x
g
+
e
g
byweightedleastsquaresusingthegroupsizeasweights.ThisisequivalenttoOLSusingmicrodata
butthestandarderrorsareasymptoticallycorrectgiventhegroupstructure,(8.2.3).Again,the
asymptoticsherearebasedonthenumberofgroupsandnotthegroupsize.Importantly,however,
becausethegroupmeansareclosetoNormallydistributedwithmodestgroupsizes,wecanexpectthe
good…nite-samplepropertiesofregressionwithNormalerrorstokickin.Thestandarderrorsthat
comeoutofgroupedestimationarethereforelikelytobemorereliablethanclusteredstandarderrors
insampleswithfewclusters.
Grouped-dataestimationcanbegeneralizedtomodelswithmicrocovariatesusingatwo-stepproce-
dure.Supposetheequationofinterestis
y
ig
=

0
+

1
x
g
+
w
0
ig

+
e
ig
;
(8.2.7)
where
w
0
ig
isavectorofcovariatesthatvarieswithingroups.Instep1,constructthecovariate-adjusted
groupe¤ects,

g
,byestimating
y
ig
=

g
+
w
0
ig

+

ig
:
8.2.CLUSTERINGANDSERIALCORRELATIONINPANELS
235
The

g
13
See,e.g.,AngristandLavy(2007)foranexampleofthelattertwoweightingschemes.
236
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
8.2.2SerialCorrelationinPanelsandDi¤erence-in-Di¤erenceModels
Serialcorrelation-thetendencyforoneobservationtobecorrelatedwiththosethathavegonebefore-used
tobeSomebodyElse’sProblem,speci…cally,theunfortunatesoulswhomaketheirlivingoutoftimeseries
y
s
=
NJ;t
=
Nov

y
s
=
NJ;t
=
Feb
)

(
y
s
=
PA;t
=
Nov

y
s
=
PA;t
=
Feb
)
:
Thisestimatorisunbiasedsince
E
[
v
st
]=
E
[

ist
]=0
:
Ontheotherhand,assumingwethinkofprobability
limitsasincreasinggroupsizewhilekeepingthechoiceofstatesandperiods…xed,state-yearshocksrender
14
The
SomebodyElse’sProblem(SEP)Field
,…rstidenti…edasanaturalphenomenoninAdams’
Life,theUniverse,and
Everything,i
s,accordingtoWikipedia,"ageneratedenergy…eldthata¤ectsperception...Entitieswithinthe…eldwillbe
perceivedbyanoutsideobserveras’SomebodyElse’sProblem’,andwillthereforebee¤ectivelyinvisibleunlesstheobserveris
speci…callylookingfortheentity."
8.2.CLUSTERINGANDSERIALCORRELATIONINPANELS
237
^

CK
inconsistent:
plim
^

CK
=

+
f
(
v
s
=
NJ;t
=
Nov

v
s
=
NJ;t
=
Feb
)

(
v
s
=
PA;t
=
Nov

v
s
=
PA;t
=
Feb
)
g
:
AveraginglargerandlargersampleswithinNewJerseyorPennsylvaniainagivenperioddoesnothingto
eliminatetheregionalshocksspeci…ctoagivenlocationandperiod.Withonlytwostatesandyears,we
havenowaytodistinguishthedi¤erences-in-di¤erencesgeneratedbyapolicychangefromthedi¤erence-in-
d¤erencesduetothefactthat,say,theNewJerseyeconomywasholdingsteadyin1992whilePennsylvania
wasexperiencingamildcyclicaldownturn.Wecanthinkofthepresenceof
v
st
asafailureofthecommon
trendsassumptiondiscussedinSection5.2.
Thesolutiontotheinconsistencyinducedbyrandomshocksindi¤erencesindi¤erencesmodelsistohave
eithermultipletimeperiodsormanystates(orboth).Forexample,Card(1992)uses51statestostudy
minimumwagechangeswhileCardandKrueger(2000)takeanotherlookattheNewJersey-Pennsylvania
experimentwithalongermonthlytimeseriesofpayrolldata.Withmultiplestatesand/orperiods,wecan
hopethatthe
v
st
averageouttozero.Asinthe…rstpartofthischapterontheMoultonproblem,the
inferenceframeworkinthiscontextreliesonasymptoticdistributiontheorywithmanygroupsandnoton
groupsize(or,atleast,notongroupsizealone).Themostimportantinferenceissuethenbecomesthe
behaviorof
v
st
.Inparticular,ifwearepreparedtoassumethatshocksareindependentacrossstatesand
overtime-i.e.,theyareseriallyuncorrelated-wearebacktotheplain-vanillaMoultonprobleminSection
8.2.1,inwhichcasewewouldclusterbystate

year.Butinmostcases,theassumptionthat
v
st
isserially
uncorrelatedishardtodefend.Almostcertainly,forexample,regionalshocksarehighlyseriallycorrelated:
ifthingsarebadinPennsylvaniainonemonth,theyarelikelytobejustaboutasbadinthenext.
TheconsequencesofserialcorrelationforclusteredpanelsarehighlightedbyBertrand,Du‡o,andMul-
lainathan(2004)andKézdi(2004).Anyresearchdesignwithagroupstructurewherethegroupmeansare
correlatedcanbesaidtohavetheserialcorrelationproblem.Theupshotofrecentworkonserialcorrelation
indatawithagroupstructureisthat,justaswemustadjustourstandarderrorsforthecorrelation
within
groupsinducedbythepresenceof
v
st
,wemustfurtheradjustforserialcorrelationinthe
v
st
themselves.
Thereareanumberofwaystodothis,notallequallye¤ectiveinallsituations.Itseemsfairtosaythat
thequestionofhowbesttoapproachtheserialcorrelationproblemiscurrentlyunderstudyandaconsensus
v
s
.Nevertheless,
v
st

v
s
isprobablystillseriallycorrelated.Clusteringatthestateleveltakesaccount
238
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
ofthissincetheone-level-upclusteredcovarianceestimatorallowsforcompletelynon-parametricresidual
correlationwithinclusters-includingthetimeseriescorrelationin
v
st

v
s
.Thisisaquickandeasy…x.
Theproblemhere,asyoumighthaveguessed,isthatpassingthebuckuponelevelreducesthenumberof
clusters.Andasymptoticinferencesupposeswehavealargenumberofclustersbecauseweneedalotof
v
s
and
v
st

1

v
s
reasonablywell.Fewclusters
meansbiasedstandarderrorsandmisleadinginferences.
8.2.3Fewerthan42clusters
BiasfromfewclustersisariskinboththeMoultonandtheserialcorrelationcontextsbecauseinbothcases
inferenceiscluster-based.Withfewclusters,wetendtounderestimateeithertheserialcorrelationina
randomshocklike
v
st
ortheintra-classcorrelation,

,intheMoultonproblem.Therelevantdimensionfor
countingclustersintheMoultonproblemisthenumberofgroups,
G
.Inadi¤erences-in-di¤erencesscenario
whereyou’dliketoclusteronstate(orsomeothercross-sectionaldimension),therelevantdimensionfor
countingclustersisthenumberofstatesorcross-sectionalgroups.Therefore,followingDouglasAdam’s
dictumthattheultimateanswertolife,theuniverse,andeverythingis42,webelievethequestionis:How
manyclustersareenoughforreliableinferenceusingastandardclusteradjustmentderivedfrom(8.2.6)?
If42isenoughforthestandardclusteradjustmenttobereliable-andlessistoofew-thenwhatshould
8.2.CLUSTERINGANDSERIALCORRELATIONINPANELS
239
correlationproblem.
15
2.Recognizingthatthefundamentalunitofobservationisaclusterandnotanindividualunitwithin
clusters,BellandMcCa¤rey(2002)andDonaldandLang(2007)suggestthatinferencebebasedon
a
t
-distributionwith
G

k
degreesoffreedomratherthanonthestandardNormaldistribution.For
small
G
,thismakesabigdi¤erence-con…denceintervalswillbemuchwider,therebyavoidingsome
mistakes.Cameron,Gelbach,andMiller(2008)reportMonteCarloexampleswherethecombination
ofaBRLadjustmentanduseof
t
-tablesworkswell.
3.DonaldandLang(2007)arguethatestimationusinggroupmeansworkswellwithsmall
G
inthe
15
Thematrix
A
g
isnotunique;therearemanysuchdecompositions.BellandMcCa¤rey(2002)usethesymemtricsquare
rootof
(
I

H
g
)

1
or
A
g
=
P

1
=
2
where
P
isthematrixofeigenvectorsof
(
I

H
g
)

1
,

isthediagonalmatrixofthecorrepondingeigenvalues,and

1
=
2
isthe
diagonalmatrixofthesquarerootsoftheeigenvalues.OneproblemwiththeBellandMcCa¤reyadjustmentisthat
(
I

H
g
)
maynotbeoffullrank,andhencetheinversemaynotexistforalldesigns.Thishappens,forexample,whenoneofthe
regressorsisadummyvariablewhichisoneforexactlyoneoftheclusters,andzerootherwise.ThisincludesthepanelDD
modeldiscussedbyBertrandetal.(2004),whereyouincludeafullsetofstatedummiesandclusterbystate.Moreover,the
eigenvaluedecompositionisimplementedformatriceswhicharethesizeofthegroups.Inmanyapplications,groupsizesare
largeenoughthatthisbecomescomputationallyintractible.
16
DonaldandLang(2007)discussserialcorrelationexampleswheretheregressoris…xedwithintheclusteringdimension,
butthisisnotthetypicaldi¤erences-in-di¤erencessetup.
240
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
4.Cameron,Gelbach,andMiller(2008)reportthatsomeformsofablockbootstrapworkwellwith
smallnumbersofgroups,andthattheblockbootstraptypicallyoutperformsStata-clusteredstandard
errorswithoutthebiascorrection.ThisappearstobetruebothfortheMoultonandserialcorrelation
problems.ButCameron,Gelbach,andMiller(2008)focusonrejectionratesusing(pivotal)test
statistics,whileweliketoseestandarderrors.
8.3.APPENDIX:DERIVATIONOFTHESIMPLEMOULTONFACTOR
241
8.3Appendix:DerivationofthesimpleMoultonfactor
Write
y
g
=
2
6
6
6
6
6
6
6
4
y
1
g
y
2
g
.
.
.
y
n
g
g
3
7
7
7
7
7
7
7
5
e
g
=
2
6
6
6
6
6
6
6
4
e
1
g
e
2
g
.
.
.
e
n
g
g
3
7
7
7
7
7
7
7
5
and
y
=
2
6
6
6
6
6
6
6
4
y
1
y
2
.
.
.
y
G
3
7
7
7
7
7
7
7
5
x
=
2
6
6
6
6
6
6
6
4

1
x
1

2
x
2
.
.
.

G
x
G
3
7
7
7
7
7
7
7
5
e
=
2
6
6
6
6
6
6
6
4
e
1
e
2
.
.
.
e
G
3
7
7
7
7
7
7
7
5
where

g
isacolumnvectorof
n
g
onesand
G
isthenumberofgroups.Notethat
E
(
ee
0
)= =
2
6
6
6
6
6
6
6
4

1
0

0
0
2
.
.
.
.
.
.
.
.
.
0
0

0
G
3
7
7
7
7
7
7
7
5

g
=

2
e
2
6
6
6
6
6
6
6
4
1




1
.
.
.
.
.
.
.
.
.




1
3
7
7
7
7
7
7
7
5
=

2
e

(1


)
I
+

g

0
g

;
where

=

2
v

2
v
+

2

:
Now
X
0
X
=
X
g
n
g
x
g
x
0
g
X
0

X
=
X
g
x
g

0
g

g

g
x
0
g
:
But
x
g

0
g

g

g
x
0
g
=

2
e
x
g

0
g
2
6
6
6
6
6
6
6
4
1+(
n
g

1)

1+(
n
g

1)


1+(
n
g

1)

3
7
7
7
7
7
7
7
5
x
0
g
=

2
e
n
g
[1+(
n
g

1)

]
x
g
x
0
g
:
242
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
8.3.APPENDIX:DERIVATIONOFTHESIMPLEMOULTONFACTOR
243
Table8.1.1:MonteCarloresultsforrobuststandarderrors
Empirical5%
RejectionRates
MeanStandardNormal
t
Deviation
(1)(2)(3)(4)
C.NoHeteroskedasticity
^

1
-0.0030.611
StandardErrors:
Conventional0.6040.0810.0610.05
HC00.4530.190.2090.193
HC10.4860.2030.1850.171
HC20.5570.2470.150.136
HC30.6670.3090.110.1
max(Conventional,HC0)0.6290.1090.0550.045
max(Conventional,HC1)0.640.1220.0530.044
max(Conventional,HC2)0.6790.1660.0470.039
max(Conventional,HC3)0.7540.2370.0390.031
Note:Thetablereportsresultsfromasamplingexperimentwith
25,000replications.
244
CHAPTER8.NONSTANDARDSTANDARDERRORISSUES
Table8.2.1:Standarderrorsforclasssizee¤ectsintheSTARdata
Standard
Error
Robust(HC1)0.09
Note:Thetablereportsestimatesfromaregressionofaveragepercentilescoreson
Lastwords
246
LASTWORDS
Acronyms
Technicalterms
2SLS
TwoStageLeastSquares,anInstrumentalVariables(IV)estimator(89)
ACR
AverageCausalResponse,theweightedaveragecausalresponsetoanorderedtreatment(136)
ANOVA
AnalysisofVariance,adecompositionoftotalvarianceintothevarianceoftheConditionalEx-
pectationFunction(CEF)andtheaverageconditionalvariance(26)
BRL
BiasedReducedLinearizationestimator,abias-correctedcovariancematrixestimatorforclustered
data(238)
CDF
CumulativeDistributionFunction,theprobabilitythatarandomvariabletakesonavaluelessthan
orequaltoagivennumber(72)
CEF
ConditionalExpectationFunction,thepopulationaverageof
y
i
with
X
i
held…xed(23)
CIA
248
ACRONYMS
GMM
249
SSIV
Split-SampleInstrumentalVariablesestimator,aversionoftheTwo-SampleInstrumentalVariables
(TSIV)estimator(111)
TSIV
Two-SampleInstrumentalVariablesestimator,anInstrumentalVariables(IV)estimatorthatcan
IPUMS
IntegratedPublicUseMicrodataSeries,consistentlycodedsamplesofcensusrecordsfromtheUS
andothercountries(24)
NHIS
NationalHealthInterviewSurvey,alargeAmericansurveywithmanyquestionsrelatedtohealth
(10)
NLSY
NationalLongitudinalSurveyofYouth,along-runningpanelsurveythatstartedwithahigh-school-
agedcohort(46)
PSAT
PreliminarySAT,quali…esAmericanhighschoolsophomoresforaNationalMeritScholarship(189)
PSID
PanelStudyofIncomeDynamics,apanelsurveyofAmericanhouseholdsbegunin1968(64)
QOB
QuarterofBirth(92)
RSN
RandomSequenceNumbers,draftlotterynumbersrandomlyassignedtodatesofbirthintheVietnam-
eradraftlotteriesheldfrom1970-73(95)
SDA
ServiceDeliveryArea,oneofthe649siteswhereJobTrainingPartnershipAct(JTPA)serviceswere
delivered(217)
SSA
SocialSecurityAdministration(110)
250
ACRONYMS
StudyNames
HIE
HealthInsuranceExperimentconductedbytheRANDCorporation,arandomizedtrialinwhich
participantswereexposedtoinsuranceprogramswithdi¤erentfeatures(70)
HRT
HormoneReplacementTherapy,aninterventiondesignedtoreducethesymptomsofmenopause(12)
JSA
JobSearchAssistance,partoftheJobTrainingPartnershipAct(JTPA)(218)
JTPA
JobTrainingPartnershipAct,alargefederaltrainingprogramwhichincludedarandomizedevalu-
ation(119)
MDVE
MinneapolisDomesticViolenceExperiment,arandomizedtrialinwhichpoliceresponsetoado-
EmpiricalStudiesIndex
Note:Pagenumbersbelowrefertotextlocationswherekeyelementsofthestudyaredescribed.
Abadie,Angrist,andImbens(2002)
ConstructsQTE(IV)estimatesofthee¤ectofsubsidizedJTPA
trainingonthedistributionoftraineeearnings.DiscussedinSection7.2.1.ResultsappearinTable
7.2.1.
AcemogluandAngrist(2000)
UsescompulsoryschoolinglawsandquarterofbirthtoconstructIV
252
EMPIRICALSTUDIESINDEX
Angrist,Chernozhukov,Fernandez-Val(2006)
ShowsthatquantileregressionestimatesaMMSEap-
proximationtoanonlinearCQF,andillustratesthequantileregressionapproximationpropertyby
estimatingthee¤ectsofschoolingonthedistributionofwages.DiscussedinSection7.1.2.Results
appearinTable7.1.1andFigure7.1.1.
Autor(2003)
UsesstatevariationinemploymentprotectionlawstoconstructDDestimatesofthee¤ect
253
254
NOTATION
Notation
X
i
a
k

1
vectorofcovariates,withelementsx
ki
;
k
=1
;:::;
k
x
i
thesingleregressorinabivariateregression
y
i
anoutcomeordependentvariable
"
i

y
i

E
[
y
i
j
X
i
]
,theCEFresidual


argmin
b
E
h

y
i

X
0
i
b

2
i
,thepopulationregressionvector;

=
E

X
i
X
0
i


1
E
[
X
i
y
i
]
e
i

y
i

X
0
i

,apopulationregressionresidual
~
x
ki
theresidualfromaregressionofregressor
x
ki
onallothercovariatesinthemodel
w
i
theinverseprobabilityofsamplingobservation
i
^



P
i
X
i
X
0
i


1
P
i
X
i
y
i
,theOLSestimator
b
e
i

y
i

X
0
i
^

,theestimatedresidual
f
i
(
s
)
255




0


0
,thevectorofcoe¢cientsina2SLSsecondstageequation,wherethe
coe¢cientofinterestis

^

2
SLS

[
P
i
V
i
V
0
i
]

1
P
i
V
i
y
i
,a2SLSestimator
=[
W
0
P
Z
W
]

1
W
0
P
Z
y
y
i
(
d;z
)
thepotentialoutcomeofindividual
i
werethispersontohavetreatmentstatus
d
i
=
d
andinstrumentvalue
z
i
=
z
.

i

y
1
i

y
0
i
,theindividualtreatmente¤ectinarandomcoe¢cientssetupwitha
binarytreatment
d
i

1
i
heterogeneouscausale¤ectofaninstrumenton
d
i
1

P
(
z
i
=1
j
X
i
)

(1

d
i
)
z
i
P
(
z
i
=1
X
i
)
,theweightusedto…ndtheexpec-
tationofanyfunctionofthedataforcompliers

i
errorterminacausalmodel,e.g.,
y
i
=
x
i
+

i

i
errortermina1ststageregression,e.g.,
x
i
=
Z
0
i

+

i
"
it
;"
ist
populationregressionerrorsinpaneldatainchapter5
[

]
standardnormalcumulativedistributionfunction(CDF)

[

]
standardnormaldensity

b
(

;

;

"
)
bivariatestandardnormalCDFwithcorrelationcoe¢cient

"
y
it

h
observationonthedependentvariable
h
periodsago

di¤erenceoperator,e.g.

y
it
=
y
it

y
it

1
F
Y
(
y
j
X
i
)
thedistributionfunctionfor
y
i
conditionalon
X
i
.
Q

(
y
i
j
X
i
)

F

1
Y
(

j
X
i
)
,conditionalquantilefunction(CQF)


(
u
)=(


1(
u

0))
u
,checkfunction,theexpectationofwhichisminimizedbythe
CQF



argmin
b
E



(
y
i

X
0
i
b
)

,populationquantileregressionvector


(
X
i
;

)

X
0
i



Q

(
y
i
j
X
i
)
,quantileregressionspeci…cationerror

asymptoticcovariancematrixoftheOLSestimator


E
[
ee
0
]
,variancematrixofresiduals,withdiagonalelements

i
e
ig


g
+

ig
,errorwithagroupstructureinchapter8
^

c
=(
X
0
X
)

1

P
b
e
2
i
N

,conventionalvarianceestimator
^

r
=(
X
0
X
)

1

P
X
i
X
i
b
e
2
i
N

(
X
0
X
)

1
,robustvarianceestimator
H

X
(
X
0
X
)

1
X
0
,covariateprojectionmatrix
h
ii

X
0
i
(
X
0
X
)

1
X
i
,the
leverage
ofthe
i
thobservation,the
i
thdiagonalelementof
H
M

I
N

H
,theresidualmakermatrix
256
NOTATION
References
Abadie,Alberto
(2008):“Bias-CorrectedMatchingEstimatorsforAverageTreatmentE¤ects,”HarvardUniversity,
DepartmentofEconomics,mimeo.
Acemoglu,Daron,
and
JoshuaAngrist
(1990):
(1995):
MostlyHarmless
.HarmonyBooks,NewYork.
Altonji,JosephG.,
and
LewisM.Segal
(1996):“Small-SampleBiasinGMMEstimationofCovariance
Structures,”
JournalofBusinessandEconomicStatistics
,14,353–366.
Ammermueller,Andreas,
and
Jorn-SteffanPischke
(2006):“PeerE¤ectsinEuropeanPrimary
Schools:EvidencefromPIRLS,”InstitutefortheStudyofLabor(IZA),DiscussionPaperNo.2077.
257
258
REFERENCES
Ananat,Elizabeth,
and
GuyMichaels
(2008):“TheE¤ectofMaritalBreakupontheIncomeDistri-
butionofWomenwithChildren,”
JournalofHumanResources
,forthcoming.
Anderson,Michael
(2008):“MultipleInferenceandGenderDi¤erencesintheE¤ectofEarlyInterven-
tion:AReevaluationoftheAbecedarian,PerryPreschool,andEarlyTrainingProjects,”
Journalofthe
AmericanStatisticalAssociation
,forthcoming.
(1991):“GroupedDataEstimationandTestinginSimpleLaborSupplyModels,”
Journalof
(1998):“EstimatingtheLaborMarketImpactonVoluntaryMilitaryServiceUsingSocialSecurity
DataonMilitaryApplicants,”
Econometrica
,66,249–288.
(2001):“EstimationsofLimitedDependentVariableModelswithDummyEndogenousRegressors:
SimpleStrategiesforEmpiricalPractice,”
JournalofBusinessandEconomicStatistics
,19,2–16.
(2004):“AmericanEducationResearchChangesTrack,”
OxfordReviewofEconomicPolicy
,20,
198–212.
REFERENCES
259
Angrist,JoshuaD.,
and
JinyongHahn
(2004):“WhentoControlforCovariates?PanelAsymptotics
forEstimatesofTreatmentE¤ects,”
ReviewofEconomicsandStatistics
,86,58–72.
Angrist,JoshuaD.,GuidoImbens,
and
DonaldB.Rubin
(1996):“Identi…cationofCausalE¤ects
UsingInstrumentalVariables,”
JournaloftheAmericanStatisticalAssociation
,91,444–472.
Angrist,JoshuaD.,
and
GuidoW.Imbens
(1995):“Two-StageLeastSquaresEstimationofAver-
ageCausalE¤ectsinModelswithVariableTreatmentIntensity,”
JournaloftheAmericanStatistical
Association
,90,430–442.
Angrist,JoshuaD.,
and
AlanB.Krueger
(1991):“DoesCompulsorySchoolingAttendanceA¤ect
SchoolingandEarnings?,”
QuarterlyJournalofEconomics
,106,976–1014.
(1992):“TheE¤ectofAgeatSchoolEntryonEducationalAttainment:AnApplicationofInstru-
mentalVariableswithMomentsfromTwoSamples,”
JournaloftheAmericanStatisticalAssociation
,418,
328–36.
(1995):“Split-SampleInstrumentalVariablesEstimatesoftheReturntoSchooling,”
Journalof
BusinessandEconomicStatistics
,13,225–35.
(1999):“EmpiricalStrategiesinLaborEconomics,”in
HandbookofLaborEconomics
,ed.by
OrleyC.Ashenfelter,
and
DavidCard,vol.3.NorthHolland,Amsterdam.
(2001):“InstrumentalVariablesandtheSearchforIdenti…cation:FromSupplyandDemandto
NaturalExperiments,”
JournalofEconomicPerspectives
,15,69–85.
Angrist,JoshuaD.,
and
GuidoKuersteiner
(2007):“TheE¤ectsofHighStakesHighSchoolAchievementAwards:EvidencefromaGroup-
260
REFERENCES
Arellano,Manuel,
and
StephenBond
(1991):“SomeTestsofSpeci…cationforPanelData:Monte
CarloEvidenceandanApplicationtoEmploymentEquations,”
TheReviewofEconomicStudies
,58,
277–297.
Ashenfelter,OrleyA.
(1978):“EstimatingtheE¤ectofTrainingProgramsonEarnings,”
Reviewof
EconomicsandStatistics
,60,47–57.
REFERENCES
261
Bennedsen,Morten,KasperM.Nielsen,FranciscoPérez-González,
and
DanielWolfenzon
(2007):“InsidetheFamilyFirm:TheRoleofFamiliesinSuccessionDecisionsandPerformance,”
The
QuarterlyJournalofEconomics
,122,647–692.
Bertrand,Marianne,EstherDuflo,
and
SendhilMullainathan
(2004):“HowMuchShouldWe
TrustDi¤erences-in-Di¤erencesEstimates?,”
QuarterlyJournalofEconomics
,119,249–275.
Bertrand,Marianne,
and
SendhilMullainathan
(2004):“AreEmilyandGregMoreEmployable
thanLakishaandJamal?AFieldExperimentonLaborMarketDiscrimination,”
TheAmericanEconomic
Review
,94,991–1013.
Besley,Timothy,
and
RobinBurgess
(2005):
LaborEconomics
,3rdedn.McGraw-Hill/Irwin,NewYork.
Bound,John,DavidJaeger,
and
ReginaBaker
(1995):“ProblemswithInstrumentalVariablesEsti-
mationwhentheCorrelationbetweentheInstrumentsandtheEndogenousVariablesisWeak,”
Journal
ofAmericanStatisticalAssociation
,90,443–450.
Bound,John,
and
GarySolon
(1999):“DoubleTrouble:OntheValueofTwins-basedEstimationof
262
REFERENCES
Bronars,StephenG.,
and
JeffGrogger
(1994):“TheEconomicConsequencesofUnwedMotherhood:
UsingTwinBirthsasaNaturalExperiment,”
AmericanEconomicReview
,84,1141–1156.
Buchinsky,Moshe
(1994):“ChangesintheU.S.WageStructure1963-1987:ApplicationofQuantile
Regression,”
(1995):“Earnings,SchoolingandAbilityRevisited,”in
ResearchinLaborEconomics
,ed.by
SolomonW.Polachek,vol.14,pp.23–48.JAIPress,Greenwich,Connecticut.
(1999):“TheCausalE¤ectofEducationonEarnings,”in
HandbookofLaborEconomics
,ed.by
OrleyC.Ashenfelter,
and
DavidCard,vol.3.NorthHolland,Amsterdam.
Card,David,
and
AlanKrueger
(1994):“MinimumWagesandEmployment:ACaseStudyoftheFast
FoodIndustryinNewJerseyandPennsylvania,”
AmericanEconomicReview
,84,772–784.
(2000):“MinimumWagesandEmployment:ACaseStudyoftheFast-FoodIndustryinNewJersey
andPennsylvania:Reply,”
AmericanEconomicReview
,90,1397–420.
Card,David,
and
DavidS.Lee
(2008):“RegressionDiscontinuityInferencewithSpeci…cationError,”
REFERENCES
263
Chamberlain,Gary
(1977):“Education,Income,andAbilityRevisited,”
(1984):“PanelData,”in
(1994):“QuantileRegression,CensoringandtheStructureofWages,”in
ProceedingsoftheSixth
(2007):“ASimpleApproachtoHeteroskedasticityandAutocorrelationRobustInferencewithWeak
264
REFERENCES
REFERENCES
265
Elder,ToddE.,
and
DarrenH.Lubotsky
(2008):“KindergartenEntranceAgeandChildren’sAchieve-
ment:ImpactsofStatePolicies,FamilyBackground,andPeers,”
JournalofHumanResources
,forthcom-
ing,forthcoming.
Finn,JeremyD.,
and
CharlesM.Achilles
(1990):“AnswersandQuestionsAboutClassSize:A
StatewideExperiment,”
AmericanEducationalResearchJournal
,28,557–77.
Firpo,Sergio
(1991):
266
REFERENCES
Griliches,Zvi,
and
JerryA.Hausman
(1986):“Errorsinvariablesinpaneldata,”
JournalofEcono-
metrics
,31,93–118.
Griliches,Zvi,
and
WilliamM.Mason
(1972):“Education,Income,andAbility,”
JournalofPolitical
Economy
,80,S74–S103.
Grumbach,Kevin,DennisKeane,
and
AndrewBindman
(1993):“PrimaryCareandPublicEmer-
gencyDepartmentOvercrowding,”
AmericanJournalofPublicHealth
,83,372–378.
Guryan,Jonathan
(2004):“DesegregationandBlackDropoutRates,”
AmericanEconomicReview
,94,
919–943.
Haavelmo,Trygve
(2007b):“GeneralizedLeastSquaresInferenceinPanelandMultilevelModelswithSerialCorre-
lationandFixedE¤ects,”
(1983):“Speci…cationandEstimationofSimultaneousEquationModels,”in
HandbookofEcono-
metrics
,ed.byZviGriliches,
and
MichaelIntriligator,vol.1,pp.391–448.NorthHolland,Amsterdam.
REFERENCES
267
Heckman,JamesJ.
(1978):“DummyEndogenousVariablesinaSimultaneousEquationsSystem,”
Econo-
metrica
,46,695–712.
Heckman,JamesJ.,HidehikoIchimura,
and
(2001):“TheBootstrap,”in
268
REFERENCES
Imbens,Guido
(2000):“TheRoleofthePropensityScoreinEstimatingDose-ResponseFunctions,”
Bio-
metrika
,87,706–10.
REFERENCES
269
Koenker,Roger,
and
StephenPortnoy
(1996):“QuantileRegression,”UniversityofIllinoisatUrbana-
Champaign,CollegeofCommerceandBusinessAdministration,O¢ceofResearch,WorkingPaperNo.
97-0100.
Krueger,AlanB.
(1999):“ExperimentalEstimatesofEducationProductionFunctions,”
QuarterlyJour-
nalofEconomics
,114,497–532.
Kugler,Adriana,JuanF.Jimeno,
and
VirginiaHernanz
(2005):“EmploymentConsequencesof
270
REFERENCES
Mariano,RobertoS.
(2001):“SimultaneousEquationModelEstimators:StatisticalProperties,”in
A
REFERENCES
271
(1990b):“SomeFurtherResultsontheExactSmall-SamplePropertiesoftheInstrumentalVariable
Estimator,”
(1990):“SemiparametricE¢ciencyBounds,”
272
REFERENCES
Prais,SigJ.,
and
JohnAitchison
(1954):“TheGroupingofObservationsinRegressionAnalysis,”
Revuedel’InstitutInternationaldeStatistique/ReviewoftheInternationalStatisticalInstitute
,22,1–22.
Reiersol,Olav
(1995):
ObservationalStudies
.Springer-Verlag,NewYork.
Rosenbaum,PaulR.,
and
DonaldB.Rubin
(1983):“TheCentralRoleofthePropensityScorein
ObservationalStudiesforCausalE¤ects,”
(1985):“TheBiasDuetoIncompleteMatching,”
(1974):“EstimatingtheCausalE¤ectsofTreatmentsinRandomizedandNon-RandomizedStud-
ies,”
JournalofEducationalPsychology
,66,688–701.
(1977):“AssignmenttoaTreatmentGroupontheBasisofaCovariate,”
JournalofEducational
Statistics
,2,1–26.
(1991):“PracticalImplicationsofModesofStatisticalInferenceforCausalE¤ectsandtheCritical
RoleoftheAssignmentMechanism,”
REFERENCES
273
Smith,JeffreyA.,
and
(2005):“DoesMatchingOvercomeLaLonde’sCritiqueofNonexperimentalEstimators?,”
Journal
(1980b):“UsingLeastSquarestoApproximateUnknownRegressionFunctions,”
International
EconomicReview
,21,149–170.
(1982):“InstrumentalVariablesRegressionwithIndependentObservations,”
274
REFERENCES
Wooldridge,Jeffrey
(2005):“Fixed-E¤ectsandRelatedEstimatorsforCorrelatedRandom-Coe¢cientandTreatment-
E¤ectPanelDataModels,”
TheReviewofEconomicsandStatistics
,87,385–390.
(2006):
IntroductoryEconometrics:AModernApproach
.Thomson/South-Western,Mason,OH.
Wright,PhillipG.
(1928):
(1897):“OntheTheoryofCorrelation,”
(1899):“AnInvestigationintotheCausesofChangesinPauperisminEngland,Chie‡yDuringthe
LastTwoIntercensalDecades(PartI.),”

Приложенные файлы

  • pdf 24147900
    Размер файла: 2 MB Загрузок: 0

Добавить комментарий