Inference in Linear Regression Models with Many Covariates and Heteroscedasticity

Matias D. Cattaneo*, Michael Jansson, Whitney K. Newey

*Corresponding author for this work

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

4 Citations (Scopus)
107 Downloads (Pure)

Abstract

The linear regression model is widely used in empirical work in economics, statistics, and many other disciplines. Researchers often include many covariates in their linear model specification in an attempt to control for confounders. We give inference methods that allow for many covariates and heteroscedasticity. Our results are obtained using high-dimensional approximations, where the number of included covariates is allowed to grow as fast as the sample size. We find that all of the usual versions of Eicker–White heteroscedasticity consistent standard error estimators for linear models are inconsistent under this asymptotics. We then propose a new heteroscedasticity consistent standard error formula that is fully automatic and robust to both (conditional) heteroscedasticity of unknown form and the inclusion of possibly many covariates. We apply our findings to three settings: parametric linear models with many covariates, linear panel models with many fixed effects, and semiparametric semi-linear models with many technical regressors. Simulation evidence consistent with our theoretical results is provided, and the proposed methods are also illustrated with an empirical application. Supplementary materials for this article are available online.

Original languageEnglish
JournalJournal of the American Statistical Association
Volume113
Issue523
Pages (from-to)1350-1361
Number of pages12
ISSN0162-1459
DOIs
Publication statusPublished - 3 Jul 2018

Keywords

  • Heteroscedasticity
  • High-dimensional models
  • Linear regression
  • Many regressors
  • Standard errors

Fingerprint

Dive into the research topics of 'Inference in Linear Regression Models with Many Covariates and Heteroscedasticity'. Together they form a unique fingerprint.

Cite this