Research report Safely Expanding Research Access to Administrative Tax Data: Creating a Synthetic Public Use File and a Validation Server
Leonard E. Burman, Alex Engler, Surachai Khitatrakun, James R. Nunns, Sarah Armstrong, John Iselin, Graham MacDonald
Display Date
File
File
Download Report
(475.35 KB)

Administrative tax data contain a wealth of information that is potentially valuable for research and analysis. However, the legal and ethical imperative to protect taxpayer privacy has restricted their access to a small number of government analysts and select researchers. We propose to develop, in consultation with the experts at the Statistics of Income Division of the Internal Revenue Service (IRS), a fully synthetic tax database—that is, a file that preserves many of the statistical characteristics of the restricted data without containing any identifiable tax return information. Working with the IRS, we also hope to develop a procedure for researchers to submit their statistical programs, which have been tested on the synthetic data, to run on IRS computers, subject to a review to guarantee that their output satisfies disclosure avoidance protocols. This paper discusses the current methodology used to produce public use datasets, surveys the literature on synthetic data and privacy protection, outlines our proposed plan to produce a synthetic file, and discusses special challenges.
Primary topic Individual Taxes
Research Area Federal Budget and Economy Individual Taxes