On June 4, the US Commerce Department issued a Department Administrative Order (DAO) limiting the statistical data privacy methods that the US Census Bureau and Bureau of Economic Analysis (BEA) could use to protect their publicly released statistical products. The directive would move the disclosure review process closer to standards used for the 1980 Census, which the Census Bureau has previously said are insufficient—and demonstrated through simulated reconstruction and reidentification attacks—especially in today’s world of smart phones, social media, cloud computing, and AI.
The directive would also represent a significant departure from federal regulations that assign federal statistical agencies responsibility for selecting the data protection tools they judge most effective. Subject matter experts note that narrowing these options will likely result in the Commerce Department agencies releasing less detailed public data, making these statistical products less useful for governments, businesses, researchers, and the public.
Federal agencies must protect confidential data
Confidential data provided by individuals, businesses, non-profits, and governments—such as federal survey responses and administrative data like tax and wage information—are foundational for nearly all federal statistics.
Agencies are required to protect these data by securing data systems, limiting access to individual or household level, and applying statistical disclosure limitation methods—tools that help prevent people, businesses, or governments from being identified—to public data releases such as datasets, tables, and reports.
Federal guidance directs agencies to minimize disclosure risk while preserving the data’s usefulness. For example, the Census Bureau promises to “…use every technology, statistical methodology, and physical security procedure at [its] disposal to protect [data providers’] information.”
Data privacy tools include suppression, which redacts values or omits categories of data; coarsening, which reduces detail through rounding, grouping, or ranges; and noise infusion, which adds small random changes to data so individual records are harder to identify. In selecting data privacy methods, agencies must balance the availability of relevant statistics, accuracy of those statistics, and confidentiality. This is the triple tradeoff of official statistics.
Statistical disclosure limitation methods have evolved over time
In the 1980s, suppression and coarsening were the primary tools used to protect confidentiality, although noise infusion was already an accepted practice. But as computing power and the amount of personal data in the public domain grew, so did the risk of disclosure.
For example, in preparation for the 2020 Census, the US Census Bureau simulated an attack on the published 2010 Census tabulations. They demonstrated that, despite having used a mix of data privacy methods, it was possible to reconstruct the household records underlying nearly two-thirds of estimates at the Census block level.
In response to this finding, they could have responded by greatly reducing the detail released for 2020. Instead, the Census Bureau adopted a noise infusion framework that quantifies the amount of disclosure risk, providing a privacy guarantee based on a worst-case scenario. That approach enabled more detailed tabular releases and allowed the Census Bureau to report the effect of the updated statistical disclosure limitation framework on data quality more transparently than past decennial census releases.
More recently, noise infusion and related data privacy methods, such as data swapping and replacing outliers with imputed values, have been mainstays of products based on the decennial census, American Community Survey, the Survey of Income and Program Participation, and others.
But the Commerce Department directive would limit data privacy tools to suppression and coarsening—barring noise infusion for statistical products.
Detailed data are vital to evaluating and developing tax policy
Detailed statistical products based on Census data provide foundational inputs for the microsimulation models that organizations like the Urban Institute use to estimate how demographic, behavioral, and policy changes might affect individual outcomes. One such model is TPC’s large-scale microsimulation model, which produces estimates of how current and proposed tax policies will affect federal revenues and the distribution of tax burdens by income.
Current Census Bureau public data releases contain enough detail to support reliable tax model estimates at national and state levels, and for large cities. Less detail in certain key Census Bureau data sources, such as the Current Population Survey or Survey of Income and Program Participation, could impact the maintenance and updates of these tax models, like at TPC or similar ones at other research organizations and universities, limiting the types of analyses they traditionally support. For example, less detailed public data may reduce tax models’ ability to produce reliable estimates for smaller population groups, geographic areas, or other policy-relevant subpopulations, even where national-level estimates remain feasible and accurate.
Even federal agencies that have access to confidential tax microdata would be impacted by reduced detail in Census Bureau data releases. Tax models used by the Treasury Department and Congressional Joint Committee on Taxation use tax return data as inputs, but they still rely on Census Bureau public-use files to add economic and demographic information not available through the tax system.
Reductions in Census Bureau public releases due to the new statistical disclosure limitation directive would diminish the accuracy and usefulness of analyses on the budget and behavioral impacts of tax law changes, especially for relatively small population segments and less densely populated geographic regions. The Commerce Department’s mandate would make developing effective, fair tax policies more difficult.
Are there alternatives to Commerce Department data products?
If official public data releases from BEA and Census Bureau cease to meet user needs, there are a couple alternatives, but they are imperfect.
First, synthetic data derived from confidential records, and combined with a validation process, could provide a useful alternative if permitted under the new directive. The Census Bureau has a history of producing synthetic data and has indicated this approach might be appropriate for some American Community Survey data products.
Second, researchers with approved projects and appropriate security clearances could still gain restricted-use access to Census Bureau and some BEA individual-level data. Having these researchers work directly with confidential data can also result in new products and insights that might not otherwise be possible with existing staff levels.
But current hiring restrictions and tight budgets may limit these opportunities. In addition, some types of research, including tax policy modeling, are not well suited to a restricted-use environment, so there is still a need for high-quality public data.
What happens if the directive stands?
If the Commerce Department’s directive remains in place, the Census Bureau and BEA will have fewer tools to protect privacy while keeping public data useful. That will likely mean less detailed data releases for researchers, policymakers, businesses, and the public—including those trying to understand how tax policy affects different people and places. If so, the agencies will need to redouble efforts to make reliable data available while protecting confidentiality through other mechanisms.