The recent data sharing Memorandum of Understanding (MOU) between the IRS and the Immigration and Customs Enforcement (ICE) runs a serious risk of causing US citizens and legal US residents to be mistakenly identified as undocumented immigrants. This could lead to their being mistakenly arrested, detained, and even deported, causing potentially irreparable harm.
The IRS and ICE should both take immediate steps to ensure taxpayer rights are protected.
The MOU requires sharing of data
In a major break with precedent, the MOU requires the agencies to exchange data about individuals. It cites an Internal Revenue Code section that requires the IRS to disclose information when it pertains to the enforcement of a specifically designated Federal criminal statute.
Under the terms of the MOU, ICE must provide the IRS with the names and addresses of individuals under investigation, the tax periods for which information is being requested, and a description of the criminal statutes for which they being investigated.
The IRS will match this information to administrative tax data and return unspecified information to ICE to aid their criminal investigation.
Putting aside questions about whether this use of IRS data is legal, there is high risk that information for taxpayers other than those identified by ICE may be inadvertently disclosed. There are at least two reasons for this.
Matching administrative records requires high quality, standardized data
All datasets, no matter how carefully constructed, contain errors and inconsistencies. These issues pose challenges when trying to link records across sources. Some discrepancies reflect styles. For example, a street might be listed as “North Street” in one dataset and abbreviated as “N St.” in another. Likewise, people may use the names on their birth certificates—for example, Jonathan or Juan—on some forms and their abbreviated or anglicized names—Jon or John—in other documents.
Transcription or processing errors occur, too. No matter how conscientious, employees will sometimes make typos when they enter information from paper documents into databases, a labor-intensive process that is still common at the IRS. (Perhaps at ICE too, but as a former career IRS employee, I am most familiar with the workings of our tax administrators.)
In addition, the IRS’s legacy databases may have limits on the number of characters a name field can store. Long names can be shortened (like Christopherson, Konstantinidou, or Garcíagonzález). Likewise, the IRS’s naming conventions may include rules that keep or drop spaces, suffixes, or special characters such as hyphens or apostrophes (like López-Hernández, or O’Brien).
Matching techniques will sometimes produce incorrect results
ICE does not seem to be providing any sort of unique, person-level identifier to IRS, so the agencies will need to rely on some sort of probabilistic, or fuzzy matching, techniques. Those techniques all have one thing in common—matches between files will be based on a “probability score” that an IRS and ICE record points to the same person. A higher score would indicate a higher degree of alignment between records. But scores can vary.
Good probabilistic matching requires extensive tidying of data, like standardizing abbreviations and spacing, or correcting spelling errors. And if using artificial intelligence (AI), it requires training and testing a model.
Data cleaning and AI training and testing both require staff time and partnership with subject matter experts to review potential matches. Those experts are best suited to confirm correct matches or identify false positives. They can then provide feedback that ultimately improves the matching methodology.
Recent staff cuts at the IRS leave few, if any, personnel to take on this important role. And time pressure to provide actionable results may shortchange these important steps.
Another issue is that ICE and IRS datasets have few common and therefore linkable variables. Consider that one apparent purpose of data sharing is to help ICE validate or update address information. A match based solely on a name could result in unlawful disclosure of sensitive data to ICE, especially for individuals with common names. (There were two Barry Johnson’s in my college dorm building and at least two working at IRS for many years.)
Consequently, there will always be a chance that two records that match only by name actually identify two separate people.
ICE must confirm what it learns from the IRS before using it for enforcement actions
The MOU does not specify what probability score will be sufficient to permit the IRS to disclose an individual’s confidential data to ICE, nor is it clear the Internal Revenue Code would permit to be much lower than 100 percent.
Policymakers need to consider what level of uncertainty would be acceptable when the consequences for individuals—including US citizens and legal US residents—could include arrest, detention and even deportation.
Given the potential consequences, ICE should take additional steps to validate any IRS-provided information, once received, before taking an enforcement action. This might include using data from other sources, such as a third-party verification service.
It should also ensure that any enforcement actions based on tax data allow the impacted individuals to challenge and review the data for accuracy as required under the Privacy Act of 1974 and standard due process. Hopefully, incautious actions will not trample the rights of innocent taxpayers.