Opposites Agree on Data Mining's Importance and the Need for Controls

Physical Security
Opposites Agree on Data Mining's Importance and the Need for Controls
Screen shot of Principles of Government Data Mining

WASHINGTON - Data mining, critical in the fight against terror, needs both an image makeover and a government-wide policy to ensure transparency and protection of civil liberties, according to a panel of experts from across the ideological spectrum.

This week The Constitution Project—a nonprofit that brings together self-described “unlikely allies,” like conservative libertarian ex-congressman Bob Barr and leaders of the American Civil Liberties Union—issued a report titled Principles of Government Data Mining: Preserving Civil Liberties in the Information Age.

The report’s authors recommend that the government impose familiar privacy and transparency controls—such as those in many freedom-of-information laws and intelligence-led policing regulations—on the practice of data mining, which it defines as “any use of computing technology to examine large amounts of data to reveal relationships, classifications, or patterns.”

Speaking on the report’s recommendations during an event at the National Press Club in Washington, D.C., panelist Jim Harper of the libertarian Cato Institute explained the roots and fundamentals of data mining, which is a common and noncontroversial practice.

Data mining emerged in the early 1990s along with advancements in computing capabilities as a means for supermarket companies to spot trends to improve logistics and marketing. For example, while most shoppers may buy milk, eggs, and bread on every visit, a spike in unusual ingredients bought together, like collard greens and walnuts, may indicate the popularity of a new recipe or diet, Harper explained. Today, the companies track purchases primarily through loyalty discount card programs.

Harper divided data mining into two practices: link analysis and predictive analysis, providing examples.

A popular example of link analysis occurs in the common hypothetical of a police officer examining the “pocket litter” of an arrested suspect, say for a narcotics arrest. A piece of paper may show a phone number, which when searched in a law enforcement database, is found to belong to another member of the drug trade. A link between the two can then be established.

People who have received a phone call from their credit card company about potential suspicious activity on their account may have benefited from predictive analysis, Harper explained. Credit card companies have found a common pattern whereby a small purchase, such as $5 worth of gasoline, followed immediately by a very large purchase, such as a $3,000 flat-screen television, often indicates a thief testing to see if a stolen card is still active before making a large fraudulent purchase.

"These are fairly simple concepts that may be obscured by the term ‘data mining,’ which confuses people and sometimes makes them think something worse is going on,” Harper said.

The practice is a prisoner of its name, which implies digging beneath the surface—ostensibly for private data. It also bears the risks of implicating innocent people who are false positives and is associated with at least two controversial federal initiatives, both at the Department of Defense (DoD): Total Information Awareness and a project called Able Danger.

Total Information Awareness was the goal of the Defense Advanced Research Projects Agency’s Information Awareness Office (IAO), which sought to mine all available data on Americans to spot threats. The program's seal bore the Eye of Providence—God’s all-seeing eye from the $1 dollar bill, atop a pyramid casting its gaze on the entire planet Earth. Congress eliminated funding for the IAO in 2003, a year after its establishment.

In 2005, then U.S. Rep Curt Weldon of Pennsylvania alleged that Able Danger, an older, secret DoD data mining program, had identified 9-11 hijacker Mohammad Atta as a potential threat well before the attacks. While the Pentagon ordered personnel not to testify before Senate Judiciary Committee, which investigated the case, Weldon changed his story and the Senate committee found that the program had not identified Atta.

Addressing data mining’s baggage, Paul Pillar, a former CIA officer and a member of The Constitution Projects’ Liberty and Security Committee, which issued the report, explained that anyone who wants governments to “connect the dots” to fight terrorism supports data mining.

“‘Connecting the dots' is link analysis,” Pillar said, warning that even robust, constitutionally sound data mining is not a panacea against terrorism. “It’s a matter of improving the odds. It’s not a matter of certainty, it’s not a matter of predicting outcomes.”

Panelist Mary Ellen Callahan, chief privacy officer of the Department of Homeland Security, said that the critical protection against abuse and false positives in data mining is the moment of human intervention, when an investigator follows up a lead that mining software has generated. Yet she expressed the agency's support for disclosure and transparency in data mining programs.

“I think that these are key elements for people to keep working on, and to keep improving," Callahan said.

At its best, Pillar said, data mining identifies the innocent so law enforcement can focus on those who bear risk.

“We’re looking to improve the odds for the good guys in a business where there’s no certainty,” Pillar said.

The reports recommendations include:

  • ​Planning of government data mining programs, including identification of the programs purposes and uses of data, with the process open to public review and comment.
  • Notification of individuals subject to action or classification by data mining programs where possible
  • Establishment of standards and procedures for operations, including appeals processes for persons affected by the programs, and penalties for abuse
  • Data security measures, including training and access limits
  • Data minimization, including set retention periods for unused data and limits on database aggregation.