Blog > > Data Mining: Rarely Focuses on Individuals

Data Mining: Rarely Focuses on Individuals

Story Courtesy of The San Francisco Chronicle
By Matthew B. Stannard
Chronicle Staff Writer

Somewhere in America, powerful computers ingest crumbs of data about your personal life. Your income level. The kind of car you drive. Your home address. Your credit rating. All input, assimilated and analyzed at lightning speed.

The result: A piece of paper arrives in your mailbox offering you 10 percent off an oil change at your local service station.

That, in a nutshell, is data mining as practiced for more than a decade by companies around the world to target current and potential customers. The methods have changed since the old days of reverse telephone directories and mailing lists, but the basic objective is the same. And data mining of some type, experts agree, is almost certainly what is behind the National Security Agency's reportedly successful efforts to obtain the phone records of tens of millions of Americans from private telecommunications companies.

President Bush, commenting on the program -- which administration officials says is aimed at identifying and tracking suspected terrorists -- said that the government is not "mining or trolling through the personal lives of millions of innocent Americans." But security experts say the program virtually fits the dictionary definition of data mining -- a technique for analyzing large sets of data that American intelligence agencies have long been developing. "The interest has been around for years and decades," said Richard Forno, Principal Consultant for KRvW Associates, a Washington security consultancy. "That's part of what NSA was chartered to do."

The fundamental use of data mining is to detect patterns -- in shopping habits or the activities of the nation's enemies.

"Data mining is going through data from the past, historical data, and predicting what is likely to happen in the future based on patterns in the data," said Ken Bendix, president of North American operations at KXEN Inc., a company headquartered in San Francisco that develops data mining software for business applications.

It is used by credit card companies to spot spending patterns that suggest a card has been stolen and by marketing companies who use enormous databases to target advertising.

The technique has been gaining in popularity in the private sector thanks to advancements in computing technology and the mathematics underlying the software, Bendix said.

"The data is very rarely at the individual level," Bendix said. "When people are doing these data mining analyses, they don't care that you are you. They don't care what your name is or what your social security number is. All they care about is what group you fit into and how you relate to everybody else out there."

Government interest in data mining increased sharply after the Sept. 11 attacks. Unlike the private sector, intelligence officials began exploring ways to use the technique to identify and track individuals suspected of terrorist links. In 2002, the Department of Defense, through the Defense Advanced Research Project Agency (DARPA) launched the "Total Information Awareness" project -- later changed to "Terrorism Information Awareness" (TIA) to counter the impression that the program would spy on U.S. citizens.

The goal of TIA, its now defunct Web site explained, was to link certain transactions -- applications for passports, visas, work permits, driver's licenses, automotive rentals, airline ticket purchases, receipts for chemical purchases -- to arrests or suspicious activities.

The program, the brainchild of President Ronald Reagan's national security adviser John Poindexter, collapsed under public and political criticism in 2003. But the idea lived on, said Forno, who lectured on information warfare at the National Defense University from 2001 to 2003 and participated in the 2000 White House Office of Science and Technology Policy Information Security Education Research Project.

"TIA may have died on paper," he said. "But it got parceled out to various other agencies, including the NSA."

The NSA's interest in what is essentially copies of tens of millions of old phone bills is not hard to understand, Forno and other analysts said.

In theory, a powerful computer could process all those numbers and find a link between a phone in, say, Iowa to a phone in an al Qaeda training camp on the Pakistan-Afghanistan border -- even by way of dozens of other phones, linkages far too scattered for a human eye to notice. And the search wouldn't necessarily stop there.

"You have these phone numbers, you might also at a minimum run them against credit reporting companies," Forno said. "Local state DMV records. Tax records. Business employment records. All those other resources might help you narrow down your search."

But while the program's defenders insist it is a crucial instrument in the U.S. war on terror, some private security experts question its usefulness.

"We're looking for a needle in a haystack," said Bruce Schneier, a security technologist and chief technology officer of Counterpane Internet Security Inc. in Mountain View. "Dumping more hay on the pile doesn't necessarily get you anywhere."

Even before Sept. 11, Forno noted, the NSA intercepted information suggesting a terrorist attack was imminent -- but failed to connect the dots in time. The New York Times reported in January that most of the leads generated by NSA surveillance of phone calls in the months after Sept. 11 led nowhere.

In addition, said Forno, with multiple government agencies now using data mining techniques, the temptation exists for them to use information gathered to fight terror for completely unrelated criminal investigations.

"I don't want to see that data mission creep," he said. "I think that is a very real potential problem."

In August, the Government Accountability Office reported that of five data mining efforts used by federal agencies, none fully complied with Office of Management and Budget guidance for assessing privacy impacts.

But data mining experts also say the technique can greatly benefit government agencies -- if it is used right and the agencies are mindful of privacy issues.

"I just concluded an audit of the Department of Homeland Security for the Office of Inspector General," said Jesus Mena, an Alameda-based data mining consultant. "We frankly found that the DHS is not doing enough (data mining)."

While he was concerned by the kind of privacy compromised suggested by media reports on the NSA program, Mena said, more and better use of data mining could be especially useful for terrorism-related countermeasures like monitoring shipboard cargo and border security.

"It would mean that there would be a safer environment, and I think we are heading in that direction," he said. "It's just a matter of time."

But the problem with applying data mining techniques to terrorism, Schneier argued, is that terrorism is so rare, and the databases being mined are so large, that false positives are inevitable and often more common than truly accurate results.

And unlike using data mining to spot credit card fraud, where at most a false positive triggers a worried call from Visa to a cardholder and perhaps a temporary suspension of the card's use, a false positive in a terror investigation can put an innocent person in jail, he said.

"If you believe in this nonsense, the goal is to get everything," he said. "They're looking for these fanciful connections. So if there's a bad guy who walks down the street and 1,000 people walk next to them, are they all under investigation?"

Despite administration assurances that the NSA program is both legal and mindful of civil liberties, Schneier said he also fears the government may at least be tempted to approach cell phone companies, credit card companies, internet service providers -- almost any industry with a major database.

"Because more and more of our daily lives are mediated by computers ... we leave electronic footprints everywhere we go," he said. "What the government is doing is sucking up all those footprints."

Despite the concerns and criticisms, data mining as a counterterrorist tool is probably here to stay, said Steven Aftergood, who directs the Federation of American Scientists' Project on Government Secrecy.

"I think it is a technology and an approach that has enormous potential, and one that is likely to be a continuing part of the toolkit," he said. But, he added, "Nothing is more important than preserving constitutional protections. And it is disturbing to learn that the intelligence community is far out in front of what the public has consented to."