<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN” “http://www.w3.org/TR/REC-html40/loose.dtd”>
Why does the NSA want records of every phone call made in the United States? Tim Lee provides the usual answer:
Since the program is secret, it’s hard to say for sure. But the NSA is probably using a software technique called data mining to look for patterns that could be a sign of terrorist activity. The idea is that NSA researchers can build a profile of “typical” terrorist activity and then use calling records — and other data such as financial transactions and travel records — to look for individuals or groups of people who fit the pattern. Many businesses use similar techniques, building profiles of their customers to help decide who is most likely to respond to targeting advertising.
Some critics question the effectiveness of these techniques. For example, in a 2006 Cato study, an IBM computer scientist argued that we simply don’t have enough examples of real terrorists to build a profile of the “typical” terrorist.
Actually, this isn’t the only possible use of data mining. Here’s David Ignatius, back in 2006, with a somewhat more concrete way that these records could be used:
Let’s take a hypothetical problem: An al-Qaeda operative decides to switch cellphones to prevent the National Security Agency from monitoring his calls. How does the NSA identify his new cellphone number? How does it winnow down a haystack with several hundred million pieces of straw so that it can find the deadly needle?
The problem may seem hopelessly complex, but if you use common sense, you can see how the NSA has tried to solve it. Suppose you lost your own cellphone and bought a new one, and people really needed to find out that new number. If they could search all calling records, they would soon find a number with the same pattern of traffic as your old one — calls to your spouse, your kids, your office, your golf buddies. They wouldn’t have to listen to the calls themselves to know it was your phone. Simple pattern analysis would be adequate — so long as they had access to all the records.
This, in simple terms, is what I suspect the NSA has done in tracking potential sleeper cells in the United States. The agency can sift through the haystack, if (and probably only if) it can search all the phone and e-mail records for links to numbers on a terrorist watch list. The computers do the work: They can examine hundreds of millions of calls to find the few red-hot links — which can then be investigated under existing legal procedures.
Is this what NSA is doing? Does it work? I have no idea, though I assume Ignatius had a little professional help dreaming up this example.
The idea that data mining is used solely to build profiles of “average” terrorists is fairly widespread, and there are lots of reasons to think that such profiling isn’t especially effective. Just for starters, the sheer number of false positives it generates is probably immense. But that’s not the only way this data can be used. There are other, more concrete ways to use it too, and probably plenty of other data it can be linked up to. It would be a mistake to assume that crude profiling is the only possible use of data mining and then go from there.
Before I’m asked, I’m bringing this up strictly for informational purposes, not because I want to defend NSA surveillance. I didn’t like this stuff when Bush was doing it, and I don’t like it now. Either way, though, it’s worth knowing what it’s capable of.
UPDATE: Shane Harris, author of The Watchers, has more here.
UPDATE 2: How much data is this, anyway? How does it get to the NSA each day? Well, if you figure there are roughly 4 billion phone calls per day, and about 100 bytes of metadata per call, that’s 4 terabits of data. The big carriers are responsible for maybe a quarter of that each, or 1 terabit per day. A single DS3 line can carry about 4 terabits per day, and a DS3 line is nothing special. In other words, the actual physical transmission of this data is no big deal.
The database that holds all this stuff, on the other hand, is a whole different story…..