Reports about the National Security Agency's program to collect vast amounts of data on personal electronic communications have created an uproar about the implications for privacy. But some statisticians and security experts have raised another objection: As a terror-fighting tool, it is highly inefficient and has some serious downsides.Kaiser Fung, in his book terrific book Numbers Rule Your World, points out that statisticians tend to be much more interested in variability than actual predictions. Analyzing data based on samples is fundamentally an imprecise science, and statisticians have diligently worked to formalize a system of expression uncertainty. Take an introduction to statistics class, and almost half of your time will be taken up by calculating variance, standard deviations, variance and confidence intervals. All of these are simply ways of saying how much (and how little) you actually know.
Their reasoning: Any automated approach to spotting something rare necessarily produces false positives. That means for every correctly identified target, many more alarms that go off will prove to be incorrect. So if there are vastly more innocent people than would-be terrorists whose communications are monitored, even an extremely accurate test would ensnare many non-terrorists.
This obsession with variance matters because statistical inference is not actually built around binary outcomes. In fact, any projection or estimate contains four possible outcomes: a true positive (correctly guessing something is there), true negative (correctly guessing that something isn't there), false positive (saying something is there when it isn't) and false negative (saying something is there when it isn't). The latter two are labeled type 1 and type 2 errors. While better algorithms and more data can, to an extent, reduce the total number of errors, it is essentially impossible to be perfectly accurate. Instead, a statistician must always accept a trade off between false positives and false negatives. Getting rid of one means accepting the greater prevalence of the other.
Why does this matter? Terrorism is an inherently rare activity and terrorists are very rare among any population. This makes it extremely difficult to establish any robust model for identifying terrorist activity within communication data. Moreover, the incentive system for an NSA agent is badly stacked against false negatives. Let one suspected terrorist go, and you have another 9/11 on your hands (or so they're all trained to think).
In response, the data mining algorithms used by PRISM have to be extremely large. Because reducing false negatives is so important, a basic Bayesian model developed by Corey Chivers estimates that there will be about 10,000 false positives for every successfully identified piece of terrorism-related communication. The government has tried to trot out a handful of terrorist plots that they've interfered with as evidence of the effectiveness of these broad surveillance tactics. For the statistician, this is a tacit admission from the government that hundreds of thousands of Americans have had their constitutionally-protected civil liberties shockingly violated by agents engaged in a hopeless wild goose chase.
The fundamental case against torture, past ethics and human rights, is that it simply does not work. It has never proven to be an effective intelligence gathering tool. Even worse, we have lapsed towards torture despite the fact that much more humane tactics have proven to be incredibly productive.
The case against the surveillance state is exactly the same. Comprehensive data collection accompanied with data mining algorithms is a terrible way to actually investigate and pursue terrorists. Agents will spend most of their time dealing with false positives. At the same time, massive direct costs will be incurred by maintaining such an elaborate system and even larger indirect costs will amount from the systematic violation of civil liberties.
This isn't a particularly novel concept, as the Cato Institute was already tearing apart the fundamentally flawed ideas behind mass surveillance seven years ago. As Jeff Jonas and Jim Harper wrote in 2006,
Though data mining has many valuable uses, it is not well suited to the terrorist discovery problem. It would be unfortunate if data mining for terrorism discovery had currency within national security, law enforcement, and technology circles because pursuing this use of data mining would waste taxpayer dollars, needlessly infringe on privacy and civil liberties, and misdirect the valuable time and energy of the men and women in the national security community.Living with terrorism means accepting living in a world that is fundamentally unpredictable and having the courage to face the day knowing that something bad might happen, no matter how hard you try and stop it. There will never be a surveillance system fine tuned enough to actually accomplish what the government wants from PRISM. Instead, a much different approach is necessary: one that modestly accepts our ability to control the world and holds the value of personal sovereignty to the highest degree.
What the 9/11 story most clearly calls for is a sharper focus on the part of our national security agencies—their focus had undoubtedly sharpened by the end of the day on September 11, 2001—along with the ability to efficiently locate, access, and aggregate information about specific suspects.
Both at home and abroad, the actions of the US government violate these basic precepts. There is little regard for the well-being of American and foreign citizens, and a hopelessly irrational obsession with trying to control the uncontrollable. A drastically more modest approach has the potential to both reaffirm the importance of human rights and offer the greater opportunity for meaningful security. It's badly needed.