The National Security Agency has been collecting massive amounts of data on our phone and internet usage. The Guardian newspaper reported that in March 2013 alone, the NSA collected 97 billion pieces of intelligence from around the world. Kim Schuske talked with University of Utah cyber security researcher Matthew Might about what it means.
KIM SCHUSKE: What can you possibly do with 97 billion pieces of data from a single month?
MATT MIGHT: That's really what data mining is all about. Is when you have way more information than human analysts can handle. How do you extract knowledge from that? What you want to do is turn data into information and then present that information in a comprehensible fashion to human analysis.
KIM SCHUSKE: Then how can they use this?
MATT MIGHT: I've seen maps where they did an analysis of connections of connections, where they took two known terrorists that attended a meeting in Malaysia and they found that they were connected to people who were associated with the U.S.S. Cole bombing. And then if you went to connections of connections, so two degrees out, you had all nineteen 9/11 hijackers with Muhammed Atta as the central node.
And yet, we still didn't catch this plot. I think part of it that if you're looking for shapes like that - you know, this sort of hub and spoke shape - you're going to find that all over the place. Look at the mailman. The mailman is connected to everybody. And there are lots of people like that throughout any individual's life. And so if you're looking for specific shapes, they could show up all over the place, so you're going to get tremendous amounts of false positives.
KIM SCHUSKE: So are there computer programs, or modeling, that can get past that and get down to the more relevant players?
MATT MIGHT: I think we're getting better at throwing out the noise. There's a lot of research going on in Smart Sampling and sampling with low amounts of data to try and figure out what is truly relevant. But ultimately it's a matter of how much data you have. If you just don't have enough data to be statistically significant, then there's not a whole lot you can do to draw out these associations without implicating huge volumes of other people.
KIM SCHUSKE: So you sound a little skeptical that this is going to be a worthwhile endeavor at least at this level.
MATT MIGHT: Right, I do think I'm a bit skeptical that this will be used to stop any large number of terror plots. I think it might catch a few here or there, but again I think just given the volumes of data and the huge numbers of false positives that can show up, that just investigating all of the false positives, all of the things that look like some sort of security threat but actually aren't, I mean, that could chew up any anti-terrorism task force's entire time.
KIM SCHUSKE: But then, another potentially useful thing is not maybe preventing a terrorist attack. But after the fact, going and finding other players, or something like that. Do you think that might be one thing they might try to do?
MATT MIGHT: Well certainly, after the fact. You know, if you've got this data then you can go and start tracking down other people who might be involved. And certainly I think that has been done effectively, but prevention I think is very difficult.
KIM SCHUSKE: So where does this go from here? Do people have to be worried that their particular information is being collected and they could be targeted?
MATT MIGHT: Honestly, it's unclear. We have an awful lot of trust in our government, with good reason. We are a functioning democracy, but you do have to worry anytime you start collecting these enormous amounts of data. You can build startling predictive models of people's behavior.
I mean, if you look at a company like Amazon and how well they learn what you like after you buy only a few things, I mean, it's remarkable. Or look at Netflix and their ability to recommend the movies you want to watch. With the amount of data that the government is collecting they could know an awful lot about you, like probably who you vote for.
KIM SCHUSKE: So, it's already happening. It's happening with corporations doing that to try and see if they can effectively advertise to you. So is this really any different? Or why is this more scary, at least to some people?
MATT MIGHT: I think it's scary because if you look at countries where the surveillance has sort of gone awry, or is sort of deliberatively used to stop people. I mean, you can use these sorts of technologies to aggressively curtail legitimate political action. Again, I'm not saying this is going to happen in the U.S., but it certainly enables the possibility of it someday happening, which is a bit frightening.
Actually one of my broader concerns here is, even if the U.S. governments never abuses all of the information it is collecting, let's suppose that never happens. I worry deeply about the security of the system doing all the collection. I think it's highly plausible that foreign governments could break into the system that we built and use that to spy on us. I mean basically they could know what we know, better than we know it because they don't have to get a court order.
KIM SCHUSKE: Being that you work on computer science and you know a lot of people in the field, what are you hearing from people that you work with?
MATT MIGHT: A lot of people I know right now are like, "Well, I'm glad I keep all of my data encrypted, I'm glad I encrypt my email, my communications." Because I think that's really the only way to avoid being collected at this point, is to make sure everything is encrypted.