Of all surveillance documents Edward Snowden leaked, some of the most important exposed the spy agency’s so-called XKEYSCORE program, a massive system for vacuuming up and sifting through emails, chats, voice calls, images, videos, online search activity, social media updates, usernames, passwords, and other private digital data from core fiber optics cables around the world. XKEYSCORE, which the NSA calls its “widest reaching” surveillance program, was established around 2008 and consists of more than 700 servers that store data sucked from the Internet’s backbone and mine this data for patterns and connections.
Only a well-resourced party like the NOBUS NSA could deploy such a grandiose surveillance program and even share so-called derived intelligence with like-minded spies, smart agencies and cronies worldwide, obviously in exchange for kind or cash. But if your spy needs are more modest, there are a number of existing tools available that offer similar surveillance capabilities, albeit at a smaller scale, says Nicholas Weaver.
Weaver, a senior researcher at the International Computer Science Institute at UC Berkeley who focuses on network surveillance and security issues, developed a little hobby after the Snowden leaks in 2013: to build a bulk surveillance system in miniature that would be capable of performing all the primary tasks of an NSA spy system—but on a small, 100 Mbps-size network. Those capabilities had to include bulk data collection, search functionality, the ability to track cookies and identify anonymous users, a method for injecting malware into a surveillance target’s computer for more directed surveillance, and a friendly web interface. Luckily, Weaver realized, he already had off-the-shelf equipment that met the criteria.
Weaver talked to Wired about his system at the Enigma security conference in San Francisco, he described the components needed to emulate the spy agency. Although the US intelligence community likes to operate under the notion that its systems are NOBUS (Nobody But Us), meaning its technologies are unique to the United States select cronies, Weaver says the reality is the opposite when it comes to surveillance technology. “It’s very banal and very basic, it’s very well-understood technology, and … there’s really nothing new,” he says.
The NSA’s super-secret surveillance system, in fact, works very much the way off-the-shelf intrusion detection systems (IDS) function: With these systems, when a data packet arrives to a network, a high-volume filter separates garbage traffic from the important traffic and passes the latter to a load balancer, which distributes data to a number of servers. In this case, it distributes the data to network intrusion detection nodes or devices. The IDS nodes then parse the traffic to determine if it’s benign or malicious and make decisions about what to do based on those conclusions, such as blocking the traffic if it’s malicious and issuing an alert to administrators.
Following the same general design, Weaver developed a home-grown surveillance system that took less than a week to construct. To approximate a filter and load balancer, he used OpenFlow, a protocol for managing and directing traffic among routers and switches on a network. For his intrusion detection system, he used the Bro Network Security Monitor, an open-source framework developed by Vern Paxson, a fellow computer scientist at UC Berkeley. He had to write scripts to do things like extract the cookies in web traffic and parse out usernames from traffic, but this was minimal work.
Those looking to do more robust backbone monitoring and data parsing like the NSA does could opt instead for Vortex, an IDS that the US defense contractor Lockheed Martin developed and released for free on GitHub. Weaver thinks, in fact, that the NSA’s XKEYSCORE system probably began its life as Lockheed Martin’s Vortex, based on XKEYSCORE system features described in the Snowden documents.
With Weaver’s DIY system, in order to search through the collected data, he just did local searches. But if someone want to do broader federated searches, they could use Hadoop, an open-source framework for storing and processing large amounts of data spread among multiple systems. Hadoop can parse similar sets of data into so-called buckets to make processing or searching data more efficient. For example, IP addresses can be parsed out and categorized in one bucket, and cookies and usernames can be categorized in other buckets. To find, for example, every IP address that visited a certain web page, a search would only need to focus on data in the IP bucket. “Hadoop will allow me to search all the data [simultaneously], but most of my searches actually only need to look at a couple of buckets,” Weaver says.
Advanced, Targeted Spying
Weaver’s surveillance solution isn’t complete without a way to conduct targeted surveillance. That’s because bulk surveillance is all about trying to find needles in a haystack—those few data points among billions that merit further scrutiny. But once spies home in on those they need to conduct more efficient and pinpointed intelligence-gathering. They do this by hacking a target’s system. The NSA and its British spy partner the GCHQ use a system called QUANTUM Insert that involves a man-on-the-side attack and code injection. The system works by hijacking a browser as it’s trying to access a web page and forcing it to visit a malicious web page instead, where malware gets secretly downloaded to the target’s computer.
Weaver notes that his surveillance system can actually be made more compact and portable by using off-the-shelf ARM/Wi-Fi embedded systems, which would be perfect for nation-state spies looking to target government workers. The spies could easily take the system to a Starbucks frequented by State Department employees, lawmakers or military personnel and use it to extract metadata belonging to customers who use the cafe’s wireless network. The metadata can help identify targets worthy of further surveillance, who can then be tracked online after they’ve left Starbucks, through this and other metadata. Such a system could easily be disguised as a plug-in air freshener inserted into an electrical outlet, Weaver notes. It could also be designed to erase itself automatically if someone unplugs it from the socket to examine it.