The NSA has published a paper claiming that it “touches” only 1.6% of Internet traffic. Now, 1.6% of something the size of the Internet is already enormous, but let’s demonstrate that the number represents are much larger share of human communications than one might intuit.

First, let’s assume that the NSA is interested only in “novel” traffic — human-created, unique data that depicts an individual’s activities or intentions.

My assumption is that the NSA is not interested in inspecting the actual bits that comprise Arrested Development or a cat video; it’s precisely the same data being transferred over and over, month after month. The transferred bits contain no novel information.

The fact that a person played the video might be informative, but that fact can be thoroughly conveyed in 1K of metadata; the video is many orders of magnitude larger. Assuming the NSA’s intention is to spy on humans, analyzing each MPEG stream of Gangnam Style is not something on which they expend cycles.

So to get a real picture, we’d have to estimate what percentage of the Internet is actually comprised of communications which would interest the NSA in the first place, and compare that to 1.6%.

This Cisco paper breaks down of Internet traffic by type (see table 10). It categorizes all individual, human-generated traffic under “Web, email and data”, and puts the share at around 20%. The other 80% is consumer video, peer-to-peer file sharing, and gaming, all of which can be considered non-novel, impersonal, or duplicative1.

By this analysis, if the NSA is “touching” 1.6% of Internet traffic, but only 20% of the traffic can be considered communication between humans, then in fact the NSA is touching 8% of actual human communications online — five times their misleading statement and an enormous number in real terms.

Further, Cisco’s categorization is probably too broad: every download of a Perez Hilton blog post would qualify, which, again, is not novel. A user’s accessing the article might comprise novel data, but not the article’s HTML, which has traversed the Internet 100,000 times before. Remember, most of us download 10 times more data than we upload.

So only a small percentage of the Internet’s moving bits can be defined as personal communication; certainly less than 20% and perhaps considerably smaller. “1.6% of the Internet” represents a lot more spying than it sounds like.

1 To be fair, Cisco’s characterizations of “Internet video” and “peer to peer” could include some personal communications.

discuss on hacker news

Update: Jeff Jarvis beat me to it. :)