AT THE END of a semester teaching an undergraduate math course a few years ago, Cornell Tech researcher and crypto professor Rafael Pass asked his students to fill out the usual anonymous online course evaluation. One of his brighter students stayed after class to ask him a question: Was the survey truly anonymous? Or could someone—a determined professor or even the university’s survey service itself—dig up the identities of an individual respondent?
As a cryptographer, Pass had to confess that no, the survey wasn’t cryptographically anonymous. Students had to blindly trust that the university wouldn’t access their identifying information. “The data is there,” Pass says he admitted.
In fact, on the web, anonymous surveys usually aren’t, according to Pass and Shelat, his fellow cryptography researcher at Cornell Tech. To prevent ballot stuffing and spam responses, surveys often require a unique identifier like an email address. And the anonymity of the survey depends entirely on the survey service—or any hacker who can access its servers—choosing not to reveal the links between its supposedly anonymous responses and those identifiers.
“When you use SurveyMonkey, you just have to hope that it ensures your anonymity. It’s a very dangerous assumption,” says Pass, referring to the popular online survey service. “When you ask people to tell you a lot of personal things about themselves in a non-anonymous way that could be leaked, that’s getting close to unethical.”
So Pass and Shelat have built a free alternative called Anonize, designed to enable fully, cryptographically anonymous surveys. Their scheme promises that survey respondents can speak their minds with the assurance that it’s mathematically impossible for anyone, even those with access to Anonize’s servers, to identify them. And their system, which they and two other researchers presented at the IEEE Security and Privacy conference last year and have since been built into working software, still allows only a chosen group of respondents to submit answers, and only one response per person. “We set out to do these seemingly contradictory things, anonymity and accountability, without trusting a third party,” says Shelat.1
SurveyMonkey responded in a statement to WIRED that it offers “best-in-class security and anonymity controls, and extremely clear options for survey creators to use these controls to ensure a great – and safe – respondent experience.” The company argues that it encrypts responses between the respondent and the server, gives respondents the option to turn off IP address collection, and meets HIPAA compliance standards for healthcare surveys. But Cornell’s Pass counters that despite those features, the company still collects enough data to link respondents with their answers.2
Anonize pulls off its more rigorous level of anonymity—not collecting any such identifying data in the first place—through a series of cryptographic sleights of hand. Respondents download the Anonize app to their smartphone, and the app generates a secret key derived from their email address that will never leave their device. When a survey administrator—say, a class professor—creates a survey, the Anonize server generates a PGP-style public key that’s derived from the email addresses of all the authorized respondents—in this example, her students. The respondents write their answer in the Anonize app and then either submit it from the phone or from a desktop by scanning a QR code.
When a student makes that submission, the app uses the survey public key and respondent secret key together to “sign” the text, converting it into a string of data that has some special properties: First, it includes a trace of the respondent’s private key, like a kind of pseudonym. The survey admin can check if the respondent is on her list of approved respondents generated from email addresses. And if the respondent writes and submits another answer, it’ll still have that proof of his or her private key, and the survey can recognize it as a duplicate response from the same person and either reject it or replace the original.
But more importantly, the string of data that the person submits doesn’t offer any hint of their actual email address. Because the response string also incorporates the survey’s public key, it changes with every survey to prevent survey creators from matching users between email lists. And the string is created using what cryptographers call a “zero knowledge proof”, a method of proving a mathematical statement is true without knowing anything else about it. The server can check for proof that someone is authorized without learning anything of their identity. That link exists only on their phone, which altogether inaccessible to the admin. “The data carries no information about who it came from,” says Pass. “With just that string of data, it’s unconditionally secure.”
Of course, anyone who gets hold of a survey respondent’s phone can access their private key and identify them. But that’s still far better than merely trusting the owner of the survey server or any hacker who breaks into it not to identify respondents. “To see who you are, they’d have to get access to both your phone and the server,” says Pass.
Pass and Shelat have already made Anonize available at Anonize.org, and they plan to open-source its code in the coming months so that others can audit and verify their security claims. They’ve also field-tested it. Earlier this year they implemented it at Cornell Tech for all course evaluations, and hope to try it again soon at the University of Virginia. At Cornell, they followed the course evaluations with a second survey afterward (also using Anonize, naturally) to ask the students if the cryptographic anonymity of the evaluation had changed their responses from those they would have given in a normal anonymous survey. Of those who responded to the second survey (Pass admits the small number of respondents makes it an unscientific test) around a quarter said it had. “Why would you be honest when the answers could be linked back to you?” asks Pass.
Of course, whether Anonize sees any real adoption depends on if people actually question or care about the anonymity of the surveys they take today. “That difference we saw still [in the course evaluations] isn’t as big as it should be,” says Pass. “The problem is that people think surveys they do are already anonymous.”
But Shelat and Pass argue that increasingly high-profile data breaches, from Sony to Ashley Madison, may be educating people that supposedly private data often doesn’t stay private for long. (They point out that even their own university, Cornell, experienced a data breach in 2009, when a computer was stolen that contained 45,000 social security numbers of students, faculty and staff.) The solution, at least in any case where anonymity is possible, is a system that doesn’t keep the data that ties private information to real identities in the first place, says Shelat. “After the Sony hack, people should khow that they need to be more careful about what they put in a digital form,” says Shelat. “If you eliminate collecting that data altogether, you have a safer system.”
Read the full details of Anonize works in the researchers’ paper below:
1Correction 9/18/2015 4:17pm EST: An earlier version of this story stated that the Anonize paper had won a “best paper” award at the IEEE conference. Instead, it was selected for publication in a IEEE Security and Privacy magazine.
2Updated 9/18/2015 4:18pm EST with a response from SurveyMonkey.