When TIME magazine let people vote on the world’s most influential person they could not have expected good results. They probably thought, “polls are popular and will drive traffic to our site. Let’s do it.” The slight loss to their credibility (haha) was outweighed by increased visibility. Lots of unworthy people have won in the past (Shigeru Miyamoto or Rain), but it was this year’s results that really got out of hand. After a concerted ballot stuffing effort, moot aka Christopher Poole (4chan’s founder) won with 8 times more votes than the next closest person (also the 4chaners stuffed the other boxes to get the first letters of each name read downwards to spell “MARBLECAKEALSOTHEGAME”). TIME’s poll is never going to work again unless they invest in some anti-ballot-stuffing technology.
I think online polls can be done right. I think online voting could be more legitimate than electronic voting. Of course in an online poll you can’t ensure a random sample of the population, but at least you can try and keep it to 1 man 1 vote or at least 1 computer 1 vote (or in the worst case, 1 cycle 1 vote). Since there is such a need for this sort of technology I think that maybe a company could try and provide this service and host legitimate online polls or even online voting. It’s really not that hard to stop the least ambitious of hackers and you can stop some of the more ambitious with a little machine learning.
A common strategy is to create a series of booby traps which is set down to trip up hackers. A user that submits the form but does not load the next page is clearly a bot. A user that has a missing UA is a bot. A user with odd headers of any kind is a bot. All of these things make it hard for people to write bots for the poll. A user that submits fields that aren’t visible onscreen gets flagged at once. At the sign of the first slip up you block but you never notify of a block. Silent blocks are much more difficult to deal with. The user has little way to know if they have been blocked especially if vote totals are only updated every minute or so (best to do it randomly). Plus of course we give users cookies and then read them back in after they submit the form to ensure that they kept them. We could even read in two cookies from two different domains (pointing to the same box) and ensure that both of them always show up in pairs. These sorts of techniques allow us to stall for time. Annoying the least aggressive attackers. If we keep changing the protocols the hackers might never catch up.
Obviously none of that trickery will stop a clever hacker. It’ll waste their time and resources, but nothing more. An open source bot project would be able to let even the least clever hacker fix one of our polls. Heck even a greasemonkey script could defeat all of those measures. I still cannot believe that some polls do not show CAPTCHAs. I assume they want to keep the UI clean and friendly. CAPTCHAs aren’t that big of a hassle though. If I had to protect a high profile site I think I’d license the CAPTCHA produced by the guys I spoke about in my old blog post. It’d be really cool if we could render these visually intensive CAPTCHAs using graphics cards (we can finally start putting video cards in our web servers). But I digress, good CAPTCHAs stop everyone except for those with an army of slaves or websites that can put up our CAPTCHAs as their own. Remember though, TIME did eventually put up a CAPTCHA to stop the 4chaners, but the 4chan users started manually cracking recaptcha and still rigged the vote.
The next step is IP checking. First maintain a list of proxies and block all of those addresses. Tor is right out. Maybe just block all non-US IPs (depending who is running the poll and for what audience). Shared IPs (like AOL IPs) cannot be blocked though so an attacker on that network could repeatedly submit to our poll. This is the tactic that could have stopped 4chan. A stream of votes from a single person should be blocked unless it is a shared IP. If it is a shared IP then we can determine the probability that the votes are coming from different people or the same person over and over again. First of all we can look at the UA strings and OS strings. Next we can look at the times between votes. We expect more votes at certain times in certain time zones. We expect the time between votes to follow a Poisson distribution. Now all of this can be faked, but we should check for it anyways. Checking shouldn’t even be that expensive there aren’t that many IPs this will apply to.
I have more ideason this front, but this post is already too long and too rambling.