The World’s Most Influential Person?

When TIME magazine let people vote on the world’s most influential person they could not have expected good results. They probably thought, “polls are popular and will drive traffic to our site. Let’s do it.” The slight loss to their credibility (haha) was outweighed by increased visibility. Lots of unworthy people have won in the past (Shigeru Miyamoto or Rain), but it was this year’s results that really got out of hand. After a concerted ballot stuffing effort, moot aka Christopher Poole  (4chan’s founder) won with 8 times more votes than the next closest person (also the 4chaners stuffed the other boxes to get the first letters of each name read downwards to spell “MARBLECAKEALSOTHEGAME”). TIME’s poll is never going to work again unless they invest in some anti-ballot-stuffing technology.

Note the clearly fake results

Note the clearly fake results

I think online polls can be done right. I think online voting could be more legitimate than electronic voting. Of course in an online poll you can’t ensure a random sample of the population, but at least you can try and keep it to 1 man 1 vote or at least 1 computer 1 vote (or in the worst case, 1 cycle 1 vote). Since there is such a need for this sort of technology I think that maybe a company could try and provide this service and host legitimate online polls or even online voting. It’s really not that hard to stop the least ambitious of hackers and you can stop some of the more ambitious with a little  machine learning.

So first of all, you can require the user to waste some processing time before voting. The way I see doing this is you have a js script that does something computationally hard but which can be verified quickly (an NP-complete problem of some kind). Solutions that don’t verify are discarded and the incident is logged. An incorrect solution is treated as an attempted break in and that IP should simply be banned (we should be careful to not let them deny service to others though). The hacker then has to either parse/run javascript or (probably) reimplement the code and run it in whatever language they are using (the idea is identical to hashcash). That’ll slow down any concerted effort by wasting their time. You could also further frustrate their efforts by giving them harder tasks the more requests that come from a single IP. Now they have to get more clock cycles in order to get more votes.

A common strategy is to create a series of booby traps which is set down to trip up hackers. A user that submits the form but does not load the next page is clearly a bot. A user that has a missing UA is a bot. A user with odd headers of any kind is a bot. All of these things make it hard for people to write bots for the poll. A user that submits fields that aren’t visible onscreen gets flagged at once. At the sign of the first slip up you block but you never notify of a block. Silent blocks are much more difficult to deal with. The user has little way to know if they have been blocked especially if vote totals are only updated every minute or so (best to do it randomly).  Plus of course we give users cookies and then read them back in after they submit the form to ensure that they kept them. We could even read in two cookies from two different domains (pointing to the same box) and ensure that both of them always show up in pairs. These sorts of techniques allow us to stall for time. Annoying the least aggressive attackers. If we keep changing the protocols the hackers might never catch up.

Obviously none of that trickery will stop a clever hacker. It’ll waste their time and resources, but nothing more. An open source bot project would be able to let even the least clever hacker fix one of our polls. Heck even a greasemonkey script could defeat all of those measures. I still cannot believe that some polls do not show CAPTCHAs. I assume they want to keep the UI clean and friendly. CAPTCHAs aren’t that big of a hassle though. If I had to protect a high profile site I think I’d license the CAPTCHA produced by the guys I spoke about in my old blog post. It’d be really cool if we could render these visually intensive CAPTCHAs using graphics cards (we can finally start putting video cards in our web servers). But I digress, good CAPTCHAs stop everyone except for those with an army of slaves or websites that can put up our CAPTCHAs as their own. Remember though,  TIME did eventually put up a CAPTCHA to stop the 4chaners, but the 4chan users started manually cracking recaptcha and still rigged the vote.

Recaptcha

The next step is IP checking. First maintain a list of proxies and block all of those addresses. Tor is right out. Maybe just block all non-US IPs (depending who is running the poll and for what audience). Shared IPs (like AOL IPs) cannot be blocked though so an attacker on that network could repeatedly submit to our poll. This is the tactic that could have stopped 4chan. A stream of votes from a single person should be blocked unless it is a shared IP. If it is a shared IP then we can determine the probability that the votes are coming from different people or the same person over and over again. First of all we can look at the UA strings and OS strings. Next we can look at the times between votes. We expect more votes at certain times in certain time zones. We expect the time between votes to follow a Poisson  distribution. Now all of this can be faked, but we should check for it anyways. Checking shouldn’t even be that expensive there aren’t that many IPs this will apply to.

I have more ideason this front, but this post is already too long and too rambling.

Leave a Reply