Election pollsters must wake up to the data

November 9, 2016

By Nathan Snyder and Chris Burke

The people have done it again and proved the pollsters wrong. Trump’s unexpected rise to victory in the US elections matches the Brexit vote in the UK, where despite pollsters predicting victory for the remain campaign, the people voted the UK out of the EU. Trump also defied the odds – an ABC Washington Post poll had him 12 points behind just a week before the election and 5 points behind on polling day – and is now President of the US.

Right up until Election Day, Hillary Clinton was said to be the frontrunner. However, the polling companies failed to accurately determine the election outcome and must now question both the collation and analysis of the data used as a basis for their predictions. They seem to have missed a critical dimension in their data. Trumpmania and Brexit both showed a complete determination of the followers to come out and vote for their cause undeterred by speculation and media comment. A dimension, which if included in the polls, may have helped accurately predict the outcome.

This complete error by the pollsters – the second time in five months – shows the weakness of polling as a means of prediction. Representative samples fail when the voter groups are new and not representative of the turnout in previous years. Voter responses to polls can no longer be taken at face value and opinions may change in the privacy of the voter booth.

The tools for correct prediction exist and are the subject of much investment in financial services. Mining and combining unstructured data sets is the essence of Data Science. Why ask someone’s opinion directly when their web presence will give a much more nuanced view of their political state of mind? Correct collation and analysis of the data would remove much of the guesswork from predictions.

The Brexit pollsters failed to take into account the fanatical vote of those people who felt lost, demoralised and saw the ‘leave’ option as an anti-establishment protest. Similarly, Clinton was seen to be an extension of the current establishment and therefore a vote against her was a backlash to the current situation where many American’s feel they are not listened to or adequately represented. In both situations there was an element of voting against the status quo.

So why do we still rely on polls? The data sets exist and are largely in the public domain. Use of Knowledge Discovery in Databases (KDD) techniques and machine learning could give a much more accurate picture. In a race where the candidate with the smallest campaign investment won the day, pollsters and Democrats alike need to wake up to the use and interpretation of fifth estate data sets.

We live in an age of misinformation and great data. Trump knows how to use it. Financial services know how to use it. Time for pollsters to up their game.