How to call an election using corpus data

Blog/2015/March/How to call an election using corpus data

One of the original aims of the project that eventually became the Language Spy political corpus analyser was to try to create a system that could be used to make predictions based on data. To predict the course of events, either by comparison of current and historical data, or by interpolation of immediate trends.

Of course, that's an impossible task. Ask a meteorologist, just as in weather there are so many variables in play that any predictions are likely to soon veer off course. But even if you can't predict with long term accuracy we hope you can still use the corpus data to make an educated guess.

To help you in this endeavour, we've gathered together all major UK elections and parliamentary by-elections since 2009 on one page, with handy links to the corpus for research. Sorry American friends, our US corpus doesn't go far enough back to do this for you.

So how would you set about calling an election with this data? Imagine for a moment you don't know that the 2010 general election resulted in a Conservative/LibDem coalition. Even imagine for a moment that you've been on Mars for a decade and are unaware that Tony Blair and then Gordon Brown's Labour administrations became progressively more unpopular, or that David Cameron's Conservatives had just emerged from a bruising period, riven by the splits in the aftermath of the Thatcher and Major years. You only have the data to go on, you don't even know about the issues of the day such as the parliamentary expenses scandal.

Then pull up the graph for david cameron, gordon brown, and nick clegg over the campaign period immediately before the election. We can safely ignore UKIP and the SNP in 2010, British politics are still just about a three horse race.

Party leaders in the run up to the 2010 election

To get an idea of who is performing better in the conversation, rate the percentage of the time each player has in the upper pert of the graph, and why they re there. In the graph above for instance David Cameron is doing quite well in the second half of April, only being supplanted by Gordon Brown around the 29th. This looks good for Brown, until an examination of his collocates during the period reveals his unfortunate "bigot" gaffe with an angry voter.

We know the outcome of the 2010 general election was a somewhat inconclusive win for the Conservatives with no overall majority. But what about a campaign that secured a more convincing majority? In the 2014 Euro elections, UKIP scored an emphatic victory over the other UK parties.

parties in the run up to the 2014 Euro election

Straight away you can see that UKIP are performing more strongly than their competitors, even if you were to take the aggregate of "conservative" and "tory" this is still the case. Europe is their core issue, and they have done a pretty good job of cornering the issue in the campaign. In the final week before the poll their exposure clearly leads the field, so it is hardly surprising that they did so well.

These two examples should show some of the potential in using a political corpus to examine the run-up to an election. As always though, the only thing that matters on the day is what happens in the voting booths. If we'd had this corpus for the April 1992 general election for example it might have prompted us to call it for Neil Kinnock's Labour party. Kinnock's Sheffield moment though gave the keys to Number 10 to John Major.

Tags for this post: election,psephology