« Présentations nouvelle mouture | Main | Web social et enseignement »

January 25, 2008

On the accuracy of opinion polls

A talk today by a sociologist, on the French elections (Spring 2007). One interesting comment -- French pollsters massage their results in ways that are not totally clear. Pr Durand referred to the 2002 "debacle" where polls put Chirac (a conservative) and Jospin (a socialist) clearly in the lead. One thesis is that "reverse strategic voting" happened (i.e. supporters of front runners felt free to "send a message" and voted for someone else, that someone being Le Pen, a candidate from the far right). Another thesis is that polls were just plain wrong. Pr. Durand made comments to the effect that some pollsters had raw data suggesting that Le Pen was ahead of Jospin close to election time, but "calibrated" (or fudged?) the data, thinking that it was just not a reasonable hypothesis.

As we are in the process of comparing webtrends with opinion polls, it is important to assess the accuracy of opinion polls. So, back in the office, I finished assembling the data available from this site for analysis.

Opinion polls are a national pasttime in France. The dataset has no less than 105 polls published by 7 leading organizations, starting in October 2006 up to April the 20th (election day being April 22nd).

Polls included voting intentions for most leading candidates. I'll report only on the top 2 (Royal and Sarkozy).

The figure below shows their score over time.

Rs1

I prefer to work on the ratio of scores, in order to isolate the effect of other candidates (i.e. both Royal and Sarkozy's share are going down because of the strong showing by Bayrou. In the end these do not matter as only the first and the runner-up make it to the final electoral round). So here are the ratio of Royal on Sarkozy, over time. I've also added a moving average (5-period centered) for convenience.

Rs2

The raw series is not smooth at all. The jerky motion is not random -- there is a systematic pollster impact. Now, we know that there IS a true voting intention, measured at various points in time by various pollsters using different calibration approaches. So the goal here is to try to isolate pollster bias to recover the "true" voting intent, which we will eventually related to web data.

So I've set-up a quick and dirty model using my favorite functional form (a + b(T^d/T^d+g)), replacing "a " with pollster-specific parameters.

First, results show surprisingly large discrepancies between pollsters. Systematic differences range from -.06 to +.08 (now, this is for the shares ratio. Translated back into voting intentions, these differences look like 49% 51% on the one hand, and 45% 55% on the other. These differences would be dismissed as falling close the the margin of error. Inappropriate in the case of systematic runs, but we do not want to become technical here, so let's move on). It is these systematic differences that produce the jagged yellow prediction line. (see graph below)

Second, the model predicted results that are very close to the actual outcome. In other words, averaging polls ended up being a reasonably accurate predictor. And moreover, the non linear model reaches its plateau at the very second time period. No surprise. (now, this model *fits* the data. I am not saying that the outcome was known from the beginning. More work to do. But at first glance, people in the know probably knew the outcome in February, barring the unexpected.)

Rs3

To summarize:

1) French pollsters use calibration rules yielding systematic bias
2) Their results can be used to generate what appears to be an accurate forecast
3) The "data generating process" appeared to be stable in 2007.

Comments

Recherche:


  • pour s'abonner
    Add to Google

kiva