Since October 2004 the dataset is released on a quarterly basis. The
datamanagers test the data for completeness and uniqueness. An
observation is complete when it has values for six critical variables.
These are education, occupation, industry, wage, sex, and year of
birth. In addition, this test aims to detect any technical failures. An
observation is unique when double cases are excluded. Our datamanagers
do that too.
Our programmers have, to a large extent, solved the unlikely problem of out-of-range values. Such values seemed impossible in a web-survey. A major reason for their occurrence was found in the data from textboxes. We found that
semicolons typed by visitors cause cell overflow when converted to
statistics. But that problem seems to be tackled now.
For the data from 31.09.04 to 14.09.05, in total 2.7% was invalid. The
improved tests for missing values, duplicate cases and out-of-range
values must lead to lower percentages of invalid cases.
And the latest on data quality control. Until November 2005, only completed
questionnaires were registered. Completed means that the respondent has
pressed the button SEND at the end of the questionnaire. From November
onwards, incomplete questionnaires are also registered. A variable will
indicate whether the data comes from completed or incomplete
questionnaires.