Introducing web-surveys
Introducing web-surveys
Download the document Introducing
web-surveys (January 15, 2007), (PDF 105kB)
This page has been updated January 1, 2008
Understanding the Internet and web traffic
The digital divide
Volunteer web surveys: recruitment
Volunteer web surveys: understanding web traffic
Volunteer web surveys: trust and high wage quality information
Volunteer web surveys: data quality
Volunteer web surveys: tackling self-selection
Volunteer web surveys: advantages of large sample
sizes
Understanding the Internet and web traffic
Since the start of Internet, in the early 1990s, the world has faced a
continuous boost in the number of websites, web-visitors and applications.
Today, the Internet has become worldwide a major medium both for providing
and finding information and for communication, gradually replacing
television, radio, and printed media, respectively telephone, fax and post.
With many millions of visitors every day, firms have shifted parts of their
advertising budgets from other media to the Internet, leading to
advertisement revenues for frequently visited websites. Consequently, web
marketing is increasingly deployed in attracting large numbers of visitors.
In contrast to a few years ago, the major players on the Internet are aware
that they aim to attract large masses of web-visitors, and increasingly they
know how to do so. Moreover, insight in web traffic has increased
quickly.
Providing free information for the public at large is a major backbone of the
Internet, in recent years encouraged by adds revenues. This has widened the
variety of free information provided, a tendency recently in particular
strengthened by Google. The free content encyclopedia project Wikipedia
contains more than 3 million articles in over 200 languages. It is written
collaboratively by volunteers under the slogan: 'Imagine a world in which
every single person is given free access to the sum of all human
knowledge.'
In addition, the Internet is used for eliciting reactions from the public at
large. Large audiences are using eBay for trading items, special websites for
congratulations or mourning messages, websites for poll voting for singers or
others, and many other topics. Every time, the numbers of visitors are
astonishingly high within and across national borders.
Eliciting the public is increasingly used for scientific data collection too.
For example, with only developing a game with a prize incentive, tremendously
large numbers of web-visitors are willing to help science by naming
photographs, leading to the labeling of nearly 20 million images, so that
search machines can find them. This was supervised by professor Luis von Ahn,
Carnegie Mellon University, Pittsburgh, USA, who is currently developing an
exiting plan for the translation of all languages into all languages, with
the help of large masses of web-visitors worldwide.
So, if cases prove that providing free information presents an added value to
the audience all over the world and that web-visitors are voluntarily willing
to provide information, with or without a prize incentive, could this be
applied to information gathering and providing concerning wages? Yes, it can.
The WageIndicator websites have shown to be able to provide information on
wages and in return ask the visitors to complete a questionnaire on work and
wages. The public at large in these countries has shown a desire for
information about wages, and it has shown to be willing to complete
voluntarily a web survey in return for this free information.
Top of page
The digital divide
The low Internet access rates in some countries are a major argument
against worldwide web surveys. In 2004, Internet access rates ranged from 788
per 1,000 inhabitants in New Zealand to 2 per 1,000 inhabitants in Egypt.
Particularly in Africa, access rates are low. For two reasons Internet access
rates can be expected to grow. First, urbanization is a continuous process
and Internet is better accessible in urban areas. In Sub Saharan African
countries for example average annual growth of internet dial-up traffic is
currently mainly driven by increases in the number of customers through both
residential and public access in cyber cafes, and not by increasing mainline
telephone services. (Rogy, Michel (2005) Broadband Technologies and Services
in Sub Saharan Africa: The Case of ADSL, Opportunities for Operators and
Challenges for Regulators, Communications and Strategies, November 2005, pp.
95-105.) Therefore, Internet access rates will increasingly underestimate
Internet use.
Second, the so-called digital divide across countries reflects a persistent
gap in the availability of mainline telephone services. Yet, cell phones may
be a promising new platform for Internet access. Currently, major computer
and telecom firms are developing technologies to make Internet accessible
through cell phones, aiming at markets in developing countries. Compared to
telephone line, mainline telephone services he diffusion rates of cell phones
are in many developing countries much higher. (Dasgupta, Susmita; Lall,
Somik; Wheeler, David (2005) Policy Reform, Economic Growth and the Digital
Divide, Oxford Development Studies, 33/2, pp. 229-43.) Particularly in Asian
and Latin American countries, Internet access rates may increase fast in the
coming years. In African countries, prospects may be slightly different, but
here also growth can be expected. Here, we may expect an increase of wireless
Internet.
With an increasing share of the worldwide population using the Internet, it
has become an interesting medium to collect volunteer survey data worldwide,
while at the same time providing free information.
Top of page
Volunteer web surveys: recruitment
Sampling methods are classified as either probability or non-probability.
In probability samples, each member of the target population has a known
non-zero probability of being selected. This method requires sampling frames.
A range of sampling should preferably match fully the target population. For
sampling, in most countries a range of sampling frames is available, from a
list of dwellings, address lists or a list of telephone numbers. Sampling for
the latter is called random digit dialing. In case no sampling frame is
available that provides adequate coverage of this group, a probability sample
can be drawn from a sampling frame covering a larger population. This sample
is then surveyed with only one question, the so-called screening question, to
identify whether the sampled individual is part of the target population. If
so, the full questionnaire can start. The major advantage of probability
sampling is that sampling error can be calculated. Yet, in industrialized
countries, probability samples increasingly face difficulties with
non-response, which may be troublesome when the non-response is
selective.
Sampling methods for web surveys are also classified as either probability or
non-probability. Probability sampling is used by market research companies,
who draw random or stratified samples from so-called access panels,
consisting of email addresses of volunteers for which a number of
socio-demographic variables are known. Apart from discussions about this
sampling method, these marketing companies exist in industrialized countries
only. Non-probability methods are used in open web surveys, in particular the
so-called convenience sampling. These volunteer web surveys are primarily
held in marketing and in voting research, and hardly used in the field of
work and employment, except for the WageIndicator project.
In non-probability sampling, members are selected from the population in some
nonrandom manner, among others by means of convenience sampling. In
non-probability sampling, the degree to which the sample differs from the
population remains unknown. Non-probability sampling is used in preliminary
research efforts to get a gross estimate of the results, without incurring
the cost or time required to select a random sample. It is used in cases
where sampling frames are absent or where the target population is a rare
group, as this requires expensive screening questions.
In the scientific literature about web surveys, little attention has been
paid to recruitment of volunteer surveys, in contrast to extensive attention
paid to panel recruitment in sampled surveys. Lee (2006) in his paper
'Propensity Score Adjustment as a Weighting Scheme for Volunteer Panel Web
Surveys' just notices that the panel recruitment in volunteer web surveys is
done via some type of advertisement, such as banner ads, pop-up ads, or
e-mails. In recent years however insight in how to reach large masses of web
visitors has increased tremendously and web marketing may aim for a
heterogeneous public. Volunteer web survey on websites with an attractive
content for the public at large may attract large masses of visitors.
Moreover, through targeting-marketing methods tailored to web surveys,
underrepresented groups can be addressed. In conclusion, 'Putting a
Questionnaire on the Web is not Enough', as Faas and Schoen (2006) phrase it,
is indeed definitely not enough for a volunteer survey. When recruitment for
volunteer web surveys is based on web marketing efforts that aim at a large
and heterogeneous public, the distinct between probability and
non-probability-sampling may become less sharp.
In case web marketing has become a critical feature in recruiting for
volunteer web surveys, it will take time before a marketing effort pays off.
Once the web-site is frequently visited, it is wise to continue and to profit
from the initial investments. Thus, it becomes profitable to employ
continuous web surveys. In the field of work and employment, continuous
surveys are primarily found among the statistical agencies employing
continuous labor force surveys, mostly for reasons of estimating seasonal
fluctuations in participation rates. This insight challenges the perspective
of surveys, the most common being the once-only surveys.
Top of page
Volunteer web surveys: understanding web traffic
Since the start of Internet, in the early 1990s, the world has faced a
continuous boost in the number of websites, web-visitors and applications.
Today, the Internet has become worldwide the major medium both for providing
and finding information and for communication, gradually replacing
television, radio, and printed media, respectively telephone, fax and post.
With many millions of visitors every day, firms have shifted parts of their
advertising budgets from other media to the Internet, leading to
advertisement revenues for frequently visited websites. Consequently, web
marketing is increasingly deployed in attracting large numbers of visitors.
In contrast to a few years ago, the major players on the Internet are aware
how to attract large masses of web-visitors. Moreover, insight in web traffic
has increased quickly. Providing free information for the public at large is
the backbone of the Internet, in recent years encouraged by adds revenues.
This has tremendously widened the variety of free information provided, a
tendency recently in particular strengthened by Google.
In addition, the Internet is used for eliciting reactions from the public at
large. Large audiences are using eBay for trading items, MSN for meeting
friends, special websites for congratulations or mourning messages, websites
for poll voting for singers or others, or many other topics. Every time, the
numbers of visitors are astonishingly high within and across national
borders. The free content encyclopedia project Wikipedia contains more than 3
million articles in over 200 languages. It is written collaboratively by
volunteers under the slogan: 'Imagine a world in which every single person is
given free access to the sum of all human knowledge.'
Eliciting the public is increasingly also used for scientific data
collection. For example, with only developing a game with a prize incentive,
tremendously large numbers of web-visitors are willing to help science by
naming photographs, leading to the labeling of nearly 20 million images, so
that search machines can find them. This was supervised by professor Luis von
Ahn, Carnegie Mellon University, Pittsburgh, USA, who is currently developing
an exiting plan for the translation of all languages into all languages, with
the help of large masses of web-visitors worldwide.
So, if cases prove that providing free information presents an added value to
the audience all over the world and that in return web-visitors are
voluntarily willing to provide information, with or without a prize
incentive, could this be applied to information gathering and providing
concerning wages? Yes, it can. The WageIndicator websites have shown to be
able to provide information on wages and in return ask the visitors to
complete a questionnaire on work and wages. The public at large in these
countries has shown a desire for information about wages, and is also willing
to complete a web survey.
Top of page
Volunteer web surveys: data quality
It is sometimes assumed that a web survey cannot be taken seriously,
because the Internet is associated with one-second-a-page visitors, who have
neither time nor patience to complete a 20-minute questionnaire. From the
consistency of the data, however, it can be concluded that this hardly
applies to the WageIndicator survey. Item non-response is mostly below 5
percent. From the respondent's emails and their comments at the end of the
questionnaire, we have learned that the vast majority of the respondents
answer the questions with great care. Many do report that they enjoyed
completing the questionnaire.
In addition, some people have pointed to the risk that an occupational group
may systematically report earning higher wages than in reality. By doing so,
the Salary Check in their country would come up with higher wages, which may
in the long run lead to higher wages for the occupational group at large.
This argument assumes that respondents within an occupation will act
collectively. Of course this may happen, but we assume this to be seldom,
because it assumes a highly organized, geographically not dispersed
occupational group, that undertakes action to report higher wages. In case
extreme wages are reported, these are excluded in the analyses for the
Salary. Another threat might be more serious. Some countries are known for
machismo bluff about earnings to family, friends and others. If this behavior
is extended to wage reporting in questionnaires, it may lead to serious
problems. In those countries, surveys, whatever survey mode is used, are not
an adequate tool for collecting wage data. Whenever a country is likely to
show this behavior, the data needs to be checked with other sources. As for
the assumption that the Salary Check may influence national wage setting
processes, two counter-arguments apply. First, for many years it is known
that national wage setting processes are primarily dependent upon factors
related to the economy and the industrial relations. Second, both employees
and employers can and do use the Salary Check. Therefore, if any influences
could be traced, it could be both ways.
In the case of countries with an important non-registered labor force,
Internet-based surveys may allow to get better information than traditional
surveys. In cases where wages are underdeclared or plainly nondeclared when
the respondent -be the worker or the boss- has to answer a questionnaire face
to face to an interviewer it is more likely to get the right information when
the worker is sitting alone in front of the computer. Thus web surveys may
capture some part of the informal labor market not covered or covered with
wrong information by traditional surveys.
The traditional way of testing the quality of a questionnaire is a pilot
study to identify potential problems with the survey's design while there is
still time to fix them. Continuous volunteer web surveys offer two advantages
in this respect, because questions can be adapted while the survey is
running, and because the test population is much larger than usual in any
other survey mode. Whenever an additional country joins, the Questionnaire
Management System is updated, or new questions are added to the survey. Then
the questionnaire is tested using the public at large for a couple of days,
of course only after being tested by the WageIndicator team.
Continuous volunteer web surveys may use multiple client-side feedback
systems for improving the questionnaire. In the case of the WageIndicator,
visitors' emails to the national web managers provide feedback on the
question¬naire and its technical functioning. In addition, the text box at
the end of the questionnaire 'what did you like about the questionnaire'
allows additional input. Finally, dropout during completion provides insight
in difficulties filling out the question¬naire. In the past few years,
web-visitor's comments have led to several improvements in the questionnaire.
Moreover, when continuous volunteer web surveys lead to large sample sizes,
it is particularly profitable to invest in survey quality.
Top of page
Volunteer web surveys: tackling self-selection
Apart from its many advantages, the WageIndicator questionnaire has one
major flaw, because it is a volunteer web survey. Individuals in the target
population, i.e. the labor force, do not have an equal chance to access the
survey and therefore the data are not representative for the population. The
selectivity is threefold. The first selectivity is associated with Internet
access as this may be related to wages, the key variable in the data. Second,
although the numbers of visitors of the WageIndicator websites are large and
growing, it is still a minority of the extremely large and heterogeneous
population visiting the Internet. The self-selection into the WageIndicator
websites may be related to interest in wages, and therefore maybe high-wage
earners could be under-represented. Third, once visiting the WageIndicator
website, the visitor still has to decide whether or not to complete the
questionnaire. This self-selection into the web survey may be related to
availability of time, satisfaction with the website or altruism to contribute
to the project, all factors which may be related to the key variable. The
exact nature of these three steps of self-selection need to be studied in the
years to come.
Results are available of studies in a number of countries to investigate the
national WageIndicator bias as for socio-demographic variables. (Kevätsalo, K. (2006)
WOLIWEB national report - Some preliminary results from Finland. Amsterdam:
WageIndicator, PDF 152 kB; Tijdens, K.G. (2006) How biased is the
Netherlands WageIndicator dataset? Amsterdam: WageIndicator; Depedraza, P.
(2007) Weights for WageIndicator data. Amsterdam: WageIndicator) The
distribution of socio-demographic variables that were assumed to be subject
to bias, notably gender, age, working hours and education, in the Netherlands
WageIndicator dataset 2000-2004 was compared in detail to the distribution of
these variables in the labor force, based on annual aggregate data of the
national Labor Force Survey. The main conclusions are as follows. As for
working hours, the part-time labor force is under-represented in the
WageIndicator sample, particularly for respondents working less than 20 hours
a week. As for age, older workers are under-represented in the sample. This
particularly applies to individuals aged 55 and over. From 2001 to 2004,
however, the under-representation was reduced. In addition, the share of
young workers in the labor force decreased, whereas it fluctuated in the
WageIndicator survey, not decreasing over-representation. As for gender, the
male labor force is under-represented in the sample, which may be due to the
fact that the survey initially addressed women only. From 2002 to 2004,
however, the under-representation reduced. As for education, the low skilled
labor force is under-represented in the sample. From 2002 to 2004, however,
the under-representation reduced.
WageIndicator employs several strategies to cope with the self-selection in
the volunteer sample. The first strategy relates to the web marketing,
whereby a broad target population is defined, including marketing aiming at
sub-populations, such as women or youth. The second strategy relates to the
special routing through the questionnaire, designed to address marginal
groups in the labor force, as they have a higher likelihood of dropping out
during completion. The third strategy relates to weighting the dataset for
under- and over-represented groups, currently developed for the European
countries in the project, using aggregate national labor force survey data.
The fourth strategy relates to asking a few questions in the web survey,
similar to those asked in other major surveys, such as the United States
Labor Force Survey, the World Values Survey, and the European Survey on
Working Conditions, allowing to the micro-data with WageIndicator data and
subsequent weighting. The fifth strategy relates to a full reference survey
in one country, controlling for self-selection effects beyond
socio-demographic variables. Such a study is foreseen in the Netherlands.
These strategies combined are assumed to lead to a sample sufficiently
corrected, so that it may confidently be used for analyses.
As for the fifth strategy, volunteer web surveys are primarily conducted in
marketing and political voting research. According to the standard
methodological literature, volunteer samples can never be representative.
Therefore a number of commercial market research agencies (Harris Online,
Bloomerce, McKinsey) apply a correction technique for their volunteer web
surveys using an extra so-called reference survey. This reference survey is a
real probability sample in which each member of the population has the same
probability of selection. The volunteer sample is adjusted to the probability
sample, using the Propensity Score Adjustment (PSA), which is a statistical
approach for self-selection. (Taylor, 2000; Schonlau et al 2002 and 2004;
Danielsson, 2004; Varedia and Forsman, 2003; Biffignandi et al 2005;
Varedian, 2005; Isaksso & Lee, 2005; Lee, 2006.) This methodology
definitely needs further exploration, but if successful, volunteer web
surveys with reference surveys offer unprecedented opportunities in terms of
sample sizes and country coverage.
Top of page
Volunteer web surveys: advantages of large sample sizes
Having said that the Internet reaches out for millions of people, a major
advantage of volunteer web surveys is said too. Volunteer web surveys allow
for extremely large sample sizes, unmet in other survey modes. These large
sample sizes are also not met in probability sampled web surveys. Here, the
sample sizes are limited to the size of the access panel.
For several reasons, large sample sizes advantageous. First, the data from
large-scale web-surveys allow for exploring small-scale units such as
regional intersections, or occupations, because each unit still has
sufficient data. In the field of work and employment studies, the typical
small-scale studies address an organization, an industry, an occupation or a
region. In large-scale data, even rare groups still have sufficient numbers
of observation. These small-scale units either lack adequate sampling frames,
so the research could not have been undertaken otherwise, or it is simply too
expensive to use a wider sampling frame with a screenings question. When the
screened population is relative small in comparison to the sampled
population, the costs involved with the screening may be larger that the
costs of the survey itself.
Second, large scale data allow for analyses of small-scale units. This is a
passive advantage of investigating large sample sizes. Large-scale surveys
allow for an active form of addressing small-scale units, as they allow for
screening questions in the survey. Subsequently, follow-up questions in the
web-survey can address the sub-populations, e.g. asking respondents in some
occupations about early signs of specific occupational diseases
A third major advantage of volunteer web surveys is that it can be
continuous. Of course, other survey modes can be used continuously as well,
and it is done so for example in labor force surveys, conducted by
statistical agencies. Continuous volunteer web surveys are particularly
profitable because the initial investments in the survey are relatively high,
but the running costs are relatively lower, even when taken the marketing
efforts in account.
Fourth, large-scale web surveys are advantageous as they offer multiple
client-side feedback systems, both active by visitors' email and comments in
the questionnaire's text boxes and passive by their dropout behavior. For
improving web surveys, client-side feedback is extremely helpful.
Finally, web surveys offer great opportunities for temporary plug-in modules
on specific items, related to the issue of work and wages.
Top of page