9 Collecting data

So far, you have learnt to ask a RQ and design the study. In this chapter, you will learn how to:

record the important steps in data collection.
describe study protocols.
ask questionnaire questions.

9.1 Protocols

If the RQ is well-constructed, terms are clearly defined, and the study is well designed and explained, then the process for collecting the data should be easy to describe. Data collection is often time-consuming, tedious and expensive, so collecting the data correctly first time is important, hence an accurate description of the data collection process is important.

Data collection is often tedious, time consuming and expensive: you usually get one chance to collect data. In contrast, data (once collected) can be analysed as many times as necessary. Design the study properly the first time!

Before collecting the data, a plan should be established and documented that explains exactly how the data will be obtained, which will include operational definitions (Sect. 2.10). This plan is called a protocol.

Definition 9.1 (Protocol) A protocol is a procedure documenting the details of the design and implementation of studies, and for data collection.

Unforeseen complications are not unusual, so often a pilot study (or a practice run) is conducted before the actual data collection, to:

determine the feasibility of the data collection protocol.
identify unforeseen challenges.
obtain data to determine appropriate sample sizes (Sect. 29).
identify ways to potentially save time and money.

The pilot study may suggest changes to the protocol.

Definition 9.2 (Pilot study) A pilot study is a small test run of the study protocol used to check that the protocol is appropriate and practical, and to identify (and hence fix) possible problems with the research design or protocol.

The data can be collected once the protocol has been finalised. Protocols ensure studies are repeatable (Sect. 5.3) so others can confirm or compare results, and others can understand exactly what was done, and how. Protocols should indicate how design aspects (such as blinding the individuals, random allocation of treatments, etc.) will happen. The final protocol, without pedantic detail, should be reported. Diagrams can be useful to support explanations. All studies should have a well-established protocol for describing how the study was done.

A protocol usually has at least three components that describe:

how individuals are chosen from the population (i.e., external validity).
how data are collected from the individuals (i.e., internal validity).
the types of analyses and software (including version) used.

Data collection often encounters problems or challenges, which should be documented also.

Example 9.1 (Protocol) Romanchik-Cerpovicz, Jeffords, and Onyenwoke (2018) made cookies using pureed green peas in place of margarine (to increase the nutritional value of cookies). They assessed the acceptance of these cookies to college students.

The protocol discussed how the individuals were chosen (p. 4):

. through advertisement across campus from students attending a university in the southeastern United States.

This voluntary sample comprised \(80.6\) % women, a higher percentage of women than in the general population, and in the college population. (Other extraneous variables were also recorded.)

Exclusion criteria were also applied, excluding people 'with an allergy or sensitivity to an ingredient used in the preparation of the cookies' (p. 5). The researchers also described how the data were obtained (p. 5):

. panelists were seated at individual tables. Each cookie was presented one at a time on a disposable white plate. Samples were previously coded and randomized. The presentation order for all samples was \(25\) %, \(0\) %, \(50\) %, \(100\) % and \(75\) % substitution of fat with puree of canned green peas. To maintain standard procedures for sensory analysis [. ], panelists cleansed their palates between cookie samples with distilled water ( \(25\) o C) [. ] characteristics of color, smell, moistness, flavor, aftertaste, and overall acceptability, for each sample of cookies [was recorded].

Thus, internal validity was managed using random allocation (to manage confounding), blinding individuals (to partially manage the Hawthorne effect), and washouts (to manage the carry-over effect). Details are also given of how the cookies were prepared, and how objective measurements (such as moisture content) were determined. Subjects were not blinded to being in a study, but were blinded to which substitution percentage was in each cookie.

The type of analyses and software used were also given.

Consider this partial protocol, which shows honesty in describing a protocol:

Fresh cow dung was obtained from free-ranging, grass fed, and antibiotic-free Milking Shorthorn cows (Bos taurus) in the Tilden Regional Park in Berkeley, CA. Resting cows were approached with caution and startled by loud shouting, whereupon the cows rapidly stood up, defecated, and moved away from the source of the annoyance. Dung was collected in ZipLoc bags ( \(1\) gallon), snap-frozen and stored at \(-80\) C.

--- Hare et al. (2008) , p. 10

9.2 Collecting data using questionnaires

9.2.1 Writing questions

Collecting data using questionnaires is common for both observational and experimental studies. Questionnaires are very difficult to do well: question wording is crucial, and surprisingly difficult (Fink 1995) . Pilot testing questionnaires is crucial.

Definition 9.3 (Questionnaire) A questionnaire is a set of questions for respondents to answer.

A questionnaire is a set of question to obtain information from individuals. A survey is an entire methodology, that includes gathering data using a questionnaire, finding a sample, and other components.

Questions in a questionnaire may be open-ended (respondents can write their own answers) or closed (respondents select from a small number of possible answers, as in multiple-choice questions). Open-ended and closed questions both have advantages and disadvantages. Answers to open-ended questions more easily lend themselves to qualitative analysis and closed question more to quantitative research. This section briefly discusses writing questions.

Example 9.2 (Open and closed questions) Raab and Bogner (2021) asked German students a series of questions about microplastics, including:

Name sources of microplastics in the household.
In which ecosystems are microplastics in Germany? Tick the answer (multiple ticks are possible). Options: (a) sea; (b) rivers; (c) lakes; (d) groundwater.
Assess the potential danger posed by microplastics. Options: (a) very dangerous; (b) dangerous; (c) hardly dangerous; (d) not dangerous.

The first question is open-ended: respondents provide their own answers. The second question is closed, where multiple options can be selected. The third question is closed, where only one option can be selected.

Writing a good questionnaire question is difficult. Some issues to consider include:

Avoid leading questions: these may lead respondents to answer a certain way.
Avoid ambiguity: avoid unfamiliar terms.
Avoid asking the uninformed: avoid asking respondents about issues they don't know about. Many people will give a response even if they do not understand (such responses are worthless). For example, people may give directions to places that do not even exist (Collett and O’Shea 1976) .
Avoid complex and double-barrelled questions: these are hard to understand and the answers hard to interpret.
Avoid problems with ethics: avoid questions about people breaking laws, or revealing confidential or private information. In special cases and with justification, ethics committees may allow such questions.
Ensure clarity in question wording.
For closed questions: ensure options are mutually exhaustive, so answers fit into only one category.
For closed questions: Ensure options are exhaustive, so that the categories cover all options.

Example 9.3 (Poor question wording) Consider a questionnaire asking these questions:

Because bottles from bottled water create enormous amounts of non-biodegradable landfill and hence threaten native wildlife, do you support banning bottled water?
Do you drink more water now?
Are you more concerned about Coagulase-negative Staphylococcus or Neisseria pharyngis in bottled water?
Do you drink water in plastic and/or glass bottles?
Do you have a water tank installed illegally, without permission?
Do you avoid purchasing water in plastic bottles unless it is carbonated, unless the bottles are plastic but not necessarily if the lid is recyclable?

Question 1 is leading because the expected response is obvious. Better would be: 'Do you support or not support banning bottled water?'

Question 2 is ambiguous: it is unclear what 'more water now' is being compared to.

Question 3 is unlikely to give sensible answers, as most people will be uninformed. Many people will still give an opinion, but the data will be effectively useless (though the researcher may not realise).

Question 4 is double-barrelled, and would be better asked as two separate questions (one asking about plastic bottles, and one about glass bottles).

Question 5 is unlikely to be given ethical approval or to obtain truthful answers, as respondents are unlikely to admit to breaking rules.

Question 6 is unclear, since knowing what a yes or no answer means is confusing.

Example 9.4 (Question wording) Question wording can be important. In the 2014 General Social Survey (https://gss.norc.org), when white Americans were asked for their opinion of the amount America spends on welfare, \(58\) % of respondents answered 'Too much' (Jardina 2018) .

However, when white Americans were asked for their opinion of the amount America spends on assistance to the poor, only \(16\) % of respondents answered 'Too much'.

Example 9.5 (Mutually exclusive options) In a study to determine the time doctors spent on patients (from Chan et al. (2008) ), doctors were given the options:

\(0\) -- \(5\) mins;
\(5\) -- \(10\) mins; or
more than \(10\) mins.

This is a poor question, because a respondent does not know which option to select for an answer of ' \(5\) minutes'. The options are not mutually exclusive.

The following (humourous) video shows how questions can be manipulated by those not wanting to be ethical:

9.2.2 Challenges using questionnaires

Using questionnaires presents myriad challenges.

Non-response bias (Sect. 6.7): Non-response bias is common with questionnaires, as they are often used with voluntary-response samples. The people who do not respond to the survey may be different than those who do respond.
Response bias (Sect. 6.7): People do not always answer truthfully; for example, what people say may not correspond with what people do (Example 8.6). Sometimes this is unintentional (e.g., poor questions wording), due to embarrassment or because questions are controversial. Sometimes, respondents repeatedly provide the same answer (without reading the question) to a series of multiple-choice questions.
Recall bias: People may not be able to accurately recall past events clearly, or recall when they happened.
Question order: The order of the questions can influence the responses.
Interpretation: Phrases and words such as 'Sometimes' and 'Somewhat disagree' may mean different things to different people.

Many of these can be managed with careful questionnaire design, but discussing the methods are beyond the scope of this book.

9.2.3 Preparing software for questionnaire data

Care is needed when preparing software for data collected using a questionnaire. Sometimes, of course, the data are collected by computer (e.g., online) and are supplied to the researchers already formatted.

Data from open question are often text-based (such as words, sentences or paragraphs of text). These can generally be included in the data worksheet (though there may be a limit to the length of such data), but cannot be analysed using quantitative methods (as described in this book).

Closed questions are easily included in a data worksheet. In closed questions where respondents can select one option only, one column is needed for the question, recording which option is selected.

In closed questions where respondents can select all options that apply, each option requires its own column, recording each respondents answer for that option.

Example 9.6 (Open and closed questions: software) In Example 9.2, three questionnaire questions are given that were asked of German students about microplastics (Raab and Bogner 2021) . Some (example) data are shown entered in Fig. 9.1.

The first question requires open-ended, text-based answers ( Sources ). For the second (closed) question, students could select multiple options, so each option needs one column in the data worksheet ( WhereSeas to WhereGroundwater ). The third (closed) question required students to select one option from a given list, so one column ( Danger ) is needed to record responses. As usual (Sect. 2.12), each row represent one unit of analysis (students).

The data worksheet for some example data, for the microplastics study.

FIGURE 9.1: The data worksheet for some example data, for the microplastics study.

9.3 Chapter summary

Having a detailed procedure for collecting the data (the protocol) is important. Using a pilot study to trial the protocol an often reveal unexpected changes necessary for a good protocol. Creating good questionnaires questions is difficult, but important.

9.4 Quick review questions

What is the biggest problem with this question: 'Do you have bromodosis?'

It is double-barrelled It is a leading question It uses language that may not be understood It is ambiguous

It allows the researchers to make the study externally valid.
It ensures that others know exactly what was done.
It ensures that the study is repeatable for others.

Do you, or do you not, believe that permeable pavements are a viable alternative to traditional pavements?
Do you support a ban on drinks sold in unrecyclable plastic bottles?
Do you believe that double-gloving by paramedics reduces the risk of infection, increases the risk of infection, or makes no difference to the risk of infection?
Should Ireland ban breakfast cereals with unhealthy sugar levels?

9.5 Exercises

Answers to odd-numbered exercises are available in App. E.

Exercise 9.1 What is the problem with this question?

Under \(18\)
Over \(18\)

Exercise 9.2 What is the problem with this question?

None
1 or 2
2 or 3
More than 4

Exercise 9.3 Which of these questionnaire questions is better? Why?

Should concerned cat owners vaccinate their pets?
Should domestic cats be required to be vaccinated or not?
Do you agree that pet-owners should have their cats vaccinated?

Exercise 9.4 Which of these questionnaire questions is better? Why?

Do you own an environmentally-friendly electric vehicle?
Do you own an electric vehicle?
Do you own or do you not own an electric vehicle?

Exercise 9.5 Falk and Anderson (2013) studied sunscreen use, and asked participants questions, including these:

How often do you sun bathe with the intention to tan during the summer in Sweden? (Possible answers: never, seldom, sometimes, often, always).
How long do you usually stay in the sun between \(11\) am and \(3\) pm, during a typical day-off in the summer (June--August)? (Possible answers: \(3\) h).

Critique these questions. What biases may be present?

Exercise 9.6 Morón-Monge, Hamed, and Morón Monge (2021) studied primary-school children's knowledge of their natural environment. They were asked three questions:

Do you usually visit Guadaira Park?
- No, I don’t like parks.
- No, I don’t usually visit it.
- Yes, once per week.
- Yes, more than once a week
How many times have you visited nature (the beach, countryside, mountains, etc.) in the last month?
- Never
- Once
- Two to three times
- More than three times
Which is your favorite natural place?
- Write a story
- Draw a picture

Which questions are open and which are closed? Which questions will produce qualitative data? Critique the questions.