9 Collecting data

So far, you have learnt to ask a RQ and design the study. In this chapter, you will learn how to:

9.1 Protocols

If the RQ is well-constructed, terms are clearly defined, and the study is well designed and explained, then the process for collecting the data should be easy to describe. Data collection is often time-consuming, tedious and expensive, so collecting the data correctly first time is important, hence an accurate description of the data collection process is important.

Data collection is often tedious, time consuming and expensive: you usually get one chance to collect data. In contrast, data (once collected) can be analysed as many times as necessary. Design the study properly the first time!

Before collecting the data, a plan should be established and documented that explains exactly how the data will be obtained, which will include operational definitions (Sect. 2.10). This plan is called a protocol.

Definition 9.1 (Protocol) A protocol is a procedure documenting the details of the design and implementation of studies, and for data collection.

Unforeseen complications are not unusual, so often a pilot study (or a practice run) is conducted before the actual data collection, to:

The pilot study may suggest changes to the protocol.

Definition 9.2 (Pilot study) A pilot study is a small test run of the study protocol used to check that the protocol is appropriate and practical, and to identify (and hence fix) possible problems with the research design or protocol.

The data can be collected once the protocol has been finalised. Protocols ensure studies are repeatable (Sect. 5.3) so others can confirm or compare results, and others can understand exactly what was done, and how. Protocols should indicate how design aspects (such as blinding the individuals, random allocation of treatments, etc.) will happen. The final protocol, without pedantic detail, should be reported. Diagrams can be useful to support explanations. All studies should have a well-established protocol for describing how the study was done.

A protocol usually has at least three components that describe:

  1. how individuals are chosen from the population (i.e., external validity).
  2. how data are collected from the individuals (i.e., internal validity).
  3. the types of analyses and software (including version) used.

Data collection often encounters problems or challenges, which should be documented also.

Example 9.1 (Protocol) Romanchik-Cerpovicz, Jeffords, and Onyenwoke (2018) made cookies using pureed green peas in place of margarine (to increase the nutritional value of cookies). They assessed the acceptance of these cookies to college students.

The protocol discussed how the individuals were chosen (p. 4):

. through advertisement across campus from students attending a university in the southeastern United States.

This voluntary sample comprised \(80.6\) % women, a higher percentage of women than in the general population, and in the college population. (Other extraneous variables were also recorded.)

Exclusion criteria were also applied, excluding people 'with an allergy or sensitivity to an ingredient used in the preparation of the cookies' (p. 5). The researchers also described how the data were obtained (p. 5):

. panelists were seated at individual tables. Each cookie was presented one at a time on a disposable white plate. Samples were previously coded and randomized. The presentation order for all samples was \(25\) %, \(0\) %, \(50\) %, \(100\) % and \(75\) % substitution of fat with puree of canned green peas. To maintain standard procedures for sensory analysis [. ], panelists cleansed their palates between cookie samples with distilled water ( \(25\) o C) [. ] characteristics of color, smell, moistness, flavor, aftertaste, and overall acceptability, for each sample of cookies [was recorded].

Thus, internal validity was managed using random allocation (to manage confounding), blinding individuals (to partially manage the Hawthorne effect), and washouts (to manage the carry-over effect). Details are also given of how the cookies were prepared, and how objective measurements (such as moisture content) were determined. Subjects were not blinded to being in a study, but were blinded to which substitution percentage was in each cookie.

The type of analyses and software used were also given.

Consider this partial protocol, which shows honesty in describing a protocol:

Fresh cow dung was obtained from free-ranging, grass fed, and antibiotic-free Milking Shorthorn cows (Bos taurus) in the Tilden Regional Park in Berkeley, CA. Resting cows were approached with caution and startled by loud shouting, whereupon the cows rapidly stood up, defecated, and moved away from the source of the annoyance. Dung was collected in ZipLoc bags ( \(1\) gallon), snap-frozen and stored at \(-80\) C.

--- Hare et al. (2008) , p. 10

9.2 Collecting data using questionnaires

9.2.1 Writing questions

Collecting data using questionnaires is common for both observational and experimental studies. Questionnaires are very difficult to do well: question wording is crucial, and surprisingly difficult (Fink 1995) . Pilot testing questionnaires is crucial.

Definition 9.3 (Questionnaire) A questionnaire is a set of questions for respondents to answer.

A questionnaire is a set of question to obtain information from individuals. A survey is an entire methodology, that includes gathering data using a questionnaire, finding a sample, and other components.

Questions in a questionnaire may be open-ended (respondents can write their own answers) or closed (respondents select from a small number of possible answers, as in multiple-choice questions). Open-ended and closed questions both have advantages and disadvantages. Answers to open-ended questions more easily lend themselves to qualitative analysis and closed question more to quantitative research. This section briefly discusses writing questions.

Example 9.2 (Open and closed questions) Raab and Bogner (2021) asked German students a series of questions about microplastics, including:

  1. Name sources of microplastics in the household.
  2. In which ecosystems are microplastics in Germany? Tick the answer (multiple ticks are possible). Options: (a) sea; (b) rivers; (c) lakes; (d) groundwater.
  3. Assess the potential danger posed by microplastics. Options: (a) very dangerous; (b) dangerous; (c) hardly dangerous; (d) not dangerous.

The first question is open-ended: respondents provide their own answers. The second question is closed, where multiple options can be selected. The third question is closed, where only one option can be selected.

Writing a good questionnaire question is difficult. Some issues to consider include:

Example 9.3 (Poor question wording) Consider a questionnaire asking these questions:

  1. Because bottles from bottled water create enormous amounts of non-biodegradable landfill and hence threaten native wildlife, do you support banning bottled water?
  2. Do you drink more water now?
  3. Are you more concerned about Coagulase-negative Staphylococcus or Neisseria pharyngis in bottled water?
  4. Do you drink water in plastic and/or glass bottles?
  5. Do you have a water tank installed illegally, without permission?
  6. Do you avoid purchasing water in plastic bottles unless it is carbonated, unless the bottles are plastic but not necessarily if the lid is recyclable?

Question 1 is leading because the expected response is obvious. Better would be: 'Do you support or not support banning bottled water?'

Question 2 is ambiguous: it is unclear what 'more water now' is being compared to.

Question 3 is unlikely to give sensible answers, as most people will be uninformed. Many people will still give an opinion, but the data will be effectively useless (though the researcher may not realise).

Question 4 is double-barrelled, and would be better asked as two separate questions (one asking about plastic bottles, and one about glass bottles).

Question 5 is unlikely to be given ethical approval or to obtain truthful answers, as respondents are unlikely to admit to breaking rules.

Question 6 is unclear, since knowing what a yes or no answer means is confusing.

Example 9.4 (Question wording) Question wording can be important. In the 2014 General Social Survey (https://gss.norc.org), when white Americans were asked for their opinion of the amount America spends on welfare, \(58\) % of respondents answered 'Too much' (Jardina 2018) .

However, when white Americans were asked for their opinion of the amount America spends on assistance to the poor, only \(16\) % of respondents answered 'Too much'.

Example 9.5 (Mutually exclusive options) In a study to determine the time doctors spent on patients (from Chan et al. (2008) ), doctors were given the options:

This is a poor question, because a respondent does not know which option to select for an answer of ' \(5\) minutes'. The options are not mutually exclusive.

The following (humourous) video shows how questions can be manipulated by those not wanting to be ethical:

9.2.2 Challenges using questionnaires

Using questionnaires presents myriad challenges.

Many of these can be managed with careful questionnaire design, but discussing the methods are beyond the scope of this book.

9.2.3 Preparing software for questionnaire data

Care is needed when preparing software for data collected using a questionnaire. Sometimes, of course, the data are collected by computer (e.g., online) and are supplied to the researchers already formatted.

Data from open question are often text-based (such as words, sentences or paragraphs of text). These can generally be included in the data worksheet (though there may be a limit to the length of such data), but cannot be analysed using quantitative methods (as described in this book).

Closed questions are easily included in a data worksheet. In closed questions where respondents can select one option only, one column is needed for the question, recording which option is selected.

In closed questions where respondents can select all options that apply, each option requires its own column, recording each respondents answer for that option.

Example 9.6 (Open and closed questions: software) In Example 9.2, three questionnaire questions are given that were asked of German students about microplastics (Raab and Bogner 2021) . Some (example) data are shown entered in Fig. 9.1.

The first question requires open-ended, text-based answers ( Sources ). For the second (closed) question, students could select multiple options, so each option needs one column in the data worksheet ( WhereSeas to WhereGroundwater ). The third (closed) question required students to select one option from a given list, so one column ( Danger ) is needed to record responses. As usual (Sect. 2.12), each row represent one unit of analysis (students).

The data worksheet for some example data, for the microplastics study.

FIGURE 9.1: The data worksheet for some example data, for the microplastics study.

9.3 Chapter summary

Having a detailed procedure for collecting the data (the protocol) is important. Using a pilot study to trial the protocol an often reveal unexpected changes necessary for a good protocol. Creating good questionnaires questions is difficult, but important.

9.4 Quick review questions

  1. What is the biggest problem with this question: 'Do you have bromodosis?'

It is double-barrelled It is a leading question It uses language that may not be understood It is ambiguous

It is double-barrelled It is a leading question It uses language that may not be understood It is ambiguous

It is double-barrelled It is a leading question It uses language that may not be understood It is ambiguous

9.5 Exercises

Answers to odd-numbered exercises are available in App. E.

Exercise 9.1 What is the problem with this question?

Exercise 9.2 What is the problem with this question?

Exercise 9.3 Which of these questionnaire questions is better? Why?

  1. Should concerned cat owners vaccinate their pets?
  2. Should domestic cats be required to be vaccinated or not?
  3. Do you agree that pet-owners should have their cats vaccinated?

Exercise 9.4 Which of these questionnaire questions is better? Why?

  1. Do you own an environmentally-friendly electric vehicle?
  2. Do you own an electric vehicle?
  3. Do you own or do you not own an electric vehicle?

Exercise 9.5 Falk and Anderson (2013) studied sunscreen use, and asked participants questions, including these:

Critique these questions. What biases may be present?

Exercise 9.6 Morón-Monge, Hamed, and Morón Monge (2021) studied primary-school children's knowledge of their natural environment. They were asked three questions:

  1. Do you usually visit Guadaira Park?
  2. How many times have you visited nature (the beach, countryside, mountains, etc.) in the last month?
  3. Which is your favorite natural place?

Which questions are open and which are closed? Which questions will produce qualitative data? Critique the questions.