### 4.1 Collection of data and sampling

**Content-specific conceptual understandings**

**Concepts of population, sample, random sample, discrete and continuous data.**

This is designed to cover the key questions that students should ask when they see a data set/analysis.

**Reliability of data sources and bias in sampling.**

Dealing with missing data, errors in the recording of data.

**Interpretation of outliers.**

Outlier is defined as a data item which is more than 1.5 × interquartile range (IQR) from the nearest quartile.

Awareness that, in context, some outliers are a valid part of the sample but some outlying data items may be an error in the sample.

**Sampling techniques and their effectiveness.**

Simple random, convenience, systematic, quota and stratified sampling methods.

**Exercises**

- Alana was analyzing data from questionnaires asking schools for the time in hours they spend each year teaching mathematics. Alana’s statistical package flagged up one item as an outlier. All the rest of her data were between 120 and 200 hours. What would you suggest Alana should do if the outlier has the value:

a) −175

b) 4

c) 240? - Anke wants to find out the proportion of households in Germany who have a pet. For her investigation, she decides to ask her friends from school, which is located in the centre of a large city, whether their family owns a pet.

a) What is the relevant population for Anke’s investigation?

b) Name the sampling method that Anke is using.

c) State one reason why Anke’s sample may not be representative of the population. - Leonie wants to collect information on the length of time pupils at her school spend on homework each evening. She thinks that this depends on the school year, so her sample should contain some pupils from each year group.

a) What information does Leonie need in order to be able to select a stratified sample?

Leonie decides to ask pupils in the lunch queue until she has responses from at least 10 pupils in each year group.

b) Name this sampling method.

c) Having collected and analyzed the data, Leonie found two outliers. For each value, suggest whether it should

be kept or discarded.

i )10 minutes

ii) 20 hours - A student wants to conduct an investigation into attitudes to environmental issues among the residents of his village. He decides to talk to the first 20 people who arrive at his local bus stop.

a) Name this sampling technique.

b) State the population relevant to his investigation.