Unit 03 - Descriptive and Inferential Statistics Notes

 

This presentation concerns descriptive and inferential statistics.

 

The discipline of biostatistics can be divided into two parts:

descriptive statistics and inferential statistics.

 

The purpose of Descriptive Statistics is to describe data. Raw data have little meaning in themselves.

 

Consider the data that are shown on this slide.  A list of data such as this has little meaning. One could perhaps say that the smallest number is 7 and the largest 99, but beyond this, it is just a collection of numbers. With a larger collection of numbers, it may even be difficult to identify the smallest and largest numbers.

 

If you were told the mean of these numbers, which are scores on an examination, is 55; you would immediately know that the examination was very difficult.

 

We could also present the test scores in the form of a distribution that could be represented by a table or a graph and this would give us even more information. This is what descriptive statistics is about – making sense of data.

 

Let’s assume that we were writing an article about the association of smoking to lung cancer and all we tell the reader is that we carried some kind of complex statistical test and found that smokers did not appear to be significantly more likely to get lung cancer.

 

If we failed to mention that our sample consisted exclusively of students in their early 20’s, we would be misleading the public into thinking that smoking was not associated with lung cancer.

 

For this reason, the first table that appears in most scientific publications describes the sample – their mean age, their gender, their education and other characteristics.

 

The purpose of Inferential Statistics is to make inferences about populations from studying samples.  In the real world, we rarely have access to populations.  For example, it would be difficult, though not impossible, to obtain the blood pressures of all students at a large university.  However, it would be relatively easy to measure the blood pressures of a sample of students in one class.

 

Pollsters trying to predict who won an election don’t call everyone in the United States to ask them how they voted. They interview a relatively small sample of people leaving the polls and infer who the winner will be.  If they had access to the entire population, there would no need for statistical tests.

 

Inferential statistics provides the link between samples and populations and permits us to infer the characteristics of large populations from small samples.