Biostatistics I: Unit 01 - Populations and Samples Notes

Slide 1
In this learning unit, we will cover populations and samples.

Slide 2
The most important organizing concept of biostatistics is the notion of populations and samples. 
Using samples to make inferences about populations can save us inordinate amounts of time and effort.
Therefore, an understanding of the differences between populations and samples is essential.

Slide 3
I mentioned earlier that we could learn about the attitudes of students at a university about a rise
in tuition by asking every one of the students to respond to a questionnaire or alternatively by
taking a small sample of students and asking the same questions. The first approach is a population
approach – that is, we gather information on everyone in the population. The second is a sampling
approach. If we had access to everyone in every population and could obtain data from them,
there would be no need for statistics. We wouldn’t have to estimate anything – because we
could describe the population with certainty.

Slide 4
After this unit, you will understand the differences between populations and samples as well as
the different ways populations and samples are described in biostatistics.

Slide 5
Let’s start with populations. There are two types of populations: popular and statistical.
Each has unique characteristics.

Slide 6
Even statisticians disagree about what a population is. The common use of the term population
is a set of persons or things – Population defined in this way are called popular populations.
They might include the population of students at a university, the population of seniors in the
United States, the population of persons in Florida who if tested would be positive for HIV,
the population of alligators in Florida, or the population of deer in Michigan that carry
the tick responsible for Lyme’s disease.

Slide 7
The second kind of population called a statistical population is made up of characteristics
of persons or things. Statistical populations, for example, could include the set of all blood
pressures of students at a university, the set of antibody titers against HIV of persons living
in Florida, the scores of seniors living in Tampa on a test of memory, or the ages of alligators
living in Florida.

Slide 8   Self-assessment

Slide 9
Both uses of the word population – popular and statistical populations -- are correct, but there
are important reasons to distinguish them. When we are analyzing a characteristic such as the titer
of HIV antibodies, we might know some more things about the persons with different titers of
antibodies, such as their age or place of residence, but generally there are a lot of things about
them that we don’t know. We might not know their sexual history or sexual preference, for example.
The analyses we conduct in biostatistics are based on what is observed. Other characteristics that
we do not observe are extraneous to our analyses and tests. Therefore, often what we are studying
is not the people themselves, but rather some observable characteristics of these people. In that sense,
we are studying statistical rather than popular populations.
 
While we usually study statistical populations, it is permissible to use the popular population to
describe the entity being studied, as long as it is understood that it is the characteristics of individuals
in this population that is the object of study, rather than the individuals themselves.

Slide 10
Before we move on, we need to introduce 2 new terms: data and variables. We use the word data to
refer to recordings of measurements made on characteristics. For example, the names of people and their
blood pressures constitute data. While characteristics could be constant (for example, the presence of a
brain in all humans), they are more likely to take on different values (for example, names or blood pressures)
in which case we refer to them as variables, meaning simply entities that can vary. Note that the name of a
particular person does not vary and therefore is not a variable, but his or her blood pressure does vary from
time to time and therefore is a variable.

Slide 11
Now that we have introduced populations, data and variables, we can turn to samples. A sample is simply
any subset of a population. For example, it could be the sample of patients with lung cancer seen
at a regional cancer center this past year.

Slide 12
Such a sample may have unique characteristics that distinguishes it from the entire population of people
with lung cancer. For example, the patterns of referral to the regional cancer center may yield
a younger, older, more or less affluent group of patients. Such samples are said to be biased,
that is to not reflect the population of all such patients fairly. For the purposes of this course
we will focus on simple random samples, where the members of the sample can be thought of
as being drawn from a hat from all possible members.

Slide 13  Self-assessment

Slide 14
Most of the populations we concern ourselves with in epidemiology are very large, and for purposes of the
analyses we conduct can be considered to be infinite in size; most of the samples we use are small.

Slide 15
Populations in some cases may not be well defined or able to be enumerated. We can still make inferences
from a sample of a population to that population even though the population is not well defined.

Slide 16
We can define the following population: all people currently living in Florida over age 65 who will ever develop
Alzheimer’s disease. Although we know that such a population exists, we cannot enumerate it, because
we don’t know who these people are. However, we could select a sample of 100 individuals over 65
to whom we can administer a medication to prevent Alzheimer’s disease. The sample is well defined,
because we can enumerate who is and who is not in the sample. We don’t need to be able to identify
everyone in a population to make inferences about the efficacy of a preventive medication in that population.

Slide 17  Self-assessment

Slide 18
There are 2 types of populations: popular and statistical. Often what we are studying is not the people
themselves, but rather some observable characteristics of these people. Samples are subsets of populations.
Most populations are very large; most samples are small. We do not need to be able to enumerate a
population to be able to make inferences about its characteristics from samples.