Mathwizurd.com is created by David Witten, a mathematics and computer science student at Stanford University. For more information, see the "About" page.

Types of Samples and Bias

Simple Random Sample

This is the simplest type of sample. In this sample, everyone has an equal chance of getting chosen, and a random group of people are selected.

For example, a simple random sample could be randomly polling 100 voters in Maryland about their vote in an election. This is a subset of all Maryland voters, and each voter has an equal chance of getting chosen.

Stratified Random Sample

Let's say we're polling Florida voters about who they are voting for in an election. We want to be sure that all races of people are accounted for: black people, asians, whites, so we take a simple random sample within each race and combine them. Next, we weight each race to reflect their predicted participation in the election.

That is a stratified random sample: you split the entire population into smaller groups.

Cluster Sample

Let's say you wanted to get information about all of the PE classes at school. Because there's no inherent difference between PE classes, you could choose a few of them and take simple random samples within them. This is usually done to save money for the researchers.

Systematic Sample

This is used when you systematically poll people in a line. For example, when conducting an exit poll, you can ask every 10 people. This solves two issues:

  1. You can't make it a random sample, because people are leaving
  2. This solves the issue of independence
    1. Generally, the tenth person in front of you is independent of your decision

This can also be used in polling a supermarket as people leave, for example.

What is bias?

This means a sample is fundamentally different from the population. This is usually due to the error in the sampling, and here a few a common ones.

Selection Bias

This means one group has a higher chance of being chosen than another. This may be called a convenience sample, meaning one doesn't randomly sample, but only does what is easiest. For example, if you want to  sample all of NYC, and you ask people you see on the street. If you're on Fifth Avenue, those people will be wildly different than those living in poorer areas.

Nonresponse Bias

Usually when a company conducts a poll, they send it by mail or they call the house. However, less than 10% of people respond, meaning an incredible number of people are excluded. This is a source of error. 

Here is a funny example of why this is bad. Let's say you have a poll that says "Agree/Disagree: I have enough time to answer this question." (I found this on Wikipedia). Those who would have disagreed didn't answer, and those that would have agreed did answer. This means you will get an overwhelming percent of "yes", despite many not having time to answer.

Response Bias

This refers to misleading wording in the question or anything that would make the participant say a non-truthful answer. 

Here is an example from a real poll conducted by Bloomberg. The questions asks "Agree/Disagree: Campaign finance should be reformed so that a rich person does not have more influence than a person without money." The way the question phrases the issue, it makes it obvious to say agree. That shouldn't be how a question is phrased, and that surely contributed to some bias.

P-Values Explained

Chi-Squared