Sampling techniques are methods used to select a smaller group of individuals or items from a larger population or dataset. The idea behind sampling is to study the smaller group, called a “sample,” to make predictions or draw conclusions about the entire population without having to examine every single individual or item in it.
Imagine you have a huge jar of candies, and you want to know what flavors are most popular. Instead of eating all the candies in the jar, which would take a lot of time and effort, you can simply take a small handful of candies from the jar. This small group represents your “sample.” By tasting the candies in the sample, you can get a good idea of the flavors that are popular in the entire jar.
Two types of Sampling
- Random Sampling
- Stratified Sampling
Random Sampling
- Imagine you have a large bag of marbles, each with a different colour. You want to pick a few marbles to study their coloUrs and make generalisations about all the marbles in the bag. In random sampling, you would close your eyes, reach into the bag, and randomly pull out marbles without looking. The marbles you pick make up your random sample.
- Random sampling is used to ensure that each individual or item in the population has an equal chance of being included in the sample. This helps to avoid any biases or preferences in the selection process and allows researchers to draw unbiased conclusions about the entire population based on the characteristics of the sample.
import random
# Sample data (a list of elements)
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Specify the number of elements you want to sample
sample_size = 5
# Perform random sampling
sampled_data = random.sample(data, sample_size)
print("Random Sampled Data:", sampled_data)
Stratified Sampling
- Imagine you have a classroom with students of different ages, and you want to understand their opinions on a particular topic. Instead of selecting a random sample from the entire class, you decide to use stratified sampling. First, you divide the students into groups based on their age, creating separate strata. Then, from each age group, you randomly select a few students to be part of the sample. Finally, you combine the selected students from each age group to form your stratified sample.
- The goal of stratified sampling is to ensure that each subgroup or stratum is adequately represented in the sample. This helps in obtaining a more accurate and precise estimation of population characteristics, especially when there is significant variability or differences among different subgroups.
import random
from collections import Counter
# Sample data with some categorical attribute (strata)
data = [
{'age': '18-25', 'score': 85},
{'age': '26-35', 'score': 90},
{'age': '18-25', 'score': 78},
{'age': '36-45', 'score': 92},
{'age': '26-35', 'score': 88},
{'age': '18-25', 'score': 80},
# Add more data as needed...
]
# Stratify the data based on the 'age' attribute
strata = {}
for item in data:
stratum = item['age']
if stratum not in strata:
strata[stratum] = []
strata[stratum].append(item)
# Number of samples to take from each stratum
sample_size_per_stratum = 2
# Perform stratified sampling
stratified_sample = []
for stratum, stratum_data in strata.items():
if len(stratum_data) >= sample_size_per_stratum:
sampled_data = random.sample(stratum_data, sample_size_per_stratum)
stratified_sample.extend(sampled_data)
print("Stratified Sampled Data:", stratified_sample)
Stay updated with my blog journey! Follow me to join the adventure of upcoming posts.