Day 1 of this challenge series consists of three parts/challenges which cover topics such as Quartiles, Interquartile Range, and Standard Deviation.
Let’s begin with the first challenge named “Quartiles”.
Quartiles/ IQR
In statistics, quartiles are values that divide a dataset into four equal parts, each containing 25% of the data. Quartiles are used to measure the spread and distribution of a dataset and provide insights into the central tendency of the data.
The quartiles divide the dataset into three points:
- First Quartile (Q1): This is the 25th percentile of the data. It marks the point below which 25% of the data falls. It is also referred to as the lower quartile.
- Second Quartile (Q2): This is the 50th percentile of the data, which is equivalent to the median. It marks the point below which 50% of the data falls. In other words, half of the data values are below the second quartile.
- Third Quartile (Q3): This is the 75th percentile of the data. It marks the point below which 75% of the data falls. It is also referred to as the upper quartile.
Quartiles are particularly useful in understanding the distribution of skewed or non-normally distributed datasets. They help identify the central tendency and spread of the data, as well as detect outliers or extreme values.
#!/bin/python3
import math
from statistics import median
# The function is expected to return an INTEGER_ARRAY.
# The function accepts INTEGER_ARRAY arr as parameter.def quartiles(arr: list, percentails = [0.25, 0.5, 0.75]) -> list:
arr.sort()
q1 = median(arr[:len(arr)//2])
q2 = median(arr)
q3 = median(arr[len(arr)//2 * -1:])
return list(map(int, [q1, q2, q3]))if __name__ == '__main__':
fptr = open(os.environ['OUTPUT_PATH'], 'w')
n = int(input().strip())
data = list(map(int, input().rstrip().split()))
res = quartiles(data)
fptr.write('\\n'.join(map(str, res)))
fptr.write('\\n')
fptr.close()
- The code defines a function named
quartiles
that takes two parameters:arr
(a list of numbers) andpercentiles
(an optional list of percentiles to calculate, defaulting to[0.25, 0.5, 0.75]
). - The function sorts the input list
arr
in ascending order using thesort()
method. This ensures that the calculations of quartiles are accurate. - The code assumes the existence of a separate function called
median()
that calculates the median of a given list. However, this function is not defined in the provided code snippet. Themedian()
function is likely needed to calculate the quartiles. - The variable
q1
is calculated as the median of the first half of the sorted listarr[:len(arr)//2]
. This corresponds to the lower quartile (25th percentile). - The variable
q2
is calculated as the median of the entire sorted listarr
. This corresponds to the median (50th percentile). - The variable
q3
is calculated as the median of the second half of the sorted listarr[len(arr)//2 * -1:]
. This corresponds to the upper quartile (75th percentile). - The function returns a list
[q1, q2, q3]
containing the calculated quartiles. - The
if __name__ == '__main__':
block is the entry point of the script when executed directly. - It reads an integer
n
from the input. This likely represents the number of elements in the list. - It reads a space-separated list of integers
data
from the input. This represents the actual data. - It calls the
quartiles
function, passingdata
as an argument, and stores the result in the variableres
. - It opens a file specified by the
OUTPUT_PATH
environment variable in write mode usingopen(os.environ['OUTPUT_PATH'], 'w')
. - It writes the quartiles in the
res
list to the file, converting each value to a string usingmap(str, res)
, and joining them with newline characters using'\\\\n'.join(...)
. - It writes an additional newline character to the file using
fptr.write('\\n')
. - Finally, it closes the file using
fptr.close()
.
Challenge #2
IQR
The interquartile range (IQR) is the difference between the third and first quartiles (Q3 — Q1). It represents the spread of the middle 50% of the dataset and is often used as a measure of variability.
#!/bin/python3
import math
# The function accepts following parameters:
# 1. INTEGER_ARRAY values
# 2. INTEGER_ARRAY freqs
from statistics import median def interQuartile(values, freqs):
s = []
for value, freq in zip(values, freqs):
s += [value] * freq
s.sort()
q1=median(s[: len(s)//2])
q3=median(s[len(s)//2 *-1:])
result=q3-q1
print(f'{result:.1f}')if __name__ == '__main__':
n = int(input().strip()) val = list(map(int, input().rstrip().split())) freq = list(map(int, input().rstrip().split())) interQuartile(val, freq)
- The code defines a function called
interQuartile
that accepts two parameters:values
(a list of integers representing data values) andfreqs
(a list of integers representing the frequencies of corresponding values). - The code imports the
median
function from thestatistics
module. This function is used to calculate the median of a given list. - Within the
interQuartile
function, a lists
is created to hold the expanded dataset. It loops over thevalues
andfreqs
lists using thezip()
function to pair corresponding values and frequencies. It appends each value tos
repeated by its frequency. - The list
s
is then sorted in ascending order using thesort()
method. Sorting is necessary to calculate quartiles accurately. - The lower quartile (
q1
) is calculated as the median of the first half of the sorted lists[: len(s)//2]
. - The upper quartile (
q3
) is calculated as the median of the second half of the sorted lists[len(s)//2 *-1:]
. The use of-1
ensures that the indexing is performed from the end of the list. - The interquartile range (
result
) is computed as the difference between the upper and lower quartiles (q3 - q1
). - The interquartile range is printed with one decimal place using an f-string with the
:.1f
format specifier. - The
if __name__ == '__main__':
block is the entry point of the script when executed directly. - It reads an integer
n
from the input, which likely represents the number of elements in thevalues
list. - It reads a space-separated list of integers
val
from the input, representing the data values. - It reads another space-separated list of integers
freq
from the input, representing the frequencies of the corresponding values. - It calls the
interQuartile
function, passingval
andfreq
as arguments.
Challenge #3
Standard Deviation
It is basically used to find the Average distance to mean
Step 1 — Find the mean
Step 2 — Find the variance ( it is basically the square of average distance to mean
Step 3 — Find the standard deviation ( square root of variance )
import math
from statistics import mean
def stdDev(arr)-> None:
mu = mean(arr)
variance = sum((el-mu)**2 for el in arr)/ len(arr)
sigma = pow(variance, 0.5)
print(f"{sigma:.1f}")
if __name__ == '__main__':
n = int(input().strip())
vals = list(map(int, input().rstrip().split())) stdDev(vals)
- The code imports the
mean
function from thestatistics
module. This function is used to calculate the mean (average) of a given list. - The code defines a function called
stdDev
that takes in a single parameterarr
(a list of numbers). - Within the
stdDev
function, the mean (mu
) of the input listarr
is calculated using themean
function. - The variance is calculated by summing up the squared differences between each element (
el
) in the input list and the mean, divided by the length of the list (len(arr)
). This calculation measures the spread of the data. - The standard deviation (
sigma
) is computed as the square root of the variance, using thepow()
function with exponent0.5
. - The standard deviation is printed with one decimal place using an f-string with the
:.1f
format specifier. - The
if __name__ == '__main__':
block is the entry point of the script when executed directly. - It reads an integer
n
from the input, which likely represents the number of elements in the list. - It reads a space-separated list of integers
vals
from the input, representing the actual data. - It calls the
stdDev
function, passingvals
as an argument.
“Stay tuned for my upcoming blogs!”