Day 1–10 Days of Statistics (Hacker Rank)

Celestial
5 min readJun 11, 2023

--

Day 1 of this challenge series consists of three parts/challenges which cover topics such as Quartiles, Interquartile Range, and Standard Deviation.

Let’s begin with the first challenge named “Quartiles”.

Quartiles/ IQR

In statistics, quartiles are values that divide a dataset into four equal parts, each containing 25% of the data. Quartiles are used to measure the spread and distribution of a dataset and provide insights into the central tendency of the data.

The quartiles divide the dataset into three points:

  1. First Quartile (Q1): This is the 25th percentile of the data. It marks the point below which 25% of the data falls. It is also referred to as the lower quartile.
  2. Second Quartile (Q2): This is the 50th percentile of the data, which is equivalent to the median. It marks the point below which 50% of the data falls. In other words, half of the data values are below the second quartile.
  3. Third Quartile (Q3): This is the 75th percentile of the data. It marks the point below which 75% of the data falls. It is also referred to as the upper quartile.

Quartiles are particularly useful in understanding the distribution of skewed or non-normally distributed datasets. They help identify the central tendency and spread of the data, as well as detect outliers or extreme values.

#!/bin/python3
import math
from statistics import median
# The function is expected to return an INTEGER_ARRAY.
# The function accepts INTEGER_ARRAY arr as parameter.
def quartiles(arr: list, percentails = [0.25, 0.5, 0.75]) -> list:
arr.sort()
q1 = median(arr[:len(arr)//2])
q2 = median(arr)
q3 = median(arr[len(arr)//2 * -1:])

return list(map(int, [q1, q2, q3]))
if __name__ == '__main__':
fptr = open(os.environ['OUTPUT_PATH'], 'w')
n = int(input().strip())
data = list(map(int, input().rstrip().split()))
res = quartiles(data)
fptr.write('\\n'.join(map(str, res)))
fptr.write('\\n')
fptr.close()
  1. The code defines a function named quartiles that takes two parameters: arr (a list of numbers) and percentiles (an optional list of percentiles to calculate, defaulting to [0.25, 0.5, 0.75]).
  2. The function sorts the input list arr in ascending order using the sort() method. This ensures that the calculations of quartiles are accurate.
  3. The code assumes the existence of a separate function called median() that calculates the median of a given list. However, this function is not defined in the provided code snippet. The median() function is likely needed to calculate the quartiles.
  4. The variable q1 is calculated as the median of the first half of the sorted list arr[:len(arr)//2]. This corresponds to the lower quartile (25th percentile).
  5. The variable q2 is calculated as the median of the entire sorted list arr. This corresponds to the median (50th percentile).
  6. The variable q3 is calculated as the median of the second half of the sorted list arr[len(arr)//2 * -1:]. This corresponds to the upper quartile (75th percentile).
  7. The function returns a list [q1, q2, q3] containing the calculated quartiles.
  8. The if __name__ == '__main__': block is the entry point of the script when executed directly.
  9. It reads an integer n from the input. This likely represents the number of elements in the list.
  10. It reads a space-separated list of integers data from the input. This represents the actual data.
  11. It calls the quartiles function, passing data as an argument, and stores the result in the variable res.
  12. It opens a file specified by the OUTPUT_PATH environment variable in write mode using open(os.environ['OUTPUT_PATH'], 'w').
  13. It writes the quartiles in the res list to the file, converting each value to a string using map(str, res), and joining them with newline characters using '\\\\n'.join(...).
  14. It writes an additional newline character to the file using fptr.write('\\n').
  15. Finally, it closes the file using fptr.close().

Challenge #2

IQR

The interquartile range (IQR) is the difference between the third and first quartiles (Q3 — Q1). It represents the spread of the middle 50% of the dataset and is often used as a measure of variability.

#!/bin/python3
import math
# The function accepts following parameters:
# 1. INTEGER_ARRAY values
# 2. INTEGER_ARRAY freqs
from statistics import median
def interQuartile(values, freqs):
s = []
for value, freq in zip(values, freqs):
s += [value] * freq
s.sort()
q1=median(s[: len(s)//2])
q3=median(s[len(s)//2 *-1:])
result=q3-q1
print(f'{result:.1f}')
if __name__ == '__main__':
n = int(input().strip())
val = list(map(int, input().rstrip().split())) freq = list(map(int, input().rstrip().split())) interQuartile(val, freq)
  1. The code defines a function called interQuartile that accepts two parameters: values (a list of integers representing data values) and freqs (a list of integers representing the frequencies of corresponding values).
  2. The code imports the median function from the statistics module. This function is used to calculate the median of a given list.
  3. Within the interQuartile function, a list s is created to hold the expanded dataset. It loops over the values and freqs lists using the zip() function to pair corresponding values and frequencies. It appends each value to s repeated by its frequency.
  4. The list s is then sorted in ascending order using the sort() method. Sorting is necessary to calculate quartiles accurately.
  5. The lower quartile (q1) is calculated as the median of the first half of the sorted list s[: len(s)//2].
  6. The upper quartile (q3) is calculated as the median of the second half of the sorted list s[len(s)//2 *-1:]. The use of -1 ensures that the indexing is performed from the end of the list.
  7. The interquartile range (result) is computed as the difference between the upper and lower quartiles (q3 - q1).
  8. The interquartile range is printed with one decimal place using an f-string with the :.1f format specifier.
  9. The if __name__ == '__main__': block is the entry point of the script when executed directly.
  10. It reads an integer n from the input, which likely represents the number of elements in the values list.
  11. It reads a space-separated list of integers val from the input, representing the data values.
  12. It reads another space-separated list of integers freq from the input, representing the frequencies of the corresponding values.
  13. It calls the interQuartile function, passing val and freq as arguments.

Challenge #3

Standard Deviation

It is basically used to find the Average distance to mean

Step 1 — Find the mean

Step 2 — Find the variance ( it is basically the square of average distance to mean

Step 3 — Find the standard deviation ( square root of variance )

import math
from statistics import mean
def stdDev(arr)-> None:
mu = mean(arr)
variance = sum((el-mu)**2 for el in arr)/ len(arr)
sigma = pow(variance, 0.5)
print(f"{sigma:.1f}")

if __name__ == '__main__':
n = int(input().strip())
    vals = list(map(int, input().rstrip().split()))    stdDev(vals)
  1. The code imports the mean function from the statistics module. This function is used to calculate the mean (average) of a given list.
  2. The code defines a function called stdDev that takes in a single parameter arr (a list of numbers).
  3. Within the stdDev function, the mean (mu) of the input list arr is calculated using the mean function.
  4. The variance is calculated by summing up the squared differences between each element (el) in the input list and the mean, divided by the length of the list (len(arr)). This calculation measures the spread of the data.
  5. The standard deviation (sigma) is computed as the square root of the variance, using the pow() function with exponent 0.5.
  6. The standard deviation is printed with one decimal place using an f-string with the :.1f format specifier.
  7. The if __name__ == '__main__': block is the entry point of the script when executed directly.
  8. It reads an integer n from the input, which likely represents the number of elements in the list.
  9. It reads a space-separated list of integers vals from the input, representing the actual data.
  10. It calls the stdDev function, passing vals as an argument.

“Stay tuned for my upcoming blogs!”

--

--

Celestial
Celestial

Written by Celestial

Uncovering Patterns , Empowering Strategies.

No responses yet