Skewness and Kurtosis

3 min readJul 6, 2023

Skewness and kurtosis are fundamental components of descriptive statistics that play a crucial role in understanding the distributional characteristics of data.

While they may appear complex at first, with some exploration, we can gain a solid grasp on how to calculate skewness and kurtosis. However, before we delve into the complexities of these measures, let’s establish a foundation by learning some basic concepts

NORMAL DISTRIBUTION

Probability Distribution with mean=0 and std deviation= 1
Symmetric about mean
Bell shaped

Skewness

Measure of symmetry
Lack of symmetry
Skewness for a normal distribution is zero
Skewed dataset typically falls between first quartile and third quartile
Skewness comes in the picture when the data is asymmetric
Types of skewness

Positive Skewed — Mean>Median>Mode
Negative Skewed — Mean<Median<Mode

Pearson’s first coefficient of skewness

Skewness = (3 * (mean — median)) / standard deviation
Ranges from -1 to 1
-1 & -0.5 (negatively skewed) or 1 & 0.5 (positively skewed) are slightly skewed
-0.5 & 0.5, the data are nearly symmetrical.
0 for normal distribution
lower that -1 or greater than 1 = extremely skewed

Kurtosis

Tailedness of Distribution
Degree of which the data values is concentrated around the mean
Three types of Kurtosis

Leptokurtic or heavy-tailed distribution (kurtosis more than normal distribution). Kurtosis > 3
Mesokurtic (kurtosis same as the normal distribution). Kurtosis =3
Platykurtic or short-tailed distribution (kurtosis less than normal distribution). Kurtosis <3

📌 When data is skewed, the tail region may behave as an outlier for the statistical model, and outliers Harshly affect the model’s performance, especially regression-based models.
How to check Kurtosis and skewness in the dataset ?

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
from scipy.stats import skew
from scipy.stats import kurtosis

df = pd.read_csv('/kaggle/input/laptop-price-dataset/laptop_data.csv')
df.head()
df['Price'].describe()

Here we can see that Mean (59870) is greater than the median(52054.5)
The maximum is 3.5 times the 75%. (The distribution is positively skewed).
Positive Skewed — Mean>Median>Mode
We can say that most of the prices are below the average.

plt.figure(figsize=(12,6))
sns.distplot(df['Price'], color ="r")
plt.show()

skew(df['Price'].dropna())
kurtosis(df['Price'].dropna())
print("Skew of raw data: %f" % df['Price'].skew()) #check skewness
print("Kurtosis (false): %f" % kurtosis(df['Price'],fisher = False)) #check kurtosis
print("Kurtosis (true): %f" % kurtosis(df['Price'],fisher = True))

Here, skew of raw data is positive and greater than 1,and kurtosis is greater than 3, right tail of the data is skewed. So, our data in this case is positively skewed and Leptokurtic .

Fisher’s correction is a way to correct a potential mistake in our estimation of kurtosis when we don’t have much data. It helps us get a better understanding of how the data is spread out and how it compares to a normal distribution.

Stay connected for upcoming blog articles! Follow me to be the first to know when they’re released

Skewness and Kurtosis

Written by Celestial

No responses yet