Central Tendency and Variation

I have two data sets consisting of 7 observations each:

Set#1: 10, 2, 3, 2, 4, 2, 5

Set#2: 20, 12, 13, 12, 14, 12, 15

For each, I will compute the Central Tendency and Variation.

These measures are fundamental statistical concepts that I need to master on my journey to understand advanced analytics.

For my sample datasets, I will demonstrate how to obtain those measures discussed above (I will not include the coefficient of variation at this time).

First, I will convert those datasets into vectors: ‘set1’ and ‘set2’

> set1 <-c(10, 2, 3, 2, 4, 2, 5)

> set2 <-c(20, 12, 13, 12, 14, 12, 15)

> mean(set1)

[1] 4

> mean(set2)

[1] 14

> median(set1)

[1] 3

>median(set2)

[1] 13

>mode(set1)  

[1] “numeric”

> mode(set2)

[1] “numeric”

> summary(set1)

Min. 1st Qu. Median Mean 3rd Qu. Max.

2.0 2.0 3.0 4.0 4.5 10.0

> summary(set2)

Min. 1st Qu. Median Mean 3rd Qu. Max.

12.0 12.0 13.0 14.0 14.5 20.0

> range(set1)

[1] 2 10

> range(set2)

[1] 12 20

> IQR(set1)

[1] 2.5

> IQR(set2)

[1] 2.5

> var(set1)

[1] 8.333333

> var(set2)

[1] 8.333333

> sd(set1)

[1] 2.886751

> sd(set2)

[1] 2.886751

My two cents:

The standard deviations of both data sets are the same (i.e., both have a standard deviation of 2.89), and they have a similar degree of variability or spread. However, the mean, median, and other summary statistics differ between the two data sets.

The standard deviation measures the spread or dispersion of data points around the mean. If the standard deviations are the same, the degree of variability in both sets is similar. However, the differences in the mean and median values indicate that the central tendency of the two data sets is still different.

In a normal distribution, the mean, median, and standard deviation are related in a specific way:

  1. The mean and median are equal.
  2. The standard deviation determines the spread or width of the distribution.

In this case, even though both data sets have the same standard deviation, their means and medians are different, which means they are not similar to a normal distribution with the same parameters.

To assess whether they closely follow a normal distribution, I would typically use graphical methods like histograms, Q-Q plots, and other statistical tests. The differences in mean and median values shows that they are not likely to be normally distributed with the same parameters.

Leave a comment