Hello, I have enjoyed the assignment regarding ANOVA.
In simple terms, ANOVA (Analysis of Variance) is used in statistics to determine any significant differences between the means (average values) of three or more groups or categories. It helps us determine if these differences are due to actual differences in the groups or if they could result from random chance.
Here are two exercises where I have used ANOVA.
A researcher is interested in the effects of drug against stress reaction. She gives a reaction time test to three different groups of subjects: one group that is under a great deal of stress, one group under a moderate amount of stress, and a third group that is under almost no stress. The subjects of the study were instructed to take the drug test during their next stress episode and to report their stress on a scale of 1 to 10 (10 being most pain).
Report on drug and stress level by using R. Provide a full summary report on the result of ANOVA testing and what does it mean. More specifically, report using the following R functions: Df, Sum, Sq Mean, Sq, F value, Pr(>F)
# Create a data frame with the given data
data <- data.frame(
Stress_Level = factor(rep(c(“High Stress”, “Moderate Stress”, “Low Stress”), each = 6)),
Reaction_Time = c(10,9,8,9,10,8,8,10,6,7,8,8,4,6,6,4,2,2)
)
# Load the necessary library for ANOVA
library(stats)
# Perform the ANOVA
anova_result <- aov(Reaction_Time ~ Stress_Level, data = data)
# Summarize the ANOVA results
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## Stress_Level 2 82.11 41.06 21.36 4.08e-05 ***
## Residuals 15 28.83 1.92
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Degrees of Freedom (Df): There are two degrees of freedom on this exercise. One for the Stress_Level variable and another for the residuals.
Sum of Squares (Sum Sq): This represents the variation in the data. In this case, 82.11 units of variation can be attributed to the different stress levels, while 28.83 units of variation remain unexplained.
Mean Square (Mean Sq): This is the average within each group (Stress_Level) and within the residuals. In this context, it’s measure of how much variation there is in reaction times within each stress level and the variability left unexplained.
F-Value: This measures whether there are significant differences between the groups. A higher F-value suggests a greater difference between groups. In this case, the F-value is 21.36.
p-value (Pr(>F)): This is the probability that the observed difference are due to random chance.
The notation “4.08e-05” is in scientific notation and means 4.08 multiplied by 10 to the power of -5, which is a very small number close to zero.
– ***: Typically, three asterisks (***) indicate a very high level of significance, meaning that the p-value is very small, often less than 0.001. In this case, “4.08e-05” falls into this category.
So, in this ANOVA test, a p-value of “4.08e-05 ***” means that the probability of observing the results obtained (or more extreme results) under the assumption that there is no real difference between the groups (null hypothesis) is extremely low. This suggests strong evidence against the null hypothesis and indicates that there is likely a significant difference among the groups being compared in the above analysis.
Conclusion:
In simple terms, the ANOVA test shows a statistically significant difference in reaction times between the three groups of subjects under different stress levels. In other words, the drug’s effect on reaction time varies significantly depending on the stress level. The “F value” and “p-value” together suggest that the differences in reaction times are unlikely to be due to random chance.
########################################################################/
########################################################################/
#2. From our Textbook:Introductory Statistics with R. Chapter # 7 Exercises 7.1
# The zelazo data (taken from textbook’s R package called ISwR) are in the form of a list of vectors, one for each of the four groups.
# Convert the data to a form suitable for the user of lm, and calculate the relevant test. Consider t tests comparing selected subgroups
# or obtained by combing groups.
# 2.1 Consider ANOVA test (one way or two-way) to this dataset (zelazo)
# install.packages(“ISwR”)
library(ISwR)
## Warning: package ‘ISwR’ was built under R version 4.2.3
data(“zelazo”)
zelazo
## $active
## [1] 9.00 9.50 9.75 10.00 13.00 9.50
##
## $passive
## [1] 11.00 10.00 10.00 11.75 10.50 15.00
##
## $none
## [1] 11.50 12.00 9.00 11.50 13.25 13.00
##
## $ctr.8w
## [1] 13.25 11.50 12.00 13.50 11.50
# Combine data and create a data frame
zelazo_data <- data.frame(
Group = rep(c(“Active”, “Passive”, “None”, “Ctr.8w”),
times = c(length(zelazo$active), length(zelazo$passive), length(zelazo$none), length(zelazo$ctr.8w))),
Value = c(unlist(zelazo$active), unlist(zelazo$passive), unlist(zelazo$none), unlist(zelazo$ctr.8w))
)
# Fit a linear model
ze_model <- lm(Value ~ Group, data = zelazo_data)
# Calculate t-test between Active and Passive groups
t_test_active_passive <- t.test(zelazo_data$Value[zelazo_data$Group == “Active”],
zelazo_data$Value[zelazo_data$Group == “Passive”])
# Print the t-test result
print(t_test_active_passive)
##
## Welch Two Sample t-test
##
## data: zelazo_data$Value[zelazo_data$Group == “Active”] and zelazo_data$Value[zelazo_data$Group == “Passive”]
## t = -1.2839, df = 9.3497, p-value = 0.2301
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.4399668 0.9399668
## sample estimates:
## mean of x mean of y
## 10.125 11.375
# Perform one-way ANOVA
anova_result <- aov(Value ~ Group, data = zelazo_data)
# Summarize the ANOVA results
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 3 14.78 4.926 2.142 0.129
## Residuals 19 43.69 2.299
Having fun with more groups just repeat the t-test with another group
# Calculate t-test between Active and None groups
t_test_active_none <- t.test(zelazo_data$Value[zelazo_data$Group == “Active”],
zelazo_data$Value[zelazo_data$Group == “None”])
> print(t_test_active_none)
Welch Two Sample t-test
data: zelazo_data$Value[zelazo_data$Group == “Active”] and zelazo_data$Value[zelazo_data$Group == “None”]
t = -1.8481, df = 9.9759, p-value = 0.09442
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.4929271 0.3262604
sample estimates:
mean of x mean of y
10.12500 11.70833