Chapter 5 Design of Experiments

5.1 Introduction to experimental design

An experimental design is a plan and a structure to test hypothesis in which the researcher either controls or manipulates one or more variables.
It contains independent and dependent variables.
In an experimental design, an independent variable may e either a treatment variable or a classification variable.
A treatment variable is a variable the experimenter controls or modifies in the experiment.
A classification variable is some characteristic of the experimental subjects that was present prior to the experiment and is not a result of the experimenter’s manipulations or control.
Independent variables are sometimes referred to as factors.
Each independent variable has two or more levels, or classifications.
Levels, or classifications, of independent variables are the subcategories of the independent variables used by the researcher in the experimental design.
The other type of variable in an experimental design is a dependent variable.
A dependent variable is the repose to the different levels of the independent variables.
It is the measurement taken under the conditions of the experimental design that reflect the effects of the independent variables(s).

5.2 Analysis of Variance (ANOVA)

Experimental design in this chapter are analysed statistically by a group of techniques referred to as analysis of variance or (ANOVA)
The analysis of variance concepts begin with the notion that individual items being studied, such as employees, machine-produced products, district offices, hospitals ad so on, are not all the same.

5.3 The completely randomized design (one-way ANOVA)

One of the simplest experimental designs is the completely randomized design.
In the completely randomized design, subjects are assigned randomly to treatments.
The complete randomized design contains only one independent variable, with two or more treatment levels, or classifications.
If only two treatment levels, or classifications, of the independent variable are present, the design is the same one used to test the difference in means of two independent populations presented in Chapter 3.
In this section we will focus on completely randomized designs with three or more classification levels.
Analysis of variance, or ANOVA will be used to analyse the data that result from the treatments.

Example

As an example of a completely randomized design, suppose a researcher decided to analyse the response time of four autonomous mobile robots.

What is the independent variable in this design? Autonomous mobile robot

Now the four autonomous mobile robots are the levels of treatment, or classification of the independent variable

What is the dependent variable in this design? Response time

Is there a significant difference in the mean response time of 24 tasks carried out by the four autonomous mobile robots?

library(tidyverse)

data <- data.frame(
  Robot = as.factor(c(rep(1,5), rep(2,8), rep(3,7), rep(4,4) )), 
  Reponse_time = c(6.33, 6.26, 6.31, 6.29, 6.40, 6.26,
                   6.36, 6.23, 6.27, 6.19, 6.50, 6.19,
                   6.22, 6.44, 6.38, 6.58, 6.54, 6.56,
                   6.34, 6.58, 6.29, 6.23, 6.19, 6.21 ) )

data

##    Robot Reponse_time
## 1      1         6.33
## 2      1         6.26
## 3      1         6.31
## 4      1         6.29
## 5      1         6.40
## 6      2         6.26
## 7      2         6.36
## 8      2         6.23
## 9      2         6.27
## 10     2         6.19
## 11     2         6.50
## 12     2         6.19
## 13     2         6.22
## 14     3         6.44
## 15     3         6.38
## 16     3         6.58
## 17     3         6.54
## 18     3         6.56
## 19     3         6.34
## 20     3         6.58
## 21     4         6.29
## 22     4         6.23
## 23     4         6.19
## 24     4         6.21

data %>%
    group_by(Robot) %>%
    summarise(
        count_poison = n())

## # A tibble: 4 x 2
##   Robot count_poison
##   <fct>        <int>
## 1 1                5
## 2 2                8
## 3 3                7
## 4 4                4

Compute the mean and standard deviation for each level of the treatment

# You can check the level of the robots with the
# following code. You should see three character
# values because we converted it to factor variable

levels(data$Robot)

## [1] "1" "2" "3" "4"

data %>%
    group_by(Robot) %>%
    summarise(
        count_robots = n(),
        mean_time = mean(Reponse_time, na.rm = TRUE),
        sd_time = sd(Reponse_time, na.rm = TRUE)
    )

## # A tibble: 4 x 4
##   Robot count_robots mean_time sd_time
##   <fct>        <int>     <dbl>   <dbl>
## 1 1                5      6.32  0.0526
## 2 2                8      6.28  0.105 
## 3 3                7      6.49  0.101 
## 4 4                4      6.23  0.0432

Graphically check if there is a difference between the distribution.

p <- ggplot(data, aes(x = Robot , y = Reponse_time, fill = Robot)) +
    geom_boxplot() +
    geom_point()

overall_mean <-  mean(data$Reponse_time)

p + geom_hline(yintercept = overall_mean)

Use one-way ANOVA test to determine whether there is a significant difference in the mean response time of the four autonomous mobile robots .

The basic syntax for an ANOVA test is

aov(formula, data) Arguments:
- formula: The equation you want to estimate - data: The dataset used
The syntax of the formula is:

y ~ X1+ X2+...+Xn

X1+ X2+...+Xn refers to the independent variables

Write the hypothesis to be tested
Use the p-value to reach a statistical conclusion

# Run the ANOVA test 
anova_one_way <- aov(Reponse_time~Robot, data = data)

# Print the summary of the test
summary(anova_one_way)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Robot        3 0.2366 0.07886   10.18 0.000279 ***
## Residuals   20 0.1549 0.00775                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is lower than the usual threshold of 0.05.
You are confident to say there is a statistical difference between the groups, indicated by the \(*\).

5.4 Pairwise comparison

The one-way ANOVA test does not inform which robot (group) has a different mean.
Instead, we can perform a Tukey test with the function TukeyHSD().

comparison <- TukeyHSD(anova_one_way)

#There are print and plot methods for class "TukeyHSD"
print(comparison)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Reponse_time ~ Robot, data = data)
## 
## $Robot
##           diff         lwr         upr     p adj
## 2-1 -0.0405000 -0.18093243  0.09993243 0.8502854
## 3-1  0.1705714  0.02633255  0.31481031 0.0169205
## 4-1 -0.0880000 -0.25324639  0.07724639 0.4613461
## 3-2  0.2110714  0.08358107  0.33856179 0.0008519
## 4-2 -0.0475000 -0.19834863  0.10334863 0.8144408
## 4-3 -0.2585714 -0.41296992 -0.10417294 0.0007541

plot(comparison)

Multiple comparison

Multiple comparisons are to be used only when the analysis of variance yields an overall significant difference in the treatment means.

Tukey’s test

In analysis of variance, this technique is used for pairwise a posterior multiple comparisons to determine if there is a significant differences between the means of any pair of treatment levels in an experimental design.

CM 2110 Calculus and Statistical Distributions

Chapter 5 Design of Experiments

5.1 Introduction to experimental design

5.2 Analysis of Variance (ANOVA)

5.3 The completely randomized design (one-way ANOVA)

5.4 Pairwise comparison

References