Lecture 8: Multivariate Analysis of Variance (MANOVA)

Zhaoxia Yu

Professor, Department of Statistics

2026-04-27

Code

knitr::opts_chunk$set(echo = TRUE)
library(tidyr) #the pipe (%>%) tool is extremely useful
library(MASS)
library(corrplot)#for visualizing the corr matrix of the iris data
library(car)

Outline

Introduction to MANOVA
Review ANOVA (univariate)
MANOVA test statistics
Example

Introduction

Introduction to MANOVA

So far we have learned
- One-sample Hotelling’s \(T^2\)
- Two-sample Hotelling’s \(T^2\)
The next logical extension is to multiple samples, i.e., the multivariate version anova, or MANOVA
Before that, we will review ANOVA for univariate data, and then we will see how to extend it to multivariate data

Review of ANOVA (One-Way)

Notations and Assumptions

\(g\) independent samples
- \(Y_{11}, \cdots Y_{1,n_1}\overset{iid}\sim N(\mu_1, \sigma^2)\)
- \(Y_{21}, \cdots Y_{2,n_2}\overset{iid}\sim N(\mu_2, \sigma^2)\)
- \(\cdots \cdots\)
- \(Y_{g1}, \cdots Y_{g,n_g}\overset{iid}\sim N(\mu_g, \sigma^2)\)
Total sample size \(n=n_1+\cdots + n_g=\sum_{i=1}^g n_i\)
Group means: \(\bar{ Y}_{i.}\) for \(i=1,\cdots, g\)
Grand/overall mean: \(\bar{Y}_{..}\)

Partition the Sum of Squares of Total:

\[\begin{array}{lll} SSTO & = & \sum_{i=1}^{g}\sum_{j=1}^{n_i}\left(Y_{ij}-\bar{Y}_{..}\right)^2 = \sum_{i=1}^{g}\sum_{j=1}^{n_i}[(Y_{ij}-\bar{Y}_{i.})+(\bar{Y}_{i.}-\bar{Y}_{..})]^2 \\ & = &\sum_{i=1}^{g}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})^2+\sum_{i=1}^{g}n_i(\bar{Y}_{i.}-\bar{Y}_{..})^2 + \\ &&2 \sum_{i=1}^{g}\sum_{j=1}^{n_i}[(Y_{ij}-\bar{Y}_{i.})(\bar{Y}_{i.}-\bar{Y}_{..})]\\ &=&\sum_{i=1}^{g}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})^2+\sum_{i=1}^{g}n_i(\bar{Y}_{i.}-\bar{Y}_{..})^2 + \\ &&2 \sum_{i=1}^{g}[(\bar{Y}_{i.}-\bar{Y}_{..})\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})]\\ &=&\sum_{i=1}^{g}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})^2+\sum_{i=1}^{g}n_i(\bar{Y}_{i.}-\bar{Y}_{..})^2 + 2 \sum_{i=1}^{g}[(\bar{Y}_{i.}-\bar{Y}_{..})\cdot 0]\\ &=&\underset{SSE}{\underbrace{\sum_{i=1}^{g}\sum_{j=1}^{n_i}[(Y_{ij}-\bar{Y}_{i.})^2}}+\underset{SSTR}{\underbrace{\sum_{i=1}^{g}n_i(\bar{Y}_{i.}-\bar{Y}_{..})^2}} \end{array} \]

SSE (also known as SSW)

\(SSTO=SSE+SSTR=SSW+SSB\):
\(SSE\) (i.e, \(SSW\) ): sum of squares of error, also known as sum of squares within (within groups). \[SSE=\sum_{i=1}^{g}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})^2=\sum_{i=1}^g (n_i-1)s_i^2\] where \(s_i^2\) is the sample variance for the \(i\)th group.
Recall that \(E[s_i^2]=\sigma^2\). Therefore, \[E[SSE]=\sum_{i=1}^g (n_i-1)\sigma^2=(n-g)\sigma^2\]

SSTR (also known as SSB)

\(SSTR\) is also known as \(SSB\)
- B stands for between groups, or sum of squares treatment
The calculation of \(E(SSTR)\) requires the following results

\[\begin{array}{lll} E[\bar{Y}_{i.}^2]&=& Var[\bar{Y}_{i.}] + \mu_i^2=\frac{\sigma^2}{n_i} +\mu_i^2 \\ E[\bar{Y}_{..}^2]&=& Var[\bar{Y}_{..}] + \mu_{..}^2=\frac{\sigma^2}{n} +\mu_{..}^2\\ E[\bar{Y}_{i.}\bar{Y}_{..}] &=&\frac{1}{n}E[\bar{Y}_{i.}\sum_{j=1}^g n_j \bar Y_{j.}]= \frac{1}{n}E[\bar{Y}_{i.}(n_i\bar{Y}_{i.} + \sum_{j\ne i}^g n_j \bar Y_{j.})]\\ &=& \frac{n_i}{n}E[\bar{Y}_{i.}^2] + \mu_i \sum_{j\ne i} \frac{n_j}{n}\mu_j= \frac{1}{n}\sigma^2 + \frac{n_i}{n}\mu_i^2 + \mu_i \sum_{j\ne i} \frac{n_j}{n}\mu_j\\ &=& \frac{1}{n}\sigma^2+ \mu_i \bar \mu_{..} \end{array} \]

\[\begin{array}{lll} E[SSTR] &=& \sum_{i=1}^g n_i E[(\bar{Y}_{i.}-\bar{Y}_{..})^2]\\ &=& \sum_{i=1}^g n_i E[\bar{Y}_{i.}^2+\bar{Y}_{..}^2 - 2\bar{Y}_{i.}\bar{Y}_{..} ]\\ &=& \sum_{i=1}^g n_i E[\bar{Y}_{i.}^2+\bar{Y}_{..}^2 - 2\bar{Y}_{i.}\bar{Y}_{..} ]\\ &=& \sum_{i=1}^g n_i (\frac{\sigma^2}{n_i} +\mu_i^2 + \frac{\sigma^2}{n} +\mu_{..}^2 -2[\frac{1}{n}\sigma^2+ \mu_i \bar \mu_{..}])\\ &=& (g-1)\sigma^2 + \sum_{i=1}^g n_i (\mu_i - \bar u_{..})^2 \end{array} \] where \(\bar \mu_{..}=\frac{1}{n} \sum_{i=1}^g n_i \mu_i\).

Degrees of Freedom (DF)

\(SSE\) has \(n-g\) degrees of freedom, because we have \(n\) observations and we have estimated \(g\) group means.
\(SSTR\) has \(g-1\) degrees of freedom, because we have \(g\) group means and we have estimated the overall mean \(\bar Y_{..}\), which is a linear combination of the group means.
\(SSTO\) has \(n-1\) degrees of freedom, because we have \(n\) observations and we have estimated the overall mean \(\bar Y_{..}\).
Note that \(n-1=(n-g)+(g-1)\), which is consistent with the partition of the sum of squares.

Mean of Sum of Squares (MSE)

\(MSE=\frac{1}{n-g}SSE\), \(E(MSE)=\sigma^2\)
\(MSTR=\frac{1}{g-1}SSTR\), \(E(MSTR)=\sigma^2 + \frac{1}{g-1}\sum_{i=1}^g n_i (\mu_i - \bar u_{..})^2\)

What is the ratio of the two expected mean sum of squares?

\[\frac{E(MSTR)}{E(MSE)}=\frac{\sigma^2 + \frac{1}{g-1}\sum_{i=1}^g n_i (\mu_i - \bar u_{..})^2}{\sigma^2} = 1 + \frac{\frac{1}{g-1}\sum_{i=1}^g n_i (\mu_i - \bar u_{..})^2}{\sigma^2}\]

The ratio is a function of the between-group variability \(\sum_{i=1}^g n_i (\mu_i - \bar u_{..})^2\) and the within-group variability \(\sigma^2\). Large ratio suggests different means.

The null hypothesis \[H_0\colon \mu_1 = \mu_2 = \dots = \mu_g\]
The alternative hypothesis \[H_a\colon \mu_i \ne \mu_j \mbox{ for at least one pair of } (i,j)\]
\(\frac{E(MSTR)}{E(MSE)}\overset{H_0}=1\)
In practice we can calculate \(\frac{MSTR}{MSE}\) and we expect large \(\frac{MSTR}{MSE}\) to reject the null hypothesis.

F-statistic

The F-statistic is defined as

\[F=\frac{MSTR}{MSE}\]

The null distribution

\[F=\frac{\frac{SSTR}{\sigma^2}/(g-1)}{\frac{SSE}{\sigma^2}/(n-g)}\overset{H_0}\sim F_{g-1, n-g}\]

To derive the null distribution of \(F\), we need to show that
- \(\frac{SSTR}{\sigma^2}\overset{H_0}\sim \chi_{g-1}^2\)
- \(\frac{SSE}{\sigma^2}\sim \chi_{n-g}^2\)
- \(SSTR\) and \(SSE\) are independent

ANOVA Table and Distributions

Source	SS	MS	DF	F
Tr	\(SSTR=\sum_{i=1}^g n_i (\bar Y_{i.}-\bar Y_{..})^2\)	\(MSTR=\frac{SSTR}{g-1}\)	\(g-1\)	\(F=\frac{MSTR}{MSE}\)
Error	\(SSE=\sum_{i=1}^g\sum_{j=1}^{n_i} (Y_{ij}-\bar Y_{i.})^2\)	\(MSE=\frac{SSE}{n-g}\)	\(n-g\)
Total	\(SSTO=\sum_{i=1}^g\sum_{j=1}^{n_i} (Y_{ij}-\bar Y_{..})^2\)		\(n-1\)

MANOVA

Methods for MANOVA

Compared to one-sample or two-sample multivariate analysis, there are many choices for comparing multiple samples for multivariate data
We will cover the following methods:
- Wilks’ Lambda
- Pillai’s Trace
- Hotelling-Lawley Trace
- Roy’s Largest Root

MANOVA: T = B + W

Notations and Assumptions

\(g\) independent random samples
- \(\mathbf Y_{11}, \cdots \mathbf Y_{1,n_1}\overset{iid}\sim N(\boldsymbol \mu_1, \boldsymbol\Sigma)\)
- \(\mathbf Y_{21}, \cdots \mathbf Y_{2,n_2}\overset{iid}\sim N(\boldsymbol \mu_2, \boldsymbol\Sigma)\)
- \(\cdots \cdots\)
- \(\mathbf Y_{g1}, \cdots \mathbf Y_{g,n_g}\overset{iid}\sim N(\boldsymbol \mu_g, \boldsymbol\Sigma)\)
\(n_i\): the number of observations in group \(i\)
The \(i\)th random sample is from \(N(\boldsymbol \mu_i, \boldsymbol\Sigma)\)

Each \(\mathbf Y_{ij}\in \mathbb R^p\), i.e, \[\mathbf Y_{ij}=\left(\begin{array}{c}Y_{ij1}\\Y_{ij2}\\\vdots\\Y_{ijp}\end{array}\right)\]
The null hypothesis vs the alternative:

\[\begin{align} H_0\colon & \boldsymbol \mu_1 = \boldsymbol \mu_2 = \dots = \boldsymbol \mu_g\\ H_1\colon & \boldsymbol \mu_i \ne \boldsymbol \mu_j \mbox{ for at least one pair of } (i,j) \end{align}\]

The Total Covariance Matrix

The total covariance matrix is the covariance matrix if the group information is ignored. If we pool all the \(n=n_1+\cdots+n_g=\sum_{i=1}^g n_g\) observations together, what is the covariance matrix?

\[\begin{array}{lll} \mathbf{T} & = & \sum_{i=1}^{g}\sum_{j=1}^{n_i}(\mathbf Y_{ij}- \bar {\mathbf Y}_{..})(\mathbf Y_{ij}-\bar {\mathbf Y}_{..})' \\ & = & \sum_{i=1}^{g}\sum_{j=1}^{n_i}\{(\mathbf Y_{ij}-\bar {\mathbf{ Y}}_i)+(\bar{ \mathbf Y}_i-\bar{\mathbf Y}_{..})\}\{(\mathbf Y_{ij}-\bar{\mathbf Y}_i)+(\bar{\mathbf Y}_i-\bar{\mathbf Y}_{..})\}' \\ & = & \underset{W}{\underbrace{\sum_{i=1}^{g}\sum_{j=1}^{n_i}(\mathbf Y_{ij}-\bar{\mathbf Y}_{i.})(\mathbf Y_{ij}-\bar{ \mathbf Y}_{i.})'}}+\underset{B}{\underbrace{\sum_{i=1}^{g}n_i(\bar{ \mathbf Y}_{i.}-\bar{\mathbf Y}_{..})(\bar{\mathbf Y}_{i.}-\bar{\mathbf Y}_{..})'}} + 0 \end{array}\]

The Within-Group Sample Covariance Matrix

In the definition of the total covariance matrix, we compare each observation to the grand mean vector
In within-group covariance matrix, we compare each observation to its group mean \[\mathbf W=\sum_{i=1}^{g}\sum_{j=1}^{n_i}(\mathbf Y_{ij}-\bar{\mathbf Y}_{i.})(\mathbf Y_{ij}-\bar{\mathbf Y}_{i.})'\]

Let \(\mathbf W_i=\sum_{j=1}^{n_i}(\mathbf Y_{ij}-\bar{\mathbf Y}_{i.})(\mathbf Y_{ij}-\bar{\mathbf Y}_{i.})'\). We can show that \[\mathbf W_1, \cdots, \mathbf W_g \overset{independent}\sim Wishart_p(n_i-1, \boldsymbol \Sigma)\]
Recall that the sum of independently distributed chi-squared distributed random variables also follows a chi-squared distribution. Similarly, \[\mathbf{ W=\sum_{i=1}^g W_i \sim Wishart_p(n-g, \boldsymbol \Sigma)}\]

The Between-Group Sample Covariance Matrix

\(\mathbf B\) captures the difference in mean vectors between groups

\[\mathbf B=\sum_{i=1}^{g}n_i(\bar{\mathbf Y}_{i.}-\bar{\mathbf Y}_{..})(\bar{\mathbf Y}_{i.}-\bar{\mathbf Y}_{..})'\]

When the null hypothesis is true, \(\mathbf B\) follows a Wishart distribution \[\mathbf B\overset{H_0}\sim Wishart_p(g-1, \boldsymbol \Sigma)\]

Outline of Proof

Let \(\mathbf Y_{n\times p}\) denote the data matrix
Let \(\mathbf X_{n\times g}\) denote the design matrix, which consists of dummy variables of the group membership. We also define \(\mathbf {P_{n\times n}=X(X^TX)^{-1}X^T}\). Then

\[\mathbf X= \begin{pmatrix} \mathbf 1_{n_1} & \mathbf 0_{n_1} & \cdots & \mathbf 0_{n_1}\\ \mathbf 0_{n_2} & \mathbf 1_{n_2} & \cdots & \mathbf 0_{n_2}\\ \cdots & \cdots & \cdots & \cdots \\ \mathbf 0_{n_g} & \mathbf 0_{n_g} & \cdots & \mathbf 1_{n_g} \end{pmatrix}, \mathbf P= \begin{pmatrix} \frac{1}{n_1}\mathbf 1_{n_1}\mathbf 1_{n_1}^T & \mathbf 0 & \cdots & \mathbf 0\\ \mathbf 0 & \frac{1}{n_2}\mathbf 1_{n_2}\mathbf 1_{n_2}^T& \cdots & \mathbf 0\\ \cdots & \cdots & \cdots & \cdots \\ \mathbf 0 & \mathbf 0 & \cdots & \frac{1}{n_g}\mathbf 1_{n_g}\mathbf 1_{n_g}^T \end{pmatrix} \] It can be shown that \[\mathbf W = \mathbf Y^T (\mathbf I - \mathbf P)\mathbf Y \mbox{ , } \mathbf B = \mathbf Y^T (\mathbf P-\frac{1}{n}\mathbf 1_n \mathbf 1_n^T)\mathbf Y \]

It is not difficult to verify that \(\mathbf P\), \(\mathbf I - \mathbf P\), and \(\mathbf P -\frac{1}{n}\mathbf 1_n \mathbf 1_n^T\) are all projection matrices. For a projection matrix, its rank equals its trace. Thus, it is not difficult to show that \[rank(\mathbf P)=g, rank(\mathbf I - \mathbf P)=n-g, rank(\mathbf P -\frac{1}{n}\mathbf 1_n \mathbf 1_n^T)=g-1\]
In Lecture 06 we showed that \((n-1)\mathbf S \sim Wishart_p(n-1, \boldsymbol \Sigma)\). Use similar methods, we can show that \[ \begin{aligned} \mathbf W &\sim Wishart_p(n-g, \boldsymbol \Sigma) \mbox{ , }\mathbf B \overset{H_0}\sim Wishart_p(g-1, \boldsymbol \Sigma)\\ \mathbf W & \perp \mathbf B \end{aligned} \]

MANOVA Table

We can also construct a table

Table 1: MANOVA Summary Table

Source	Sample Cov	DF
Treatment	\(\mathbf{B}=\sum_{i=1}^{g}n_i(\bar{\mathbf{Y}}_{i.}-\bar{\mathbf{Y}}_{..})(\bar{\mathbf{Y}}_{i.}-\bar{\mathbf{Y}}_{..})'\)	\(g-1\)
Error	\(\mathbf{W}=\sum_{i=1}^{g}\sum_{j=1}^{n_i}(\mathbf{Y}_{ij}-\bar{\mathbf{Y}}_{i.})(\mathbf{Y}_{ij}-\bar{\mathbf{Y}}_{i.})'\)	\(n-g\)
Total	\(\mathbf{T}=\sum_{i=1}^{g}\sum_{j=1}^{n_i}(\mathbf{Y}_{ij}- \bar{\mathbf{Y}}_{..})(\mathbf{Y}_{ij}-\bar{\mathbf{Y}}_{..})'\)

Iris Data: \(B\) and \(W\)

Code

#rearrange the data such as the response matrix is 
#an n-by-p matrix
Y=cbind(SepalL=c(iris3[,1,1],iris3[,1,2],iris3[,1,3]), 
SepalW=c(iris3[,2,1],iris3[,2,2],iris3[,2,3]), 
PetalL=c(iris3[,3,1],iris3[,3,2],iris3[,3,3]), 
PetalW=c(iris3[,4,1],iris3[,4,2],iris3[,4,3]))
#for unknown reasons, data.frame won't work but cbind works
#alternatively, we can use the following way to define y
#Y=aperm(iris3,c(1,3,2));dim(y)=c(150,4)

#define the covariate variable X, which is vector of labels
iris.type=rep(c("Setosa","Versicolor","Virginica"),each=50)
T=(150-1)*cov(Y)
W=(50-1)*cov(iris3[,,1]) +(50-1)*cov(iris3[,,2])+(50-1)*cov(iris3[,,3])
B=T-W

Iris Data: \(B\) and \(W\)

Code

          SepalL    SepalW   PetalL    PetalW
SepalL  63.21213 -19.95267 165.2484  71.27933
SepalW -19.95267  11.34493 -57.2396 -22.93267
PetalL 165.24840 -57.23960 437.1028 186.77400
PetalW  71.27933 -22.93267 186.7740  80.41333

Code

         Sepal L. Sepal W. Petal L. Petal W.
Sepal L.  38.9562  13.6300  24.6246   5.6450
Sepal W.  13.6300  16.9620   8.1208   4.8084
Petal L.  24.6246   8.1208  27.2226   6.2718
Petal W.   5.6450   4.8084   6.2718   6.1566

Code

B+W

           SepalL     SepalW   PetalL    PetalW
SepalL 102.168333  -6.322667 189.8730  76.92433
SepalW  -6.322667  28.306933 -49.1188 -18.12427
PetalL 189.873000 -49.118800 464.3254 193.04580
PetalW  76.924333 -18.124267 193.0458  86.56993

MANOVA Test

Test Statistics

In one-way ANOVA, we understand that \(SSB\) should be large relative to \(SSW\) when the null hypothesis is not true
In the multivariate version, we also expect that \(\mathbf B\) should be large relative to \(\mathbf W\) when the null hypothesis is not true
However, \(\mathbf B\) and \(\mathbf W\) are matrices. How to define “large”?

MANOVA in R

summary.manova {stats}  R Documentation
Summary Method for Multivariate Analysis of Variance
Description
A summary method for class "manova".

Usage
## S3 method for class 'manova'
summary(object,
        test = c("Pillai", "Wilks", "Hotelling-Lawley", "Roy"),
        intercept = FALSE, tol = 1e-7, ...)

Method 1: Pallai Trace

\[V = trace(\mathbf{\mathbf B(\mathbf B+\mathbf W)^{-1}})=trace(\mathbf B \mathbf T^{-1})\]

Method 2: Wilk’s Lambda

Wilk’s Lambda distribution Let \(\mathbf A\sim Wishart_p(m_1, \mathbf I)\) and \(\mathbf B\sim Wishart_p(m_2, \mathbf I)\) be independent with \(m_1>p\). We say \[\Lambda=\dfrac{|\mathbf{A}|}{|\mathbf{A+B}|}\sim \Lambda (p, m_1, m_2)\]
Test Statistic \[\Lambda^* = \dfrac{|\mathbf{W}|}{|\mathbf{B+W}|}=\dfrac{|\mathbf{W}|}{|\mathbf{T}|}=\dfrac{|\mathbf{\boldsymbol \Sigma^{-1/2} W\boldsymbol \Sigma^{-1/2} }|}{|\mathbf{\boldsymbol \Sigma^{-1/2} (B+W)\boldsymbol \Sigma^{-1/2} }|}\]
By the definition of Wilk’s Lambda distribution, \[\Lambda^* \overset{H_0}\sim \Lambda (p, n-g, g-1)\]

Method 3: Lawley-Hotelling Trace

\[T^2 = trace(\mathbf{BW}^{-1})\]

Method 4: Roy’s Largest Root

Two equivalent test statistics have been used \[ \begin{aligned} \lambda_{max}(\mathbf B\mathbf W^{-1})\\ \lambda_{max}(\mathbf B\mathbf {(B+W)}^{-1}) \end{aligned} \]

Test Statistics and the Eigenvalues

One interesting observation is that all the test statistics can be expressed in terms of eigenvalues of \(\mathbf B \mathbf W^{-1}\). Let \(\lambda_1, \cdots, \lambda_p\) denote the eigenvalues, from the largest to the smallest. We have
Pillai trace \[tr(\mathbf B (\mathbf B + \mathbf W)^{-1})=\sum_{i=1}^{min(p, g-1)}\frac{\lambda_i}{1+\lambda_i}\]
Wilk’s Lambda \[\Lambda^*= \prod_{i=1}^{min(p, g-1)}\frac{1}{1+\lambda_i}\]

Test Statistics and the Eigenvalues of \(\mathbf B \mathbf W^{-1}\)

Lawley-Hotelling trace \[trace(\mathbf B \mathbf W^{-1}) = \sum_{i=1}^{min(p, g-1)}\lambda_i\]
Roy’s largest root \[ \begin{aligned} \lambda_{max}(\mathbf B\mathbf W^{-1}) = \lambda_1\\ \lambda_{max}(\mathbf B\mathbf {(B+W)}^{-1})= \frac{\lambda_1}{1+\lambda_1} \end{aligned} \]

Proof

If you are curious about how to prove these results, I provide an example. \[ \begin{aligned} (\Lambda ^{*})^{-1} &= \dfrac{|\mathbf{B+W}|}{|\mathbf{W}|} = |\mathbf{I +W^{-1} B}|\\ &= \prod_{i=1}^p [\mbox{the ith eigenvalue of } \mathbf{I +W^{-1} B}]\\ &= \prod_{i=1}^p [1 + \mbox{the ith eigenvalue of } \mathbf{W^{-1} B}]\\ &= \prod_{i=1}^p (1 + \lambda_i)\overset{1}=\prod_{i=1}^{min(p,g-1)} (1 + \lambda_i) \end{aligned} \] As a result, \(\Lambda ^{*}=\prod_{i=1}^p \frac{1}{1 + \lambda_i}\)

Proof (continued)

The last step implies that \(rank(\mathbf W^{-1}\mathbf B)=min(p, g-1)\)
Why is it true?
- First, \(rank(\mathbf W^{-1}\mathbf B)=rank(\mathbf B)\)
- Second, because \(\mathbf B\overset{H_0}\sim Wishart_p(g-1, \boldsymbol \Sigma)\), there exists \(\mathbf Y_{(g-1)\times p}\) such that \[\mathbf B = \mathbf Y^T \mathbf Y \mbox{ , }\] and \(\mathbf Y\) is a random sample of size \(g-1\) from \(N(\mathbf 0, \boldsymbol \Sigma)\). The rank of \(\mathbf Y\) is \(rank(\mathbf Y)=min(p, g-1)\).
- Third, \(rank(\mathbf B)= rank(\mathbf Y^T\mathbf Y)=rank(\mathbf Y)=min(p, g-1)\).

Example - Iris Data

Code

mycolors=rainbow(12)[9:1]
image(t(iris[150:1, 1:4]),col = mycolors, xaxt="n", yaxt="n")

Example

Iris Data: Univariate One-way ANOVA: Data Formatting

Code

#rearrange the data in the (X,Y) format
y=c(iris3[,1,1], iris3[,1,2], iris3[,1,3])
#alternatively, you may use: 
#y=aperm(iris3[,1,], c(1,2)); dim(y)=c(150,1)

#define the covariate variable X, 
#which is vector of labels
iris.type=rep(c("Setosa","Versicolor","Virginica"),each=50)

Iris Data: Univariate One-way ANOVA: Exploratory

Code

#visual checking
par(mfrow=c(2,2))
#box plot
boxplot(y~iris.type, main="SepalL")
#alternatively, you may use: boxplot(iris3[,1,], 
#main="SepalL")

#qq plots
qqnorm(iris3[,1,1], main="Q-Q Plot: Setosa"); 
qqline(iris3[,1,1])
qqnorm(iris3[,1,2], main="Q-Q Plot: Versicolor"); 
qqline(iris3[,1,2])
qqnorm(iris3[,1,3], main="Q-Q Plot: Virginica"); 
qqline(iris3[,1,3])

Iris Data: Univariate One-way ANOVA: Exploratory

Iris Data: Univariate One-way ANOVA: Analysis

Code

obj.aov=aov(y~iris.type)
summary(obj.aov)

             Df Sum Sq Mean Sq F value Pr(>F)    
iris.type     2  63.21  31.606   119.3 <2e-16 ***
Residuals   147  38.96   0.265                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Iris Data: MANOVA: Data Formatting

Code

#rearrange the data such as the response matrix is 
#an n-by-p matrix
Y=cbind(SepalL=c(iris3[,1,1],iris3[,1,2],iris3[,1,3]), 
SepalW=c(iris3[,2,1],iris3[,2,2],iris3[,2,3]), 
PetalL=c(iris3[,3,1],iris3[,3,2],iris3[,3,3]), 
PetalW=c(iris3[,4,1],iris3[,4,2],iris3[,4,3]))
#for unknown reasons, data.frame won't work but cbind works
#alternatively, we can use the following way to define y
#Y=aperm(iris3,c(1,3,2));dim(y)=c(150,4)

#define the covariate variable X, which is vector of labels
iris.type=rep(c("Setosa","Versicolor","Virginica"),each=50)

Iris Data: MANOVA: Exploratory

Code

#visual investigation
par(mfrow=c(2,2))
boxplot(iris3[,1,],main="SepalL")
boxplot(iris3[,2,],main="SepalW")
boxplot(iris3[,3,],main="PetalL")
boxplot(iris3[,4,],main="PetalW")

Iris Data: MANOVA: Exploratory

Conducting MANOVA “Manually”

Code

T=(150-1)*cov(Y)
W=(50-1)*cov(iris3[,,1]) +(50-1)*cov(iris3[,,2])+(50-1)*cov(iris3[,,3])
B=T-W
Lambda=prod(1/(1+ eigen(B%*%solve(W))$values))
(150-3-2)/3*(1-sqrt(Lambda))/sqrt(Lambda)

[1] 267.3711

Code

# Using relationship between Wilk's lambda and F-distribution 
# (see wikipedia about "Wilks's lambda distribution")
1-pf((150-3-2)/3*(1-sqrt(Lambda))/sqrt(Lambda), 2*3, 150-3-2)

[1] 0

Conducting MANOVA using “manova” in R

Code

obj=manova(Y~iris.type)
obj

Call:
   manova(Y ~ iris.type)

Terms:
                iris.type Residuals
SepalL            63.2121   38.9562
SepalW            11.3449   16.9620
PetalL           437.1028   27.2226
PetalW            80.4133    6.1566
Deg. of Freedom         2       147

Residual standard errors: 0.5147894 0.3396877 0.4303345 0.20465
Estimated effects may be unbalanced

Conducting MANOVA using “manova” in R

Code

summary(obj, test="Pillai")

           Df Pillai approx F num Df den Df    Pr(>F)    
iris.type   2 1.1919   53.466      8    290 < 2.2e-16 ***
Residuals 147                                            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

summary(obj, test="Wilks")

           Df    Wilks approx F num Df den Df    Pr(>F)    
iris.type   2 0.023439   199.15      8    288 < 2.2e-16 ***
Residuals 147                                              
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conducting MANOVA using “manova” in R

Code

summary(obj, test="Hotelling-Lawley")

           Df Hotelling-Lawley approx F num Df den Df    Pr(>F)    
iris.type   2           32.477   580.53      8    286 < 2.2e-16 ***
Residuals 147                                                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

summary(obj, test="Roy")

           Df    Roy approx F num Df den Df    Pr(>F)    
iris.type   2 32.192     1167      4    145 < 2.2e-16 ***
Residuals 147                                            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Too Many Choices?

Due to the nature of multivariate analysis, we have seen many choices for conducting one-way MANOVA
Do they work equally well?
Does their performance depend on the true distribution?
You will be asked to compare the methods in your midterm project