Professor, Department of Statistics
2026-04-27
\[\begin{array}{lll} SSTO & = & \sum_{i=1}^{g}\sum_{j=1}^{n_i}\left(Y_{ij}-\bar{Y}_{..}\right)^2 = \sum_{i=1}^{g}\sum_{j=1}^{n_i}[(Y_{ij}-\bar{Y}_{i.})+(\bar{Y}_{i.}-\bar{Y}_{..})]^2 \\ & = &\sum_{i=1}^{g}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})^2+\sum_{i=1}^{g}n_i(\bar{Y}_{i.}-\bar{Y}_{..})^2 + \\ &&2 \sum_{i=1}^{g}\sum_{j=1}^{n_i}[(Y_{ij}-\bar{Y}_{i.})(\bar{Y}_{i.}-\bar{Y}_{..})]\\ &=&\sum_{i=1}^{g}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})^2+\sum_{i=1}^{g}n_i(\bar{Y}_{i.}-\bar{Y}_{..})^2 + \\ &&2 \sum_{i=1}^{g}[(\bar{Y}_{i.}-\bar{Y}_{..})\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})]\\ &=&\sum_{i=1}^{g}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})^2+\sum_{i=1}^{g}n_i(\bar{Y}_{i.}-\bar{Y}_{..})^2 + 2 \sum_{i=1}^{g}[(\bar{Y}_{i.}-\bar{Y}_{..})\cdot 0]\\ &=&\underset{SSE}{\underbrace{\sum_{i=1}^{g}\sum_{j=1}^{n_i}[(Y_{ij}-\bar{Y}_{i.})^2}}+\underset{SSTR}{\underbrace{\sum_{i=1}^{g}n_i(\bar{Y}_{i.}-\bar{Y}_{..})^2}} \end{array} \]
\(SSTO=SSE+SSTR=SSW+SSB\):
\(SSE\) (i.e, \(SSW\) ): sum of squares of error, also known as sum of squares within (within groups). \[SSE=\sum_{i=1}^{g}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_{i.})^2=\sum_{i=1}^g (n_i-1)s_i^2\] where \(s_i^2\) is the sample variance for the \(i\)th group.
Recall that \(E[s_i^2]=\sigma^2\). Therefore, \[E[SSE]=\sum_{i=1}^g (n_i-1)\sigma^2=(n-g)\sigma^2\]
\[\begin{array}{lll} E[\bar{Y}_{i.}^2]&=& Var[\bar{Y}_{i.}] + \mu_i^2=\frac{\sigma^2}{n_i} +\mu_i^2 \\ E[\bar{Y}_{..}^2]&=& Var[\bar{Y}_{..}] + \mu_{..}^2=\frac{\sigma^2}{n} +\mu_{..}^2\\ E[\bar{Y}_{i.}\bar{Y}_{..}] &=&\frac{1}{n}E[\bar{Y}_{i.}\sum_{j=1}^g n_j \bar Y_{j.}]= \frac{1}{n}E[\bar{Y}_{i.}(n_i\bar{Y}_{i.} + \sum_{j\ne i}^g n_j \bar Y_{j.})]\\ &=& \frac{n_i}{n}E[\bar{Y}_{i.}^2] + \mu_i \sum_{j\ne i} \frac{n_j}{n}\mu_j= \frac{1}{n}\sigma^2 + \frac{n_i}{n}\mu_i^2 + \mu_i \sum_{j\ne i} \frac{n_j}{n}\mu_j\\ &=& \frac{1}{n}\sigma^2+ \mu_i \bar \mu_{..} \end{array} \]
\[\begin{array}{lll} E[SSTR] &=& \sum_{i=1}^g n_i E[(\bar{Y}_{i.}-\bar{Y}_{..})^2]\\ &=& \sum_{i=1}^g n_i E[\bar{Y}_{i.}^2+\bar{Y}_{..}^2 - 2\bar{Y}_{i.}\bar{Y}_{..} ]\\ &=& \sum_{i=1}^g n_i E[\bar{Y}_{i.}^2+\bar{Y}_{..}^2 - 2\bar{Y}_{i.}\bar{Y}_{..} ]\\ &=& \sum_{i=1}^g n_i (\frac{\sigma^2}{n_i} +\mu_i^2 + \frac{\sigma^2}{n} +\mu_{..}^2 -2[\frac{1}{n}\sigma^2+ \mu_i \bar \mu_{..}])\\ &=& (g-1)\sigma^2 + \sum_{i=1}^g n_i (\mu_i - \bar u_{..})^2 \end{array} \] where \(\bar \mu_{..}=\frac{1}{n} \sum_{i=1}^g n_i \mu_i\).
\(SSE\) has \(n-g\) degrees of freedom, because we have \(n\) observations and we have estimated \(g\) group means.
\(SSTR\) has \(g-1\) degrees of freedom, because we have \(g\) group means and we have estimated the overall mean \(\bar Y_{..}\), which is a linear combination of the group means.
\(SSTO\) has \(n-1\) degrees of freedom, because we have \(n\) observations and we have estimated the overall mean \(\bar Y_{..}\).
Note that \(n-1=(n-g)+(g-1)\), which is consistent with the partition of the sum of squares.
\(MSE=\frac{1}{n-g}SSE\), \(E(MSE)=\sigma^2\)
\(MSTR=\frac{1}{g-1}SSTR\), \(E(MSTR)=\sigma^2 + \frac{1}{g-1}\sum_{i=1}^g n_i (\mu_i - \bar u_{..})^2\)
\[\frac{E(MSTR)}{E(MSE)}=\frac{\sigma^2 + \frac{1}{g-1}\sum_{i=1}^g n_i (\mu_i - \bar u_{..})^2}{\sigma^2} = 1 + \frac{\frac{1}{g-1}\sum_{i=1}^g n_i (\mu_i - \bar u_{..})^2}{\sigma^2}\]
The null hypothesis \[H_0\colon \mu_1 = \mu_2 = \dots = \mu_g\]
The alternative hypothesis \[H_a\colon \mu_i \ne \mu_j \mbox{ for at least one pair of } (i,j)\]
\(\frac{E(MSTR)}{E(MSE)}\overset{H_0}=1\)
In practice we can calculate \(\frac{MSTR}{MSE}\) and we expect large \(\frac{MSTR}{MSE}\) to reject the null hypothesis.
\[F=\frac{MSTR}{MSE}\]
\[F=\frac{\frac{SSTR}{\sigma^2}/(g-1)}{\frac{SSE}{\sigma^2}/(n-g)}\overset{H_0}\sim F_{g-1, n-g}\]
| Source | SS | MS | DF | F |
|---|---|---|---|---|
| Tr | \(SSTR=\sum_{i=1}^g n_i (\bar Y_{i.}-\bar Y_{..})^2\) | \(MSTR=\frac{SSTR}{g-1}\) | \(g-1\) | \(F=\frac{MSTR}{MSE}\) |
| Error | \(SSE=\sum_{i=1}^g\sum_{j=1}^{n_i} (Y_{ij}-\bar Y_{i.})^2\) | \(MSE=\frac{SSE}{n-g}\) | \(n-g\) | |
| Total | \(SSTO=\sum_{i=1}^g\sum_{j=1}^{n_i} (Y_{ij}-\bar Y_{..})^2\) | \(n-1\) |
Each \(\mathbf Y_{ij}\in \mathbb R^p\), i.e, \[\mathbf Y_{ij}=\left(\begin{array}{c}Y_{ij1}\\Y_{ij2}\\\vdots\\Y_{ijp}\end{array}\right)\]
The null hypothesis vs the alternative:
\[\begin{align} H_0\colon & \boldsymbol \mu_1 = \boldsymbol \mu_2 = \dots = \boldsymbol \mu_g\\ H_1\colon & \boldsymbol \mu_i \ne \boldsymbol \mu_j \mbox{ for at least one pair of } (i,j) \end{align}\]
\[\begin{array}{lll} \mathbf{T} & = & \sum_{i=1}^{g}\sum_{j=1}^{n_i}(\mathbf Y_{ij}- \bar {\mathbf Y}_{..})(\mathbf Y_{ij}-\bar {\mathbf Y}_{..})' \\ & = & \sum_{i=1}^{g}\sum_{j=1}^{n_i}\{(\mathbf Y_{ij}-\bar {\mathbf{ Y}}_i)+(\bar{ \mathbf Y}_i-\bar{\mathbf Y}_{..})\}\{(\mathbf Y_{ij}-\bar{\mathbf Y}_i)+(\bar{\mathbf Y}_i-\bar{\mathbf Y}_{..})\}' \\ & = & \underset{W}{\underbrace{\sum_{i=1}^{g}\sum_{j=1}^{n_i}(\mathbf Y_{ij}-\bar{\mathbf Y}_{i.})(\mathbf Y_{ij}-\bar{ \mathbf Y}_{i.})'}}+\underset{B}{\underbrace{\sum_{i=1}^{g}n_i(\bar{ \mathbf Y}_{i.}-\bar{\mathbf Y}_{..})(\bar{\mathbf Y}_{i.}-\bar{\mathbf Y}_{..})'}} + 0 \end{array}\]
\[\mathbf B=\sum_{i=1}^{g}n_i(\bar{\mathbf Y}_{i.}-\bar{\mathbf Y}_{..})(\bar{\mathbf Y}_{i.}-\bar{\mathbf Y}_{..})'\]
\[\mathbf X= \begin{pmatrix} \mathbf 1_{n_1} & \mathbf 0_{n_1} & \cdots & \mathbf 0_{n_1}\\ \mathbf 0_{n_2} & \mathbf 1_{n_2} & \cdots & \mathbf 0_{n_2}\\ \cdots & \cdots & \cdots & \cdots \\ \mathbf 0_{n_g} & \mathbf 0_{n_g} & \cdots & \mathbf 1_{n_g} \end{pmatrix}, \mathbf P= \begin{pmatrix} \frac{1}{n_1}\mathbf 1_{n_1}\mathbf 1_{n_1}^T & \mathbf 0 & \cdots & \mathbf 0\\ \mathbf 0 & \frac{1}{n_2}\mathbf 1_{n_2}\mathbf 1_{n_2}^T& \cdots & \mathbf 0\\ \cdots & \cdots & \cdots & \cdots \\ \mathbf 0 & \mathbf 0 & \cdots & \frac{1}{n_g}\mathbf 1_{n_g}\mathbf 1_{n_g}^T \end{pmatrix} \] It can be shown that \[\mathbf W = \mathbf Y^T (\mathbf I - \mathbf P)\mathbf Y \mbox{ , } \mathbf B = \mathbf Y^T (\mathbf P-\frac{1}{n}\mathbf 1_n \mathbf 1_n^T)\mathbf Y \]
| Source | Sample Cov | DF |
|---|---|---|
| Treatment | \(\mathbf{B}=\sum_{i=1}^{g}n_i(\bar{\mathbf{Y}}_{i.}-\bar{\mathbf{Y}}_{..})(\bar{\mathbf{Y}}_{i.}-\bar{\mathbf{Y}}_{..})'\) | \(g-1\) |
| Error | \(\mathbf{W}=\sum_{i=1}^{g}\sum_{j=1}^{n_i}(\mathbf{Y}_{ij}-\bar{\mathbf{Y}}_{i.})(\mathbf{Y}_{ij}-\bar{\mathbf{Y}}_{i.})'\) | \(n-g\) |
| Total | \(\mathbf{T}=\sum_{i=1}^{g}\sum_{j=1}^{n_i}(\mathbf{Y}_{ij}- \bar{\mathbf{Y}}_{..})(\mathbf{Y}_{ij}-\bar{\mathbf{Y}}_{..})'\) |
#rearrange the data such as the response matrix is
#an n-by-p matrix
Y=cbind(SepalL=c(iris3[,1,1],iris3[,1,2],iris3[,1,3]),
SepalW=c(iris3[,2,1],iris3[,2,2],iris3[,2,3]),
PetalL=c(iris3[,3,1],iris3[,3,2],iris3[,3,3]),
PetalW=c(iris3[,4,1],iris3[,4,2],iris3[,4,3]))
#for unknown reasons, data.frame won't work but cbind works
#alternatively, we can use the following way to define y
#Y=aperm(iris3,c(1,3,2));dim(y)=c(150,4)
#define the covariate variable X, which is vector of labels
iris.type=rep(c("Setosa","Versicolor","Virginica"),each=50)
T=(150-1)*cov(Y)
W=(50-1)*cov(iris3[,,1]) +(50-1)*cov(iris3[,,2])+(50-1)*cov(iris3[,,3])
B=T-W SepalL SepalW PetalL PetalW
SepalL 63.21213 -19.95267 165.2484 71.27933
SepalW -19.95267 11.34493 -57.2396 -22.93267
PetalL 165.24840 -57.23960 437.1028 186.77400
PetalW 71.27933 -22.93267 186.7740 80.41333
Sepal L. Sepal W. Petal L. Petal W.
Sepal L. 38.9562 13.6300 24.6246 5.6450
Sepal W. 13.6300 16.9620 8.1208 4.8084
Petal L. 24.6246 8.1208 27.2226 6.2718
Petal W. 5.6450 4.8084 6.2718 6.1566
SepalL SepalW PetalL PetalW
SepalL 102.168333 -6.322667 189.8730 76.92433
SepalW -6.322667 28.306933 -49.1188 -18.12427
PetalL 189.873000 -49.118800 464.3254 193.04580
PetalW 76.924333 -18.124267 193.0458 86.56993
summary.manova {stats} R Documentation
Summary Method for Multivariate Analysis of Variance
Description
A summary method for class "manova".
Usage
## S3 method for class 'manova'
summary(object,
test = c("Pillai", "Wilks", "Hotelling-Lawley", "Roy"),
intercept = FALSE, tol = 1e-7, ...)
\[V = trace(\mathbf{\mathbf B(\mathbf B+\mathbf W)^{-1}})=trace(\mathbf B \mathbf T^{-1})\]
Wilk’s Lambda distribution Let \(\mathbf A\sim Wishart_p(m_1, \mathbf I)\) and \(\mathbf B\sim Wishart_p(m_2, \mathbf I)\) be independent with \(m_1>p\). We say \[\Lambda=\dfrac{|\mathbf{A}|}{|\mathbf{A+B}|}\sim \Lambda (p, m_1, m_2)\]
Test Statistic \[\Lambda^* = \dfrac{|\mathbf{W}|}{|\mathbf{B+W}|}=\dfrac{|\mathbf{W}|}{|\mathbf{T}|}=\dfrac{|\mathbf{\boldsymbol \Sigma^{-1/2} W\boldsymbol \Sigma^{-1/2} }|}{|\mathbf{\boldsymbol \Sigma^{-1/2} (B+W)\boldsymbol \Sigma^{-1/2} }|}\]
By the definition of Wilk’s Lambda distribution, \[\Lambda^* \overset{H_0}\sim \Lambda (p, n-g, g-1)\]
\[T^2 = trace(\mathbf{BW}^{-1})\]
One interesting observation is that all the test statistics can be expressed in terms of eigenvalues of \(\mathbf B \mathbf W^{-1}\). Let \(\lambda_1, \cdots, \lambda_p\) denote the eigenvalues, from the largest to the smallest. We have
Pillai trace \[tr(\mathbf B (\mathbf B + \mathbf W)^{-1})=\sum_{i=1}^{min(p, g-1)}\frac{\lambda_i}{1+\lambda_i}\]
Wilk’s Lambda \[\Lambda^*= \prod_{i=1}^{min(p, g-1)}\frac{1}{1+\lambda_i}\]
Lawley-Hotelling trace \[trace(\mathbf B \mathbf W^{-1}) = \sum_{i=1}^{min(p, g-1)}\lambda_i\]
Roy’s largest root \[ \begin{aligned} \lambda_{max}(\mathbf B\mathbf W^{-1}) = \lambda_1\\ \lambda_{max}(\mathbf B\mathbf {(B+W)}^{-1})= \frac{\lambda_1}{1+\lambda_1} \end{aligned} \]
#visual checking
par(mfrow=c(2,2))
#box plot
boxplot(y~iris.type, main="SepalL")
#alternatively, you may use: boxplot(iris3[,1,],
#main="SepalL")
#qq plots
qqnorm(iris3[,1,1], main="Q-Q Plot: Setosa");
qqline(iris3[,1,1])
qqnorm(iris3[,1,2], main="Q-Q Plot: Versicolor");
qqline(iris3[,1,2])
qqnorm(iris3[,1,3], main="Q-Q Plot: Virginica");
qqline(iris3[,1,3])#rearrange the data such as the response matrix is
#an n-by-p matrix
Y=cbind(SepalL=c(iris3[,1,1],iris3[,1,2],iris3[,1,3]),
SepalW=c(iris3[,2,1],iris3[,2,2],iris3[,2,3]),
PetalL=c(iris3[,3,1],iris3[,3,2],iris3[,3,3]),
PetalW=c(iris3[,4,1],iris3[,4,2],iris3[,4,3]))
#for unknown reasons, data.frame won't work but cbind works
#alternatively, we can use the following way to define y
#Y=aperm(iris3,c(1,3,2));dim(y)=c(150,4)
#define the covariate variable X, which is vector of labels
iris.type=rep(c("Setosa","Versicolor","Virginica"),each=50)[1] 267.3711
[1] 0
Call:
manova(Y ~ iris.type)
Terms:
iris.type Residuals
SepalL 63.2121 38.9562
SepalW 11.3449 16.9620
PetalL 437.1028 27.2226
PetalW 80.4133 6.1566
Deg. of Freedom 2 147
Residual standard errors: 0.5147894 0.3396877 0.4303345 0.20465
Estimated effects may be unbalanced
Df Pillai approx F num Df den Df Pr(>F)
iris.type 2 1.1919 53.466 8 290 < 2.2e-16 ***
Residuals 147
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Wilks approx F num Df den Df Pr(>F)
iris.type 2 0.023439 199.15 8 288 < 2.2e-16 ***
Residuals 147
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Hotelling-Lawley approx F num Df den Df Pr(>F)
iris.type 2 32.477 580.53 8 286 < 2.2e-16 ***
Residuals 147
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Roy approx F num Df den Df Pr(>F)
iris.type 2 32.192 1167 4 145 < 2.2e-16 ***
Residuals 147
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1