Lecture 6: Wishart Distribution

Zhaoxia Yu

Professor, Department of Statistics

2026-04-27

Code

knitr::opts_chunk$set(echo = TRUE)
library(tidyr) #the pipe (%>%) tool is extremely useful
library(MASS)
library(corrplot)#for visualizing the corr matrix of the iris data
library(car)

The Big Picture

Univariate vs Multivariate

Review: A random sample, denoted by \(X_1, \cdots, X_n\), from a (univariate) normal distribution \(N(\mu, \sigma^2)\)
- What are the distributions of \(\bar X, s^2\)? What useful statistics can be constructed?
New material: A random sample, denoted by \(\mathbf X_1, \cdots, \mathbf X_n\), from a multivariate normal distribution \(N(\boldsymbol \mu, \boldsymbol \Sigma)\)
- What are the distributions of \(\bar{\mathbf X}, \mathbf S\)? What useful statistics can be constructed?

Univariate

A random sample, denoted by \(X_1, \cdots, X_n\), from a normal distribution \(N(\mu, \sigma^2)\)
Let \(\mathbf X_{n\times 1}=(X_1, \cdots, X_n)^T\). It is random vector with a MVN, i.e., \[\mathbf X_{n\times 1}=(X_1, \cdots, X_n)^T \sim \mathbf N(\mu\mathbf 1, \sigma^2\mathbf I)\]

\(\bar X \sim N(\mu, \sigma^2/n)\)
\(\frac{(n-1)s^2}{\sigma^2} \sim \chi_{n-1}^2\)
Independence between \(\bar X\) and \(s^2\).
T-statistic:

\[\frac{\frac{\bar X-\mu}{\sqrt{\sigma^2/n}}}{\sqrt{\frac{(n-1)s^2/\sigma^2}{n-1}}}=\frac{\sqrt{n}(\bar X-\mu)}{s}\sim t_{n-1}.\]

Multivariate

A random sample \(\mathbf X_1, \cdots, \mathbf X_n\) from a MVN \(\mathbf N(\boldsymbol \mu, \boldsymbol \Sigma)\).
Let \[\mathbf X_{n\times p}=\begin{pmatrix} \mathbf X_1^T \\ \vdots \\\mathbf X_n^T \end{pmatrix}\] \(\mathbf X\) follows a matrix normal distribution.

Sample mean vector follows a multivariate normal, i.e., \(\bar{\mathbf X} \sim \mathbf N(\boldsymbol \mu, \boldsymbol \Sigma/n)\)
Sample covariance matrix \((n-1)\mathbf S\) follows a Wishart distribution, i.e., \((n-1)\mathbf S \sim Wishart_p (n-1, \Sigma)\)
Independence between \(\bar {\mathbf X}\) and \(S\).
Hoetelling’s \(T^2\): \(T^2 = (\bar{\mathbf X} - \boldsymbol \mu)^T\left(\frac{\mathbf S}{n}\right)^{-1} (\bar{\mathbf X} - \boldsymbol \mu)\)

Outline

Sample variance and chi-squared distribution
Sample covariance matrix and Wishart distribution
Hotelling’s \(T^2\)
Maximum likelihood estimate

Sample Variance

Sample Variance and Chi-squared Distribution

Let \(\mathbf X=(X_1, \cdots, X_n)\) denote a random sample from \(N(\mu, \sigma^2)\).
Equivalently, \(\mathbf X \sim N(\mu \mathbf 1, \sigma^2 \mathbf I)\).
Let \(s^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2\) denote the sample variance.
We would like to show that \[\frac{(n-1)s^2}{\sigma^2}\sim \chi_{n-1}^2\]

Outline of proof

Projection matrices
Chi-squared distribution
Rewrite \((n-1)s^2/\sigma^2\) as the sum of squared \(N(0,1)\) random variables

Projection Matrices

A projection matrix is a square matrix that is both idempotent and symmetric

\[\mathbf{P}^2 = \mathbf{P} \mbox{, }\mathbf{P}=\mathbf{P}^T \]

Suppose \(\mathbf P\) is a projection matrix. We have
- The eigenvalues of \(\mathbf{P}\) has eigenvalues are either 0 or 1, and the number of 1’s is the same as the rank of the projection matrix.
- \(tr(\mathbf P) = rank(\mathbf P)\)

The spectral decomposition of \(\mathbf P\) is \[\mathbf P=\sum_{i=j}^r \gamma_j\gamma_j^T\] where \(r=rank(\mathbf P)\), and \((\gamma_1, \cdots, \gamma_r)\) are orthogonal vectors of norm 1, i.e.,

\[ \gamma_i^T\gamma_j = \left\{ \begin{array}{ll} 1 & \mbox{if } i=j\\ 0 & \mbox{if } i\not=j \end{array} \right. \]

Example: Project onto a Plane in \(\mathbb{R}^3\)

The goal is to find the projection matrix that projects any vector in \(\mathbb{R}^3\) onto the plane defined by \(x + y + z = 0\).
Steps:
- Step 1: Define the subspace (the plane) and find a basis for it.
- Step 2: Compute the projection matrix using the basis.
- Step 3: Use the projection matrix to project a vector onto the plane and verify the result.

Step 1: Define the Subspace

The plane equation is \(x + y + z = 0\).

We need two basis vectors \(\mathbf{a}_1, \mathbf{a}_2\) that span the plane:

Code

a1 <- c(1, -1, 0)   # Satisfies x + y + z = 0
a2 <- c(0, 1, -1)    # Also satisfies x + y + z = 0
A <- cbind(a1, a2)   # Basis matrix
print(A)

     a1 a2
[1,]  1  0
[2,] -1  1
[3,]  0 -1

Step 2: Compute Projection Matrix \(P = A(A^TA)^{-1}A^T\)

Code

P <- A %*% solve(t(A) %*% A) %*% t(A)
print(P)

           [,1]       [,2]       [,3]
[1,]  0.6666667 -0.3333333 -0.3333333
[2,] -0.3333333  0.6666667 -0.3333333
[3,] -0.3333333 -0.3333333  0.6666667

Key Property: Verify \(P^2 = P\) (idempotent):

Code

all.equal(P, P %*% P)  # Should return TRUE

[1] TRUE

A different basis for the same plane would yield the same projection matrix. Example:

Code

b1 <- c(2, -2, 0)   # Same direction as a1
b2 <- c(0, 2, -2)    # Same direction as a2
B <- cbind(b1, b2)
P2 <- B %*% solve(t(B) %*% B) %*% t(B)
print(P2)

           [,1]       [,2]       [,3]
[1,]  0.6666667 -0.3333333 -0.3333333
[2,] -0.3333333  0.6666667 -0.3333333
[3,] -0.3333333 -0.3333333  0.6666667

Step 3: Project a Vector onto the Plane

Let’s project \(\mathbf{v} = [3,1,2]^T\): \(v_{\text{proj}}=P \mathbf{v}\).

Code

v <- c(1, 1, 2)
v_proj <- P %*% v
print(v_proj)  # Result should satisfy x + y + z = 0

           [,1]
[1,] -0.3333333
[2,] -0.3333333
[3,]  0.6666667

Verification: Check if \(v_{\text{proj}}\) lies on the plane:

Code

sum(v_proj)  # Should be 0 (or very close due to floating-point)

[1] 0

Example: the Centering Matrix

The centering matrix: \(\mathbb C=\mathbf I - \frac{1}{n} \mathbf 1\mathbf 1^T\).
It is a projection matrix:
- \(\mathbb C^T=\mathbb C\) (symmetric)
- \(\mathbb C^2= \mathbb C\) (idempotent)
One important result about a projection matrix is that its eigenvalues are either zero or one.
By properties of projection matrices, we have
- \(rank(\mathbb C) = tr(\mathbb C)=n-1\)
- \(\mathbb C = \sum_{j=1}^{n-1}\gamma_j\gamma_j^T\)

Verify the properties of the centering matrix

Code

set.seed(123)
# Generate a random sample
X=rnorm(10, mean=5, sd=1)
# the centering matrix
C=diag(10)-1/10*matrix(1,10,10)
# Check if C is symmetric
isSymmetric(C)

[1] TRUE

Code

# Check if C is idempotent
C2=C%*%C
all.equal(C, C2) # should be TRUE

[1] TRUE

Original vs Centered Data

Code

# Verify that CX is has mean 0
mean(C%*%X)

[1] -1.776465e-16

Code

plot(X, C%*%X, xlab="X", ylab="CX", main="Original vs Centered")
abline(a=-mean(X), b=1, col="red") # mean line

Spectral decomposition of the centering matrix

Code

# Check the eigenvalues of C
eigen(C)$values

 [1] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
 [6] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 8.881784e-16

Code

# Check the eigenvectors of C
round(eigen(C)$vectors, 2)

       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
 [1,]  0.00  0.00  0.00  0.00  0.00  0.95  0.00  0.00  0.00 -0.32
 [2,]  0.02  0.37  0.00 -0.09  0.24 -0.11 -0.04  0.83 -0.05 -0.32
 [3,]  0.00  0.79  0.00 -0.02  0.04 -0.11 -0.01 -0.51 -0.01 -0.32
 [4,]  0.87 -0.14  0.00 -0.15 -0.30 -0.11  0.10 -0.01  0.04 -0.32
 [5,] -0.25 -0.14  0.00  0.02 -0.25 -0.11  0.45 -0.01 -0.74 -0.32
 [6,] -0.25 -0.14  0.00 -0.04  0.03 -0.11  0.63 -0.01  0.64 -0.32
 [7,]  0.05 -0.35  0.00 -0.28  0.77 -0.11 -0.14 -0.24 -0.15 -0.32
 [8,]  0.05 -0.14  0.00  0.91  0.09 -0.11 -0.17 -0.01  0.04 -0.32
 [9,] -0.25 -0.14 -0.71 -0.18 -0.31 -0.11 -0.41 -0.01  0.11 -0.32
[10,] -0.25 -0.14  0.71 -0.18 -0.31 -0.11 -0.41 -0.01  0.11 -0.32

Code

Gamma=eigen(C)$vectors
# Check Gamma is orthogonal
Gamma%*%t(Gamma)

               [,1]          [,2]          [,3]          [,4]          [,5]
 [1,]  1.000000e+00 -2.775558e-17 -1.387779e-17 -1.387779e-17 -1.387779e-17
 [2,] -2.775558e-17  1.000000e+00  4.996004e-16 -9.714451e-17 -9.714451e-17
 [3,] -1.387779e-17  4.996004e-16  1.000000e+00 -1.665335e-16  1.387779e-17
 [4,] -1.387779e-17 -9.714451e-17 -1.665335e-16  1.000000e+00 -2.498002e-16
 [5,] -1.387779e-17 -9.714451e-17  1.387779e-17 -2.498002e-16  1.000000e+00
 [6,] -1.387779e-17 -9.714451e-17 -1.387779e-17 -1.249001e-16 -6.938894e-17
 [7,] -1.387779e-17  1.110223e-16 -2.775558e-17 -6.938894e-17 -2.775558e-17
 [8,] -1.387779e-17 -1.249001e-16  8.326673e-17  2.775558e-17  2.775558e-17
 [9,] -1.387779e-17 -1.110223e-16  4.163336e-17 -2.775558e-17  9.714451e-17
[10,] -1.387779e-17 -1.110223e-16  4.163336e-17 -2.775558e-17  2.081668e-16
               [,6]          [,7]          [,8]          [,9]         [,10]
 [1,] -1.387779e-17 -1.387779e-17 -1.387779e-17 -1.387779e-17 -1.387779e-17
 [2,] -9.714451e-17  1.110223e-16 -1.249001e-16 -1.110223e-16 -1.110223e-16
 [3,] -1.387779e-17 -2.775558e-17  8.326673e-17  4.163336e-17  4.163336e-17
 [4,] -1.249001e-16 -6.938894e-17  2.775558e-17 -2.775558e-17 -2.775558e-17
 [5,] -6.938894e-17 -2.775558e-17  2.775558e-17  9.714451e-17  2.081668e-16
 [6,]  1.000000e+00  1.804112e-16  1.387779e-17 -1.249001e-16 -1.387779e-17
 [7,]  1.804112e-16  1.000000e+00 -6.938894e-17 -6.938894e-17 -1.387779e-17
 [8,]  1.387779e-17 -6.938894e-17  1.000000e+00  1.110223e-16 -2.775558e-17
 [9,] -1.249001e-16 -6.938894e-17  1.110223e-16  1.000000e+00 -1.387779e-17
[10,] -1.387779e-17 -1.387779e-17 -2.775558e-17 -1.387779e-17  1.000000e+00

Code

t(Gamma)%*%Gamma

               [,1]          [,2]          [,3]          [,4]          [,5]
 [1,]  1.000000e+00 -1.249001e-16 -2.775558e-17 -5.551115e-17 -1.387779e-16
 [2,] -1.249001e-16  1.000000e+00 -1.387779e-17  5.551115e-17  8.326673e-17
 [3,] -2.775558e-17 -1.387779e-17  1.000000e+00 -1.110223e-16  5.551115e-17
 [4,] -5.551115e-17  5.551115e-17 -1.110223e-16  1.000000e+00 -9.714451e-17
 [5,] -1.387779e-16  8.326673e-17  5.551115e-17 -9.714451e-17  1.000000e+00
 [6,]  0.000000e+00  1.040834e-17  0.000000e+00 -6.938894e-18  2.775558e-17
 [7,] -2.775558e-17 -2.775558e-17  1.110223e-16 -2.775558e-17  2.220446e-16
 [8,]  7.546047e-17  2.949030e-17 -6.938894e-18 -2.602085e-18 -1.092876e-16
 [9,]  1.665335e-16 -2.081668e-17 -1.387779e-17  6.938894e-18  1.804112e-16
[10,]  5.551115e-17  1.387779e-16  2.775558e-17  0.000000e+00  5.551115e-17
               [,6]          [,7]          [,8]          [,9]         [,10]
 [1,]  0.000000e+00 -2.775558e-17  7.546047e-17  1.665335e-16  5.551115e-17
 [2,]  1.040834e-17 -2.775558e-17  2.949030e-17 -2.081668e-17  1.387779e-16
 [3,]  0.000000e+00  1.110223e-16 -6.938894e-18 -1.387779e-17  2.775558e-17
 [4,] -6.938894e-18 -2.775558e-17 -2.602085e-18  6.938894e-18  0.000000e+00
 [5,]  2.775558e-17  2.220446e-16 -1.092876e-16  1.804112e-16  5.551115e-17
 [6,]  1.000000e+00  2.775558e-17  8.630249e-17  6.938894e-18 -1.387779e-17
 [7,]  2.775558e-17  1.000000e+00  2.255141e-17  1.526557e-16  1.110223e-16
 [8,]  8.630249e-17  2.255141e-17  1.000000e+00 -6.938894e-18  2.931683e-16
 [9,]  6.938894e-18  1.526557e-16 -6.938894e-18  1.000000e+00  1.387779e-17
[10,] -1.387779e-17  1.110223e-16  2.931683e-16  1.387779e-17  1.000000e+00

Note that \[\mathbb C=\sum_{i=j}^{n-1} \gamma_j\gamma_j^T\]

Code

Total_mat=matrix(0,10,10)
for(i in 1:9){
  Total_mat=Total_mat+Gamma[,i]%*%t(Gamma[,i])
}
all.equal(Total_mat, C)

[1] TRUE

Center A Random Sample from Univariate

The centering matrix centers data
Univariate: Let \(\mathbf X_{n\times 1}\) be a random sample from \(N(\mu, \sigma^2)\), i.e., \[\mathbf X_{n\times 1}\sim N(\mu\mathbf 1, \sigma^2 \mathbf I)\]

\(\mathbb C\mathbf X\) is a linear function of \(\mathbf X\) and it can be verified that \(\mathbb C\mathbf 1=\mathbf 0\), we have \[E[\mathbb C\mathbf X]=\mu \mathbb C\mathbf 1=\mathbf 0\]

Center A Random Sample from Univariate

Multivariate: Let \(\mathbf X_{n\times p}\) be a random sample from \(N(\boldsymbol\mu, \boldsymbol \Sigma)\) Similarly, it can be shown that \(\mathbb C \mathbf X\) has mean \(\mathbf 0_{n\times p}\). We have verified this numerically.
In either situation, we have \(\mathbb C \mathbf X = \mathbb C (\mathbf X-E[\mathbf X])\) This fact will be used later.

Chi-squared distribution

Definition

Definition. Let \(Z_1, ..., Z_k \overset{iid}\sim N(0,1)\). Then, the sum of squares \(Q = Z_1^2 + ... + Z_k^2\) has a chi-squared distribution with \(k\) degrees of freedom, denoted by \(\chi_k^2\).

Alternatively, let \(\mathbf Z_{k\times 1} \sim N(\mathbf 0, \mathbf I)\). We say \(||\mathbf Z||^2=\mathbf Z^T \mathbf Z\) follows \(\chi_k^2\).
\(Z_1^2, \cdots, Z_k^2 \overset{iid}\sim \chi_1^2\).
The sum of independent chi-squared random variables is also chi-squared. Specifically, if \(Q_1\sim \chi_{k_1}^2\) and \(Q_2\sim \chi_{k_2}^2\) are independent, then \(Q_1+Q_2\sim \chi_{k_1+k_2}^2\).

Chi-squared vs Gamma

The PDF of a chi-squared random variable with \(k\) degrees of freedom is given by:

\[ f(x) = \frac{1}{2^{k/2}\Gamma(k/2)} x^{k/2-1} e^{-x/2} \mbox{, } x>0 \]

where \(\Gamma(\cdot)\) is the gamma function.

Remark: \(\chi_k^2\) = Gamma(shape=\(k/2\), rate=1/2).

MGF and Moments

The MGF of a chi-squared random variable with \(k\) degrees of freedom is: \[ M_X(t) = (1-2t)^{-k/2} \]
The mean and variance of a chi-squared random variable with \(k\) degrees of freedom are:

\[\text{E}[X] = k \mbox{, }\text{Var}[X] = 2k\]

Construct Chi-squared R.V.s from Projection Matrices and Independent Normal R.V.s

Let \(\mathbf P_{n\times n}\) be a projection matrix with rank \(r\) and let \(\mathbf Z_{n\times 1}\sim N(\mathbf 0, \mathbf I)\).
Claim: \(\mathbf Z^T \mathbf P \mathbf Z\sim \chi_r^2\).
Proof: By the spectral decomposition of \(\mathbf P\), we have

\[P= \sum_{i=1}^r \gamma_i \gamma_i^T,\] where \(\gamma_1, \cdots, \gamma_r\) are orthogonal vectors of norm 1, i.e., \(\gamma_i^T \gamma_j = 0\) for \(i\not=j\) and \(\gamma_i^T \gamma_i=1\) for \(i=1, \cdots, r\).

\[ \begin{aligned} \mathbf Z^T \mathbf P \mathbf Z &= \mathbf Z^T \sum_{i=1}^r \gamma_i \gamma_i^T \mathbf Z= \sum_{i=1}^r \mathbf Z^T \gamma_i \gamma_i^T \mathbf Z\\ &= \sum_{i=1}^r (\gamma_i^T \mathbf Z)^T(\gamma_i^T \mathbf Z) \end{aligned} \]

Let \(Y_i = \gamma_i^T \mathbf Z\).
\(Y_i\) is univariate and it is a linear combination of \(\mathbf Z\), therefore it follows a normal distribution (univariate).

\[Y_i \sim N(\gamma_i^T \mathbf 0, \gamma_i^T \mathbf I \gamma_i^T)=N(0,1)\]

\(Cov(Y_i, Y_j)=cov(\gamma_i^T Z, \gamma_j^T)= \gamma_i^T\mathbf I \gamma_j^T=\gamma_i^T\mathbf \gamma_j^T=0\) for \(i\not=j\). For MVN, zero covariance implies independence. Thus,

\[Y_1, \cdots, Y_r\overset{iid}\sim N(0,1).\]

Consequently, \(Y_i^2\overset{iid}\sim \chi_1^2\).
Note that \(\mathbf Z^T \mathbf P \mathbf Z=\sum_{i=1}^r Y_i^2\). By the definition of chi-squared distribution, we have \[\mathbf Z^T \mathbf P \mathbf Z\sim \chi_r^2.\]

The Sample Variance

Let \(\mathbb C=\mathbf I - \frac{1}{n} \mathbf 1\mathbf 1^T\) be the centering matrix.
We have shown that \(\mathbb C\) is a projection matrix with rank \(n-1\):
- \(\mathbb C^T=\mathbb C\), \(\mathbb C^2=\mathbb C\).
- \(\mathbb C = \sum_{j=1}^{n-1}\gamma_i\gamma_i^T\).
\((n-1)s^2=\mathbf X^T\mathbb C \mathbf X\) because
\[\begin{aligned} (n-1)s^2&=\sum_{i=1}^n (X_i-\bar X)^2 = (\mathbb C \mathbf X)^T (\mathbb C \mathbf X) = \mathbf X^T \mathbb C \mathbf X. \end{aligned}\]

\(\mathbb C \mathbf X = \mathbb C (\mathbf X - E[\mathbf X])\) because
- \(E[\mathbf X] = \mu \mathbf 1\).
- \(\mathbb C \mathbf 1 = \mathbf 0\).
Therefore, \[ \begin{aligned} \frac{(n-1)s^2}{\sigma^2}&=\frac{\mathbf X^T \mathbb C \mathbf X}{\sigma^2}=\frac{\mathbf X^T \mathbb C^T \mathbb C \mathbb C \mathbf X}{\sigma^2}\\ &=\frac{(\mathbb C\mathbf X)^T \mathbb C \mathbb C\mathbf X}{\sigma^2}\\ &=\frac{(\mathbf X - E[\mathbf X])^T}{\sigma}\mathbb C \frac{(\mathbf X - E[\mathbf X])}{\sigma} \end{aligned}\]

Let \(\mathbf Z=\frac{(\mathbf X - E[\mathbf X])}{\sigma}\).
Easy to see that \(\mathbf Z\sim N(\mathbf 0, \mathbf I)\). Thus,

\[\frac{(n-1)s^2}{\sigma^2}=\mathbf Z^T \mathbb C \mathbf Z\]

Use the result we derived, we have \[\frac{(n-1)s^2}{\sigma^2}=\mathbf Z^T \mathbb C \mathbf Z \sim \chi_{n-1}^2\]

Sample Covariance

The Sample Covriance from A MVN Random Sample

Let \(\mathbf X_1, \cdots,\mathbf X_n \overset{iid} \sim N(\boldsymbol \mu, \boldsymbol \Sigma)\).
Recall that the sample covariance matrix is defined as \[\mathbf S =\frac{1}{n-1} \sum_{i=1}^n (\mathbf X_i - \bar{\mathbf X})(\mathbf X_i - \bar{\mathbf X})^T\]
We have shown that \[(n-1)\mathbf S = \mathbf X ^T \mathbb C \mathbf X\] where \(\mathbf X\) is the \(n\times p\) random matrix.

The Distribution of Sample Covriance

The goal is to show that \((n-1)\mathbf S\) follows a Wishart distribution. More specifically, we would like to show that \[(n-1)\mathbf S \sim Wishart_p(n-1, \Sigma)\]
Outline of proof
1. Rewrite \((n-1)\mathbf S\)
2. Apply properties of a projection matrix
3. Use the definition of Wishart distribution

Wishart Distribution

The Wishart distribution is named after the British statistician John Wishart, who introduced it in his 1928 paper published in Biometrika.
Wishart was interested in the problem of estimating the covariance matrix of a multivariate normal distribution.
Wishart showed that the sample covariance matrix follows a particular probability distribution that we now call the Wishart distribution.
The Wishart distribution has become a fundamental tool in multivariate statistical analysis

Definition

A Wishart distribution can be defined in the following way
Let \(\mathbf W\) be a \(p\times p\) random matrix.
We say \(\mathbf W\) follows \(Wishart_{p}(k, \boldsymbol \Sigma)\) if \(\mathbf W\) can be written as \(\mathbf W=\mathbf X^T \mathbf X\) where \(\mathbf X\) denotes the random matrix formed by a random sample of size \(k\) from MVN \(N(\mathbf 0, \boldsymbol \Sigma)\).
Remark:\(E[\mathbf W]=k\Sigma\).

The definition indicates that if

\[\mathbf X_1, \cdots \mathbf X_k \overset{iid}\sim N(\mathbf 0, \boldsymbol \Sigma),\]

then \[\mathbf X^T \mathbf X=\sum_{i=1}^k \mathbf X_i \mathbf X_i^T \sim Wishart_p(k, \boldsymbol \Sigma).\]

Wishart vs Chi-squared

Wishart: If \(\mathbf X_1, \cdots \mathbf X_k \overset{iid}\sim N(\mathbf 0, \boldsymbol \Sigma)\), then \[\mathbf X^T \mathbf X =\sum_{i=1}^k \mathbf X_i\mathbf X_i^T \sim Wishart_p(k, \boldsymbol \Sigma) \mbox{, where } \mathbf X_{k\times p}=\begin{pmatrix} X_1^T\\ \vdots\\ X_k^T \end{pmatrix} \]
Chi-squared: If \(X_1, \cdots, X_k \overset{iid}\sim N(0,1)\), then
\[\mathbf X^T\mathbf X=\sum_{i=1}^k X_i^2\sim \chi_k^2 \mbox{, where } \mathbf X_{k\times 1}= \begin{pmatrix} X_1 \\ \vdots \\ X_k \end{pmatrix}\]

Wishart vs Chi-squared (continued)

When \(p=1\), \[W=\sum_{i=1}^k X_i^2 = \sigma^2 \sum_{i=1}^k \left(\frac{X_i}{\sigma} \right)^2\sim \sigma^2 \chi_k^2 \]

Sample Covariance

The sample covariance \((n-1)\mathbf S=\mathbf X^T \mathbb C \mathbf X\).. The definition of Wishart distribution is not applicable immediately.
Rewrite \((n-1)\mathbf S\):

\[ \begin{aligned} (n-1)\mathbf S&=\mathbf X^T \mathbb C^T\mathbb C\mathbb C \mathbf X=(\mathbb C \mathbf X)^T(\mathbb C \mathbf X)\\ &=(\mathbb C \mathbf X)^T\mathbb C(\mathbb C \mathbf X)\\ &=(\mathbb C \mathbf X)^T\sum_{j=1}^{n-1}\gamma_i \gamma_i^T (\mathbb C \mathbf X)\\ &=\sum_{j=1}^{n-1} (\gamma_i^T \mathbb C \mathbf X)^T (\gamma_i^T \mathbb C \mathbf X) \end{aligned}\]

Let \(Y_i= (\gamma_i^T \mathbb C \mathbf X)^T=(\gamma_i^T \sum_j \gamma_j \gamma_j^T \mathbf X)^T = (\mathbf X)^T\gamma_i\), we have
- \(Y_i\) is a \(p\times 1\) vector; it is a linear function of \(\mathbf X\). Thus it follows a MVN.
- \(E[Y_i]=0\) because \(\mathbb C\) is the centering matrix
- The \(j\)th element of \(Y_i\) is a linear function of the \(j\) column of \(\mathbf X\).

In the following, we show that \(Y_i\) and \(Y_j\) are uncorrelated for \(i\not=j\):

\[\begin{aligned} Cov[Y_i, Y_j]&=E[(Y_i-\mathbf 0 )(Y_j-\mathbf 0)^T]\\ &= E[Y_iY_j^T] \\ &= E[(\gamma_i^T \mathbb C \mathbf X)^T(\gamma_j^T \mathbb C \mathbf X)]\\ &= E[\mathbf X^T \mathbb C \gamma_i \gamma_j^T \mathbb C \mathbf X] \\ &=\mathbf 0 \end{aligned}\]

The last step is true because for \(i\not=j\), \(\gamma_i \gamma_j^T=0\)

Since \(Y_i\) and \(Y_j\) are two linear combinations of the same MVN distributed random matrix (or its vectorized version), we have \(Y_i\) and \(Y_j\) are independent for \(i\not=j\).
We understand that \(Y_i\) follows a MVN with mean \(\mathbf 0\). How about its covariance matrix? Next, we introduce matrix normal distribution.

Matrix Normal Distribution

Let \(\mathbf X_1, \cdots \mathbf X_n\) be a random sample (therefore iid) from \(N(\boldsymbol \mu, \boldsymbol \Sigma)\).
If we stack \(\mathbf X_1, \cdots, \mathbf X_n\) into a \(n\times p\) random matrix \(\mathbf X\), we say follows a matrix normal distribution: \[\mathbf X \sim N(\mathbf 1_n \boldsymbol \mu^T, \boldsymbol \Sigma, \mathbf I_n)\]
Consider the linear function \(\mathbf X^T \mathbb a\). It can be shown that \[\mathbf X^T \mathbb a \sim N((\mathbf 1_n \boldsymbol \mu^T)^T \mathbb a, \mathbb a^T \mathbb a \boldsymbol \Sigma)\sim N(\boldsymbol \mu \mathbf 1_n \mathbb a, \mathbb a^T \mathbb a \boldsymbol \Sigma)\]
Thus, \(Y_i \sim N(\mathbf 0, \boldsymbol \Sigma)\).

Wishart Distribution and Sample Covariance

So far, we have the following:
- \((n-1)\mathbf S=\sum_{j=1}^{n-1} Y_i Y_i^T\).
- \(Y_1, \cdots, Y_{n-1} \overset{iid} \sim N(\mathbf 0, \boldsymbol \Sigma)\) for \(i=1, \cdots, n-1\).
By the definition of Wishart, we can conclude that \[(n-1)\mathbf S\sim Wishart_p(n-1, \boldsymbol \Sigma)\]

Some Interesting Results

Consider a random sample from MVN \(N(\boldsymbol \mu, \boldsymbol \Sigma)\). Let \(\mathbf S\) denote the sample covariance matrix.
We have already shown that \((n-1)\mathbf S \sim Wishart_p(n-1, \boldsymbol \Sigma)\)
What is the distribution of a diagonal element of \((n-1)\mathbf S\)?
What is the distribution of the sum of elements of \((n-1)\mathbf S\)? Note, this is a special case of next question with \(\mathbf B=(1, \cdots, 1)\).
What is the distribution of \((n-1)\mathbf B \mathbf S \mathbf B^T\) where \(B\) is a fixed \(q\times p\) matrix?
If time permits, we will run some simulations

If you cannot get the answer to the last question, let’s use the definition of Wishart distribution.
Let \(\mathbf W = (n-1)S\). Because it follows \(Wishart_p(n-1, \boldsymbol\Sigma)\), we know that \(\mathbf W=\sum_{j=1}^{n-1} \mathbf Z_j \mathbf Z_j^T\) where \(\mathbf Z_j\)’s are iid frm \(N(\mathbf 0, \boldsymbol\Sigma)\).
Then \[ \begin{aligned} (n-1)\mathbf B \mathbf S \mathbf B^T &= \mathbf B\sum_{j=1}^{n-1} \mathbf Z_j \mathbf Z_j^T\mathbf B = \sum_{j=1}^{n-1} \mathbf B \mathbf Z_j \mathbf Z_j^T\mathbf B^T\\ &= \sum_{j=1}^{n-1} (\mathbf B \mathbf Z_j)(\mathbf B \mathbf Z_j)^T \end{aligned} \]

Let \(\mathbf Y_j=\mathbf B \mathbf Z_j\). Note that it is a linear function of \(\mathbf Z_j\); therefore \[\mathbf Y_j\sim N(\mathbf 0, \mathbf B \boldsymbol \Sigma \mathbf B^T)\] and the \(\mathbf Y_j\)’s are iid (becaue …).

By the definition of Wishart distribution, we have

\[(n-1)\mathbf B \mathbf S \mathbf B^T\sim Wishart_q(n-1, \mathbf B \boldsymbol \Sigma \mathbf B^T)\]