Lecture 6: Wishart Distribution

Zhaoxia Yu

Professor, Department of Statistics

2026-04-27

Code
knitr::opts_chunk$set(echo = TRUE)
library(tidyr) #the pipe (%>%) tool is extremely useful
library(MASS)
library(corrplot)#for visualizing the corr matrix of the iris data
library(car)

The Big Picture

Univariate vs Multivariate

  • Review: A random sample, denoted by \(X_1, \cdots, X_n\), from a (univariate) normal distribution \(N(\mu, \sigma^2)\)
    • What are the distributions of \(\bar X, s^2\)? What useful statistics can be constructed?
  • New material: A random sample, denoted by \(\mathbf X_1, \cdots, \mathbf X_n\), from a multivariate normal distribution \(N(\boldsymbol \mu, \boldsymbol \Sigma)\)
    • What are the distributions of \(\bar{\mathbf X}, \mathbf S\)? What useful statistics can be constructed?

Univariate

  • A random sample, denoted by \(X_1, \cdots, X_n\), from a normal distribution \(N(\mu, \sigma^2)\)
  • Let \(\mathbf X_{n\times 1}=(X_1, \cdots, X_n)^T\). It is random vector with a MVN, i.e., \[\mathbf X_{n\times 1}=(X_1, \cdots, X_n)^T \sim \mathbf N(\mu\mathbf 1, \sigma^2\mathbf I)\]
  1. \(\bar X \sim N(\mu, \sigma^2/n)\)
  2. \(\frac{(n-1)s^2}{\sigma^2} \sim \chi_{n-1}^2\)
  3. Independence between \(\bar X\) and \(s^2\).
  4. T-statistic:

\[\frac{\frac{\bar X-\mu}{\sqrt{\sigma^2/n}}}{\sqrt{\frac{(n-1)s^2/\sigma^2}{n-1}}}=\frac{\sqrt{n}(\bar X-\mu)}{s}\sim t_{n-1}.\]

Multivariate

  • A random sample \(\mathbf X_1, \cdots, \mathbf X_n\) from a MVN \(\mathbf N(\boldsymbol \mu, \boldsymbol \Sigma)\).
  • Let \[\mathbf X_{n\times p}=\begin{pmatrix} \mathbf X_1^T \\ \vdots \\\mathbf X_n^T \end{pmatrix}\] \(\mathbf X\) follows a matrix normal distribution.
  1. Sample mean vector follows a multivariate normal, i.e., \(\bar{\mathbf X} \sim \mathbf N(\boldsymbol \mu, \boldsymbol \Sigma/n)\)

  2. Sample covariance matrix \((n-1)\mathbf S\) follows a Wishart distribution, i.e., \((n-1)\mathbf S \sim Wishart_p (n-1, \Sigma)\)

  3. Independence between \(\bar {\mathbf X}\) and \(S\).

  4. Hoetelling’s \(T^2\): \(T^2 = (\bar{\mathbf X} - \boldsymbol \mu)^T\left(\frac{\mathbf S}{n}\right)^{-1} (\bar{\mathbf X} - \boldsymbol \mu)\)

Outline

  • Sample variance and chi-squared distribution
  • Sample covariance matrix and Wishart distribution
  • Hotelling’s \(T^2\)
  • Maximum likelihood estimate

Sample Variance

Sample Variance and Chi-squared Distribution

  • Let \(\mathbf X=(X_1, \cdots, X_n)\) denote a random sample from \(N(\mu, \sigma^2)\).
  • Equivalently, \(\mathbf X \sim N(\mu \mathbf 1, \sigma^2 \mathbf I)\).
  • Let \(s^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2\) denote the sample variance.
  • We would like to show that \[\frac{(n-1)s^2}{\sigma^2}\sim \chi_{n-1}^2\]

Outline of proof

  1. Projection matrices
  2. Chi-squared distribution
  3. Rewrite \((n-1)s^2/\sigma^2\) as the sum of squared \(N(0,1)\) random variables

Projection Matrices

  • A projection matrix is a square matrix that is both idempotent and symmetric

\[\mathbf{P}^2 = \mathbf{P} \mbox{, }\mathbf{P}=\mathbf{P}^T \]

  • Suppose \(\mathbf P\) is a projection matrix. We have
    • The eigenvalues of \(\mathbf{P}\) has eigenvalues are either 0 or 1, and the number of 1’s is the same as the rank of the projection matrix.
    • \(tr(\mathbf P) = rank(\mathbf P)\)
  • The spectral decomposition of \(\mathbf P\) is \[\mathbf P=\sum_{i=j}^r \gamma_j\gamma_j^T\] where \(r=rank(\mathbf P)\), and \((\gamma_1, \cdots, \gamma_r)\) are orthogonal vectors of norm 1, i.e.,

\[ \gamma_i^T\gamma_j = \left\{ \begin{array}{ll} 1 & \mbox{if } i=j\\ 0 & \mbox{if } i\not=j \end{array} \right. \]

Example: Project onto a Plane in \(\mathbb{R}^3\)

  • The goal is to find the projection matrix that projects any vector in \(\mathbb{R}^3\) onto the plane defined by \(x + y + z = 0\).

  • Steps:

    • Step 1: Define the subspace (the plane) and find a basis for it.
    • Step 2: Compute the projection matrix using the basis.
    • Step 3: Use the projection matrix to project a vector onto the plane and verify the result.

Step 1: Define the Subspace

The plane equation is \(x + y + z = 0\).

We need two basis vectors \(\mathbf{a}_1, \mathbf{a}_2\) that span the plane:

Code
a1 <- c(1, -1, 0)   # Satisfies x + y + z = 0
a2 <- c(0, 1, -1)    # Also satisfies x + y + z = 0
A <- cbind(a1, a2)   # Basis matrix
print(A)
     a1 a2
[1,]  1  0
[2,] -1  1
[3,]  0 -1

Step 2: Compute Projection Matrix \(P = A(A^TA)^{-1}A^T\)

Code
P <- A %*% solve(t(A) %*% A) %*% t(A)
print(P)
           [,1]       [,2]       [,3]
[1,]  0.6666667 -0.3333333 -0.3333333
[2,] -0.3333333  0.6666667 -0.3333333
[3,] -0.3333333 -0.3333333  0.6666667
  • Key Property: Verify \(P^2 = P\) (idempotent):
Code
all.equal(P, P %*% P)  # Should return TRUE
[1] TRUE
  • A different basis for the same plane would yield the same projection matrix. Example:
Code
b1 <- c(2, -2, 0)   # Same direction as a1
b2 <- c(0, 2, -2)    # Same direction as a2
B <- cbind(b1, b2)
P2 <- B %*% solve(t(B) %*% B) %*% t(B)
print(P2)
           [,1]       [,2]       [,3]
[1,]  0.6666667 -0.3333333 -0.3333333
[2,] -0.3333333  0.6666667 -0.3333333
[3,] -0.3333333 -0.3333333  0.6666667

Step 3: Project a Vector onto the Plane

Let’s project \(\mathbf{v} = [3,1,2]^T\): \(v_{\text{proj}}=P \mathbf{v}\).

Code
v <- c(1, 1, 2)
v_proj <- P %*% v
print(v_proj)  # Result should satisfy x + y + z = 0
           [,1]
[1,] -0.3333333
[2,] -0.3333333
[3,]  0.6666667

Verification: Check if \(v_{\text{proj}}\) lies on the plane:

Code
sum(v_proj)  # Should be 0 (or very close due to floating-point)
[1] 0

Example: the Centering Matrix

  • The centering matrix: \(\mathbb C=\mathbf I - \frac{1}{n} \mathbf 1\mathbf 1^T\).
  • It is a projection matrix:
    • \(\mathbb C^T=\mathbb C\) (symmetric)
    • \(\mathbb C^2= \mathbb C\) (idempotent)
  • One important result about a projection matrix is that its eigenvalues are either zero or one.
  • By properties of projection matrices, we have
    • \(rank(\mathbb C) = tr(\mathbb C)=n-1\)
    • \(\mathbb C = \sum_{j=1}^{n-1}\gamma_j\gamma_j^T\)

Verify the properties of the centering matrix

Code
set.seed(123)
# Generate a random sample
X=rnorm(10, mean=5, sd=1)
# the centering matrix
C=diag(10)-1/10*matrix(1,10,10)
# Check if C is symmetric
isSymmetric(C)
[1] TRUE
Code
# Check if C is idempotent
C2=C%*%C
all.equal(C, C2) # should be TRUE
[1] TRUE

Original vs Centered Data

Code
# Verify that CX is has mean 0
mean(C%*%X)
[1] -1.776465e-16
Code
plot(X, C%*%X, xlab="X", ylab="CX", main="Original vs Centered")
abline(a=-mean(X), b=1, col="red") # mean line

Spectral decomposition of the centering matrix

Code
# Check the eigenvalues of C
eigen(C)$values
 [1] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
 [6] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 8.881784e-16
Code
# Check the eigenvectors of C
round(eigen(C)$vectors, 2)
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
 [1,]  0.00  0.00  0.00  0.00  0.00  0.95  0.00  0.00  0.00 -0.32
 [2,]  0.02  0.37  0.00 -0.09  0.24 -0.11 -0.04  0.83 -0.05 -0.32
 [3,]  0.00  0.79  0.00 -0.02  0.04 -0.11 -0.01 -0.51 -0.01 -0.32
 [4,]  0.87 -0.14  0.00 -0.15 -0.30 -0.11  0.10 -0.01  0.04 -0.32
 [5,] -0.25 -0.14  0.00  0.02 -0.25 -0.11  0.45 -0.01 -0.74 -0.32
 [6,] -0.25 -0.14  0.00 -0.04  0.03 -0.11  0.63 -0.01  0.64 -0.32
 [7,]  0.05 -0.35  0.00 -0.28  0.77 -0.11 -0.14 -0.24 -0.15 -0.32
 [8,]  0.05 -0.14  0.00  0.91  0.09 -0.11 -0.17 -0.01  0.04 -0.32
 [9,] -0.25 -0.14 -0.71 -0.18 -0.31 -0.11 -0.41 -0.01  0.11 -0.32
[10,] -0.25 -0.14  0.71 -0.18 -0.31 -0.11 -0.41 -0.01  0.11 -0.32
Code
Gamma=eigen(C)$vectors
# Check Gamma is orthogonal
Gamma%*%t(Gamma)
               [,1]          [,2]          [,3]          [,4]          [,5]
 [1,]  1.000000e+00 -2.775558e-17 -1.387779e-17 -1.387779e-17 -1.387779e-17
 [2,] -2.775558e-17  1.000000e+00  4.996004e-16 -9.714451e-17 -9.714451e-17
 [3,] -1.387779e-17  4.996004e-16  1.000000e+00 -1.665335e-16  1.387779e-17
 [4,] -1.387779e-17 -9.714451e-17 -1.665335e-16  1.000000e+00 -2.498002e-16
 [5,] -1.387779e-17 -9.714451e-17  1.387779e-17 -2.498002e-16  1.000000e+00
 [6,] -1.387779e-17 -9.714451e-17 -1.387779e-17 -1.249001e-16 -6.938894e-17
 [7,] -1.387779e-17  1.110223e-16 -2.775558e-17 -6.938894e-17 -2.775558e-17
 [8,] -1.387779e-17 -1.249001e-16  8.326673e-17  2.775558e-17  2.775558e-17
 [9,] -1.387779e-17 -1.110223e-16  4.163336e-17 -2.775558e-17  9.714451e-17
[10,] -1.387779e-17 -1.110223e-16  4.163336e-17 -2.775558e-17  2.081668e-16
               [,6]          [,7]          [,8]          [,9]         [,10]
 [1,] -1.387779e-17 -1.387779e-17 -1.387779e-17 -1.387779e-17 -1.387779e-17
 [2,] -9.714451e-17  1.110223e-16 -1.249001e-16 -1.110223e-16 -1.110223e-16
 [3,] -1.387779e-17 -2.775558e-17  8.326673e-17  4.163336e-17  4.163336e-17
 [4,] -1.249001e-16 -6.938894e-17  2.775558e-17 -2.775558e-17 -2.775558e-17
 [5,] -6.938894e-17 -2.775558e-17  2.775558e-17  9.714451e-17  2.081668e-16
 [6,]  1.000000e+00  1.804112e-16  1.387779e-17 -1.249001e-16 -1.387779e-17
 [7,]  1.804112e-16  1.000000e+00 -6.938894e-17 -6.938894e-17 -1.387779e-17
 [8,]  1.387779e-17 -6.938894e-17  1.000000e+00  1.110223e-16 -2.775558e-17
 [9,] -1.249001e-16 -6.938894e-17  1.110223e-16  1.000000e+00 -1.387779e-17
[10,] -1.387779e-17 -1.387779e-17 -2.775558e-17 -1.387779e-17  1.000000e+00
Code
t(Gamma)%*%Gamma
               [,1]          [,2]          [,3]          [,4]          [,5]
 [1,]  1.000000e+00 -1.249001e-16 -2.775558e-17 -5.551115e-17 -1.387779e-16
 [2,] -1.249001e-16  1.000000e+00 -1.387779e-17  5.551115e-17  8.326673e-17
 [3,] -2.775558e-17 -1.387779e-17  1.000000e+00 -1.110223e-16  5.551115e-17
 [4,] -5.551115e-17  5.551115e-17 -1.110223e-16  1.000000e+00 -9.714451e-17
 [5,] -1.387779e-16  8.326673e-17  5.551115e-17 -9.714451e-17  1.000000e+00
 [6,]  0.000000e+00  1.040834e-17  0.000000e+00 -6.938894e-18  2.775558e-17
 [7,] -2.775558e-17 -2.775558e-17  1.110223e-16 -2.775558e-17  2.220446e-16
 [8,]  7.546047e-17  2.949030e-17 -6.938894e-18 -2.602085e-18 -1.092876e-16
 [9,]  1.665335e-16 -2.081668e-17 -1.387779e-17  6.938894e-18  1.804112e-16
[10,]  5.551115e-17  1.387779e-16  2.775558e-17  0.000000e+00  5.551115e-17
               [,6]          [,7]          [,8]          [,9]         [,10]
 [1,]  0.000000e+00 -2.775558e-17  7.546047e-17  1.665335e-16  5.551115e-17
 [2,]  1.040834e-17 -2.775558e-17  2.949030e-17 -2.081668e-17  1.387779e-16
 [3,]  0.000000e+00  1.110223e-16 -6.938894e-18 -1.387779e-17  2.775558e-17
 [4,] -6.938894e-18 -2.775558e-17 -2.602085e-18  6.938894e-18  0.000000e+00
 [5,]  2.775558e-17  2.220446e-16 -1.092876e-16  1.804112e-16  5.551115e-17
 [6,]  1.000000e+00  2.775558e-17  8.630249e-17  6.938894e-18 -1.387779e-17
 [7,]  2.775558e-17  1.000000e+00  2.255141e-17  1.526557e-16  1.110223e-16
 [8,]  8.630249e-17  2.255141e-17  1.000000e+00 -6.938894e-18  2.931683e-16
 [9,]  6.938894e-18  1.526557e-16 -6.938894e-18  1.000000e+00  1.387779e-17
[10,] -1.387779e-17  1.110223e-16  2.931683e-16  1.387779e-17  1.000000e+00

Note that \[\mathbb C=\sum_{i=j}^{n-1} \gamma_j\gamma_j^T\]

Code
Total_mat=matrix(0,10,10)
for(i in 1:9){
  Total_mat=Total_mat+Gamma[,i]%*%t(Gamma[,i])
}
all.equal(Total_mat, C)
[1] TRUE

Center A Random Sample from Univariate

  • The centering matrix centers data
  • Univariate: Let \(\mathbf X_{n\times 1}\) be a random sample from \(N(\mu, \sigma^2)\), i.e., \[\mathbf X_{n\times 1}\sim N(\mu\mathbf 1, \sigma^2 \mathbf I)\]

\(\mathbb C\mathbf X\) is a linear function of \(\mathbf X\) and it can be verified that \(\mathbb C\mathbf 1=\mathbf 0\), we have \[E[\mathbb C\mathbf X]=\mu \mathbb C\mathbf 1=\mathbf 0\]

Center A Random Sample from Univariate

  • Multivariate: Let \(\mathbf X_{n\times p}\) be a random sample from \(N(\boldsymbol\mu, \boldsymbol \Sigma)\) Similarly, it can be shown that \(\mathbb C \mathbf X\) has mean \(\mathbf 0_{n\times p}\). We have verified this numerically.

  • In either situation, we have \(\mathbb C \mathbf X = \mathbb C (\mathbf X-E[\mathbf X])\) This fact will be used later.

Chi-squared distribution

Definition

Definition. Let \(Z_1, ..., Z_k \overset{iid}\sim N(0,1)\). Then, the sum of squares \(Q = Z_1^2 + ... + Z_k^2\) has a chi-squared distribution with \(k\) degrees of freedom, denoted by \(\chi_k^2\).

  • Alternatively, let \(\mathbf Z_{k\times 1} \sim N(\mathbf 0, \mathbf I)\). We say \(||\mathbf Z||^2=\mathbf Z^T \mathbf Z\) follows \(\chi_k^2\).

  • \(Z_1^2, \cdots, Z_k^2 \overset{iid}\sim \chi_1^2\).

  • The sum of independent chi-squared random variables is also chi-squared. Specifically, if \(Q_1\sim \chi_{k_1}^2\) and \(Q_2\sim \chi_{k_2}^2\) are independent, then \(Q_1+Q_2\sim \chi_{k_1+k_2}^2\).

Chi-squared vs Gamma

  • The PDF of a chi-squared random variable with \(k\) degrees of freedom is given by:

\[ f(x) = \frac{1}{2^{k/2}\Gamma(k/2)} x^{k/2-1} e^{-x/2} \mbox{, } x>0 \]

where \(\Gamma(\cdot)\) is the gamma function.

  • Remark: \(\chi_k^2\) = Gamma(shape=\(k/2\), rate=1/2).

MGF and Moments

  • The MGF of a chi-squared random variable with \(k\) degrees of freedom is: \[ M_X(t) = (1-2t)^{-k/2} \]

  • The mean and variance of a chi-squared random variable with \(k\) degrees of freedom are:

\[\text{E}[X] = k \mbox{, }\text{Var}[X] = 2k\]

Construct Chi-squared R.V.s from Projection Matrices and Independent Normal R.V.s

  • Let \(\mathbf P_{n\times n}\) be a projection matrix with rank \(r\) and let \(\mathbf Z_{n\times 1}\sim N(\mathbf 0, \mathbf I)\).

  • Claim: \(\mathbf Z^T \mathbf P \mathbf Z\sim \chi_r^2\).

  • Proof: By the spectral decomposition of \(\mathbf P\), we have

\[P= \sum_{i=1}^r \gamma_i \gamma_i^T,\] where \(\gamma_1, \cdots, \gamma_r\) are orthogonal vectors of norm 1, i.e., \(\gamma_i^T \gamma_j = 0\) for \(i\not=j\) and \(\gamma_i^T \gamma_i=1\) for \(i=1, \cdots, r\).

\[ \begin{aligned} \mathbf Z^T \mathbf P \mathbf Z &= \mathbf Z^T \sum_{i=1}^r \gamma_i \gamma_i^T \mathbf Z= \sum_{i=1}^r \mathbf Z^T \gamma_i \gamma_i^T \mathbf Z\\ &= \sum_{i=1}^r (\gamma_i^T \mathbf Z)^T(\gamma_i^T \mathbf Z) \end{aligned} \]

  • Let \(Y_i = \gamma_i^T \mathbf Z\).
  • \(Y_i\) is univariate and it is a linear combination of \(\mathbf Z\), therefore it follows a normal distribution (univariate).

\[Y_i \sim N(\gamma_i^T \mathbf 0, \gamma_i^T \mathbf I \gamma_i^T)=N(0,1)\]

  • \(Cov(Y_i, Y_j)=cov(\gamma_i^T Z, \gamma_j^T)= \gamma_i^T\mathbf I \gamma_j^T=\gamma_i^T\mathbf \gamma_j^T=0\) for \(i\not=j\). For MVN, zero covariance implies independence. Thus,

\[Y_1, \cdots, Y_r\overset{iid}\sim N(0,1).\]

  • Consequently, \(Y_i^2\overset{iid}\sim \chi_1^2\).
  • Note that \(\mathbf Z^T \mathbf P \mathbf Z=\sum_{i=1}^r Y_i^2\). By the definition of chi-squared distribution, we have \[\mathbf Z^T \mathbf P \mathbf Z\sim \chi_r^2.\]

The Sample Variance

  • Let \(\mathbb C=\mathbf I - \frac{1}{n} \mathbf 1\mathbf 1^T\) be the centering matrix.

  • We have shown that \(\mathbb C\) is a projection matrix with rank \(n-1\):

    • \(\mathbb C^T=\mathbb C\), \(\mathbb C^2=\mathbb C\).
    • \(\mathbb C = \sum_{j=1}^{n-1}\gamma_i\gamma_i^T\).
  • \((n-1)s^2=\mathbf X^T\mathbb C \mathbf X\) because

    \[\begin{aligned} (n-1)s^2&=\sum_{i=1}^n (X_i-\bar X)^2 = (\mathbb C \mathbf X)^T (\mathbb C \mathbf X) = \mathbf X^T \mathbb C \mathbf X. \end{aligned}\]
  • \(\mathbb C \mathbf X = \mathbb C (\mathbf X - E[\mathbf X])\) because
    • \(E[\mathbf X] = \mu \mathbf 1\).
    • \(\mathbb C \mathbf 1 = \mathbf 0\).
  • Therefore, \[ \begin{aligned} \frac{(n-1)s^2}{\sigma^2}&=\frac{\mathbf X^T \mathbb C \mathbf X}{\sigma^2}=\frac{\mathbf X^T \mathbb C^T \mathbb C \mathbb C \mathbf X}{\sigma^2}\\ &=\frac{(\mathbb C\mathbf X)^T \mathbb C \mathbb C\mathbf X}{\sigma^2}\\ &=\frac{(\mathbf X - E[\mathbf X])^T}{\sigma}\mathbb C \frac{(\mathbf X - E[\mathbf X])}{\sigma} \end{aligned}\]
  • Let \(\mathbf Z=\frac{(\mathbf X - E[\mathbf X])}{\sigma}\).

  • Easy to see that \(\mathbf Z\sim N(\mathbf 0, \mathbf I)\). Thus,

\[\frac{(n-1)s^2}{\sigma^2}=\mathbf Z^T \mathbb C \mathbf Z\]

  • Use the result we derived, we have \[\frac{(n-1)s^2}{\sigma^2}=\mathbf Z^T \mathbb C \mathbf Z \sim \chi_{n-1}^2\]

Sample Covariance

The Sample Covriance from A MVN Random Sample

  • Let \(\mathbf X_1, \cdots,\mathbf X_n \overset{iid} \sim N(\boldsymbol \mu, \boldsymbol \Sigma)\).
  • Recall that the sample covariance matrix is defined as \[\mathbf S =\frac{1}{n-1} \sum_{i=1}^n (\mathbf X_i - \bar{\mathbf X})(\mathbf X_i - \bar{\mathbf X})^T\]
  • We have shown that \[(n-1)\mathbf S = \mathbf X ^T \mathbb C \mathbf X\] where \(\mathbf X\) is the \(n\times p\) random matrix.

The Distribution of Sample Covriance

  • The goal is to show that \((n-1)\mathbf S\) follows a Wishart distribution. More specifically, we would like to show that \[(n-1)\mathbf S \sim Wishart_p(n-1, \Sigma)\]
  • Outline of proof
    1. Rewrite \((n-1)\mathbf S\)
    2. Apply properties of a projection matrix
    3. Use the definition of Wishart distribution

Wishart Distribution

  • The Wishart distribution is named after the British statistician John Wishart, who introduced it in his 1928 paper published in Biometrika.

  • Wishart was interested in the problem of estimating the covariance matrix of a multivariate normal distribution.

  • Wishart showed that the sample covariance matrix follows a particular probability distribution that we now call the Wishart distribution.

  • The Wishart distribution has become a fundamental tool in multivariate statistical analysis

Definition

  • A Wishart distribution can be defined in the following way

  • Let \(\mathbf W\) be a \(p\times p\) random matrix.

  • We say \(\mathbf W\) follows \(Wishart_{p}(k, \boldsymbol \Sigma)\) if \(\mathbf W\) can be written as \(\mathbf W=\mathbf X^T \mathbf X\) where \(\mathbf X\) denotes the random matrix formed by a random sample of size \(k\) from MVN \(N(\mathbf 0, \boldsymbol \Sigma)\).

  • Remark:\(E[\mathbf W]=k\Sigma\).

  • The definition indicates that if

\[\mathbf X_1, \cdots \mathbf X_k \overset{iid}\sim N(\mathbf 0, \boldsymbol \Sigma),\]

then \[\mathbf X^T \mathbf X=\sum_{i=1}^k \mathbf X_i \mathbf X_i^T \sim Wishart_p(k, \boldsymbol \Sigma).\]

Wishart vs Chi-squared

  • Wishart: If \(\mathbf X_1, \cdots \mathbf X_k \overset{iid}\sim N(\mathbf 0, \boldsymbol \Sigma)\), then \[\mathbf X^T \mathbf X =\sum_{i=1}^k \mathbf X_i\mathbf X_i^T \sim Wishart_p(k, \boldsymbol \Sigma) \mbox{, where } \mathbf X_{k\times p}=\begin{pmatrix} X_1^T\\ \vdots\\ X_k^T \end{pmatrix} \]

  • Chi-squared: If \(X_1, \cdots, X_k \overset{iid}\sim N(0,1)\), then
    \[\mathbf X^T\mathbf X=\sum_{i=1}^k X_i^2\sim \chi_k^2 \mbox{, where } \mathbf X_{k\times 1}= \begin{pmatrix} X_1 \\ \vdots \\ X_k \end{pmatrix}\]

Wishart vs Chi-squared (continued)

  • When \(p=1\), \[W=\sum_{i=1}^k X_i^2 = \sigma^2 \sum_{i=1}^k \left(\frac{X_i}{\sigma} \right)^2\sim \sigma^2 \chi_k^2 \]

Sample Covariance

  • The sample covariance \((n-1)\mathbf S=\mathbf X^T \mathbb C \mathbf X\).. The definition of Wishart distribution is not applicable immediately.

  • Rewrite \((n-1)\mathbf S\):

\[ \begin{aligned} (n-1)\mathbf S&=\mathbf X^T \mathbb C^T\mathbb C\mathbb C \mathbf X=(\mathbb C \mathbf X)^T(\mathbb C \mathbf X)\\ &=(\mathbb C \mathbf X)^T\mathbb C(\mathbb C \mathbf X)\\ &=(\mathbb C \mathbf X)^T\sum_{j=1}^{n-1}\gamma_i \gamma_i^T (\mathbb C \mathbf X)\\ &=\sum_{j=1}^{n-1} (\gamma_i^T \mathbb C \mathbf X)^T (\gamma_i^T \mathbb C \mathbf X) \end{aligned}\]

  • Let \(Y_i= (\gamma_i^T \mathbb C \mathbf X)^T=(\gamma_i^T \sum_j \gamma_j \gamma_j^T \mathbf X)^T = (\mathbf X)^T\gamma_i\), we have
    • \(Y_i\) is a \(p\times 1\) vector; it is a linear function of \(\mathbf X\). Thus it follows a MVN.
    • \(E[Y_i]=0\) because \(\mathbb C\) is the centering matrix
    • The \(j\)th element of \(Y_i\) is a linear function of the \(j\) column of \(\mathbf X\).
  • In the following, we show that \(Y_i\) and \(Y_j\) are uncorrelated for \(i\not=j\):

\[\begin{aligned} Cov[Y_i, Y_j]&=E[(Y_i-\mathbf 0 )(Y_j-\mathbf 0)^T]\\ &= E[Y_iY_j^T] \\ &= E[(\gamma_i^T \mathbb C \mathbf X)^T(\gamma_j^T \mathbb C \mathbf X)]\\ &= E[\mathbf X^T \mathbb C \gamma_i \gamma_j^T \mathbb C \mathbf X] \\ &=\mathbf 0 \end{aligned}\]

The last step is true because for \(i\not=j\), \(\gamma_i \gamma_j^T=0\)

  • Since \(Y_i\) and \(Y_j\) are two linear combinations of the same MVN distributed random matrix (or its vectorized version), we have \(Y_i\) and \(Y_j\) are independent for \(i\not=j\).

  • We understand that \(Y_i\) follows a MVN with mean \(\mathbf 0\). How about its covariance matrix? Next, we introduce matrix normal distribution.

Matrix Normal Distribution

  • Let \(\mathbf X_1, \cdots \mathbf X_n\) be a random sample (therefore iid) from \(N(\boldsymbol \mu, \boldsymbol \Sigma)\).

  • If we stack \(\mathbf X_1, \cdots, \mathbf X_n\) into a \(n\times p\) random matrix \(\mathbf X\), we say follows a matrix normal distribution: \[\mathbf X \sim N(\mathbf 1_n \boldsymbol \mu^T, \boldsymbol \Sigma, \mathbf I_n)\]

  • Consider the linear function \(\mathbf X^T \mathbb a\). It can be shown that \[\mathbf X^T \mathbb a \sim N((\mathbf 1_n \boldsymbol \mu^T)^T \mathbb a, \mathbb a^T \mathbb a \boldsymbol \Sigma)\sim N(\boldsymbol \mu \mathbf 1_n \mathbb a, \mathbb a^T \mathbb a \boldsymbol \Sigma)\]

  • Thus, \(Y_i \sim N(\mathbf 0, \boldsymbol \Sigma)\).

Wishart Distribution and Sample Covariance

  • So far, we have the following:
    • \((n-1)\mathbf S=\sum_{j=1}^{n-1} Y_i Y_i^T\).
    • \(Y_1, \cdots, Y_{n-1} \overset{iid} \sim N(\mathbf 0, \boldsymbol \Sigma)\) for \(i=1, \cdots, n-1\).
  • By the definition of Wishart, we can conclude that \[(n-1)\mathbf S\sim Wishart_p(n-1, \boldsymbol \Sigma)\]

Some Interesting Results

  • Consider a random sample from MVN \(N(\boldsymbol \mu, \boldsymbol \Sigma)\). Let \(\mathbf S\) denote the sample covariance matrix.

  • We have already shown that \((n-1)\mathbf S \sim Wishart_p(n-1, \boldsymbol \Sigma)\)

  • What is the distribution of a diagonal element of \((n-1)\mathbf S\)?

  • What is the distribution of the sum of elements of \((n-1)\mathbf S\)? Note, this is a special case of next question with \(\mathbf B=(1, \cdots, 1)\).

  • What is the distribution of \((n-1)\mathbf B \mathbf S \mathbf B^T\) where \(B\) is a fixed \(q\times p\) matrix?

  • If time permits, we will run some simulations

  • If you cannot get the answer to the last question, let’s use the definition of Wishart distribution.

  • Let \(\mathbf W = (n-1)S\). Because it follows \(Wishart_p(n-1, \boldsymbol\Sigma)\), we know that \(\mathbf W=\sum_{j=1}^{n-1} \mathbf Z_j \mathbf Z_j^T\) where \(\mathbf Z_j\)’s are iid frm \(N(\mathbf 0, \boldsymbol\Sigma)\).

  • Then \[ \begin{aligned} (n-1)\mathbf B \mathbf S \mathbf B^T &= \mathbf B\sum_{j=1}^{n-1} \mathbf Z_j \mathbf Z_j^T\mathbf B = \sum_{j=1}^{n-1} \mathbf B \mathbf Z_j \mathbf Z_j^T\mathbf B^T\\ &= \sum_{j=1}^{n-1} (\mathbf B \mathbf Z_j)(\mathbf B \mathbf Z_j)^T \end{aligned} \]

Let \(\mathbf Y_j=\mathbf B \mathbf Z_j\). Note that it is a linear function of \(\mathbf Z_j\); therefore \[\mathbf Y_j\sim N(\mathbf 0, \mathbf B \boldsymbol \Sigma \mathbf B^T)\] and the \(\mathbf Y_j\)’s are iid (becaue …).

By the definition of Wishart distribution, we have

\[(n-1)\mathbf B \mathbf S \mathbf B^T\sim Wishart_q(n-1, \mathbf B \boldsymbol \Sigma \mathbf B^T)\]