MVN and Quadratic Forms

THIS IS A DRAFT. IT WILL BE POLISHED.

1 Multivariate Normal Distribution

1.1 Univariate normal distribution

Recall the pdf for the univariate normal distribution \(Y\sim N(\mu,\sigma^2)\) \[f(y)=\frac{1}{\sqrt{2\pi\sigma^2}}exp\{-\frac{(y-u)^2}{2\sigma^2}\}.\]

1.2 Bivariate normal distribution

For a bivariate normal distribution with mean \(\vec{\mu}=(\mu_1,\mu_2)^T\) and covariance matrix \[\Sigma= \begin{pmatrix} \sigma_{1}^2 & \rho \sigma_1\sigma_2\\ \rho\sigma_1\sigma_2 & \sigma^2 \end{pmatrix} \] where \(cov(Y_1, Y_2)= \rho\sigma_1\sigma_1\) and \(cor(Y_1, Y_2)=\rho\).

The (joint) pdf is \[f(y_1,y_2)=\frac{1}{\sqrt{(2\pi)^2\sigma_1^2\sigma_2^2(1-\rho^2)}} exp\left[-\frac{1}{2(1-\rho^2)} \left\{ \frac{(x_1-\mu_1)^2}{\sigma_1^2} -\frac{2\rho(x_1-\mu_1)(x_2-\mu_2)}{\sigma_1\sigma_2} +\frac{(x_2-\mu_2)^2}{\sigma^2} \right\}\right] \]

Next, we will generalize it to multivariate normal distribution.

1.3 MVN

MVN can be defined in several equivalent ways. We will show the equivalence of these definitions later.

  1. A random vector \(Y\) has a multivariate normal distribution if \(Y_{p\times 1}=A_{p\times r}Z_{r\times 1}+\mu_{p\times 1}\) (\(p<r\)), where \(dim(A)=p\times r\) and \(Z\) is an \(r\)-vector of independent univariate normal random variables.

  2. If the density is \[f_Y(y) = \frac{1}{(2\pi)^{p/2}|\Sigma|^{1/2}}exp\{-\frac{1}{2}(y-\mu)^T\Sigma^{-1} (y-\mu)\}\]

  3. If the moment generating function is \[M_Y(t)=E(e^{t'Y})=exp\{t'\mu+\frac{1}{2} t'\Sigma t \}\]

  4. A random vector \(Y\) with variance-covariance matrix \(\Sigma\) and mean vector \(\mu\) has a \(N(\mu,\Sigma)\) distribution if and only if \(a^TY\) has a univariate distribution for every vector \(a\). In other words, a random vector has a multivariate normal distribution if and only if any linear combination of the components of \(Y\) is normal.

Before proving the equivalence of the above definitions, we need to review some properties of moment generating functions (mgf).

  • The mgf of a random variable \(X\) is defined as \(M_X(t)=E(e^{tX})\).

  • For a random vector \(Y\), the mgf is defined as \(M_Y(t)=E(e^{t^TY})\) where \(t\) is a vector of the same dimension as \(Y\). Note that the mgf of a random vector is a function of a vector, not a scalar. For example, for a univariate random variable \(X\), the m

  • Theorem: Suppose that the moment-generating function of \((X_1,\cdots,X_n)^T\) exists in some open set. Then these random variables are independent if and only if \(M(t_1,\cdots,t_n)=M_{X_1}(t_1)\cdots M_{X_n}(t_n)\).

We say \(S\) is an open set if for any \(x\in S\), there is a neighborhood of \(x\) that is lying in the set.

Proof of Equivalence

The equivalence of the four definitions can be proved. We show \((1)\Rightarrow (4)\) and \((4)\Rightarrow (3)\).

Proof.

  • \((1)\Rightarrow (4)\): If \(Y\sim N(\mu,\Sigma)\), then by definition (1), \(Y=AZ+\mu\).

Therefore, \(a^TY=a^TAZ+a^T\mu=(a^TA)Z+a^T\mu\) for any \(a\), which has a univariate normal distribution by definition (1).

  • \((4)\Rightarrow (3)\): If \(a^TY\) has a univariate normal distribution for all \(a\), its mean is \(a^T\mu\) and variance is \(a^T\Sigma a\). Then using for the formula for mgf of the univariate normal, we have \[E\{exp[t(a^TY)]\}=exp[t(a^T\mu)+1/2t^2(a^T\Sigma a)\] Let \(t=1\) leading to \[E\{exp[a^TY]\}=exp[a^T\mu+1/2a^T\Sigma a)]=M_Y(a),\] i.e., the mgf of \(Y\) by Definition (3).

Note, the distribution of a random vector is completely determined by the set of all one-dimensional distributions of the linear combinations of the random vector and this is known as Cramer-Wold theorem.

1.4 Linear transformations of MVN

Suppose \(X\sim (\mu, \Sigma)\), then \(AX+B \sim (A\mu+B, A\Sigma A^T)\)

Proof. \[\begin{align*} E(AX+B) &= AE(X)+B=A\mu+B\\ Cov(AX+B) &= Cov(AX)=E[(AX-E(AX))(AX-E(AX))^T]\\ &= E[(A(X-\mu)(X-\mu)^TA^T]\\ &= AE[(X-\mu)(X-\mu)^T]A^T=A\Sigma A^T. \end{align*}\]

Theorem 1 If \(Y\sim N(\mu, \Sigma)\), then \[AY+B \sim N(A\mu+B, A\Sigma A^T)\] with \(dim(A)=r\times n\)

Proof. Hint: Either definition (1) or (3) can be used to prove the above result.

If \(Y \sim N(\mu, \Sigma)\) then every subset of elements of \(Y\) is MVN.

Proof. WLOG consider the first \(r\) elements of \(Y\), \(W=(Y_1,...,Y_r)^T\). Consider the partition of the mean vector and covariance matrix of \(Y\) as follows:

\[ \mu=\begin{pmatrix}\mu_1\\\mu_2\end{pmatrix}, \Sigma= \begin{pmatrix} \Sigma_{11} & \Sigma_{12}\\ \Sigma_{12} & \Sigma_{22}\\ \end{pmatrix} \] where \(dim(\Sigma_{11})=r\times r\).

Note that \(W=(I_r,0)Y\). So \(W\) follows a MVN and the mean vector and covariance matrix of \(W\) are

\[E(W)=(I_r,0)\mu=\mu_1, Cov(W)=(I_r,0)\Sigma (I_r,0)^T=\Sigma_{11}\]

Theorem 2  

where \[\Sigma=\begin{pmatrix} \Sigma_{11} & \Sigma_{12}\\ \Sigma_{21} & \Sigma_{22} \end{pmatrix} \] \(Y_1\) and \(Y_2\) are independent iff \(\Sigma_{12}=0\).

Proof. If \(Y_1\) and \(Y_2\) are independent, \(Cov(Y_1,Y_2)=\Sigma_{12}=0\).

The other direction can be proved using moment generating function function and the factorization theorem. The m.g.f of \(Y\) is \(exp(t^T\mu+1/2t^T\sigma t)\). Partition \(t\) conformably with \(Y\). Then the exponent of the m.g.f. above is \[t_1^T\mu_1 + t_2^T\mu_2 + 1/2t_1^T\Sigma_{11}t_1 + 1/2t_2^T\Sigma_{22}t_2 + t_1^T\Sigma_{12}t_2\] If \(\Sigma_{12}=0\), the exponent can be written as a function of just \(t_1\) plus a function of just \(t_2\), so the m.g.f. factorizes to a term in \(t_1\) alone times a term in \(t_2\) alone. This implies that \(Y_1\) and \(Y_2\) are independent.

1.5 Algebra explanation of of the inverse of a partitioned matrix

Let \(M\) be a block matrix partitioned as: \[ M = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}\]

where \(A_{11}\) is invertible. Define: \[B_{22} = A_{22} - A_{21}A_{11}^{-1}A_{12},\quad B_{12} = A_{11}^{-1}A_{12},\quad B_{21} = A_{21}A_{11}^{-1}.\]

The matrix can be decomposed as (LU decomposition):

\[ M = \underbrace{\begin{pmatrix} I & 0 \\ B_{21} & I \end{pmatrix}}_{L} \underbrace{\begin{pmatrix} A_{11} & 0 \\ 0 & B_{22} \end{pmatrix}}_{D} \underbrace{\begin{pmatrix} I & B_{12} \\ 0 & I \end{pmatrix}}_{U}\]

where:

  • \(L\) is lower block triangular (Gaussian elimination matrix)
  • \(D\) is block diagonal containing pivots
  • \(U\) is upper block triangular

The inverse is \(M^{-1} = U^{-1}D^{-1}L^{-1}\) with:

\[\begin{align*} U^{-1} &= \begin{pmatrix} I & -B_{12} \\ 0 & I \end{pmatrix}, \\ D^{-1} &= \begin{pmatrix} A_{11}^{-1} & 0 \\ 0 & B_{22}^{-1} \end{pmatrix}, \\ L^{-1} &= \begin{pmatrix} I & 0 \\ -B_{21} & I \end{pmatrix}. \end{align*}\]

Finally, \[M^{-1} = \boxed{ \begin{pmatrix} A_{11}^{-1} + B_{12}B_{22}^{-1}B_{21} & -B_{12}B_{22}^{-1} \\ -B_{22}^{-1}B_{21} & B_{22}^{-1} \end{pmatrix} }\]

This method is known as the ``sweep operator”. Most software uses the QR decomposition method to solve least squares. SAS has an option to the sweep operator.

1.6 Conditional distributions

We will derive the conditional distribution of \(Y_1|Y_2=v\). The first method is based on the definition of conditional distribution, i.e., we compute \(f(y_1,y_2)/f(y_2)\). The following inverse of partitioned matrix is useful. Let \[\Sigma= \begin{pmatrix} \Sigma_{11} & \Sigma_{12}\\ \Sigma_{21} & \Sigma_{22} \end{pmatrix} \] Then \[\Sigma^{-1}= \begin{pmatrix} \Sigma^{11} & \Sigma^{12}\\ \Sigma^{21} & \Sigma^{22} \end{pmatrix} \] where \[\begin{eqnarray*} \Sigma^{11}&=&\Sigma_{11}^{-1}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} \\ \Sigma^{22}&=&\Sigma_{22}^{-1}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12} \\ \Sigma^{12}&=&-\Sigma_{11}^{-1}\Sigma_{12}\Sigma_{22}^{-1}\\ \Sigma^{21}&=&-\Sigma_{22}^{-1}\Sigma_{21}\Sigma_{11}^{-1} \end{eqnarray*}\]

This method is Another method, Let \[A= \begin{pmatrix} I_p & -\Sigma_{12}\Sigma_{22}^{-1}\\ 0 & I_q \end{pmatrix} \]

Then define \(X=AY\), we have \(X\sim N(A\mu, A\Sigma A^T)\). Note that \[AY= \begin{pmatrix} Y_1-\Sigma_{12}\Sigma_{22}^{-1}Y_2\\ Y_2 \end{pmatrix} \] \[A\mu= \begin{pmatrix} \mu_1-\Sigma_{12}\Sigma_{22}^{-1}\mu_2\\ \mu_2 \end{pmatrix}\]

\[A\Sigma A^T= \begin{pmatrix} \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} & 0\\ \Sigma_{21} & \Sigma_{22} \end{pmatrix} \begin{pmatrix} I_p & -\Sigma_{12}\Sigma_{22}^{-1}\\ 0 & I_q \end{pmatrix}= \begin{pmatrix} \Sigma_{11.2}=\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} & 0 \\ 0 & \Sigma_{22} \end{pmatrix} \]

Hence \(Y_1-\Sigma_{12}\Sigma_{22}^{-1}Y_2\) and \(Y_2\) are independent. Therefore, \(\{Y_1-\Sigma_{12}\Sigma_{22}^{-1}v | Y_2=v\} \sim N_p (\mu_1-\Sigma_{12}\Sigma_{22}^{-1}\mu_2, \Sigma_{11.2} )\), i.e., \(\{Y_1|Y_2=v\}\sim N_p(\mu_1+\Sigma_{12}\Sigma_{22}^{-1}(v-\mu_2), \Sigma_{11.2} )\)

The matrix \(\Sigma_{11.2}\) is called the ’’partial covariance matrix of \(Y_1\) adjusted for \(Y_2\). It is the covariance matrix for the residuals of the regression of \(y_1\) on \(y_2\), conditional on the values of \(y_2\). The partial correlation matrix is obtained by standardizing the covariances by the appropriate standard deviations, i.e., \[\rho_{ij.2}=\frac{\sigma_{ij.2}}{\sqrt{\sigma_{ii.2}}\sqrt{\sigma_{jj.2}}}\] Note that the conditional mean \(E(Y_1|Y_2=v)=\mu_1+\Sigma_{12}\Sigma_{22}^{-1}(v-\mu_2)\) is called the “best linear regression of \(y_1\) on \(y_2\)”.

Multiple Correlation: Here we define the correlation between a scalar variable, say \(y_1\), and a vector, say \(y_2\), with the above type partitioning for the mean vector and covariance matrix, but without the assumption of normality. Let \(a\in R^{k-1}\) be a fixed vector. For convenience, let \(a=(a_2,\cdots,a_k)^T\). Then consider the correlation between \(y_1\) and \(a^Ty_2\): \[\rho(a)=corr(y_1,a^Ty_2)=\frac{cov(y_1,a^Ty_2)}{\sqrt{\sigma_{11}}\sqrt{a^T\Sigma_{22}a}}= \frac{\sum_{i=2}^ka_icov(y_1,y_i)}{\sqrt{\sigma_{11}}\sqrt{a^T\Sigma_{22}a}}\ =\frac{a^T\Sigma_{21}}{\sqrt{\sigma_{11}}\sqrt{a^T\Sigma_{22}a}} \] The multiple correlation between \(y_1\) and \(y_2\) is defined as the maximum value of \(\rho^2(a)\) over \(a\). Note that \[\rho^2_{max}=max_{a}\frac{a^T\Sigma_{21}\Sigma_{12}a}{\sigma_{11}a^T\Sigma_{22}a}\] Use Theorem~\(\ref{maxcorr}\), we have \[\rho^2_{max}=\frac{\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}}{\sigma_{11}}\] And \(\rho_{max}\) is attained when \(a=\Sigma_{22}^{-1}\Sigma_{21}\).

If \(Y\sim N(\mu, \Sigma)\). Then \(AY\) and \(BY\) are independent iff \(A\Sigma B^T =0\)

If \(Y\sim N(\mu, \Sigma)\) and \(U=AY\) and \(V=BY\), then if \(A\Sigma B =0\)

  1. \(U\) is independent of \(V\)
  2. \(U\) is independent of \(V^TV\) (think about \(Y\) and \(S^2\))
  3. \(U^TU\) is independent of \(V^TV\)

Proof.

  1. follows direction from the two propositions we just proved. (b) and (c) are true because if \(X\) and \(Y\) are independent, then \(g(X)\) and \(h(Y)\) are also independent.

If \(X\) and \(Y\) are independent, then \(g(X)\) and \(h(Y)\) are also independent

Proof. Consider the MGF of the vector \(g(X), h(Y)\) \[\begin{align*} M_{(g(X), h(Y))}(t_1, t_2)&=\iint Exp\{ t_1g(x) + t_2 g(y)\}f(x,y)dxdy\\ &\overset{X\perp Y}= \iint Exp\{ t_1g(x) + t_2 g(y)\}f_X(x)f_Y(y)dxdy\\ &= \iint Exp\{ t_1g(x)\}Exp\{ t_2 g(y)\}f_X(x)f_Y(y)dxdy\\ &\int Exp\{ t_1g(x)\} f_X(x) dx \int Exp\{ t_2 g(y)\} dy\\ &=M_{g(X)}(t_1) M_{h(Y)}(t2) \end{align*}\] which implies independence.

Note, if \(\Sigma=I\), the condition becomes \(AB=0\)

Let \(Y \sim N(\mu, \sigma^2 I_n)\), let \(1_n\) be an n-vector of 1’s, and let \(J_n=1_n1_n^{'}\). Then the sample mean \(\bar{Y}=\frac{1}{n}\sum_{i=1}^nY_i\) is independent of of the sample variance \(S^2=\frac{1}{n-1}\sum_{i}(Y-\bar{Y})^2\). This is because \[\bar{Y}=n^{-1}1_n^{'}Y\] \[S^2=\frac{1}{n-1}(Y-1_n\bar{Y})^T(Y-1_n\bar{Y})=\frac{1}{n-1}Y^T(I_n-n^{-1}J_n)^T(I_n-n^{-1}J_n)Y\] \(1_n^{'}(I_n-n^{-1}J_n)^T=0\).

\(Y=X\beta+\epsilon\), where \(\epsilon\sim N(0,\sigma^2)\). Let \(\hat Y=X\hat \beta\). Then \(Y-\hat Y\) and \(\hat Y\) are indep. This is true, as \[\hat Y= X\hat \beta = P_X Y\] and \[Y-\hat Y=(I-P_X)Y\]

Similarly, you can show that the \(\hat Y_G\) and \(Y-\hat Y_G\) are indep for \(Y=X\beta +\epsilon\), where \(\epsilon \sim N(0, \Sigma)\).

2 Chi-Square Distributions and Quadratic Forms

2.1 Chi-Square Distributions

Let \(X=\sum_{i=1}^p Z_i^2=Z^{;}Z\), where \(Z_i \overset{iid} \sim N(0,1)\) (e.q.t \(Z\sim N(0, I_p)\)), then \(X\sim \chi_p^2\).

  • pdf \[f(x;p)=\frac{1}{2^{p/2}\Gamma (p/2)} x^{p/2-1}e^{-x/2}\] Note: \(\chi_p^2=Gamma(p/2,2)\), where \(p/2\) and \(2\) are the shape and scale parameters, respectively.

Recall that \(Y\sim Gamma(k,\theta)\), then \(f_Y(y)=\frac{y^{k-1}e^{-y/\theta}}{\Gamma (k) \theta^{k}}\)

  • mgf: \((1-2t)^{-p/2}\). The mgf for Gamma distribution can be calculated easily.

Let \(X=\sum_{i=1}^pZ_i^2\), where \(Z_i\) independent and \(Z_i \sim N(\mu_i,1)\). Then \[X\sim \chi_p^2(\lambda=\sum_{i=1}^p\mu_i^2)\]

\[f(x;p,\lambda)=\sum_{k=0}^\infty \frac{e^{-\lambda/2}(\lambda/2)^k}{k!}f_{n+2k}(x)\]

It can be shown that if

  • \(K\sim Poisson(\lambda/2)\)
  • \(X|K\sim \chi_{p+2k}^2\)

then \(X\sim \chi_{p}^2(\lambda)\).

[A homework problem.] Suppose \(K\sim Poisson(\lambda/2)\) and \(X|K \sim \chi_{p+2k}^2\). Show that \(X\sim \chi_p^2 (\lambda)\). Hint: Let \(Z_i\)’s be independent and \(Z_i\sim N(\mu_i,1)\). Compute the mgf of \(Z_i^2\) then the mgf of \(\sum Z_i^2\). By doing this, you obtain the mgf for a non-central chi-square distribution with \(p\) df and \(\lambda=\sum \mu_i^2\). Next compute the mgf of \(X\) and compare it to that of \(\chi_p^2(\lambda)\).

Solution:

Let \(Z_i \sim N(\mu_i, 1)\) and suppose that \(Z_i\)’s are independent. Thus \[\begin{align*} M_{Z_i^2}(t)&= E[exp(tZ_i^2)]=\int \frac{1}{\sqrt{2\pi}} exp\{tz_i^2 - \frac{z_i^2-2\mu_i z_i + \mu_i^2}{2}\}dz_i\\ &= \int \frac{1}{\sqrt{2\pi}} exp\{-\frac{(1-2t)(z_i-\mu_i/(1-2t))^2 +\mu_i^2 -\mu_i^2/(1-2t)}{2} \} dz_i\\ &= e^{\frac{t\mu_i^2}{1-2t}} (1-2t)^{-1/2}\int \frac{1}{\sqrt{2\pi}}(1-2t)^{1/2} exp\{-\frac{(1-2t)(z_i-\mu_i/(1-2t))^2}{2} \} dz_i\\ &= e^{\frac{t\mu_i^2}{1-2t}}(1-2t)^{-1/2} \end{align*}\]

Because \(Z_i\)’s are independent, the mgf of the sum is \(e^{\frac{t\sum \mu_i^2}{1-2t}}(1-2t)^{-1/2}\). Let \(\lambda=\sum \mu_i^2\). By the definition of \(\chi_p^2(\lambda)\), \(\sum Z_i^2\) follows \(\chi_p^2(\lambda)\) and the corresponding mgf is \(e^{\frac{t\sum \mu_i^2}{1-2t}}(1-2t)^{-p/2}\).

Next we compute the mgf of \(X\). \[\begin{align*} M_X(t) &= E[e^{tX}]=E[E[e^{tX}]|K]=E[(1-2t)^{-\frac{p}{2}-k}]\\ &= (1-2t)^{-p/2}\sum_{k=0}^{\infty} (1-2t)^{-k}\frac{e^{-\lambda/2}(\lambda/2)^k}{k!}\\ &= (1-2t)^{-p/2}e^{-\frac{\lambda}{2}+\frac{\lambda}{2(1-2t)}} \sum_{K=0}^{\infty} (1-2t)^{-k}\frac{e^{-\lambda/[2(1-2t)]}(\lambda/[2(1-2t)])^k}{k!}\\ &= (1-2t)^{-p/2}e^{\frac{\lambda t}{1-2t}} \end{align*}\]

This shows the the mgf of \(X\) is the same as that of \(\chi_p^2(\lambda)\), which implies that \(X\sim \chi_p^2(\lambda)\).

2.2 Quadratic Forms

Recall two properties of symmetric matrix:

  1. If \(A\) is symmetric, then \(Rank(A)\) is equal to the number of nonzero eigenvalues of \(A\).

  2. If \(A\) is symmetric of rank \(r\), then \(AA=A\) (i.e., \(A\) is idempotent) IFF \(A\) has \(r\) eigenvalues equal to 1 and \(n-r\) eigenvalues equal to 0.

Proof. By the spectral decomposition of a symmetric matrix, \(\exists\) orthogonal matrix \(T\) s.t. \(A=T\Lambda T^{'}\) where \(\Lambda\) is diagonal and the diagonal elements are the eigenvalues of \(A\). \(rank(A)=rank(T\Lambda T^{'})=rank(\Lambda)\)=number of nonzero eigenvalues.

Proof. Suppose the \(A\) is symmetric of rank \(r\). \(\Rightarrow\): suppose that \(A\) is idempotent. Then \(Ax=\lambda x\) for \(x\) nonzero implies that \[\begin{align*} \lambda x^Tx &= x^TAx\\ &= x^T A^2 x\\ &= (Ax)^T(Ax)\\ &= \lambda ^2 x^Tx\\ &\rightarrow \lambda(\lambda-1)=0 \end{align*}\] This proves that the eigenvalues of an idempotent matrix is either 0 or 1. By the previous Lemma, \(r\) of the eigenvalues equal to 1 and \(n-r\) equal to 0.

\(\Leftarrow\): WLOG, we assume the first \(r\) eigenvalues of \(A\) are 1 and the remaining are 0. By the principle axis theorem (Seber A.1.4) \(\exists\) an orthogonal matrix \(T\) s.t. \[T^TAT=\Lambda= \begin{pmatrix} I_r & 0\\ 0 & 0 \end{pmatrix} \] which is equivalent to \(A=T\Lambda T^T\). Therefore, \[A^2=T\Lambda T^T T \Lambda T^T = T \Lambda^2 T^T=A.\]

Note: a symmetric idempotent matrix is called a projection matrix.

Theorem 3 If \(Y\sim N(\mu, I_p)\) and \(A\) is a \(p\times p\) symmetric matrix of rank \(r\), then \[Q=(Y-\mu)^TA(Y-\mu)\sim \chi_r^2\] IFF \(A\) is idempotent.

Proof. (idempotenet \(\Rightarrow \chi_r^2\) )

Let \(A\) denote a symmetric matrix of rank \(r\) and idempotent. By the spectral decomposition, \(\exists\) an orthogonal matrix \(T\) s.t. \[T^TAT=\Lambda=diag(\lambda_1,\cdots,\lambda_n)= \begin{pmatrix} I_r & 0\\ 0 & 0 \end{pmatrix} \] Then \(Z=T^T(Y-\mu)\sim N(0,I_r)\) and \[\begin{align*} Q &= (Y-\mu)^TA(Y-\mu)\\ &= [T^T(Y-\mu)]^{T} \Lambda [T^T(Y-\mu)]\\ &= Z^T\Lambda Z\\ &= \sum_{i=1}^rZ_i^2 \sim \chi_r^2 \end{align*}\]

The opposite is also true (\(\chi_r^2\Rightarrow\) idempotent). Now assume \(Q\) follows \(\chi_r^2\). \[Q=Y^T T\Lambda T^T Y =\sum_i \lambda_i Z_i^2\] where \(Z_i \overset{iid}\sim N(0,1)\). Recall that \(m.g.f(Z_i^2)=(1-2t)^{-1/2}\). Thus \(mgf(Q)=\prod_{i=1}^p (1-2\lambda_it)^{-1/2}=(1-2t)^{-r/2}\). By the unique factorization of polynomials, \(r\) of the eigenvalues are 1 and the rest are zero.

Suppose \(Y_i\overset{iid}\sim N(\mu,\sigma^2)\). Then \(\bar Y\sim N(\mu, \sigma^2/n)\) and \[\frac{(n-1)s^2}{\sigma^2} = \frac{(Y-\bar{Y})^T(Y-\bar{Y})}{\sigma^2}\sim \frac{}{}\chi_{n-1}^2\]

Proof. Consider \(A=I_n-\frac{1}{n}1_n1_n^T\). It is easy to verify that \(A\) is symmetric and idempotent. Because \(A\) is symmetric and idempotent, by Seber A.6.2, \(Rank(A)=tr(A)=n-1\). In addition, \[A(Y-\mu)=A(Y-1_n\mu)=AY-A1_n\mu=AY-(I_n1_n-\frac{1}{n}1_n1_n^{'}1_n)\mu=AY\] Therefore, \[\frac{(n-1)s^2}{\sigma^2}=\frac{(AY)^{T}AY}{\sigma^2}=\frac{(Y-\mu)^T A^TA(Y-\mu)}{\sigma^2}=\frac{(Y-\mu)^TA(Y-\mu)}{\sigma^2}\].

By Theorem 3, \((n-1)s^2/\sigma^2\sim \chi_{n-1}^2.\)

Suppose that \(A\) is symmetric and \(Y\sim N(0, I_n)\). Then if \(Y^TAY\sim \chi_r^2\), then \(Y^T(I_n-A)Y\sim \chi_{n-r}^2\).

Proof. This follows from the fact that if \(A\) is idempotent of rank \(r\), then \(I_n-A\) is idempotent with rank \(n-r\).

If \(Y\sim N(\mu, \Sigma)\) with \(Rank(\Sigma)=n\) (full rank), then \[Q=(Y-\mu)^T\Sigma^{-1}(Y-\mu)\sim \chi _n^2\]

Proof. Let \(Z=\Sigma^{-1/2}(Y-\mu)\), then \(Z\sim N(0,I)\). We have \[Q=Z^TZ\sim \chi_n^2\]

Theorem 4 Suppose that \(Y \sim N(0,\Sigma)\), and \(A\) is a symmetric matrix. Then \(Y^TAY\) is \(\chi_r^2\) if and only if \(r\) of the eigenvalues of \(A\Sigma\) are 1 and the rest are zero.

Proof. Because \(\Sigma\ge0\), it can be written as \(\Sigma=RR^T\) (by spectral decomposition of a symmetric matrix and the fact that the eigenvalues of a p.s.d matrix is nonnegative).

Suppose \(Y^TAY\sim \chi_r^2\), then \(Z^TR^TARZ\sim \chi_r^2\), which implies that \(R^TAR\) is symmetric and idempotent with \(r\) of its eigenvalues equal to 1 and \(n-r\) equal to 0. Thus, \[r=rank(R^TAR)=tr(R^TAR)=tr(ARR^T)=tr(A\Sigma)\] We also know that \(R^TAR\) and \(ARR^T=A\Sigma\) have the same eigenvalues. Therefore, the eigenvalues of \(A\Sigma\) are either 1 or 0. Because the rank is \(r\), exactly \(r\) of its eigenvalues are 1.

The converse argument is just the reverse of the one above.

Let \(Y\sim N(0, \Sigma)\), where \(\Sigma\) is p.d., and suppose that \(A\) is symmetric. Then \(Y^TAY\sim \chi_r^2\) if and only if \(A\Sigma\) is idempotent and has rank \(r\).

Here is the difference between the above theorem and corollary: for nonsymmetric matrices, idempotence implies that the eigenvalues are zero or 1, but the converse is not true.

When \(\Sigma\) (hence \(R\)) has full rank, the fact that \(R^TAR\) is idempotent implies that \(A\Sigma\) is idempotent. This is because the equation \[T^TARR^TAR=R^TAR\] can be multiplied by \((R^T)^{-1}\) and postmultiplied by \(R^T\) to give \[A\Sigma A\Sigma=A\Sigma\]

If \(Y\sim N(\mu, \Sigma)\) with \(Rank(\Sigma)=n\), then \[Y^T\Sigma^{-1}Y\sim \chi_n^2 (\lambda=\mu^T\Sigma^{-1}\mu)\] where \(\chi_n^2(\lambda)\) denotes the non-central chi-squared distribution with \(n\) d.f. and non-centrality parameter \(\lambda\).

Note, \(E[Y^T\Sigma^{-1}Y]=tr(\Sigma\Sigma^{-1})+\mu^T\Sigma^{-1}\mu=n+\mu^T \Sigma^{-1}\mu\). The power of various statistical tests of hypotheses depends on the noncentrality parameter.

Theorem 5 If \(Y\sim (\mu,\Sigma)\), then \(E[Y^TAY]=tr(A\Sigma)+\mu^TA\mu\).

Proof. \[\begin{align*} E[Y^TAY] &= E[tr(Y^TAY)]=E(tr[AYY^T]) \\ &= tr(A E[YY^T]) = tr(A [\Sigma + \mu\mu^T]) \\ &= tr(A\Sigma) + tr(A\mu\mu^T)= tr(A\Sigma)+tr(\mu^TA\mu) \\ &= tr(A\Sigma) + \mu^TA\mu \end{align*}\]

\(X_1,\cdots,X_n \sim_{iid} N(\mu, \sigma^2)\). We will compute the expectation of \[Q(X)=X_1^2+2X_2^2+\cdots+X_{n-1}^2+X_n^2=(X_1-X_2)^2 + \cdots + (X_{n-1}-X_{n})^2\] To compute \(E[Q(X)]\), we write \(Q(X)\) as a quadratic form \(Q(X)=X^TAX\), with \[A= \begin{pmatrix} 1 & -1 & 0 & 0 & 0 & \cdots\\ -1& 2 & -1 & 0 & 0 & \cdots\\ 0 & -1 & 2 & -1 & 0 & \cdots\\ \cdots &&&&& \cdots \end{pmatrix}\] Then \(E[Q(X)]=tr(A)\sigma^2+\mu^TA\mu=(2n-2)\sigma^2+(\mu-\mu)^2+...+(\mu-\mu)^2=(2n-2)\sigma^2\)