A Brief Review of Linear Algebra

The lecture covers the following linear algebra topics:

  1. vectors and linear independence
  2. rank
  3. vector spaces
  4. column spaces
  5. eigenvalues and trace
  6. Special square matrices: symmetric, positive definite, and positive semidefinite matrices
  7. Inverse and generalized inverse
  8. Matrix differentiation

1 Vectors and linear independence

1.1 Vectors

A vector (more specifically, a column vector) is a quantity whose entries are stacked vertically.

Examples:

\[ a = \begin{pmatrix}1 \\ 2 \\ 0.5\end{pmatrix}, \qquad u = \begin{pmatrix}u_1 \\ u_2 \\ u_3\end{pmatrix}, \qquad u = (u_1, \ldots, u_p)^T \in \mathbb{R}^p. \]

1.2 Linear dependence

Let \(x_1, \ldots, x_k\) be vectors in \(\mathbb{R}^p\).

We say that \(x_1, \ldots, x_k\) are linearly dependent if there exist scalars \(\lambda_1, \ldots, \lambda_k\), not all zero, such that

\[ \lambda_1 x_1 + \cdots + \lambda_k x_k = 0. \]

Equivalently, if one coefficient is nonzero, then one vector can be written as a linear combination of the others. For example, if \(\lambda_1 \neq 0\), then

\[ x_1 = -\frac{1}{\lambda_1} \sum_{i=2}^k \lambda_i x_i. \]

1.3 Linear independence

We say that \(x_1, \ldots, x_k\) are linearly independent if they are not linearly dependent.

Equivalently, \(x_1, \ldots, x_k\) are linearly independent if

\[ \lambda_1 x_1 + \cdots + \lambda_k x_k = 0 \]

implies

\[ \lambda_1 = \cdots = \lambda_k = 0. \]

1.3.1 Example

Let

\[ x_1 = \begin{pmatrix}1 \\ 1\end{pmatrix}, \qquad x_2 = \begin{pmatrix}1 \\ 0\end{pmatrix}. \]

Suppose

\[ \lambda_1 x_1 + \lambda_2 x_2 = 0. \]

Then

\[ \lambda_1 \begin{pmatrix}1 \\ 1\end{pmatrix} + \lambda_2 \begin{pmatrix}1 \\ 0\end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}, \]

so

\[ \begin{pmatrix} \lambda_1 + \lambda_2 \\ \lambda_1 \end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}. \]

Therefore \(\lambda_1 = 0\) and \(\lambda_2 = 0\). Hence \(x_1\) and \(x_2\) are linearly independent.

1.4 Orthogonality

Two vectors \(x, y \in \mathbb{R}^p\) are orthogonal if their inner product is zero:

\[ \langle x, y \rangle = x^T y = y^T x = \sum_i x_i y_i = 0. \]

Example:

\[ y_1 = \begin{pmatrix}1 \\ 0\end{pmatrix}, \qquad y_2 = \begin{pmatrix}0 \\ 0.5\end{pmatrix}. \]

These are orthogonal because \(y_1^T y_2 = 0\).

1.5 Fact: orthogonal vectors are linearly independent

Claim. If \(x_1\) and \(x_2\) are orthogonal nonzero vectors, then they are linearly independent.

1.5.1 Proof

Assume, for contradiction, that \(x_1\) and \(x_2\) are linearly dependent. Then there exist scalars \(\lambda_1, \lambda_2\), not both zero, such that

\[ \lambda_1 x_1 + \lambda_2 x_2 = 0. \]

Left-multiply by \(x_1^T\):

\[ \lambda_1 x_1^T x_1 + \lambda_2 x_1^T x_2 = 0. \]

Because \(x_1\) and \(x_2\) are orthogonal, \(x_1^T x_2 = 0\), so

\[ \lambda_1 x_1^T x_1 = 0. \]

Since \(x_1 \neq 0\), we have \(x_1^T x_1 > 0\), hence \(\lambda_1 = 0\).

Similarly, left-multiplying by \(x_2^T\) gives \(\lambda_2 = 0\).

This contradicts the assumption that \(\lambda_1\) and \(\lambda_2\) are not both zero. Therefore \(x_1\) and \(x_2\) are linearly independent.

1.5.2 Converse does not hold

Linear independence does not imply orthogonality.

Example:

\[ x_1 = \begin{pmatrix}1 \\ 1\end{pmatrix}, \qquad x_2 = \begin{pmatrix}1 \\ 0\end{pmatrix}. \]

These vectors are linearly independent, but they are not orthogonal because

\[ x_1^T x_2 = 1. \]

1.6 Norm

Let

\[ a = \begin{pmatrix}a_1 \\ \vdots \\ a_p\end{pmatrix} \in \mathbb{R}^p. \]

Its Euclidean norm is

\[ \|a\| = \sqrt{a^T a} = \sqrt{\langle a, a \rangle} = \sqrt{\sum_{i=1}^p a_i^2}. \]

1.7 Orthogonal matrices

A square matrix

\[ Q = (q_1, \ldots, q_n) \]

is an orthogonal matrix if

\[ Q^TQ = QQ^T = I_n, \]

where each column \(q_i \in \mathbb{R}^n\) and \(\|q_i\| = 1\).

Examples:

\[ Q_1 = \begin{pmatrix} \tfrac{1}{\sqrt{2}} & \tfrac{1}{\sqrt{2}} \\ \tfrac{1}{\sqrt{2}} & -\tfrac{1}{\sqrt{2}} \end{pmatrix}, \qquad Q_2 = \begin{pmatrix} \tfrac{1}{\sqrt{2}} & -\tfrac{1}{\sqrt{2}} \\ \tfrac{1}{\sqrt{2}} & \tfrac{1}{\sqrt{2}} \end{pmatrix}. \]

2 Rank

2.1 Definition

For a matrix \(A \in \mathbb{R}^{n \times p}\), the rank of \(A\) is the maximum number of linearly independent rows of \(A\). Equivalently, it is the maximum number of linearly independent columns of \(A\).

2.2 Example

Consider

\[ A = \begin{pmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 2 \end{pmatrix}. \]

The first two rows are the same, so they are not linearly independent as a set of three rows. The matrix has rank

\[ \operatorname{rank}(A) = 2. \]

2.3 Properties of rank

  1. \(0 \leq \operatorname{rank}(A) \leq \min(n,p)\).
  2. \(\operatorname{rank}(A) = \operatorname{rank}(A^T)\).
  3. \(\operatorname{rank}(A + B) \leq \operatorname{rank}(A) + \operatorname{rank}(B)\).
  4. \(\operatorname{rank}(AB) \leq \min\bigl(\operatorname{rank}(A), \operatorname{rank}(B)\bigr)\).
  5. \[ \operatorname{rank}(A^TA) = \operatorname{rank}(AA^T) = \operatorname{rank}(A). \]
  6. For any matrix \(A\), if \(B\) and \(C\) are conformable and nonsingular, then \[ \operatorname{rank}(BAC) = \operatorname{rank}(A). \] In particular, if \(A\) is \(n \times p\), then \(B\) is \(n \times n\) and \(C\) is \(p \times p\).

2.4 Proof sketch for property 6

The handwritten notes give the following partial argument:

\[ \operatorname{rank}(A) \ge \operatorname{rank}(AC) \ge \operatorname{rank}(ACC^{-1}) = \operatorname{rank}(A), \]

which implies

\[ \operatorname{rank}(A) = \operatorname{rank}(AC). \]

Applying the same idea to left multiplication by \(B\) yields

\[ \operatorname{rank}(BA) = \operatorname{rank}(A), \]

and therefore

\[ \operatorname{rank}(BAC) = \operatorname{rank}(A). \]

The original notes end with: Please complete the proof.

3 Vector spaces

3.1 Definition

A vector space is a set of vectors that is closed under:

  • vector addition, and
  • scalar multiplication.

Examples:

  1. the set of all vectors in \(\mathbb{R}^p\) is a vector space.

  2. the set of all vectors in \(\mathbb{R}^p\) with zero in the last entry is a vector space.

  3. the set of all vectors in \(\mathbb{R}^p\) with the sum of the entries equal to zero is a vector space.

  4. the set of all vectors in \(\mathbb{R}^p\) with zero in the last entry or zero in the second-to-last entry is not a vector space, because it is not closed under vector addition.

Next, we will see that the column space of a matrix is a vector space.

4 Column space

Let \(A\) be an \(n \times p\) matrix with columns

\[ A = (a_1, \ldots, a_p), \qquad a_i \in \mathbb{R}^n. \]

The column space of \(A\), denoted \(C(A)\), is the vector space generated by the columns of \(A\):

\[ C(A) = \left\{ y : y = \sum_{i=1}^p c_i a_i \text{ for scalars } c_i,\ i=1,\ldots,p \right\}. \]

Equivalently,

\[ y \in C(A) \iff \exists c \in \mathbb{R}^p \text{ such that } y = Ac. \] So (1) \(\mathcal{C}(A)\) is the set of all linear combinations of the columns of \(A\), (2) multiplying \(A\) by a coefficient vector \(c\) produces a linear combination of the columns of \(A\).

4.1 Example.

Consider

\[ A = \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} a_1 & a_2 \end{bmatrix}, \qquad a_1 = \begin{bmatrix} 1\\ 1\\ 1 \end{bmatrix}, \qquad a_2 = \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix}. \]

Since

\[ a_1 = 1 \cdot a_1 + 0 \cdot a_2, \]

we have \(a_1 \in \mathcal{C}(A)\).

Similarly,

\[ a_2 = 0 \cdot a_1 + 1 \cdot a_2, \]

so \(a_2 \in \mathcal{C}(A)\).

Question:

\[ \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix} \in \mathcal{C}(A)? \]

Yes, because

\[ a_1 - a_2 = \begin{bmatrix} 1\\ 1\\ 1 \end{bmatrix} - \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix}. \]

Therefore,

\[ \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix} \in \mathcal{C}(A). \]

Question: \[ \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} \in \mathcal{C}(A)? \]

If it were, then we could find scalars \(c_1, c_2\) such that

\[ \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} = c_1 \begin{bmatrix} 1\\ 1\\ 1 \end{bmatrix} + c_2 \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix} = \begin{bmatrix} c_1 + c_2\\ c_1 + c_2\\ c_1 \end{bmatrix}. \]

Matching coordinates gives

\[ 1 = c_1 + c_2,\qquad 0 = c_1 + c_2,\qquad 0 = c_1, \]

which has no solution. Hence

\[ \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} \notin \mathcal{C}(A). \]

In this example, \(\mathcal{C}(A)\) is a subspace of \(\mathbb{R}^3\).

4.2 A proposition about column spaces

4.2.1 Proposition

For conformable matrices \(A\), \(B\), and \(C\),

\[ A = BC \quad \Longleftrightarrow \quad \mathcal{C}(A) \subseteq \mathcal{C}(B). \]

4.2.2 Proof of the forward direction

Assume \(A = BC\).

Take any \(x \in \mathcal{C}(A)\). By definition of column space, there exists a vector \(d\) such that

\[ x = Ad. \]

Since \(A = BC\),

\[ x = Ad = BCd. \]

But \(Cd\) is just another vector, so \(x\) has the form \(B(\text{something})\). Therefore \(x \in \mathcal{C}(B)\).

Hence every vector in \(\mathcal{C}(A)\) is also in \(\mathcal{C}(B)\), so

\[ \mathcal{C}(A) \subseteq \mathcal{C}(B). \]

4.2.3 Proof of the reverse direction

Now assume

\[ \mathcal{C}(A) \subseteq \mathcal{C}(B). \]

Write

\[ A = (a_1, \ldots, a_p). \]

Each column \(a_i\) belongs to \(\mathcal{C}(A)\), so by assumption each \(a_i \in \mathcal{C}(B)\). Therefore, for each \(i\) there exists a vector \(c_i\) such that

\[ a_i = Bc_i. \]

Collect these vectors into a matrix

\[ C = (c_1, \ldots, c_p). \]

Then

\[ A = (a_1, \ldots, a_p) = (Bc_1, \ldots, Bc_p) = B(c_1, \ldots, c_p) = BC. \]

So there exists a matrix \(C\) such that \(A = BC\).

4.2.4 Example

Consider again

\[ A = \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 1 & 0 \end{bmatrix} \in \mathbb{R}^{3 \times 2}, \qquad B = I_3 = \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix}. \]

Then

\[ \operatorname{rank}(A) = 2, \qquad \operatorname{rank}(B) = 3. \]

Since \(B = I_3\), clearly \(\mathcal{C}(A) \subseteq \mathcal{C}(B) = \mathbb{R}^3\), and the proposition implies that there exists a matrix \(C\) such that \(A = BC\).

4.3 Theorem

\[ \mathcal{C}(A) = \mathcal{C}(AA^T), \qquad \mathcal{C}(A^T) = \mathcal{C}(A^TA). \]

Practice: show that \(\mathcal{C}(AA^T) \subseteq \mathcal{C}(A)\).

5 Eigenvalues and trace

5.1 Definitions: eigenvector, eigenvalue, and trace

Let \(A \in \mathbb{R}^{n \times n}\).

  1. A nonzero vector \(x \in \mathbb{R}^n\) is called an eigenvector of \(A\) if

\[ Ax = \lambda x \]

for some scalar \(\lambda\). The scalar \(\lambda\) is the eigenvalue of \(A\) corresponding to the eigenvector \(x\).

  1. The trace of \(A\) is the sum of the diagonal entries of \(A\): \[ \operatorname{tr}(A) = \sum_{i=1}^n A_{ii}. \]

Facts

If the eigenvalues of \(A\) are \(\lambda_1, \ldots, \lambda_n\), then

\[ \det(A) = |A| = \prod_{i=1}^n \lambda_i, \]

and

\[ \operatorname{tr}(A) = \sum_{i=1}^n \lambda_i. \]

5.2 Properties of trace

  1. \[ \operatorname{tr}(A + B) = \operatorname{tr}(A) + \operatorname{tr}(B). \]

  2. \[ \operatorname{tr}(AB) = \operatorname{tr}(BA). \]

A useful example from the notes is

\[ \operatorname{tr}(Axx^T) = \operatorname{tr}(x^TAx) = x^TAx. \]

  1. \[ \operatorname{tr}(A^s) = \sum_{i=1}^n \lambda_i^s. \]

Also,

\[ \det(A) = \prod_{i=1}^n \lambda_i. \]

  1. If \(P\) is nonsingular (invertible), then

\[ \operatorname{tr}(P^{-1}AP) = \operatorname{tr}(A). \]

  1. If \(C\) is orthogonal, then

\[ \operatorname{tr}(C^TAC) = \operatorname{tr}(A). \]

Proof of (2): (skip the proof in class) Let \(A_{n\times p}\) and \(B_{p\times n}\). Then \[(AB)_{ii}=\sum_{k=1}^p a_{ik}b_{ki} \Rightarrow trace(AB)= \sum_{i=1}^n \sum_{k=1}^p a_{ik}b_{ki}\] \[(BA)_{jj}=\sum_{k=1}^n b_{jk}a_{kj} \Rightarrow trace(BA)= \sum_{j=1}^p \sum_{k=1}^n b_{jk}a_{kj} = \sum_{i=1}^n \sum_{k=1}^p a_{ik}b_{ki}\]

Example. \(A=\begin{pmatrix} 1\\ \vdots \\ 1 \end{pmatrix}, B=(1,\cdots,1)\). \(tr(AB)=tr(BA)=n\).

Proof of (3): (skip the proof in class) By the definition of eigenvalues, \(\lambda_i\)’s are the \(n\) roots to \(det(\lambda I_n-A)=0\). In other words, \(det(\lambda I_n-A)=\prod (\lambda-\lambda_i)=\lambda^n -\lambda^{n-1}(\lambda_1+\cdots+\lambda_n) + \cdots + (-1)^n\lambda_1\cdots \lambda_n\).

Note that \[det(\lambda I_n -A) = |\begin{pmatrix} \lambda-a_{11} & -a_{12} & \cdots & -a_{1n}\\ -a_{21} & \lambda-a_{22} & \cdots & -a_{2n}\\ \cdots & \cdots & \cdots & \cdots\\ \cdots & \cdots & \cdots & \lambda -a_{nn} \end{pmatrix}|\] So we have
\[\lambda^n -\lambda^{n-1}(\lambda_1+\cdots+\lambda_n) + \cdots + (-1)^n\lambda_1\cdots \lambda_n= |\begin{pmatrix} \lambda-a_{11} & -a_{12} & \cdots & -a_{1n}\\ -a_{21} & \lambda-a_{22} & \cdots & -a_{2n}\\ \cdots & \cdots & \cdots & \cdots\\ \cdots & \cdots & \cdots & \lambda -a_{nn} \end{pmatrix}|\]

Use Cofactor Expansions and commpare the coefficient of \(\lambda^{n-1}\) of both sides, we can conclude that \[-(\lambda_1+\cdots+\lambda_n)=-(a_{11}+\cdots+a_{nn}),\] indicating that \(tr(A)=\sum \lambda_i\).

Let \(\lambda=0\) and compare both sides we have \[(-1)^n\lambda_1\cdots\lambda_n=det(-A)=(-1)^n det(A),\] indicating that \(det(A)=\prod \lambda_i\).

Proof of (4): (show the proof in class) \(tr(P^{-1}AP)=tr(APP^{-1})=tr(A)\).

6 Special Square Matrices

6.1 Symmetric Matrices

A matrix \(A\) is symmetric if \(A = A^T\).

6.1.1 Theorem: Spectral Decomposition of Symmetric Matrices

  1. If \(A_{n\times n}\) is a symmetrix matrix, then there exists an orthogonal matrix \(\Gamma=(\gamma_1,\cdots,\gamma_n)\) and a diagonal matrix \(\Lambda=diag(\lambda_1,\cdots,\lambda_n)\) such that \(A=\Gamma \Lambda \Gamma^T\). This factorization is known as the spectral decomposition of a symmetric matrix.

  2. \(A=\sum_{i=1}^n \lambda_i \gamma_i \gamma_i^T\).

  3. \(A\Gamma=\Gamma \Lambda\), which implies that \(A\gamma_i = \lambda_i \gamma_i\). Thus, \(\lambda_i's\) are eigenvalues and \(\gamma_i's\) are the correponing eigenvectors.

  4. \(rank(A)\) is equal to the number of non-zero eigenvalues.

  5. \(tr(A^s)=\sum \lambda_i^s\)

  6. If \(A\) is also non-singular, then \(tr(A^{-1})=\sum \lambda_i^{-1}\)

Proof of (1): The proof is lengthy and requires basic knowledge of eigenvalues and eigenvectors. We omit the proof.

Proof of (4): Recall that for any matrix \(A\) and conformable and nonsingular \(B\) and \(C\) we have \(rank(BAC)=rank(A)\). Therefore, \(rank(A)=rank(\Gamma\Lambda \Gamma^T)=rank(\Lambda)=\sum I(\lambda_i\not=0)\)

Proof of (5): \(tr(A^s)=tr((\Gamma\Lambda \Gamma^T)\cdots (\Gamma\Lambda \Gamma^T))=tr(\Gamma(\Lambda)^s\Gamma^T)=\sum \lambda_i^s\)

Proof of (6): When \(A\) is nonsingular, the eigenvalues of \(A^{-1}\) is \(\lambda_i^{-1}\). Thus, there exists \(\Gamma\) such that \(A^{-1}=\Gamma(\Lambda)^{-1}\Gamma^T\) and the result follows immediately.

Example. \(A=\begin{pmatrix}2.2 & 0.4 \\ 0.4 & 2.8\end{pmatrix}= \begin{pmatrix}1/\sqrt{5} & 2/\sqrt{5} \\ 2/\sqrt{5} & -1/\sqrt{5}\end{pmatrix} \begin{pmatrix}3 & 0 \\ 0 & 2\end{pmatrix} \begin{pmatrix} 1/\sqrt{5} & 2/\sqrt{5} \\ 2/\sqrt{5} & -1/\sqrt{5} \end{pmatrix}\)

6.1.2 Theorem: SVD

Singular value decomposition: If \(A\) is an \(n\times p\) matrix of rank \(r\), then \(A\) can be written as \[A_{n\times p}=U_{n\times r}L_{r\times r}V^T_{r\times p}\] where \(U(n\times r)\) and \(V(p\times r)\) are column orthogonal matrices (\(U^TU=V^TV=I_r\) and \(L\) is a diagonal matrix).

Note:\(AA^T=UL^2U^T\), and \(A^TA=VL^2V^T\).

6.2 Positive Definite and Positive-Semidefinite

6.2.1 Quadratic Forms

A quadratic form of a vector \(x\) is a function of the form \[Q(x)=x^TAx=\sum_{i=1}^p\sum_{j=1}^pa_{ij}x_ix_j\] where \(A\) is a symmetric matrix; that is \[Q(x)=a_{11}x_1^2+\cdots+a_{pp}x_p^2+2a_{12}x_1x_2+\cdots+2a_{p-1,p}x_{p-1}x_p\]

Examples.

  1. \(A=\begin{pmatrix}1 & 1\\ 1 & 1\end{pmatrix}, x^TAx=x_1^2+2x_1x_2+x_2^2\). Note that for any \(x\not= 0\), \(x^TAx\ge 0\).

  2. \(A=\begin{pmatrix}1 & -1/2\\ -1/2 & 1\end{pmatrix}, x^TAx=x_1^2-x_1x_2+x_2^2=(x_1-x_2/2)^2+3x_2^2/4\). Note that for any \(x\not= 0\), \(x^TAx> 0\).

  3. \(A=\begin{pmatrix}1 & -2\\ -2 & 1\end{pmatrix}, x^TAx=x_1^2-4x_1x_2+x_2^2=(x_1-x_2)^2-2x_1x_2\).

6.2.2 Theorem: Diagonalization of Quadratic Forms

For any symmetric matrix \(A\), there exists an orthogonal transformation \(y=\Gamma^T x\) such that \[x^TAx=\sum \lambda_iy_i^2\]

Proof: Consider the spectral decomposition of \(A=\Gamma \Lambda \Gamma^T\) and the transformation \(y=\Gamma^Tx\) Then \[x^TAx=x^T\Gamma^T\Lambda\Gamma x=y^T\Lambda y=\sum \lambda_i y_i^2\]

6.2.3 Definition: Positive Definite and Positive Semi-definite

  • \(Q(x)\) is called a positive definite (p.d.) quadratic form if \(Q(x)>0\) for all \(x\not=0\); it is call positive semi-definite (p.s.d.) if \(Q(x)\ge 0\) for all \(x\not=0\).

  • A symmetric matrix \(A\) is called p.d. (or p.s.d.) if \(Q(x)\) is p.d. (or p.s.d.). We use the notation \(>0\) for positive definite and \(\ge 0\) for positive semi-definite.

6.2.4 Properties of PSD matrices.

Let \(A_{p\times p}\) be p.s.d. Then 1) \(\lambda_i\ge 0\) for \(i=1,\cdots,p\). The result implies that if \(A\ge 0\), then \(tr(A)\ge 0\). Proof: \(A\ge 0\) implies that \(0\le x^TAx=x^T\Gamma\Lambda\Gamma^Tx=\lambda_1y_1^2+\cdots+\lambda_p y_p^2\) (Thm 1.5). Take \(y_1=1, y_2=0,\cdots, y_p=0\) we have \(\lambda_1\ge0\), etc.

  1. There exists a unique p.s.d matrix \(B\) such that \(A=B^2\). Proof of existence. Consider the spetral decomposition of A: \(A=\Gamma \Lambda \Gamma^T\). Let \(B=\Gamma \Lambda^{1/2} \Gamma^T\). \(B\) is called the square root matrix of \(A\).

Example. \(A=\begin{pmatrix}1 & 1\\ 1 & 1\end{pmatrix}, A^{1/2}=\begin{pmatrix}1/\sqrt{2} & 1/\sqrt{2}\\ 1/\sqrt{2} & 1/\sqrt{2}\end{pmatrix}\)

  1. Let \(C\) be any \(p\times n\) matrix. We have \(C^TAC\ge 0\). Proof: Since \(A\ge 0\), for any \(n\)-vector \(x\not= 0\), \[x^TC^TACx=(Cx)^TA(Cx)\ge 0\] so \(C^TAC\ge 0\).

  2. \(a_{ii}\ge 0\). Proof: take \(x=(1,0,...,0)^T\), because \(A\) is p.s.d., then \(x^TAx\ge 0\). But \(a_{11}=x^TAx\). So \(a_{11}\ge 0\). Similarly, \(a_{ii}\ge 0\) for \(i=1,\cdots,p\).

6.2.5 Properties of PD Matrices.

Let \(A\) be p.d. Then the following properties hold:

  1. \(rank(CAC^T)=rank(C).\)

Proof: \(rank(CAC^T)=rank(CBBC^T)=rank(CB(CB)^T)=rank(CB)=rank(C)\). The last step is true because we know that for any \(A\), when \(P\) and\(Q\) are conformable and nonsingular, \(rank(PAQ)=rank(A)=rank(PA)=rank(AQ)\).

  1. If \(C(n\times p)\) has rank \(n\), then \(CAC^T\) is p.d.

Proof: \(x^TCAC^Tx=y^TAy\ge0\) with equality iff \(y=0\) iff \(C^Tx=0\) iff \(x=0\) (since the columns of \(C^T\) are linearly independent). Hence \(x^TCAC^Tx>0\) for all \(x\not=0\).

  1. If \(B (n\times p)\) of rank \(p\), then \(B^TB\) is p.d.

Proof: \(x^TB^TBx=y^Ty\ge 0\) with equality iff \(Bx=0\) iff \(x=0\) (since the columns of \(B\) are linearly independent)

  1. The diagonal elements of a p.d. matrix are all positive.

Proof: For i=1, take \(x=(1,0,...,0)^T\), then \(x^TAx=a_{11}> 0\). Similarly, you can prove for other \(i\)’s.

  1. \(x^TAx=0\) iff \(x=0\). Proof: \(A\) can be written as \(B^2\). Thus \(x^TAx=(Bx)^T(Bx)\), equal to 0 iff \(Bx=0\), iff \(x=0\).

Practice Problems:

  • Let \(A_{p\times p}\) be a positive definite matrix and let \(B_{k\times p}\) be a matrix of rank \(k\le p\). Then \(BAB^T\) is positive definite.

  • Let \(A_{p\times p}\) be a positive defnite matrix and let \(B_{k\times p}\) a matrix. If \(k>p\) or if \(rank(B)=r<p\), then \(BAB^T\) is positive semidefinite.

6.2.6 Lemma: Gram factorization / Gram representation

A symmetric matrix \(A\) is p.d. if and only if there exists a nonsingular matrix \(P\) such that \(A=P^TP\).

proof: $: $ By the spectral decomposition, \(A=\Gamma \Lambda \Gamma^T\), where \(\Gamma\) is orthogonal and \(\Lambda=\) is diagonal. Because \(A\) is p.d., \(\Gamma^TA\Gamma=\Lambda\) is also p.d.. Thus, we can write \(\Lambda\) to \(\Lambda^{1/2}\Lambda^{1/2}\), which leads to \(A=P^TP\).

$: $ Since \(P\) is nonsingular, its columns are linearly independent. Therefore, for any \(x\not=0\), \(Px\not=0\), and \(x^TAx=x^TP^TPx=(Px)^T (Px)>0\), i.e., \(A\) is positive definite.

Remarks:

  1. The result is known as the Gram factorization of a p.d. matrix, and \(P\) is called the Gram representation of \(A\). Note that the Gram representation is not unique. For example, if \(P\) is a Gram representation of \(A\), then so is \(CP\) for any orthogonal matrix \(C\).

  2. Related factorizations:

  • Cholesky factorization: If \(A\) is p.d., then there exists a unique lower triangular matrix \(L\) with positive diagonal entries such that \(A=LL^T\).

  • Square root factorization: If \(A\) is p.d., then there exists a unique p.d. matrix \(B\) such that \(A=B^2\).

6.2.7 Lemma: Eigenvalues of \(AB\) and \(BA\)

For conformable matrices, the nonzero eigenvalues of \(AB\) are the same as those of \(BA\). The eigenvalues are identical for square matrices.

Proof: Let \(\lambda\) be a nonzero eigenvalue of \(AB\). Then there exists \(u (u\not=0)\) such that \(ABu=\lambda u\); that is, \(BABu=\lambda Bu\). Hence \(BAv=\lambda v\), where \(v=Bu\not=0\) (as \(ABu\not=0\)), and \(\lambda\) is an eigenvalue of \(BA\). The argument reverses by interchanging the roles of \(A\) and \(B\). For square matrices \(AB\) and \(BA\) have the same number of zero eigenvalues.

6.2.8 Theorem: Maximization of Quadratic Forms

Let \(A\) and \(B\) be two symmetric matrices. Suppose that \(B>0\). Then the maximum (minimum) of \(x^TAx\) given \(x^TBx=1\) is attained when \(x\) is the eigenvector of \(B^{-1}A\) corresponding to the largest (smallest) eigenvalue of \(B^{-1}A\). Thus, if \(\lambda_1\) and \(\lambda_p\) are the largest and the smallest eigenvalue of \(B^{-1}A\), respectively, then subject to the constrain \(x^TBx=1\), \[max_x x^TAx=\lambda_{max} (B^{-1}A), min_x x^TAx = \lambda_{min}(B^{-1}A)\]

This is a homework problem. Hint: consider \(y=B^{1/2}x\).

The above theorem can also be written as \[max_{x\not=0} \frac{x^TAx}{x^TBx}=\lambda_{max} (B^{-1}A), min_{x\not=0} \frac{x^TAx}{x^TBx}=\lambda_{min} (B^{-1}A)\]

A useful result from the theorem: Let \(B=I\), then \[ \begin{align*} \max_{x,\;x\ne 0}\left\{ \frac{x^T A x}{x^T x}\right\} &= \lambda_{\max}\\ \min_{x,\;x\ne 0}\left\{ \frac{x^T A x}{x^T x}\right\} &= \lambda_{\min} \end{align*} \]

where \(\lambda_{min}\) and \(\lambda_{max}\) are the minimum and maximum eigenvalues of \(A\), respectively. These values occur when \(x\) is the eigenvector corresponding to the eigenvalues \(\lambda_{min}\) and \(\lambda_{max}\), respectively.

6.2.9 Key Lemma for Scheffé’s Method (Maximization of Linear Contrasts)

If \(L\) is p.d., then for any \(b\), \[max_{h,h\not=0}\left\{\frac{(h^Tb)^2}{h^TLh}\right\}=b^TL^{-1}b\]

Proof: The theorem can be proved by using the previous theorem because we can rewrite \((h^Tb)^2\) to \(h^Tbb^Th\).

Thus, the maximum is \[\lambda_{max}(L^{-1}bb^T)=\lambda_{max}(b^TL^{-1}b)=b^TL^{-1}b\] The last step is true because \(b^TL^{-1}b\) is a scalar; the second to the last step is true because the nonzero eigenvalues of \(AB\) are the same as those of \(BA\).

6.3 Idempotent matrices and projection matrices

Definition. A matrix \(P\) is idempotent if \(P^2=P\).

Note. If \(P\) is idempotent, so is \(I-P\).

Example. \(A=\begin{pmatrix}1 & 1\\ 0 & 0\end{pmatrix}, A^2=A\). Verify that $()

Definition. A symmetric and idempotent matrix is called a projection matrix.

Example. \(A=\begin{pmatrix}1/2 & 1/2\\ 1/2 & 1/2\end{pmatrix}, A=A^T, A^2=A\).

6.3.1 Properties of projection matrices:

  1. If \(P\) is symmetric, then \(P\) is idempotent with rank \(r\) if and only if it has \(r\) eigenvalues equal to 1 and \(n-r\) eigenvalues equal to zero.

  2. If \(P\) is a projection matrix, then \(tr(P)=rank(P)\).

  3. Projection matrices are positive-semidefinite.

  4. If \(P_i (i=1,2)\) is a projection matrix and \(P_1-P_2\) is p.s.d., then

  • \(P_1P_2=P_2P_1=P_2\) (note, both \(P_1P_2\) and \(P_2P_1\) are projection matrices).
  • \(P_1-P_2\) is a projection matrix.

Proof of (1): Let \(\lambda\) be an eigenvalue of \(P\), then we have \(Px=\lambda x\), where \(x\) is the corresponding eigenvector. Since \(P^2=P\), we have \(\lambda x= Px=P(Px)=\lambda Px= \lambda^2 x\), which implies that \(\lambda\) equals either zero or 1. For a symmetric matrix, its rank equals the number of nonzero eigenvalues. Thus \(rank(P)\) equals the number of 1’s.

On the other hand, suppose \(P\) has \(r\) eigenvalues equal to 1 and \(n-r\) eigenvalues equal to 0. Without loss of generality, we assume that the first \(r\) eigenvalues are 1. By the spectral decomposition of the symmetric matrix \(P\), we have \(P=\Gamma \Lambda \Gamma^T\), where the first \(r\) diagonal elements of \(\Lambda\) are 1 and all the other elements are zero. It is obvious that \(\Lambda^2=\Lambda\). Thus \(P^2=\Gamma \Lambda \Gamma^T\Gamma \Lambda \Gamma^T=\Gamma \Lambda \Gamma=P\).

Proof of (2): Based on (1), \(P\) has \(rank(P)\) eigenvalues equal to 1 and \(n-rank(P)\) eigenvalues equal to 0. We also know that \(tr(P)=\sum \lambda_i=rank(P)\).

Proof of (3): \(x^TPx=X^TPPx=(Px)^T(Px)\ge 0\).

Proof of (4): Clearly the following are true:

\[\begin{eqnarray*} ((I-P_1)y)^T P_2 ((I-P_1)y) & \ge & 0\\ ((I-P_1)y)^T P_1 ((I-P_1)y) & = & 0 \end{eqnarray*}\]

which implies that \([(I-P_1)y]^T(P_1-P_2)[(I-P_1)y] \le 0\). However, we also know that \(P_1-P_2\ge 0\). Therefore, it must be true that \[[(I-P_1)y]^T(P_2-P_1)[(I-P_1)y] = 0\] which implies that \[\begin{eqnarray*} (I-P_1)P_2(I-P_1)=0\\ &\Rightarrow & (I-P_1)P_2 P_2 (I-P_1)=0\\ &\Rightarrow & (I-P_1)P_2=0\\ &\Rightarrow & P_2=P_1P_2 \end{eqnarray*}\]

A projection matrix must be a symmetric matrix. Therefore \[P_2=(P_2)^T=(P_1P_2)^T=(P_2)^T(P_1)^T=P_2P_1\]

Last, \((P_1-P_2)^2=P_1-2P_1P_2+P_2=P_1-2P_2+P_2=P_1-P_2\).

7 Inverse and Generalized Inverse

Definition. A full-rank square matrix is called a nonsingular matrix. A nonsingular matrix \(A\) has a unique inverse, denoted by \(A^{-1}\): \[AA^{-1}=A^{-1}A=I.\]

Some properties of \(A^{-1}\): - It is unique - \((AB)^{-1}=B^{-1}A^{-1}\) - \((A^T)^{-1}=(A^{-1})^T\)

Definition. For a matrix \(A_{n\times p}\), \(A^{-}\) is call a g-inverse (generalized inverse) of \(A\) if \[AA^{-}A=A\]

A generalized inverse always exists although in general it is not unique.

8 Matrix Differentiation

Definition. We define the derivative of \(f(X)\) with respect to \(X_{n\times p}\) as \[\frac{\partial f(X)}{\partial X}=\left(\frac{\partial f(X)}{\partial x_{ij}} \right)_{n\times p}\]

The following results are useful:

  • \[\frac{\partial a^Tx}{\partial x}=a\]
  • \[\frac{\partial x^Tx}{\partial x}=2x, \frac{\partial x^TAx}{\partial x}=(A+A^T)x, \frac{\partial x^TAy}{\partial x}=Ay\]

Proof of (2): \[\begin{eqnarray*} \frac{\partial x^TAx}{ \partial x_k} &=& \frac{\partial}{\partial x_k}\left( \sum_{i}\sum_{j} a_{ij}x_{i}x_{j}\right) \\ &=& \frac{\partial}{\partial x_k} (a_{kk}x_k^2 + \sum_{j\not=k}a_{kj}x_kx_j +\sum_{i\not=k} a_{ik} x_i x_k )\\ &=&2a_{kk}x_k + \sum_{j\not=k}a_{kj}x_j + \sum_{i\not=k}a_{ik}x_i\\ &=&\sum_i a_{ki}x_i + \sum_i a_{ik}x_i\\ &=& (Ax)_k + (A^Tx)_k\\ &=& ((A+A^T)x)_k \end{eqnarray*}\]