A Brief Review of Linear Algebra

The lecture covers the following linear algebra topics:

vectors and linear independence
rank
vector spaces
column spaces
eigenvalues and trace
Special square matrices: symmetric, positive definite, and positive semidefinite matrices
Inverse and generalized inverse
Matrix differentiation

1 Vectors and linear independence

1.1 Vectors

A vector (more specifically, a column vector) is a quantity whose entries are stacked vertically.

Examples:

\[ a = \begin{pmatrix}1 \\ 2 \\ 0.5\end{pmatrix}, \qquad u = \begin{pmatrix}u_1 \\ u_2 \\ u_3\end{pmatrix}, \qquad u = (u_1, \ldots, u_p)^T \in \mathbb{R}^p. \]

1.2 Linear dependence

Let $x_1, \ldots, x_k$ be vectors in $\mathbb{R}^p$.

We say that $x_1, \ldots, x_k$ are linearly dependent if there exist scalars $\lambda_1, \ldots, \lambda_k$, not all zero, such that

\[ \lambda_1 x_1 + \cdots + \lambda_k x_k = 0. \]

Equivalently, if one coefficient is nonzero, then one vector can be written as a linear combination of the others. For example, if $\lambda_1 \neq 0$, then

\[ x_1 = -\frac{1}{\lambda_1} \sum_{i=2}^k \lambda_i x_i. \]

1.3 Linear independence

We say that $x_1, \ldots, x_k$ are linearly independent if they are not linearly dependent.

Equivalently, $x_1, \ldots, x_k$ are linearly independent if

\[ \lambda_1 x_1 + \cdots + \lambda_k x_k = 0 \]

implies

\[ \lambda_1 = \cdots = \lambda_k = 0. \]

1.3.1 Example

Let

\[ x_1 = \begin{pmatrix}1 \\ 1\end{pmatrix}, \qquad x_2 = \begin{pmatrix}1 \\ 0\end{pmatrix}. \]

Suppose

\[ \lambda_1 x_1 + \lambda_2 x_2 = 0. \]

Then

\[ \lambda_1 \begin{pmatrix}1 \\ 1\end{pmatrix} + \lambda_2 \begin{pmatrix}1 \\ 0\end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}, \]

\[ \begin{pmatrix} \lambda_1 + \lambda_2 \\ \lambda_1 \end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}. \]

Therefore $\lambda_1 = 0$ and $\lambda_2 = 0$. Hence $x_1$ and $x_2$ are linearly independent.

1.4 Orthogonality

Two vectors $x, y \in \mathbb{R}^p$ are orthogonal if their inner product is zero:

\[ \langle x, y \rangle = x^T y = y^T x = \sum_i x_i y_i = 0. \]

Example:

\[ y_1 = \begin{pmatrix}1 \\ 0\end{pmatrix}, \qquad y_2 = \begin{pmatrix}0 \\ 0.5\end{pmatrix}. \]

These are orthogonal because $y_1^T y_2 = 0$.

1.5 Fact: orthogonal vectors are linearly independent

Claim. If $x_1$ and $x_2$ are orthogonal nonzero vectors, then they are linearly independent.

1.5.1 Proof

Assume, for contradiction, that $x_1$ and $x_2$ are linearly dependent. Then there exist scalars $\lambda_1, \lambda_2$, not both zero, such that

\[ \lambda_1 x_1 + \lambda_2 x_2 = 0. \]

Left-multiply by $x_1^T$:

\[ \lambda_1 x_1^T x_1 + \lambda_2 x_1^T x_2 = 0. \]

Because $x_1$ and $x_2$ are orthogonal, $x_1^T x_2 = 0$, so

\[ \lambda_1 x_1^T x_1 = 0. \]

Since $x_1 \neq 0$, we have $x_1^T x_1 > 0$, hence $\lambda_1 = 0$.

Similarly, left-multiplying by $x_2^T$ gives $\lambda_2 = 0$.

This contradicts the assumption that $\lambda_1$ and $\lambda_2$ are not both zero. Therefore $x_1$ and $x_2$ are linearly independent.

1.5.2 Converse does not hold

Linear independence does not imply orthogonality.

Example:

\[ x_1 = \begin{pmatrix}1 \\ 1\end{pmatrix}, \qquad x_2 = \begin{pmatrix}1 \\ 0\end{pmatrix}. \]

These vectors are linearly independent, but they are not orthogonal because

\[ x_1^T x_2 = 1. \]

1.6 Norm

Let

\[ a = \begin{pmatrix}a_1 \\ \vdots \\ a_p\end{pmatrix} \in \mathbb{R}^p. \]

Its Euclidean norm is

\[ \|a\| = \sqrt{a^T a} = \sqrt{\langle a, a \rangle} = \sqrt{\sum_{i=1}^p a_i^2}. \]

1.7 Orthogonal matrices

A square matrix

\[ Q = (q_1, \ldots, q_n) \]

is an orthogonal matrix if

\[ Q^TQ = QQ^T = I_n, \]

where each column $q_i \in \mathbb{R}^n$ and $\|q_i\| = 1$.

Examples:

\[ Q_1 = \begin{pmatrix} \tfrac{1}{\sqrt{2}} & \tfrac{1}{\sqrt{2}} \\ \tfrac{1}{\sqrt{2}} & -\tfrac{1}{\sqrt{2}} \end{pmatrix}, \qquad Q_2 = \begin{pmatrix} \tfrac{1}{\sqrt{2}} & -\tfrac{1}{\sqrt{2}} \\ \tfrac{1}{\sqrt{2}} & \tfrac{1}{\sqrt{2}} \end{pmatrix}. \]

2 Rank

2.1 Definition

For a matrix $A \in \mathbb{R}^{n \times p}$, the rank of $A$ is the maximum number of linearly independent rows of $A$. Equivalently, it is the maximum number of linearly independent columns of $A$.

2.2 Example

Consider

\[ A = \begin{pmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 2 \end{pmatrix}. \]

The first two rows are the same, so they are not linearly independent as a set of three rows. The matrix has rank

\[ \operatorname{rank}(A) = 2. \]

2.3 Properties of rank

$0 \leq \operatorname{rank}(A) \leq \min(n,p)$.
$\operatorname{rank}(A) = \operatorname{rank}(A^T)$.
$\operatorname{rank}(A + B) \leq \operatorname{rank}(A) + \operatorname{rank}(B)$.
$\operatorname{rank}(AB) \leq \min\bigl(\operatorname{rank}(A), \operatorname{rank}(B)\bigr)$.
\[ \operatorname{rank}(A^TA) = \operatorname{rank}(AA^T) = \operatorname{rank}(A). \]
For any matrix $A$, if $B$ and $C$ are conformable and nonsingular, then \[ \operatorname{rank}(BAC) = \operatorname{rank}(A). \] In particular, if $A$ is $n \times p$, then $B$ is $n \times n$ and $C$ is $p \times p$.

2.4 Proof sketch for property 6

The handwritten notes give the following partial argument:

\[ \operatorname{rank}(A) \ge \operatorname{rank}(AC) \ge \operatorname{rank}(ACC^{-1}) = \operatorname{rank}(A), \]

which implies

\[ \operatorname{rank}(A) = \operatorname{rank}(AC). \]

Applying the same idea to left multiplication by $B$ yields

\[ \operatorname{rank}(BA) = \operatorname{rank}(A), \]

and therefore

\[ \operatorname{rank}(BAC) = \operatorname{rank}(A). \]

The original notes end with: Please complete the proof.

3 Vector spaces

3.1 Definition

A vector space is a set of vectors that is closed under:

vector addition, and
scalar multiplication.

Examples:

the set of all vectors in $\mathbb{R}^p$ is a vector space.
the set of all vectors in $\mathbb{R}^p$ with zero in the last entry is a vector space.
the set of all vectors in $\mathbb{R}^p$ with the sum of the entries equal to zero is a vector space.
the set of all vectors in $\mathbb{R}^p$ with zero in the last entry or zero in the second-to-last entry is not a vector space, because it is not closed under vector addition.

Next, we will see that the column space of a matrix is a vector space.

4 Column space

Let $A$ be an $n \times p$ matrix with columns

\[ A = (a_1, \ldots, a_p), \qquad a_i \in \mathbb{R}^n. \]

The column space of $A$, denoted $C(A)$, is the vector space generated by the columns of $A$:

\[ C(A) = \left\{ y : y = \sum_{i=1}^p c_i a_i \text{ for scalars } c_i,\ i=1,\ldots,p \right\}. \]

Equivalently,

\[ y \in C(A) \iff \exists c \in \mathbb{R}^p \text{ such that } y = Ac. \] So (1) $\mathcal{C}(A)$ is the set of all linear combinations of the columns of $A$, (2) multiplying $A$ by a coefficient vector $c$ produces a linear combination of the columns of $A$.

4.1 Example.

Consider

\[ A = \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} a_1 & a_2 \end{bmatrix}, \qquad a_1 = \begin{bmatrix} 1\\ 1\\ 1 \end{bmatrix}, \qquad a_2 = \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix}. \]

Since

\[ a_1 = 1 \cdot a_1 + 0 \cdot a_2, \]

we have $a_1 \in \mathcal{C}(A)$.

Similarly,

\[ a_2 = 0 \cdot a_1 + 1 \cdot a_2, \]

so $a_2 \in \mathcal{C}(A)$.

Question:

\[ \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix} \in \mathcal{C}(A)? \]

Yes, because

\[ a_1 - a_2 = \begin{bmatrix} 1\\ 1\\ 1 \end{bmatrix} - \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix}. \]

Therefore,

\[ \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix} \in \mathcal{C}(A). \]

Question: \[ \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} \in \mathcal{C}(A)? \]

If it were, then we could find scalars $c_1, c_2$ such that

\[ \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} = c_1 \begin{bmatrix} 1\\ 1\\ 1 \end{bmatrix} + c_2 \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix} = \begin{bmatrix} c_1 + c_2\\ c_1 + c_2\\ c_1 \end{bmatrix}. \]

Matching coordinates gives

\[ 1 = c_1 + c_2,\qquad 0 = c_1 + c_2,\qquad 0 = c_1, \]

which has no solution. Hence

\[ \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} \notin \mathcal{C}(A). \]

In this example, $\mathcal{C}(A)$ is a subspace of $\mathbb{R}^3$.

4.2 A proposition about column spaces

4.2.1 Proposition

For conformable matrices $A$, $B$, and $C$,

\[ A = BC \quad \Longleftrightarrow \quad \mathcal{C}(A) \subseteq \mathcal{C}(B). \]

4.2.2 Proof of the forward direction

Assume $A = BC$.

Take any $x \in \mathcal{C}(A)$. By definition of column space, there exists a vector $d$ such that

\[ x = Ad. \]

Since $A = BC$,

\[ x = Ad = BCd. \]

But $Cd$ is just another vector, so $x$ has the form $B(\text{something})$. Therefore $x \in \mathcal{C}(B)$.

Hence every vector in $\mathcal{C}(A)$ is also in $\mathcal{C}(B)$, so

\[ \mathcal{C}(A) \subseteq \mathcal{C}(B). \]

4.2.3 Proof of the reverse direction

Now assume

\[ \mathcal{C}(A) \subseteq \mathcal{C}(B). \]

Write

\[ A = (a_1, \ldots, a_p). \]

Each column $a_i$ belongs to $\mathcal{C}(A)$, so by assumption each $a_i \in \mathcal{C}(B)$. Therefore, for each $i$ there exists a vector $c_i$ such that

\[ a_i = Bc_i. \]

Collect these vectors into a matrix

\[ C = (c_1, \ldots, c_p). \]

Then

\[ A = (a_1, \ldots, a_p) = (Bc_1, \ldots, Bc_p) = B(c_1, \ldots, c_p) = BC. \]

So there exists a matrix $C$ such that $A = BC$.

4.2.4 Example

Consider again

\[ A = \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 1 & 0 \end{bmatrix} \in \mathbb{R}^{3 \times 2}, \qquad B = I_3 = \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix}. \]

Then

\[ \operatorname{rank}(A) = 2, \qquad \operatorname{rank}(B) = 3. \]

Since $B = I_3$, clearly $\mathcal{C}(A) \subseteq \mathcal{C}(B) = \mathbb{R}^3$, and the proposition implies that there exists a matrix $C$ such that $A = BC$.

4.3 Theorem

\[ \mathcal{C}(A) = \mathcal{C}(AA^T), \qquad \mathcal{C}(A^T) = \mathcal{C}(A^TA). \]

Practice: show that $\mathcal{C}(AA^T) \subseteq \mathcal{C}(A)$.

5 Eigenvalues and trace

5.1 Definitions: eigenvector, eigenvalue, and trace

Let $A \in \mathbb{R}^{n \times n}$.

A nonzero vector $x \in \mathbb{R}^n$ is called an eigenvector of $A$ if

\[ Ax = \lambda x \]

for some scalar $\lambda$. The scalar $\lambda$ is the eigenvalue of $A$ corresponding to the eigenvector $x$.

The trace of $A$ is the sum of the diagonal entries of $A$: \[ \operatorname{tr}(A) = \sum_{i=1}^n A_{ii}. \]

Facts

If the eigenvalues of $A$ are $\lambda_1, \ldots, \lambda_n$, then

\[ \det(A) = |A| = \prod_{i=1}^n \lambda_i, \]

and

\[ \operatorname{tr}(A) = \sum_{i=1}^n \lambda_i. \]

5.2 Properties of trace

\[ \operatorname{tr}(A + B) = \operatorname{tr}(A) + \operatorname{tr}(B). \]
\[ \operatorname{tr}(AB) = \operatorname{tr}(BA). \]

A useful example from the notes is

\[ \operatorname{tr}(Axx^T) = \operatorname{tr}(x^TAx) = x^TAx. \]

\[ \operatorname{tr}(A^s) = \sum_{i=1}^n \lambda_i^s. \]

Also,

\[ \det(A) = \prod_{i=1}^n \lambda_i. \]

If $P$ is nonsingular (invertible), then

\[ \operatorname{tr}(P^{-1}AP) = \operatorname{tr}(A). \]

If $C$ is orthogonal, then

\[ \operatorname{tr}(C^TAC) = \operatorname{tr}(A). \]

Proof of (2): (skip the proof in class) Let $A_{n\times p}$ and $B_{p\times n}$. Then \[(AB)_{ii}=\sum_{k=1}^p a_{ik}b_{ki} \Rightarrow trace(AB)= \sum_{i=1}^n \sum_{k=1}^p a_{ik}b_{ki}\] \[(BA)_{jj}=\sum_{k=1}^n b_{jk}a_{kj} \Rightarrow trace(BA)= \sum_{j=1}^p \sum_{k=1}^n b_{jk}a_{kj} = \sum_{i=1}^n \sum_{k=1}^p a_{ik}b_{ki}\]

Example. $A=\begin{pmatrix} 1\\ \vdots \\ 1 \end{pmatrix}, B=(1,\cdots,1)$. $tr(AB)=tr(BA)=n$.

Proof of (3): (skip the proof in class) By the definition of eigenvalues, $\lambda_i$’s are the $n$ roots to $det(\lambda I_n-A)=0$. In other words, $det(\lambda I_n-A)=\prod (\lambda-\lambda_i)=\lambda^n -\lambda^{n-1}(\lambda_1+\cdots+\lambda_n) + \cdots + (-1)^n\lambda_1\cdots \lambda_n$.

Note that \[det(\lambda I_n -A) = |\begin{pmatrix} \lambda-a_{11} & -a_{12} & \cdots & -a_{1n}\\ -a_{21} & \lambda-a_{22} & \cdots & -a_{2n}\\ \cdots & \cdots & \cdots & \cdots\\ \cdots & \cdots & \cdots & \lambda -a_{nn} \end{pmatrix}|\] So we have
\[\lambda^n -\lambda^{n-1}(\lambda_1+\cdots+\lambda_n) + \cdots + (-1)^n\lambda_1\cdots \lambda_n= |\begin{pmatrix} \lambda-a_{11} & -a_{12} & \cdots & -a_{1n}\\ -a_{21} & \lambda-a_{22} & \cdots & -a_{2n}\\ \cdots & \cdots & \cdots & \cdots\\ \cdots & \cdots & \cdots & \lambda -a_{nn} \end{pmatrix}|\]

Use Cofactor Expansions and commpare the coefficient of $\lambda^{n-1}$ of both sides, we can conclude that \[-(\lambda_1+\cdots+\lambda_n)=-(a_{11}+\cdots+a_{nn}),\] indicating that $tr(A)=\sum \lambda_i$.

Let $\lambda=0$ and compare both sides we have \[(-1)^n\lambda_1\cdots\lambda_n=det(-A)=(-1)^n det(A),\] indicating that $det(A)=\prod \lambda_i$.

Proof of (4): (show the proof in class) $tr(P^{-1}AP)=tr(APP^{-1})=tr(A)$.

6 Special Square Matrices

6.1 Symmetric Matrices

A matrix $A$ is symmetric if $A = A^T$.

6.1.1 Theorem: Spectral Decomposition of Symmetric Matrices

If $A_{n\times n}$ is a symmetrix matrix, then there exists an orthogonal matrix $\Gamma=(\gamma_1,\cdots,\gamma_n)$ and a diagonal matrix $\Lambda=diag(\lambda_1,\cdots,\lambda_n)$ such that $A=\Gamma \Lambda \Gamma^T$. This factorization is known as the spectral decomposition of a symmetric matrix.
$A=\sum_{i=1}^n \lambda_i \gamma_i \gamma_i^T$.
$A\Gamma=\Gamma \Lambda$, which implies that $A\gamma_i = \lambda_i \gamma_i$. Thus, $\lambda_i's$ are eigenvalues and $\gamma_i's$ are the correponing eigenvectors.
$rank(A)$ is equal to the number of non-zero eigenvalues.
$tr(A^s)=\sum \lambda_i^s$
If $A$ is also non-singular, then $tr(A^{-1})=\sum \lambda_i^{-1}$

Proof of (1): The proof is lengthy and requires basic knowledge of eigenvalues and eigenvectors. We omit the proof.

Proof of (4): Recall that for any matrix $A$ and conformable and nonsingular $B$ and $C$ we have $rank(BAC)=rank(A)$. Therefore, $rank(A)=rank(\Gamma\Lambda \Gamma^T)=rank(\Lambda)=\sum I(\lambda_i\not=0)$

Proof of (5): $tr(A^s)=tr((\Gamma\Lambda \Gamma^T)\cdots (\Gamma\Lambda \Gamma^T))=tr(\Gamma(\Lambda)^s\Gamma^T)=\sum \lambda_i^s$

Proof of (6): When $A$ is nonsingular, the eigenvalues of $A^{-1}$ is $\lambda_i^{-1}$. Thus, there exists $\Gamma$ such that $A^{-1}=\Gamma(\Lambda)^{-1}\Gamma^T$ and the result follows immediately.

Example. $A=\begin{pmatrix}2.2 & 0.4 \\ 0.4 & 2.8\end{pmatrix}= \begin{pmatrix}1/\sqrt{5} & 2/\sqrt{5} \\ 2/\sqrt{5} & -1/\sqrt{5}\end{pmatrix} \begin{pmatrix}3 & 0 \\ 0 & 2\end{pmatrix} \begin{pmatrix} 1/\sqrt{5} & 2/\sqrt{5} \\ 2/\sqrt{5} & -1/\sqrt{5} \end{pmatrix}$

6.1.2 Theorem: SVD

Singular value decomposition: If $A$ is an $n\times p$ matrix of rank $r$, then $A$ can be written as \[A_{n\times p}=U_{n\times r}L_{r\times r}V^T_{r\times p}\] where $U(n\times r)$ and $V(p\times r)$ are column orthogonal matrices ($U^TU=V^TV=I_r$ and $L$ is a diagonal matrix).

Note:$AA^T=UL^2U^T$, and $A^TA=VL^2V^T$.

6.2 Positive Definite and Positive-Semidefinite

6.2.1 Quadratic Forms

A quadratic form of a vector $x$ is a function of the form \[Q(x)=x^TAx=\sum_{i=1}^p\sum_{j=1}^pa_{ij}x_ix_j\] where $A$ is a symmetric matrix; that is \[Q(x)=a_{11}x_1^2+\cdots+a_{pp}x_p^2+2a_{12}x_1x_2+\cdots+2a_{p-1,p}x_{p-1}x_p\]

Examples.

$A=\begin{pmatrix}1 & 1\\ 1 & 1\end{pmatrix}, x^TAx=x_1^2+2x_1x_2+x_2^2$. Note that for any $x\not= 0$, $x^TAx\ge 0$.
$A=\begin{pmatrix}1 & -1/2\\ -1/2 & 1\end{pmatrix}, x^TAx=x_1^2-x_1x_2+x_2^2=(x_1-x_2/2)^2+3x_2^2/4$. Note that for any $x\not= 0$, $x^TAx> 0$.
$A=\begin{pmatrix}1 & -2\\ -2 & 1\end{pmatrix}, x^TAx=x_1^2-4x_1x_2+x_2^2=(x_1-x_2)^2-2x_1x_2$.

6.2.2 Theorem: Diagonalization of Quadratic Forms

For any symmetric matrix $A$, there exists an orthogonal transformation $y=\Gamma^T x$ such that \[x^TAx=\sum \lambda_iy_i^2\]

Proof: Consider the spectral decomposition of $A=\Gamma \Lambda \Gamma^T$ and the transformation $y=\Gamma^Tx$ Then \[x^TAx=x^T\Gamma^T\Lambda\Gamma x=y^T\Lambda y=\sum \lambda_i y_i^2\]

6.2.3 Definition: Positive Definite and Positive Semi-definite

$Q(x)$ is called a positive definite (p.d.) quadratic form if $Q(x)>0$ for all $x\not=0$; it is call positive semi-definite (p.s.d.) if $Q(x)\ge 0$ for all $x\not=0$.
A symmetric matrix $A$ is called p.d. (or p.s.d.) if $Q(x)$ is p.d. (or p.s.d.). We use the notation $>0$ for positive definite and $\ge 0$ for positive semi-definite.

6.2.4 Properties of PSD matrices.

Let $A_{p\times p}$ be p.s.d. Then 1) $\lambda_i\ge 0$ for $i=1,\cdots,p$. The result implies that if $A\ge 0$, then $tr(A)\ge 0$. Proof: $A\ge 0$ implies that $0\le x^TAx=x^T\Gamma\Lambda\Gamma^Tx=\lambda_1y_1^2+\cdots+\lambda_p y_p^2$ (Thm 1.5). Take $y_1=1, y_2=0,\cdots, y_p=0$ we have $\lambda_1\ge0$, etc.

There exists a unique p.s.d matrix $B$ such that $A=B^2$. Proof of existence. Consider the spetral decomposition of A: $A=\Gamma \Lambda \Gamma^T$. Let $B=\Gamma \Lambda^{1/2} \Gamma^T$. $B$ is called the square root matrix of $A$.

Example. $A=\begin{pmatrix}1 & 1\\ 1 & 1\end{pmatrix}, A^{1/2}=\begin{pmatrix}1/\sqrt{2} & 1/\sqrt{2}\\ 1/\sqrt{2} & 1/\sqrt{2}\end{pmatrix}$

Let $C$ be any $p\times n$ matrix. We have $C^TAC\ge 0$. Proof: Since $A\ge 0$, for any $n$-vector $x\not= 0$, \[x^TC^TACx=(Cx)^TA(Cx)\ge 0\] so $C^TAC\ge 0$.
$a_{ii}\ge 0$. Proof: take $x=(1,0,...,0)^T$, because $A$ is p.s.d., then $x^TAx\ge 0$. But $a_{11}=x^TAx$. So $a_{11}\ge 0$. Similarly, $a_{ii}\ge 0$ for $i=1,\cdots,p$.

6.2.5 Properties of PD Matrices.

Let $A$ be p.d. Then the following properties hold:

$rank(CAC^T)=rank(C).$

Proof: $rank(CAC^T)=rank(CBBC^T)=rank(CB(CB)^T)=rank(CB)=rank(C)$. The last step is true because we know that for any $A$, when $P$ and$Q$ are conformable and nonsingular, $rank(PAQ)=rank(A)=rank(PA)=rank(AQ)$.

If $C(n\times p)$ has rank $n$, then $CAC^T$ is p.d.

Proof: $x^TCAC^Tx=y^TAy\ge0$ with equality iff $y=0$ iff $C^Tx=0$ iff $x=0$ (since the columns of $C^T$ are linearly independent). Hence $x^TCAC^Tx>0$ for all $x\not=0$.

If $B (n\times p)$ of rank $p$, then $B^TB$ is p.d.

Proof: $x^TB^TBx=y^Ty\ge 0$ with equality iff $Bx=0$ iff $x=0$ (since the columns of $B$ are linearly independent)

The diagonal elements of a p.d. matrix are all positive.

Proof: For i=1, take $x=(1,0,...,0)^T$, then $x^TAx=a_{11}> 0$. Similarly, you can prove for other $i$’s.

$x^TAx=0$ iff $x=0$. Proof: $A$ can be written as $B^2$. Thus $x^TAx=(Bx)^T(Bx)$, equal to 0 iff $Bx=0$, iff $x=0$.

Practice Problems:

Let $A_{p\times p}$ be a positive definite matrix and let $B_{k\times p}$ be a matrix of rank $k\le p$. Then $BAB^T$ is positive definite.
Let $A_{p\times p}$ be a positive defnite matrix and let $B_{k\times p}$ a matrix. If $k>p$ or if $rank(B)=r<p$, then $BAB^T$ is positive semidefinite.

6.2.6 Lemma: Gram factorization / Gram representation

A symmetric matrix $A$ is p.d. if and only if there exists a nonsingular matrix $P$ such that $A=P^TP$.

proof: $: $ By the spectral decomposition, $A=\Gamma \Lambda \Gamma^T$, where $\Gamma$ is orthogonal and $\Lambda=$ is diagonal. Because $A$ is p.d., $\Gamma^TA\Gamma=\Lambda$ is also p.d.. Thus, we can write $\Lambda$ to $\Lambda^{1/2}\Lambda^{1/2}$, which leads to $A=P^TP$.

$: $ Since $P$ is nonsingular, its columns are linearly independent. Therefore, for any $x\not=0$, $Px\not=0$, and $x^TAx=x^TP^TPx=(Px)^T (Px)>0$, i.e., $A$ is positive definite.

Remarks:

The result is known as the Gram factorization of a p.d. matrix, and $P$ is called the Gram representation of $A$. Note that the Gram representation is not unique. For example, if $P$ is a Gram representation of $A$, then so is $CP$ for any orthogonal matrix $C$.
Related factorizations:

Cholesky factorization: If $A$ is p.d., then there exists a unique lower triangular matrix $L$ with positive diagonal entries such that $A=LL^T$.
Square root factorization: If $A$ is p.d., then there exists a unique p.d. matrix $B$ such that $A=B^2$.

6.2.7 Lemma: Eigenvalues of $AB$ and $BA$

For conformable matrices, the nonzero eigenvalues of $AB$ are the same as those of $BA$. The eigenvalues are identical for square matrices.

Proof: Let $\lambda$ be a nonzero eigenvalue of $AB$. Then there exists $u (u\not=0)$ such that $ABu=\lambda u$; that is, $BABu=\lambda Bu$. Hence $BAv=\lambda v$, where $v=Bu\not=0$ (as $ABu\not=0$), and $\lambda$ is an eigenvalue of $BA$. The argument reverses by interchanging the roles of $A$ and $B$. For square matrices $AB$ and $BA$ have the same number of zero eigenvalues.

6.2.8 Theorem: Maximization of Quadratic Forms

Let $A$ and $B$ be two symmetric matrices. Suppose that $B>0$. Then the maximum (minimum) of $x^TAx$ given $x^TBx=1$ is attained when $x$ is the eigenvector of $B^{-1}A$ corresponding to the largest (smallest) eigenvalue of $B^{-1}A$. Thus, if $\lambda_1$ and $\lambda_p$ are the largest and the smallest eigenvalue of $B^{-1}A$, respectively, then subject to the constrain $x^TBx=1$, \[max_x x^TAx=\lambda_{max} (B^{-1}A), min_x x^TAx = \lambda_{min}(B^{-1}A)\]

This is a homework problem. Hint: consider $y=B^{1/2}x$.

The above theorem can also be written as \[max_{x\not=0} \frac{x^TAx}{x^TBx}=\lambda_{max} (B^{-1}A), min_{x\not=0} \frac{x^TAx}{x^TBx}=\lambda_{min} (B^{-1}A)\]

A useful result from the theorem: Let $B=I$, then \[ \begin{align*} \max_{x,\;x\ne 0}\left\{ \frac{x^T A x}{x^T x}\right\} &= \lambda_{\max}\\ \min_{x,\;x\ne 0}\left\{ \frac{x^T A x}{x^T x}\right\} &= \lambda_{\min} \end{align*} \]

where $\lambda_{min}$ and $\lambda_{max}$ are the minimum and maximum eigenvalues of $A$, respectively. These values occur when $x$ is the eigenvector corresponding to the eigenvalues $\lambda_{min}$ and $\lambda_{max}$, respectively.

6.2.9 Key Lemma for Scheffé’s Method (Maximization of Linear Contrasts)

If $L$ is p.d., then for any $b$, \[max_{h,h\not=0}\left\{\frac{(h^Tb)^2}{h^TLh}\right\}=b^TL^{-1}b\]

Proof: The theorem can be proved by using the previous theorem because we can rewrite $(h^Tb)^2$ to $h^Tbb^Th$.

Thus, the maximum is \[\lambda_{max}(L^{-1}bb^T)=\lambda_{max}(b^TL^{-1}b)=b^TL^{-1}b\] The last step is true because $b^TL^{-1}b$ is a scalar; the second to the last step is true because the nonzero eigenvalues of $AB$ are the same as those of $BA$.

6.3 Idempotent matrices and projection matrices

Definition. A matrix $P$ is idempotent if $P^2=P$.

Note. If $P$ is idempotent, so is $I-P$.

Example. $A=\begin{pmatrix}1 & 1\\ 0 & 0\end{pmatrix}, A^2=A$. Verify that $()

Definition. A symmetric and idempotent matrix is called a projection matrix.

Example. $A=\begin{pmatrix}1/2 & 1/2\\ 1/2 & 1/2\end{pmatrix}, A=A^T, A^2=A$.

6.3.1 Properties of projection matrices:

If $P$ is symmetric, then $P$ is idempotent with rank $r$ if and only if it has $r$ eigenvalues equal to 1 and $n-r$ eigenvalues equal to zero.
If $P$ is a projection matrix, then $tr(P)=rank(P)$.
Projection matrices are positive-semidefinite.
If $P_i (i=1,2)$ is a projection matrix and $P_1-P_2$ is p.s.d., then

$P_1P_2=P_2P_1=P_2$ (note, both $P_1P_2$ and $P_2P_1$ are projection matrices).
$P_1-P_2$ is a projection matrix.

Proof of (1): Let $\lambda$ be an eigenvalue of $P$, then we have $Px=\lambda x$, where $x$ is the corresponding eigenvector. Since $P^2=P$, we have $\lambda x= Px=P(Px)=\lambda Px= \lambda^2 x$, which implies that $\lambda$ equals either zero or 1. For a symmetric matrix, its rank equals the number of nonzero eigenvalues. Thus $rank(P)$ equals the number of 1’s.

On the other hand, suppose $P$ has $r$ eigenvalues equal to 1 and $n-r$ eigenvalues equal to 0. Without loss of generality, we assume that the first $r$ eigenvalues are 1. By the spectral decomposition of the symmetric matrix $P$, we have $P=\Gamma \Lambda \Gamma^T$, where the first $r$ diagonal elements of $\Lambda$ are 1 and all the other elements are zero. It is obvious that $\Lambda^2=\Lambda$. Thus $P^2=\Gamma \Lambda \Gamma^T\Gamma \Lambda \Gamma^T=\Gamma \Lambda \Gamma=P$.

Proof of (2): Based on (1), $P$ has $rank(P)$ eigenvalues equal to 1 and $n-rank(P)$ eigenvalues equal to 0. We also know that $tr(P)=\sum \lambda_i=rank(P)$.

Proof of (3): $x^TPx=X^TPPx=(Px)^T(Px)\ge 0$.

Proof of (4): Clearly the following are true:

\[\begin{eqnarray*} ((I-P_1)y)^T P_2 ((I-P_1)y) & \ge & 0\\ ((I-P_1)y)^T P_1 ((I-P_1)y) & = & 0 \end{eqnarray*}\]

which implies that $[(I-P_1)y]^T(P_1-P_2)[(I-P_1)y] \le 0$. However, we also know that $P_1-P_2\ge 0$. Therefore, it must be true that \[[(I-P_1)y]^T(P_2-P_1)[(I-P_1)y] = 0\] which implies that \[\begin{eqnarray*} (I-P_1)P_2(I-P_1)=0\\ &\Rightarrow & (I-P_1)P_2 P_2 (I-P_1)=0\\ &\Rightarrow & (I-P_1)P_2=0\\ &\Rightarrow & P_2=P_1P_2 \end{eqnarray*}\]

A projection matrix must be a symmetric matrix. Therefore \[P_2=(P_2)^T=(P_1P_2)^T=(P_2)^T(P_1)^T=P_2P_1\]

Last, $(P_1-P_2)^2=P_1-2P_1P_2+P_2=P_1-2P_2+P_2=P_1-P_2$.

7 Inverse and Generalized Inverse

Definition. A full-rank square matrix is called a nonsingular matrix. A nonsingular matrix $A$ has a unique inverse, denoted by $A^{-1}$: \[AA^{-1}=A^{-1}A=I.\]

Some properties of $A^{-1}$: - It is unique - $(AB)^{-1}=B^{-1}A^{-1}$ - $(A^T)^{-1}=(A^{-1})^T$

Definition. For a matrix $A_{n\times p}$, $A^{-}$ is call a g-inverse (generalized inverse) of $A$ if \[AA^{-}A=A\]

A generalized inverse always exists although in general it is not unique.

8 Matrix Differentiation

Definition. We define the derivative of $f(X)$ with respect to $X_{n\times p}$ as \[\frac{\partial f(X)}{\partial X}=\left(\frac{\partial f(X)}{\partial x_{ij}} \right)_{n\times p}\]

The following results are useful:

\[\frac{\partial a^Tx}{\partial x}=a\]
\[\frac{\partial x^Tx}{\partial x}=2x, \frac{\partial x^TAx}{\partial x}=(A+A^T)x, \frac{\partial x^TAy}{\partial x}=Ay\]

Proof of (2): \[\begin{eqnarray*} \frac{\partial x^TAx}{ \partial x_k} &=& \frac{\partial}{\partial x_k}\left( \sum_{i}\sum_{j} a_{ij}x_{i}x_{j}\right) \\ &=& \frac{\partial}{\partial x_k} (a_{kk}x_k^2 + \sum_{j\not=k}a_{kj}x_kx_j +\sum_{i\not=k} a_{ik} x_i x_k )\\ &=&2a_{kk}x_k + \sum_{j\not=k}a_{kj}x_j + \sum_{i\not=k}a_{ik}x_i\\ &=&\sum_i a_{ki}x_i + \sum_i a_{ik}x_i\\ &=& (Ax)_k + (A^Tx)_k\\ &=& ((A+A^T)x)_k \end{eqnarray*}\]

1 Vectors and linear independence

1.1 Vectors

1.2 Linear dependence

1.3 Linear independence

1.3.1 Example

1.4 Orthogonality

1.5 Fact: orthogonal vectors are linearly independent

1.5.1 Proof

1.5.2 Converse does not hold

1.6 Norm

1.7 Orthogonal matrices

2 Rank

2.1 Definition

2.2 Example

2.3 Properties of rank

2.4 Proof sketch for property 6

3 Vector spaces

3.1 Definition

4 Column space

4.1 Example.

4.2 A proposition about column spaces

4.2.1 Proposition

4.2.2 Proof of the forward direction

4.2.3 Proof of the reverse direction

4.2.4 Example

4.3 Theorem

5 Eigenvalues and trace

5.1 Definitions: eigenvector, eigenvalue, and trace

5.2 Properties of trace

6 Special Square Matrices

6.1 Symmetric Matrices

6.1.1 Theorem: Spectral Decomposition of Symmetric Matrices

6.1.2 Theorem: SVD

6.2 Positive Definite and Positive-Semidefinite

6.2.1 Quadratic Forms

6.2.2 Theorem: Diagonalization of Quadratic Forms

6.2.3 Definition: Positive Definite and Positive Semi-definite

6.2.4 Properties of PSD matrices.

6.2.5 Properties of PD Matrices.

6.2.6 Lemma: Gram factorization / Gram representation

6.2.7 Lemma: Eigenvalues of \(AB\) and \(BA\)

6.2.8 Theorem: Maximization of Quadratic Forms

6.2.9 Key Lemma for Scheffé’s Method (Maximization of Linear Contrasts)

6.3 Idempotent matrices and projection matrices

6.3.1 Properties of projection matrices:

7 Inverse and Generalized Inverse

8 Matrix Differentiation