Multivariate Analysis Lecture 14: More on Classification

Zhaoxia Yu | Professor, Department of Statistics

2025-05-14

Introduction

Code

knitr::opts_chunk$set(echo = TRUE)
library(tidyr) #the pipe (%>%) tool is extremely useful
library(MASS)
library(ggplot2)
library(kableExtra)
library("gridExtra")
library(klaR) #visualize lda and qda
library(knitr)

Outline

Review of LDA
QDA
Decision theory
- Equal costs
- Unequal costs

Review of LDA

Linear Discrminant Analysis

FLDA: Maximum Separability for Two-Class Problems

The maximization problem is \[\operatorname*{argmax}_a \frac{a^T(\bar {\mathbf X}_1 - \bar {\mathbf X}_2)(\bar {\mathbf X}_1 - \bar {\mathbf X}_2)^Ta}{a^T \boldsymbol \Sigma a}\]
Use an argument similar to PCA, such \(a\) is the first eigenvector of \(\boldsymbol \Sigma ^{-1}(\bar {\mathbf X}_1 - \bar {\mathbf X}_2)(\bar {\mathbf X}_1 - \bar {\mathbf X}_2)^T\).
We can show that \(a=\mathbf S_p^{-1}(\bar {\mathbf X}_1 - \bar {\mathbf X}_2)\).
The linear function \[f(x)=a^T x \mbox{ where } a=\mathbf S_p^{-1}(\bar {\mathbf X}_1 - \bar {\mathbf X}_2)\] is called Fisher’s linear discriminant function.

Allocate New Observations

Consider an observation \(X_0\). We compute \[f(X_0)=a^T X_0\] where \(a=\mathbf S_p^{-1}(\bar {\mathbf X}_1 - \bar {\mathbf X}_2)\)
Let \[m=a^T \frac{\bar {\mathbf X}_1 + \bar {\mathbf X}_2}{2}=\boldsymbol (\bar {\mathbf X}_1 - \bar {\mathbf X}_2)^T \mathbf S_p^{-1}\frac{\bar {\mathbf X}_1 + \bar {\mathbf X}_2}{2}\]
Allocate \(X_0\) to
- class 1 if \(f(X_0)>m\)
- class 2 if \(f(x_0)<m\)

g-Class Problems

Measure separation using F statistic \[\begin{aligned} F(a) &= \frac{MSB}{MSW}=\frac{SSB/(g-1)}{SSW/(n-g)}\\ &= \frac{\sum_{i=1}^g n_i (\bar Y_{i.}^{(1)} -\bar Y_{..}^{(1)})^2/(g-1)}{\sum_{i=1}^g (n_i-1)S_{Y_i^{(1)}}^2/(n-g)}\\ &=\frac{a^T \sum n_i (\bar X_{i.} -\bar X_{..})(\bar X_{i.}-\bar X_{..})^T a}{a^T \sum_{i=1}^g\sum_{j=1}^{n_i} (X_{ij} -\bar X_{i.})(X_{ij}-\bar X_{i.})^T a}\frac{n-g}{g-1}\\ &= \frac{a^T \mathbf B a}{a^T \mathbf W a}\frac{n-g}{g-1} \end{aligned}\] where \(n=\sum_{i=1}^g n_i\), \(\mathbf B\) is the between-group sample covariance matrix, and \(\mathbf W\) is the within-group sample covariance matrix.

Linear Discriminants

The first linear discriminant is the linear function that maximizes \(F(a)\). It can also be shown that the first linear discriminant is given by the first eigenvector of \(\mathbf W ^{-1} \mathbf B\), i.e., \[Y_{ij}^{(1)}=\gamma_1^T X_{ij}\] where \(\gamma_1\) is the first eigenvector of \(\mathbf W ^{-1} \mathbf B\).
Similarly, for \(k=1, \cdots, rank(\mathbf B)\), the \(k\)th linear discriminant is given by the \(k\)th eigenvector of \(\mathbf W ^{-1} \mathbf B\)
\[Y_{ij}^{(k)}=\gamma_k^T X_{ij}\]

Use the Linear Discriminants

Let \(X_0\) be a new observation. We allocate it to the group with the minimum distance defined by the Euclidean distance in space spanned by the linear discriminants.
Calculate \(Y_0^{(k)}=\gamma_k^T X_0\), the projection of \(X_0\) to the \(k\)th linear discriminant for \(k=1, \cdots, rank(B)\).
Calculate the distance between \((Y_0^{(1)}, \cdots, Y_0^{(rank(B))})\) and \((\bar Y_{i.}^{(1)}, \cdots, \bar Y_{i.}^{(rank(B))})\)

\[D^2(X_0, i) = \sum_{k=1}^{rank(B)} [Y_0^{(k)} - \bar Y_{i.}^{(k)}]^2\]

Allocate \(X_0\) to \(\operatorname*{argmin}_i D^2(X_0, i)\).

Quadratic DA (QDA)

QDA for Two-Class Problems

The LDA can be derived using likelihood functions under the assumptions
1. Multivariate normal
2. Equal covariance matrix
The assumption of equal covariance matrix is not always a good approximation to the true covariance matrices
If we relax this assumption, we will have QDA

QDA for Two-Class Problems

Let’s consider a two-class classification problem with \(n_1\) and \(n_2\) observations in classes 1 and 2, respectively.
Suppose we have two independent random samples
- Sample 1: \(X_{1j}\overset{iid}\sim N(\mathbf \mu_1, \boldsymbol \Sigma_1)\), where \(j=1, \cdots, n_1\)
- Sample 2: \(X_{2j}\overset{iid}\sim N(\mathbf \mu_2, \boldsymbol \Sigma_2)\), where \(j=1, \cdots, n_2\)
Sample mean vectors: \[\bar {\mathbf X}_1=\frac{1}{n_1}\sum_{j=1}^{n_1}X_{1j}, \bar {\mathbf X}_2=\frac{1}{n_2}\sum_{j=1}^{n_2}X_{2j}\]
Remark: the sample mean vectors are the MLE of the corresponding mean vectors

QDA for Two-Class Problems

MLE of covariance matrices \[\hat {\boldsymbol\Sigma}_1 = \frac{n_1-1}{n_1}S_1, \hat {\boldsymbol\Sigma}_2 = \frac{n_2-1}{n_2}S_2\] where \(S_i\) is the sample covariance matrix for sample \(i\).
Likelihood functions \[\begin{aligned} L_1(\boldsymbol \mu_1, \boldsymbol \Sigma_1) \propto |\boldsymbol\Sigma_1|^{-1/2} exp\{-\frac{1}{2} (x-\boldsymbol \mu_1)^T \boldsymbol \Sigma_1^{-1} (x-\boldsymbol \mu_1)\}\\ L_2(\boldsymbol \mu_2, \boldsymbol \Sigma_2) \propto |\boldsymbol\Sigma_2|^{-1/2} exp\{-\frac{1}{2} (x-\boldsymbol \mu_2)^T \boldsymbol \Sigma_2^{-1} (x-\boldsymbol \mu_2)\} \end{aligned}\]

QDA for Two-Class Problems

We can either check whether the ratio is greater than one or check whether the difference of log-likelihood is positive.

\[l_1 - l_2=-\frac{1}{2}log(\frac{|\boldsymbol\Sigma_1|}{|\boldsymbol\Sigma_2|}) - \frac{1}{2} [(x-\boldsymbol \mu_1)^T \boldsymbol \Sigma_1^{-1} (x-\boldsymbol \mu_1) - (x-\boldsymbol \mu_2)^T \boldsymbol \Sigma_2^{-1} (x-\boldsymbol \mu_2)]\]

The classification boundary is given by \(l_1-l_2=0\), i.e.,

\[(x-\boldsymbol \mu_1)^T \boldsymbol \Sigma_1^{-1} (x-\boldsymbol \mu_1) - (x-\boldsymbol \mu_2)^T \boldsymbol \Sigma_2^{-1} (x-\boldsymbol \mu_2) = log(\frac{|\boldsymbol\Sigma_2|}{|\boldsymbol\Sigma_1|})\]

It is quadratic!

QDA for Two-Class Problems

Replace unknown parameters with estimate, we have the classification rule: allocate \(x\) to class 1 if \[(x-\bar {\mathbf X}_1)^T \mathbf S _1^{-1} (x-\bar {\mathbf X}_1) - (x-\bar {\mathbf X}_2)^T \mathbf S_2^{-1} (x-\bar {\mathbf X}_2) < log(\frac{|\mathbf S_2|}{|\mathbf S_1|})\]

QDA for g-Class Problems

For the \(i\)th group, we compute a quadratic score, which is defined as
\[Q_i(x)=(x-\bar {\mathbf X}_i)^T \mathbf S _i^{-1} (x-\bar {\mathbf X}_i)+ log(|\mathbf S_i|)\]
Allocate \(x\) to the class with the minimum quadratic score

Example of QDA

Code

obj.lda=lda(Species~., data = iris)
obj.qda=qda(Species~., data = iris)

table(Pred=predict(obj.lda, iris)$class,
True=iris$Species)
table(Pred=predict(obj.qda, iris)$class,
True=iris$Species)

Example of QDA

            True
Pred         setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         48         1
  virginica       0          2        49

            True
Pred         setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         48         1
  virginica       0          2        49

The same result for this particular example

Visualize QDA Results

Code

partimat(Species ~ ., data = iris, method = "qda")

Visualize LDA Results

Code

partimat(Species ~ ., data = iris, method = "lda")

Decision Theory

Cost and Prior Probabilities

In practice, different types of errors have different costs
Prior probabilities are often known but we haven’t discussed how to use them
Goals:
- When different errors have the same cost, we look for a classification rule that minimizes the probability of misclassification
- When different errors cost differently, we want to find a classification rule that minimizes the total cost

Equal Costs

Minimize Probability of Misclassification

Notations:
\(X\): data
\(Z\): true class. It is binary, i.e., \(Z=1\) or \(Z=0\)
\(P(Z=1)=\pi\): prior probability, known
\(\delta(x)\): decision function / classifier
- \(\delta(x)=1\): allocate \(x\) to group 1
- \(\delta(x)=0\): allocate \(x\) to group 0

Risk and Posterior Risk

Risk of a classifier \(\delta\) \[\begin{aligned} R(\delta, z)&=\Pr [\delta(X)\not= Z|Z=z]=\mathbb E_{X|z} [\mathbb I_{\delta(X)\not= Z}|Z=z]\\ &=\left\{ \begin{array}{cc} \Pr[\delta(X)=0|z=1] & \mbox{ if } z=1\\ \Pr[\delta(X)=1|z=0] & \mbox{ if } z=0 \end{array}\right. \end{aligned}\]
The posterior risk of \(\delta\) \[\begin{aligned} PR(\delta(x)) &= \Pr[\delta(x)\not= Z|x]=\mathbb E_{Z|x} [\mathbb I_{\delta(X)\not= Z}|X=x]\\ &=\left\{ \begin{array}{cc} \Pr[Z=0|x] & \mbox{ if } \delta(x)=1\\ \Pr[Z=1|x] & \mbox{ if } \delta(x)=0 \end{array}\right. \end{aligned}\]

Bayes Risk

Bayes risk \(B(\delta)=\Pr [\delta(X)\not= Z]\).
Note that \[B(\delta)=\Pr [\delta(X)\not= Z]=\mathbb E_{XZ} [\mathbb I_{\delta(X)\not= Z}]=E_X[PR(\delta, X)]=E_Z[R(\delta, Z)]\]
Rewrite the Bayes risk \[\begin{aligned} B(\delta) &=\Pr [\delta(X)\not= Z]\\ &=\Pr [\delta(X)=1, Z=0] + \Pr [\delta(X)=0, Z=1]\\ &=\Pr [\delta(X)=1| Z=0]\Pr[Z=0] + \Pr [\delta(X)=0| Z=1] \Pr[Z=1]\\ &=\pi \Pr [\delta(X)=0| Z=1]+ (1-\pi)\Pr [\delta(X)=1| Z=0] \end{aligned}\]
The above expression is baesd on the fact \[B(\delta)=\mathbb E_{XZ} [\mathbb I_{\delta(X)\not= Z}]\]

Bayes Classification Rule

Want to find \(\delta^*\) that minimizes \(B(\delta)\)
Claim 1: the \(\delta^*\) that minimizes \(PR(\delta(x))\) also minimizes \(B(\delta)\)
- This is because \(B(\delta)=\mathbb E[PR(\delta(X)]\ge \mathbb E[PR(\delta^*(X)]=B(\delta^*)\)]
Need to find \(\delta^*\) that minimizes \(PR(\delta(x))\). It can be shown that

\[\delta^*(x)=\left\{ \begin{array}{cc} 1 & \mbox{ if } \frac{\Pr(Z=1|x)}{\Pr(Z=0|x)}>1\\ 0 & \mbox{ if } \frac{\Pr(Z=1|x)}{\Pr(Z=0|x)}<1 \end{array}\right.\]

Skip next slide if you are not interested in the proof

The Classifier that Minimizes Posterior Risk

Recall that \[PR(\delta(x)=0)=\Pr(Z=1|x), PR(\delta(x)=1)=\Pr(Z=0|x)\]
Therefore, we \(\delta^*(x)\) should be 1 if \[\begin{aligned} PR(\delta(x)=0)>PR(\delta(x)=1) &\Leftrightarrow \Pr(Z=1|x)> \Pr(Z=0|x)\\ &\Leftrightarrow \frac{\Pr(Z=1|x)}{\Pr(Z=0|x)}>1 \end{aligned}\]

The Bayes Classificatin Rule

We say \(\delta^*(x)\) is the Bayes classification rule \[\delta^*(x)=\left\{ \begin{array}{cc} 1 & \mbox{ if } \frac{\Pr(Z=1|x)}{\Pr(Z=0|x)}>1\\ 0 & \mbox{ if } \frac{\Pr(Z=1|x)}{\Pr(Z=0|x)}<1 \end{array}\right.\]
Computation \[\begin{aligned} \frac{\Pr(Z=1|x)}{\Pr(Z=0|x)} & \overset{\mbox{Bayes' theorem}} = \frac{\frac{f(x|z=1)\Pr(Z=1)}{f(x)}}{\frac{f(x|z=0)\Pr(Z=0)}{f(x)}}\\ & = \frac{f(x|z=1)}{f(x|z=0)}\frac{\pi}{1-\pi} \end{aligned}\]
A short review of Bayes’ theorem is on next slide. Feel free to skip if you are very familiar with it already

Bayes’ Theorem

Read this slide if you would like to review Bayes’ theorem
Let \(A\) and \(B\) be two events.
Bayes’ theorem says \[\Pr(B|A) = \dfrac{\Pr(A, B)}{\Pr(A)}=\dfrac{\Pr(A|B)\Pr(B)}{\Pr(A)}\] where \(\Pr(A,B)\) means the joint probability that both \(A\) and \(B\) occur. We can use alternative expressions such as \(\Pr(A \text{ and } B)\) and \(\Pr(A \cap B)\).

Example 1: Univariate

Let’s consider a univariate example. Suppose that the population consists for two underlying populations
- Population 1 with \(\pi\) probability and \(N(\mu_1=1, \sigma^2=0.25)\)
- population 0 with \(1-\pi\) probability and \(N(\mu_0=0, \sigma^2=0.25)\)
Would like to allocate \(x=0.8\)
According to Bayes classification rule, we need to compute

\[\begin{aligned} \frac{f(x|z=1)\pi}{f(x|z=0)(1-\pi)} &=\frac{f(x|\mu_1=1,\sigma^2)\pi}{f(x|\mu_0=0, \sigma^2)(1-\pi)}\\ &=\frac{\frac{1}{\sigma\sqrt{2\pi}}exp\{-\frac{1}{2\sigma^2}(x-1)^2\}} {\frac{1}{\sigma\sqrt{2\pi}}exp\{-\frac{1}{2\sigma^2}(x-0)^2\}}\frac{\pi}{1-\pi}\\ &= exp\{\frac{1}{2\sigma^2} (2x-1) \}\frac{\pi}{1-\pi} \end{aligned}\]

Example 1: Univariate

The classification boundary is \[\begin{aligned} exp\{\frac{1}{2\sigma^2} (2x-1) \} =(1-\pi)/\pi &\Leftrightarrow \frac{1}{2\sigma^2} (2x-1)=log((1-\pi)/\pi)\\ &\Leftrightarrow x=\sigma^2 log((1-\pi)/\pi)+0.5 \end{aligned}\]
The boundary is linear!
- If \(\pi=0.5\), the boundary is \(x=0.5\), we classify \(x=0.8\) to class 1.
- If \(\pi=0.7\), the boundary is \(x=0.288\), we classify \(x=0.8\) to class 1.
- If \(\pi=0.2\), the bondary is \(x=0.846\), we classify \(x=0.8\) to class 0.

Example 1: Density and Classification Boundary

Example 2: Multivariate

For a two-class problem, the classification boundary by Bayes’ classification rule is \[\frac{f(x|z=1)\pi}{f(x|z=0)(1-\pi)}=1\\ \Leftrightarrow log(\frac{f(x|z=1)}{f(x|z=0)})=log(\frac{1-\pi}{\pi})\]

Bayes’ Classification under Equal Covariance

Suppose the two underlying distributions are \(N(\boldsymbol\mu_1, \boldsymbol \Sigma)\) and \(N(\boldsymbol\mu_2, \boldsymbol \Sigma)\).
The boundary is \[-\frac{1}{2}(x-\boldsymbol\mu_1)^T \boldsymbol\Sigma^{-1}(x-\boldsymbol\mu_1) + \frac{1}{2}(x-\boldsymbol\mu_1)^T \boldsymbol\Sigma^{-1}(x-\boldsymbol\mu_1)\\ =log(\frac{1-\pi}{\pi})\] which is equivalent to

\[(\boldsymbol\mu_1 -\boldsymbol\mu_2)^T \boldsymbol\Sigma^{-1}x = (\boldsymbol\mu_1 -\boldsymbol\mu_2)^T\boldsymbol\Sigma^{-1}\frac{\boldsymbol\mu_1 + \boldsymbol\mu_2}{2} + log(\frac{1-\pi}{\pi})\]

Bayes’ Classification under Equal Covariance

In practice, we substitute the unknown parameters by their estimates \[(\bar {\mathbf X}_{1.} -\bar {\mathbf X}_{2.})^T\boldsymbol\Sigma^{-1}x = (\bar {\mathbf X}_{1.} -\bar {\mathbf X}_{2.})^T \boldsymbol\Sigma^{-1}\frac{\bar {\mathbf X}_{1.} + \bar {\mathbf X}_{2.}}{2} + log(\frac{1-\pi}{\pi})\]
Recall that in LDA the linear boundary is \[a^Tx=a^T\frac{\bar {\mathbf X}_{1.} + \bar {\mathbf X}_{2.}}{2}\] Therefore, Bayes’ classification is the same as the LDA when \(\pi=1/2\).
Similarly, in a g-class problem, LDA is the same as Bayes classification under the assumptions (1) multivariate normality, (2) equal covariance, and (3) uniform prior probabilities.

Connection with Logistic Regression

The ratio of posterior risk is \[\frac{\Pr(Z=1|x)}{{\Pr(Z=0|x)}}\]
It is also called the odds ratio. A logistic regression models the log-odds as a linear function of the covariates.
The LDA under the Bayes rule computes the ratio of the posterior risk.

Connection with Logistic Regression

In both approaches, the decision function is a linear function of the covariates.
They also give similar results.
The two approaches were derived from different models with different assumptions.
- Logistic regression models the behavior of a binary response variable given covariates.
- LDA models the multivariate behavior of covariates given the class labels.
If MVN is not satisfied, LDA is not a good approximation to the posterior risk. In this situation, logistic regression is a better choice.

Example 3: Univariate, Unequal Variance

Again, consider a univariate example. This time we relax the assumption of equal variance
Suppose that the population consists for two underlying populations
- Population 1 with \(\pi\) probability and \(N(\mu_1=1, \sigma_1^2=0.25)\)
- population 0 with \(1-\pi\) probability and \(N(\mu_0=0, \sigma_2^2=1)\)
Would like to allocate \(x=0.8\)

Example 3: Univariate, Unequal Variance

According to Bayes classification rule, we need to compute \[\begin{aligned} \frac{f(x|z=1)\pi}{f(x|z=0)(1-\pi)} &=\frac{f(x|\mu_1=1,\sigma_1^2)\pi}{f(x|\mu_0=0, \sigma_0^2)(1-\pi)}\\ &=\frac{\frac{1}{\sigma_1\sqrt{2\pi}}exp\{-\frac{1}{2\sigma_1^2}(x-1)^2\}} {\frac{1}{\sigma_0\sqrt{2\pi}}exp\{-\frac{1}{2\sigma_0^2}(x-0)^2\}}\frac{\pi}{1-\pi}\\ &= exp\{(\frac{1}{2\sigma_0^2}-\frac{1}{2\sigma_1^2})x^2 + \frac{x}{\sigma_1^2}-\frac{1}{2\sigma_1^2}\} \frac{\pi}{1-\pi}\frac{\sigma_0}{\sigma_1} \end{aligned}\]

Example 3: Univariate, Unequal Variance

The classification boundary is \[(\frac{1}{2\sigma_0^2}-\frac{1}{2\sigma_1^2})x^2 + \frac{x}{\sigma_1^2}-\frac{1}{2\sigma_1^2}=log[ \frac{1-\pi}{\pi}\frac{\sigma_1}{\sigma_0}] \]
It is quadratic!

Example 3: Univariate, Unequal Variance

Code

mean1 <- 1
var1 <- 0.25
mean0 <- 0
var0 <- 0.64
weight1 <- 0.5
weight0 <- 1 - weight1

# doesn't seem to be correct
x <- seq(-2, 3, length.out = 1000)
fx=(1/var0 - 1/var1)/2*x^2 + x/var1 - 1/2/var1 - log(weight0*sqrt(var1)/weight1/sqrt(var0))
plot(x, fx, type="n")
lines(x[fx>0], fx[fx>0], col="blue")
lines(x[fx<0], fx[fx<0], col="green")
abline(h=0)

Example 3: Univariate, Unequal Variance

Unequal Costs

Risk and Cost

Different types of misclassifications might cost differently
Let \(L(\delta(x), z)\) denote the cost function
Let \(C(1|0)=L(1, 0)\), the cost of misclassifying 0 to 1
Let \(C(0|1)=L(0, 1)\), the cost of misclassifying 1 to 0
The Risk and Bayes risk need to be revised accordingly

Risk and Posterior Risk

Risk of a classifier \(\delta\) \[R(\delta, z)=\mathbb E_{X|Z=z} [L(\delta(X), z)]=\left\{ \begin{array}{cc} C(0|1)\Pr[\delta(X)=0|z=1] & \mbox{ if } z=1\\ C(1|0)\Pr[\delta(X)=1|z=0] & \mbox{ if } z=0 \end{array}\right.\]
The posterior risk of \(\delta\) \[\begin{aligned} PR(\delta(x)) &= \mathbb E_{Z|x}[L(\delta(x), Z)]\\ &=\left\{ \begin{array}{cc} C(1|0)\Pr[Z=0|x] & \mbox{ if } \delta(x)=1\\ C(0|1)\Pr[Z=1|x] & \mbox{ if } \delta(x)=0 \end{array}\right. \end{aligned}\]

Bayes Risk

Bayes risk \[B(\delta)=\mathbb E_{XZ} [L(\delta(X), Z)]\]
Rewrite the Bayes risk

\[\begin{aligned} B(\delta) &=\mathbb E_{XZ} [L(\delta(X), Z)]\\ &=L(\delta(X)=1, Z=0)\Pr[\delta(X)=1, Z=0] + L(\delta(X)=0, Z=1)[\delta(X)=0, Z=1]\\ &=C(1|0)\Pr [\delta(X)=1, Z=0] + C(0|1)\Pr [\delta(X)=0, Z=1]\\ &=C(1|0)\Pr [\delta(X)=1| Z=0]\Pr[Z=0] + C(0|1)\Pr [\delta(X)=0| Z=1] \Pr[Z=1]\\ &=C(0|1)\pi \Pr [\delta(X)=0| Z=1]+ (1-\pi)C(1|0)\Pr [\delta(X)=1| Z=0] \end{aligned}\]

Bayes Classification Rule with Unequal Costs

Use a derivation similar to the equal cost situation, we can show that the Bayes classification rule is

\[\begin{aligned} &PR(\delta(x)=0)>PR(\delta(x)=1) \\ &\Leftrightarrow C(0|1)\Pr(Z=1|x)> C(1|0)\Pr(Z=0|x)\\ &\Leftrightarrow \frac{\Pr(Z=1|x)}{\Pr(Z=0|x)}>\frac{C(1|0)}{C(0|1)}\\ &\Leftrightarrow \frac{f(x|z=1)}{f(x|z=0)}>\frac{C(1|0)}{C(0|1)}\frac{1-\pi}{\pi} \end{aligned}\]

Multivariate Analysis Lecture 14: More on Classification

Introduction

Outline

Review of LDA

Linear Discrminant Analysis

FLDA: Maximum Separability for Two-Class Problems

Allocate New Observations

g-Class Problems

Linear Discriminants

Use the Linear Discriminants

Quadratic DA (QDA)

QDA for Two-Class Problems

QDA for Two-Class Problems

QDA for Two-Class Problems

QDA for Two-Class Problems

QDA for Two-Class Problems

QDA for g-Class Problems

Example of QDA

Example of QDA

Visualize QDA Results

Visualize LDA Results

Decision Theory

Cost and Prior Probabilities

Equal Costs

Minimize Probability of Misclassification

Risk and Posterior Risk

Bayes Risk

Bayes Classification Rule

The Classifier that Minimizes Posterior Risk

The Bayes Classificatin Rule

Bayes’ Theorem

Example 1: Univariate

Example 1: Univariate

Example 1: Density and Classification Boundary

Example 2: Multivariate

Bayes’ Classification under Equal Covariance

Bayes’ Classification under Equal Covariance

Connection with Logistic Regression

Connection with Logistic Regression

Example 3: Univariate, Unequal Variance

Example 3: Univariate, Unequal Variance

Example 3: Univariate, Unequal Variance

Example 3: Univariate, Unequal Variance

Example 3: Univariate, Unequal Variance

Unequal Costs

Risk and Cost

Risk and Posterior Risk

Bayes Risk

Bayes Classification Rule with Unequal Costs

Other Related Topics