Multivariate Analysis

Zhaoxia Yu

Professor, Department of Statistics

2026-04-27

Intro

Introduction

Course Information

Please use the Canvas website for important updates, and deadlines.
The course materials including lecture notes, homework, and projects will be posted here: https://yu-zhaoxia.github.io/STAT240/
Announcements will be sent to the mailing list or posted in Canvas.
Assignment submission: GradeScope on Canvas.

Multivariate Data

“multi” means more than one
Multivariate data: the data with simultaneous measurements on many variables

More Examples of Multivariate Data

A basketball player: points, rebounds, steals, assists, turnovers, free throws, fouls, etc
A person’s well-being: social, economic, psychological, medical, physical, etc
A person’s annual physical exam report

What is Multivariate Analysis

The term “multivariate analysis” implies a broader scope than univariate analysis.
Certain approaches like simple linear regression and multiple regression are typically not considered as multivariate analysis as they tend to focus on the conditional distribution of one univariate variable rather than multiple variables.
Multivariate analysis focuses on the joint behavior of several variables simultaneously to identify patterns and relationships.

Learning Objectives

Matrix algebra, distributions
Visualization
Inference about a mean vector or multiple mean vectors
Multivariate analysis of variance (MANOVA) and multivariate regression
Linear discriminant analysis (LDA)
Principal component analysis (PCA)
Cluster analysis
Factor analysis

Milestones in the history of multivariate analysis

1901: PCA was invented by Karl Pearson; independently developed by Harold Hotelling in the 1930s.
1904: Charles Spearman introduced factor analysis to identify underlying factors that explain the correlation between multiple variables.
1928: Wishart presented the distribution of the covariance matrix of a random sample from a multivariate normal distribution.
1936: Ronald Fisher developed discriminant analysis.

Milestones in the history of multivariate analysis

1932: Cluster analysis by Driver and Kroeber.
1936: Canonical analysis by Harold Hotelling.
1960s: Multidimensional scaling.
1970s: Multivariate regression.
1980s: Structural equation modeling; the idea dated back to (1920-1921) by Sewall Wright.

Matrix Algebra

Vectors: We begin with a little bit matrix algebra

Vectors in R

There are many ways to create or define a vector in R

x=rep(0.3, 4)
x

[1] 0.3 0.3 0.3 0.3

x=seq(1, 4, by=0.2)
x

 [1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0

c("a1", "a2", "a3")

[1] "a1" "a2" "a3"

Vectors in R

x=c(0.4, 0.2, 0.5)
x

[1] 0.4 0.2 0.5

length(x)

[1] 3

dim(x) #note that there is no dimension information

NULL

A row or column of a matrix is also a vector

x=rbind(c(0.4,0.2,0.5), rep(1,3))
dim(x)

[1] 2 3

x[1,]

[1] 0.4 0.2 0.5

x[,1]

[1] 0.4 1.0

Special Matrices

Row or Column Vectors

A vector (column vector) is a special matrix consisting of a single column of elements. e.g., \[a=\begin{pmatrix}a_1\\ a_2 \\ a_3\end{pmatrix}\]
A row vector is a special matrix consisting of a single row of elements \[b=(b_1, b_2, b_3, b_4)\]

Row or Column Vectors

In this class, a vector means a column vector
A row or column vector is also a matrix
The transpose of a row vector is a column vector; the transpose of a column vector is row vector. e.g., \[a'=(a_1,a_2,a_3)\]

Row or Column Vectors

In vector/matrix operations, it is helpful to define row or column vectors
A row vector

matrix(rep(0.5,3), 1, 3)

     [,1] [,2] [,3]
[1,]  0.5  0.5  0.5

dim(matrix(rep(0.5,3), 1, 3))

[1] 1 3

#A neater way is to use the pipe "%>%"
matrix(rep(0.5,3), 1, 3) %>% dim

[1] 1 3

Row or Column Vectors

A column vector

x= matrix(rep(0.5,3), 3, 1)
dim(x)

[1] 3 1

# use pipe
x %>% dim

[1] 3 1

Transposes

The transpose of a column vector is a row vector
The transpose of a row vector is a column vector

x= matrix(rep(0.5,3), 3, 1)
x

     [,1]
[1,]  0.5
[2,]  0.5
[3,]  0.5

t(x)

     [,1] [,2] [,3]
[1,]  0.5  0.5  0.5

Types of Special Matrices

Identity matrix
Diagonal matrix
All-ones matrix
Random matrix: a matrix whose entries are random variables. I will introduce matrix normal distributions.

Identity Matrix

#diag(1, 2)
diag(5, 3)

     [,1] [,2] [,3]
[1,]    5    0    0
[2,]    0    5    0
[3,]    0    0    5

diag(1, 2, 3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0

Diagonal Matrix

diag(1:3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3

seq(1,2, by=0.5) %>% diag

     [,1] [,2] [,3]
[1,]    1  0.0    0
[2,]    0  1.5    0
[3,]    0  0.0    2

All-ones

matrix(1, 3, 2)

     [,1] [,2]
[1,]    1    1
[2,]    1    1
[3,]    1    1

Common Vector Operations

Scalar Multiplication

Examples of Scalar Multiplication

x=matrix(c(0.4,0.2,0.5), 3, 1)
10*x

     [,1]
[1,]    4
[2,]    2
[3,]    5

Addition and Subtraction

Example of Addition and Subtraction

x1=matrix(c(0.4,0.2,0.5), 3, 1)
x2=rep(1, 3)
x1+x2

     [,1]
[1,]  1.4
[2,]  1.2
[3,]  1.5

x1-x2

     [,1]
[1,] -0.6
[2,] -0.8
[3,] -0.5

Outer Product

The outer product of two vectors \(x=(x_1,\cdots, x_m)'\) and \(y=(y_1,\cdots,y_n)'\) is \[x\otimes y= \begin{pmatrix} x_1y_1 & x_1y_2 & \cdots & x_1y_n\\ \cdots& \cdots& \cdots& \cdots \\ x_my_1 & x_my_2 & \cdots & x_my_n \end{pmatrix}\]
A similar operation for matrices is called Kronecker product.

Example: outer product

x1=matrix(c(0.4,0.2,0.5), 3, 1)
x2=rep(1, 3)
x1%*%x2

     [,1] [,2] [,3]
[1,]  0.4  0.4  0.4
[2,]  0.2  0.2  0.2
[3,]  0.5  0.5  0.5

Inner product

Let \(x=\begin{pmatrix}x_1\\ \cdots\\ x_n\end{pmatrix}, y=\begin{pmatrix}y_1\\ \cdots\\ y_n\end{pmatrix}\) The inner product of \(x\) and \(y\) is \(<x,y>=x_1y_1 + \cdots x_ky_n=\sum_{i=1}^n x_iy_i\)
Note, the two vectors must have the same length
The norm / Euclidean norm / length of \(x\) is \(||x||=\sqrt{<x,x>}\).
The Euclidean distance between \(x\) and \(y\) is \[D(x,y)=||x-y||=\sqrt{(x_1-y_1)^2 + \cdots (x_k-y_k)^2}\]

Inner Product and Norm

Distance: 1d and 2d

Distance: 3d

Example: Norm

x1=matrix(c(0.4,0.2,0.5), 3, 1)
#the norm/length of x1
sqrt(sum(x1^2))

[1] 0.6708204

#or use pipe
x1^2 %>% sum %>% sqrt

[1] 0.6708204

Example: (Euclidean) Distance

x1=matrix(c(0.4,0.2,0.5), 3, 1)
x2=rep(1, 3)
sqrt(sum((x1-x2)^2))

[1] 1.118034

#or use pipe
(x1-x2)^2 %>% sum %>% sqrt

[1] 1.118034

Example: (Euclidean) Distance

Motivating example. Consider bivariate random vectors. The standard deviations are 2 and 1, respectively.
What is the distance between (-2,0) and (2,0)? 4.
What is the distance between (0, -2) and (0,2)? 4.

Example: (Euclidean) Distance

#R code
set.seed(20230404)
par(pty="s")#to make sure the shape of figure is a square
mvrnorm(n=1000, c(0,0), matrix(c(4,0,0,1),2,2)) %>% 
  plot(xlab="x", ylab="y", xlim=c(-4,4), ylim=c(-4,4))
points(x=c(-2, 0, 0, 2), y=c(0, -2, 2, 0), pch=c(15, 16, 16, 15), 
       col=c(2,3,3,2),cex=3)

Both pairs have a distance of 4.
But we notice that the pairs with a y-distance greater than 4 is very rare; as a comparison, there are much pairs with a x-distance greater than 4.

A Homework Problem of Euclidean Distances

Suppose \(X_1, X_2, Y_1, Y_2\) are mutually independent.
- \(X_1\) and \(X_2\) are iid from \(N(\mu=0, \sigma_x^2=2^2)\)
- \(Y_1\) and \(Y_2\) are iid from \(N(\mu=0, \sigma_y^2=1^2)\)
Consider the two pairs \((X_1, X_2)\) and \((Y_1, Y_2)\). Which pair tends to have a larger difference?
To answer the question, we can calculate or estimate the following two probabilities: \[P(|X_1-X_2|>4), P(|Y_1-Y_2|>4)\]

Calculate \(P(|X_1-X_2|>4)\)

First, find the distribution of \(X_1-X_2\) and standardize it to have mean 0 and SD 1.
Second, express the probability to \(P(|Z|>z)\), where \(Z\sim N(0,1)\).
Next, express the probability in terms of \(\Phi(\cdot)\), the CDF of the standard normal distribution.
Last, use the “pnorm” function in R to find the numerical value.

Estimate \(P(|X_1-X_2|>4)\)

The probability can be estimated by doing simulations/sampling.
If you sample many (say 10,000) pairs of \(X_1\) and \(X_2\), count how many pairs satisfying \(|X_1-X_2|>4\). The probability can be used to estimate \(P(|X_1-X_2|>4)\)

Statistical / Mahalanobis Distance

The two probabilities \(P(|X_1-X_2|>4)\) \(P(|Y_1-Y_2|>4)\) are quite different.
Euclidean distance might be misleading.
In this example we have examined, the x-values and y-values are independent but have different variations.

Statistical / Mahalanobis Distance

The variation along \(x\) is greater than along \(y\). Let \(X_1\) and \(X_2\) be two random points along the \(x\) direction, \(Y_1\) and \(Y_2\) be two random points along the \(y\) direction.
One simple idea is to standardize both. Because the SD of Y is 1 we don’t need to change the y-values. Because the SD of X is 2, we shrink the x-values by 50%.
- point (-2,0) becomes (-1,0)
- point (2, 0) becomes (1,0)
The distance between the red pair is 2, the distance between the green pair is 4.

Standardized Observations

Original vs Standardized Observations

Statistical Distance

In the example above \(X\) and \(Y\) are independent, as a result, the covariance is zero. Statistical distance can also be defined when the covariance matrix \(\Sigma\) is not diagonal;
We will introduce a type of statistical distance, which is known as Mahalanobis distance.