Multivariate Analysis Lecture 17: Factor Analysis

Zhaoxia Yu | Professor, Department of Statistics

2025-05-27

Required packages

Code

knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(tidyr) #the pipe (%>%) tool is extremely useful
library(MASS)
library(ggplot2)
library(kableExtra)

library(corrplot)#for visualizing the corr matrix of the iris data

library(png)
library(grid)
library(gridExtra)

#install.packages("psych")
#install.packages("semPlot")
#install.packages("lavaan")
#install.packages("lavaanPlot")
library(psych)
library(semPlot)  
library(lavaan)
library(lavaanPlot)

Outline

Review of PCA

Motivating Example

A Motivating Example: Exam Scores

Simulated data
60 students
six subjects: Latin, English, History, Arithmetic, Algebra, Geometry

Code

set.seed(6)
mu=rnorm(60, sd=1)
mu1=mu+rnorm(60)
mu2=mu+rnorm(60)
exam=cbind(mu1+rnorm(60, sd=0.5), mu1+rnorm(60, sd=0.5), mu1+rnorm(60, sd=0.5), 
          mu2+rnorm(60, sd=0.5), mu2+rnorm(60, sd=0.5), mu2+rnorm(60, sd=0.5))
colnames(exam)=c("Latin","English","History",
                 "Arithmetic","Algebra","Geometry")
exam=data.frame(exam)

Pairwise Correlation

Code

corrplot(cor(exam), method="number")

PCA: Visualize Eigenvalues

Code

obj=princomp(exam, cor=TRUE)
cumsum(obj$sdev^2)/sum(obj$sdev^2)

   Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6 
0.6986579 0.9361760 0.9583366 0.9742750 0.9873617 1.0000000

Code

plot(obj, type="lines", main="Scree Plot")

PCA Loadings

Code

obj$loadings


Loadings:
           Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Latin       0.394  0.456  0.176  0.149  0.756  0.107
English     0.426  0.360 -0.127 -0.187 -0.465  0.650
History     0.417  0.393 -0.139        -0.323 -0.740
Arithmetic  0.414 -0.368  0.806        -0.193       
Algebra     0.396 -0.437 -0.456  0.659              
Geometry    0.402 -0.427 -0.275 -0.708  0.265       

               Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
SS loadings     1.000  1.000  1.000  1.000  1.000  1.000
Proportion Var  0.167  0.167  0.167  0.167  0.167  0.167
Cumulative Var  0.167  0.333  0.500  0.667  0.833  1.000

1st PC is about average
2nd PC is about difference between Latin/English/History and Arithmetic/Algebra/Geometry

PCA

1st PC = 0.39Latin+0.43English+0.42History +0.41Arithmetic+0.40Algebra+0.40Geometry
2nd PC = 0.46Latin+0.36English+0.39History -0.37Arithmetic-0.44Algebra-0.43Geometry

Code

obj$scores[,1:2] #first two PCs

           Comp.1       Comp.2
 [1,] -0.12510081  0.387885389
 [2,] -0.24313883  0.565843287
 [3,]  0.61706648 -1.169012826
 [4,]  1.57192571 -1.061544977
 [5,] -1.01259642  0.511535779
 [6,]  0.08159618  0.709633832
 [7,] -3.97028171  1.616681077
 [8,]  1.40252349 -1.718247894
 [9,] -0.87166984 -0.676249351
[10,] -2.83899249  1.873213808
[11,]  2.72771614 -0.850526753
[12,] -1.70338268 -1.375732667
[13,]  0.45223712  1.335051512
[14,] -0.73935793 -0.391321467
[15,]  0.36383804 -1.679191292
[16,] -0.40033457 -0.951024825
[17,]  4.07653952 -1.202123539
[18,]  0.61591640  0.133077527
[19,] -0.12488695  0.259977199
[20,]  3.32163279  2.091207138
[21,]  2.00387753  3.054062570
[22,] -1.33374043 -0.320546882
[23,]  5.52664746  0.816383883
[24,] -0.33635132 -1.217102103
[25,] -0.10498766 -1.814458635
[26,] -4.21673559 -0.146734355
[27,]  1.65883546  1.052902505
[28,]  2.16635789  1.786264476
[29,]  2.73200731 -1.478791682
[30,] -1.27463938 -0.621599441
[31,] -2.37991077 -1.616443270
[32,]  1.37425859  0.969212378
[33,] -2.47024496  2.191557493
[34,] -0.02205375  0.198973539
[35,]  1.27620254  1.394858358
[36,] -1.69777512 -1.082925687
[37,]  1.21645111  1.935638299
[38,]  2.17466798 -0.001283984
[39,] -2.26900283  1.182184153
[40,] -4.47651168  0.610396065
[41,] -1.16212454 -1.013488801
[42,] -0.46563689 -0.170207403
[43,] -0.25281534 -0.155695673
[44,] -0.64299833 -0.617060300
[45,]  1.60383026 -1.065000051
[46,] -0.49803511 -1.390479937
[47,] -0.42374873 -0.582368919
[48,]  2.95636926  1.190798983
[49,] -1.27521832  0.505207025
[50,] -0.14644013 -0.618323482
[51,]  0.29360813 -1.258124936
[52,] -1.53084891  0.644636778
[53,] -1.16723274 -1.853200992
[54,]  2.73613797  0.871310350
[55,] -1.90680059 -0.110926424
[56,]  3.14836985 -0.234451019
[57,] -3.98034871  1.828724734
[58,] -1.63857531  0.287901854
[59,]  0.03921813 -1.629983051
[60,]  1.56468801  0.069052627

Visualize PCs (biplot)

Code

biplot(obj, main="Biplot")

Introduction to Factor Analysis

Factor analysis (FA) is a statistical model used to identify underlying relationships between variables.
It tries to understand/identify what (latent factors) leads to the observed correlations among a set of variables.

PCA vs FA

Code

knitr::include_graphics("img/PCA_FA.png")

https://livebook.manning.com/book/r-in-action-second-edition/chapter-14/6

PCA vs FA

Both reduce dimensionality
Different goals:
- PCA: aims to reduce the number of variables while preserving as much variance as possible
- FA: aims to identify latent factors that explain the correlations among observed variables
PCA: finds PCs (eg, PC1, PC2). Focuses on total variance.
FA: find latent factors (eg, F1, F2). Focuses on correlations.

FA Model

Factor Model

Consider a random vector \(\mathbf X \in \mathbb R^p\)
Let \(\boldsymbol \mu \in \mathbb R^p\) denote the population mean
Let \(F\in \mathbb R^m\) denote \(m\) factors

\[\mathbf{X} = \left(\begin{array}{c}X_1\\X_2\\\vdots\\X_p\end{array}\right), \boldsymbol{\mu} = \left(\begin{array}{c}\mu_1\\\mu_2\\\vdots\\\mu_p\end{array}\right), \mathbf{F} = \left(\begin{array}{c}f_1\\f_2\\\vdots\\f_m\end{array}\right)\]

Factor model

\(X_j\), which is observable, is assumed to be a linear function of the unobservable \(f_1, \cdots, f_m\) plus specific errors. \[\begin{aligned} X_1 & = \mu_1 + l_{11}f_1 + l_{12}f_2 + \dots + l_{1m}f_m + \epsilon_1\\ X_2 & = \mu_2 + l_{21}f_1 + l_{22}f_2 + \dots + l_{2m}f_m + \epsilon_2 \\ & \vdots \\ X_p & = \mu_p + l_{p1}f_1 + l_{p2}f_2 + \dots + l_{pm}f_m + \epsilon_p \end{aligned}\] where \(\epsilon_j\) is called the specific factor for feature \(j\).
The means \(\mu_1, \cdots, \mu_p\) are parameters
The coefficients in the factor loading matrix are also parameters

Factor model

Let \(\mathbf L\) denote the \(p\times m\) matrix of factor loadings \[\mathbf{L} = \left(\begin{array}{cccc}l_{11}& l_{12}& \dots & l_{1m}\\l_{21} & l_{22} & \dots & l_{2m}\\ \vdots & \vdots & & \vdots \\l_{p1} & l_{p2} & \dots & l_{pm}\end{array}\right) \]
A compact expression of the factor model is \[\mathbf X_{p\times 1} = \boldsymbol \mu_{p\times 1} + \mathbf L_{p\times m} \mathbf F_{m\times 1} + \boldsymbol \epsilon_{p\times 1}\]

Example: Loading

Code

obj=factanal(exam, factors=2, scores="regression")
obj$loadings


Loadings:
           Factor1 Factor2
Latin      0.937   0.195  
English    0.905   0.321  
History    0.920   0.280  
Arithmetic 0.290   0.878  
Algebra    0.207   0.923  
Geometry   0.220   0.931  

               Factor1 Factor2
SS loadings      2.718   2.710
Proportion Var   0.453   0.452
Cumulative Var   0.453   0.905

How to interpret the loadings and the two factors?

Example: Scores

Code

obj.heatmap=heatmap(obj$scores, main="Factor Scores", col=heat.colors(256, rev=TRUE), scale="none")

How to interpret the scores

Example: Original Scores (sorted)

Code

tmp=as.matrix(exam)[obj.heatmap$rowInd,]
row.names(tmp)=obj.heatmap$rowInd
heatmap(tmp, main="Exam Scores",  col=heat.colors(256, rev=TRUE), Rowv=NA)

Mathematical Details: Assumptions

\(\mathbf F\) and \(\boldsymbol \epsilon\) are uncorrelated
The \(m\) common factors are uncorrelated \[\mathbb E(\mathbf F)=0, Cov(\mathbf F)=\mathbf I_m\]
The specific factors are uncorrelated \[\mathbb E(\boldsymbol\epsilon)=0, Cov(\boldsymbol\epsilon)=\Psi\] where \(\Psi\) is a diagonal matrix with non-negative values, i.e.,

\[\boldsymbol{\Psi} = \left(\begin{array}{cccc}\psi_1 & 0 & \dots & 0 \\ 0 & \psi_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \psi_p \end{array}\right)\]

The Covariance

By the factor model and its assumptions, we have

\[\begin{aligned} \Sigma&=cov(\mathbf X)\\ &= cov(\mathbf L\mathbf F + \boldsymbol \epsilon)\\ &=\mathbf L Cov(\mathbf F) \mathbf L^T + \Psi\\ &=\mathbf L \mathbf L^T + \Psi \end{aligned} \] The last step is due to our assumption that \(cov(\mathbf F)=\mathbf I\)

The Covariance

For \(i\not=j\), the covariance between \(X_i\) (feature \(i\)) and \(X_j\) (feature \(j\)) is \[\sigma_{ij}=cov(X_i, X_j)=\sum_{k=1}^m l_{ik}l_{jk}\]
The variance of \(X_i\) is \[\sigma_{ii} = \sum_{k=1}^m l_{ik}^2 + \psi_i\]

Communality

Communality and Specific Variance

From last slide \[\sigma_{ii} = \sum_{k=1}^m l_{ik}^2 + \psi_i\]
We say that the variance of \(X_i\) is partitioned into communality and specific variance where
- communality is defined as \(h_i^2=\sum_{k=1}^m l_{ik}^2\), which is the proportion of variance contributed by common factors
- specific variance \(\psi_i\), which is the specific variance of \(X_i\)

Example of Communality

Code

obj=factanal(exam, factors=2)
L=obj$loadings[,1:2] 
Psi=diag(obj$uniquenesses)
#uniquenesses
obj$uniquenesses

     Latin    English    History Arithmetic    Algebra   Geometry 
0.08437518 0.07777364 0.07581204 0.14534743 0.10513511 0.08395685

Code

#communality
1-obj$uniquenesses

     Latin    English    History Arithmetic    Algebra   Geometry 
 0.9156248  0.9222264  0.9241880  0.8546526  0.8948649  0.9160432

For Geometry, the communality is ____ and the uniqueness (specific variance) is ____.

Non-uniqueness

Non-uniqueness of Factor Loadings

The factor loading coefficient is NOT unique.
Suppose \(\textbf{X} = \boldsymbol{\mu} + \textbf{LF}+ \boldsymbol{\epsilon}\)
Consider any \(m\times m\) orthogonal matrix \(\Gamma\), which satisfies \(\Gamma \Gamma^T=\Gamma^T \Gamma=\mathbf I\).
Let \(\tilde {\mathbf L}=\mathbf L \Gamma\) The model \(\textbf{X} = \boldsymbol{\mu} + \tilde {\textbf L} \mathbf F+ \boldsymbol{\epsilon}\)

give the same \(\Sigma\) because \[cov(\tilde {\textbf L} \mathbf F)= \tilde {\textbf L} cov (\mathbf F)\tilde {\textbf L}^T=\tilde {\textbf L} \tilde {\textbf L}^T=\mathbf L \Gamma \Gamma^T \mathbf L^T=\mathbf L \mathbf L^T=cov(\mathbf {LF})\]

Non-uniqueness of Factor Loadings

Code

#Estimated Sigma
L%*%t(L) + Psi

               Latin   English   History Arithmetic   Algebra  Geometry
Latin      1.0000002 0.9104285 0.9161363  0.4431684 0.3743267 0.3881987
English    0.9104285 1.0000000 0.9222470  0.5448653 0.4843854 0.4989042
History    0.9161363 0.9222470 1.0000001  0.5128914 0.4493253 0.4636887
Arithmetic 0.4431684 0.5448653 0.5128914  1.0000007 0.8702984 0.8814692
Algebra    0.3743267 0.4843854 0.4493253  0.8702984 0.9999997 0.9053334
Geometry   0.3881987 0.4989042 0.4636887  0.8814692 0.9053334 1.0000002

Non-uniqueness of Factor Loadings

Consider a rotation matrix \(R\) and define \(\tilde {\mathbf L}=L R\)

Code

theta=pi/6
R=matrix(c(cos(theta), sin(theta), -sin(theta), cos(theta)), 2,2)
L.tilde=L%*%R

L.tilde %*% t(L.tilde) + Psi

               Latin   English   History Arithmetic   Algebra  Geometry
Latin      1.0000002 0.9104285 0.9161363  0.4431684 0.3743267 0.3881987
English    0.9104285 1.0000000 0.9222470  0.5448653 0.4843854 0.4989042
History    0.9161363 0.9222470 1.0000001  0.5128914 0.4493253 0.4636887
Arithmetic 0.4431684 0.5448653 0.5128914  1.0000007 0.8702984 0.8814692
Algebra    0.3743267 0.4843854 0.4493253  0.8702984 0.9999997 0.9053334
Geometry   0.3881987 0.4989042 0.4636887  0.8814692 0.9053334 1.0000002

Factor Rotation

Rotation for Better Interpretation

Interpretation of final results are easier for some choices of \(\mathbf L\) than others.
We often rotate the factors to gain insights or for better interpretation
This is one advantage of factor analysis
In practice,
- Step 1: fit a factor model by imposing conditions that lead to a unique solution
- Step 2: the loading matrix L is rotated (multiplied by an orthogonal matrix) in a way that gives a good interpretation of the data. Trial and error
Well know criteria of rotation exist

Two Major Types of Rotation

An orthogonal rotation
- maintains the perpendicularity between factors after rotation.
- assumes that factors are unrelated or independent of each other.
- Varimax is the most commonly used method of orthogonal rotation.

Two Major Types of Rotation

An oblique rotation
- allows factors to be correlated and does not maintain a 90 degrees angle.
- assumes that factors are related or dependent on each other.
- One popular method is Promax
Orthogonal rotations are mathematically appealing/convenient
There is no reason that factors have to be uncorrelated

Varimax and Promax

https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1745-3984.2006.00003.x
Consider a rotation matrix with angle \(\psi\)

\[\begin{pmatrix} cos \psi & -sin \psi\\ sin \psi & cos \psi \end{pmatrix}\]

The Varimax method looks for \(\psi\) that maximizes the Varimax criterion \[\frac{1}{p}\sum_{i} \left[\sum_j l_{ij}^4/h_i - (\sum_j (l_{ij}/h_i)^2)^2/p \right] \]
The Promax is based on Varimax. It basically shrinks small loadings by using a powerful function

No Rotation

Code

principal(exam,nfactors=2, rotate="none")

Principal Components Analysis
Call: principal(r = exam, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
            PC1   PC2   h2    u2 com
Latin      0.81 -0.54 0.95 0.052 1.8
English    0.87 -0.43 0.95 0.054 1.5
History    0.85 -0.47 0.95 0.052 1.6
Arithmetic 0.85  0.44 0.91 0.090 1.5
Algebra    0.81  0.52 0.93 0.070 1.7
Geometry   0.82  0.51 0.94 0.064 1.7

                       PC1  PC2
SS loadings           4.19 1.43
Proportion Var        0.70 0.24
Cumulative Var        0.70 0.94
Proportion Explained  0.75 0.25
Cumulative Proportion 0.75 1.00

Mean item complexity =  1.6
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1

No Rotation

Code

par(pty="s")
plot(principal(exam,nfactors=2, rotate="none"), main="No Rotation")

Factor Rotation: varimax (an orthogonal rotation)

Code

principal(exam,nfactors=2)

Principal Components Analysis
Call: principal(r = exam, nfactors = 2)
Standardized loadings (pattern matrix) based upon correlation matrix
            RC1  RC2   h2    u2 com
Latin      0.96 0.17 0.95 0.052 1.1
English    0.93 0.30 0.95 0.054 1.2
History    0.94 0.25 0.95 0.052 1.1
Arithmetic 0.30 0.90 0.91 0.090 1.2
Algebra    0.22 0.94 0.93 0.070 1.1
Geometry   0.24 0.94 0.94 0.064 1.1

                       RC1  RC2
SS loadings           2.86 2.76
Proportion Var        0.48 0.46
Cumulative Var        0.48 0.94
Proportion Explained  0.51 0.49
Cumulative Proportion 0.51 1.00

Mean item complexity =  1.1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1

Factor Rotation: varimax (an orthogonal rotation)

Code

par(pty="s")
plot(principal(exam, nfactors=2),xlim=c(-0.2,1),ylim=c(-0.2,1), main="varimax (default)")
abline(h=0, v=0)

Factor Rotation: an oblique (non-orthogonal) rotation

Code

principal(exam,nfactors=2, rotate="promax")

Principal Components Analysis
Call: principal(r = exam, nfactors = 2, rotate = "promax")
Standardized loadings (pattern matrix) based upon correlation matrix
             RC1   RC2   h2    u2 com
Latin       1.01 -0.08 0.95 0.052   1
English     0.93  0.07 0.95 0.054   1
History     0.96  0.02 0.95 0.052   1
Arithmetic  0.07  0.92 0.91 0.090   1
Algebra    -0.03  0.98 0.93 0.070   1
Geometry   -0.02  0.98 0.94 0.064   1

                       RC1  RC2
SS loadings           2.84 2.78
Proportion Var        0.47 0.46
Cumulative Var        0.47 0.94
Proportion Explained  0.50 0.50
Cumulative Proportion 0.50 1.00

 With component correlations of 
     RC1  RC2
RC1 1.00 0.49
RC2 0.49 1.00

Mean item complexity =  1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1

Factor Rotation: an oblique (non-orthogonal) rotation

Code

par(pty="s")
plot(principal(exam,nfactors=2, rotate="promax"), main="promax (oblique)")

Factor Rotation

The following articles provide nice descriptions of the two major types of rotations:

https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1251&context=pare

https://www.theanalysisfactor.com/rotations-factor-analysis/

Computation

Method 1: Use PCA

By the spectral decomposition of \(\Sigma\) we have \[\Sigma=\Gamma \Lambda \Gamma^T\] where \(\Gamma=(\gamma_1, \cdots, \gamma_p)\) is an orthogonal matrix and \(\Lambda=diag (\lambda_1, \cdots, \lambda_p)\) be the diagonal matrix of eigenvalues.
The spectral decomposition can be rewritten to \[\boldsymbol \Sigma=\sum_{i=1}^p \lambda_i \gamma_i\gamma_i^T=\sum_{i=1}^p (\sqrt{\lambda_i}\gamma_i)(\sqrt{\lambda_i}\gamma_i)^T\]

Method 1: Use PCA

Suppose that \(\lambda_m, \lambda_{m+1}, \cdots, \lambda_p\) are small. Then

\[\boldsymbol \Sigma\approx \sum_{i=1}^m (\sqrt{\lambda_i}\gamma_i)(\sqrt{\lambda_i}\gamma_i)^T\]

Let \(\mathbf L=\left(\sqrt{\lambda_1}\gamma_1, \cdots, \sqrt{\lambda_m}\gamma_m\right)\)
Let \(\boldsymbol \Psi=\boldsymbol \Sigma-\mathbf L \mathbf L^T\)

R code

Code

principal(exam, nfactors=2, rotate = "none")

Principal Components Analysis
Call: principal(r = exam, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
            PC1   PC2   h2    u2 com
Latin      0.81 -0.54 0.95 0.052 1.8
English    0.87 -0.43 0.95 0.054 1.5
History    0.85 -0.47 0.95 0.052 1.6
Arithmetic 0.85  0.44 0.91 0.090 1.5
Algebra    0.81  0.52 0.93 0.070 1.7
Geometry   0.82  0.51 0.94 0.064 1.7

                       PC1  PC2
SS loadings           4.19 1.43
Proportion Var        0.70 0.24
Cumulative Var        0.70 0.94
Proportion Explained  0.75 0.25
Cumulative Proportion 0.75 1.00

Mean item complexity =  1.6
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1

Code

principal(exam, nfactors=2)

Principal Components Analysis
Call: principal(r = exam, nfactors = 2)
Standardized loadings (pattern matrix) based upon correlation matrix
            RC1  RC2   h2    u2 com
Latin      0.96 0.17 0.95 0.052 1.1
English    0.93 0.30 0.95 0.054 1.2
History    0.94 0.25 0.95 0.052 1.1
Arithmetic 0.30 0.90 0.91 0.090 1.2
Algebra    0.22 0.94 0.93 0.070 1.1
Geometry   0.24 0.94 0.94 0.064 1.1

                       RC1  RC2
SS loadings           2.86 2.76
Proportion Var        0.48 0.46
Cumulative Var        0.48 0.94
Proportion Explained  0.51 0.49
Cumulative Proportion 0.51 1.00

Mean item complexity =  1.1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1

Method 2: MLE

We impose multivariate normality on the common and specific factors

\[\mathbf F\sim N(\mathbf 0, \mathbf I), \epsilon\sim N(\mathbf 0, \Psi)\]

The log-likelihood is \[\begin{aligned} l(\mu,\mathbf L, \Psi) &= - \dfrac{np}{2}\log{2\pi}- \dfrac{n}{2}\log{|\mathbf{LL}^T + \Psi|} -\\ &\dfrac{1}{2}\sum_{i=1}^{n}(\mathbf X_i-\mu)^T(\mathbf L \mathbf L^T+\Psi)^{-1}(\mathbf X_i-\mu) \end{aligned}\]

where \(\mathbf {X}_i\in \mathbb R^p\) denotes the \(i\) observation (not the \(i\)th feature).

The Number of Common Factors

The \(p\times p\) covariance matrix \(\Sigma\) is symmetric. As a result, there are \(\frac{p(p+1)}{2}\) parameters.
A factor mode imposes a structure on \(\Sigma\)
For a FA model with \(m\) common factors
A FA model a small number common factors, i.e., when \(m\) is small, the model uses fewer parameters
- the model is more parsimonious
- the model might not be adequate is \(m\) is too small
One can test whether an \(m\) is large enough

Choose \(m\) of Factors Computed using PCA

Cumulatative variance explained is should be reasonably large, such as >\(80\%\)
Look for elbow from the scree plot

A Goodness of Fit Test for the Adequacy of the Number of Common Factors

\[\begin{align*} H_0: \quad \boldsymbol{\Sigma}_{(p \times p)} &= \mathbf{L}_{(p \times m)} \, \mathbf{L}'_{(m \times p)} + \boldsymbol{\Psi}_{(p \times p)} \\ \\ -2 \ln \Lambda &= -2 \ln \left[ \frac{\text{maximized likelihood under } H_0} {\text{maximized likelihood}} \right] \\ \\ &= -2 \ln \left( \frac{|\hat{\boldsymbol{\Sigma}}|}{|\mathbf{S}_n|} \right)^{-n/2} + n \left[ \operatorname{tr}(\hat{\boldsymbol{\Sigma}}^{-1} \mathbf{S}_n) - p \right] \\ \\ & \text{It can be shown that } \operatorname{tr}(\hat{\boldsymbol{\Sigma}}^{-1} \mathbf{S}_n) - p = 0 \\ \\ &= n \ln \left( \frac{|\hat{\boldsymbol{\Sigma}}|}{|\mathbf{S}_n|} \right) \end{align*}\]

Test for the Adequacy of the Number of Common Factors

The number of parameters for covariance in the full model is \[\frac{p(p+1)}{2}\]
The number of parameters for covariance in the reduced model is \[mp + p - \frac{m(m-1)}{2}\]

Note: \(- \frac{m(m-1)}{2}\) is due to the nonuniqueness of \(\mathbf L\).

Degrees of freedom

The difference in numbers of parameters between the two models is

\[\begin{aligned} df&=\frac{p(p+1)}{2}-[mp+p-\frac{m(m-1)}{2}]\\ &= \frac{1}{2}[(p-m)^2 -p-m] \end{aligned}\]

Under the null hypothesis (adequate), the test statistics follows a chi-squared distribution when the sample size is large enough.

Test for the Adequacy of the Number of Common Factors

The result indicates that 1 factor is not adequate as the p-value is small.
The p-value is about whether the correlation structure specified in the proposed model is significantly different from that of the full model

Code

print("p-value for 1 factor")

[1] "p-value for 1 factor"

Code

factanal(exam, factors=1)$PVAL

   objective 
2.264759e-30

Test for the Adequacy of the Number of Common Factors

The result indicates that 2 factors is adequate because the fit is not substantially from the full model.

Code

print("p-value for 2 factors")

[1] "p-value for 2 factors"

Code

factanal(exam, factors=2)$PVAL

objective 
 0.930055

CFA vs EFA

Exploratory Factor Analysis

The FA approach we have discussed is exploratory in nature.
In fact, we can perform EFA and identify latent factors by using only correlations, not the data
The purpose of EFA is to explore the possible underlying structure that can explain the observed pattern of correlations
EFA is used when researchers do not have a specific idea about the underlying structure of data
EFA tries to identify the factor configuration (model)
EFA is hypothesis-generating

Exploratory or Confirmatory Factor Analysis

Confirmatory Factory Analysis (CFA) is used when a researcher has specific hypotheses or theories about the factor structure of their data.
It is a “theory-driven” approach.
In CFA, the researcher specifies the number of factors and which variables load onto which factors.
CFA is typically used in later stages of research to test or confirm the factor structure suggested by EFA
CFA is hypothesis testing. A pre-specified model is required

Exploratory or Confirmatory Factor Analysis

Use EFA when:
- You are unsure about the underlying structure.
- You aim to uncover complex patterns.
- You need to form hypotheses and develop theory.
Use CFA when:
- You have a predetermined theory or model.
- You aim to test the hypothesis about the factor structure.
- You need to confirm or disconfirm theories.

CFA: Example

R Packages for CFA

Conduct CFA in R
R packages:
- sem
- OpenMx
- lavaan

An Example using “lavaan”

CFA can be performed using the latent variable analysis (“lavaan”) package in r

Code

model1 <- '
verbal =~ Latin + English + History
math =~ Arithmetic + Algebra + Geometry'

obj=cfa(model1, data=data.frame(exam))
summary(obj)

lavaan 0.6-19 ended normally after 51 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        13

  Number of observations                            60

Model Test User Model:
                                                      
  Test statistic                                 8.586
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.378

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  verbal =~                                           
    Latin             1.000                           
    English           0.993    0.059   16.762    0.000
    History           0.952    0.056   17.055    0.000
  math =~                                             
    Arithmetic        1.000                           
    Algebra           0.976    0.071   13.740    0.000
    Geometry          1.035    0.073   14.253    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  verbal ~~                                           
    math              1.151    0.342    3.360    0.001

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Latin             0.285    0.072    3.928    0.000
   .English           0.210    0.063    3.349    0.001
   .History           0.175    0.056    3.140    0.002
   .Arithmetic        0.341    0.084    4.060    0.000
   .Algebra           0.234    0.068    3.423    0.001
   .Geometry          0.206    0.071    2.916    0.004
    verbal            2.505    0.509    4.920    0.000
    math              2.015    0.429    4.693    0.000

The User model

Code

semPaths(obj, what="std", layout="tree", edge.label.cex=1.2)

The User model

The model fits well. This suggests that the model does not significantly deviate from the observed data.

Model Test User Model:
                                                      
  Test statistic                                 8.586
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.378

Understand the Baseline model

Code

model0 <- '
Latin ~~ Latin
English ~~ English
History ~~ History
Arithmetic ~~ Arithmetic
Algebra ~~ Algebra
Geometry ~~ Geometry
'
obj0=cfa(model0, data=data.frame(exam))
semPaths(obj0, what="est")

Understand the Baseline model

The null model assumes no relationships between the variables. The output indicates that the baseline model is a poor fit, which is expected since it does not account for any relationships.

Code

summary(obj0)

lavaan 0.6-19 ended normally after 8 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         6

  Number of observations                            60

Model Test User Model:
                                                      
  Test statistic                               462.072
  Degrees of freedom                                15
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    Latin             2.790    0.509    5.477    0.000
    English           2.679    0.489    5.477    0.000
    History           2.445    0.446    5.477    0.000
    Arithmetic        2.356    0.430    5.477    0.000
    Algebra           2.153    0.393    5.477    0.000
    Geometry          2.366    0.432    5.477    0.000

Model Fit Statistics

We often need to compare a FA (reduced) model to the full model \[H_0: \boldsymbol \Sigma(\mathbf L, \Psi)=\boldsymbol \Sigma\]
Chi-square test. Derived from the likelihood ratio test. Depends on sample sizes.
RMSEA: root mean square error of approximation compares the sample correlation matrix and the model correlation matrix. <0.06 is good. \[RMSEA = \sqrt{\frac{\delta}{df(N-1)}}\] where \(\delta=\chi^2 - df\).

Model Fit Statistics

CFI: comparative fit index that measures the relative difference between two models; is not affected by sample size too much; between 0 and 1; the larger the better. >0.9 is good

\[CFI= \frac{\delta(\mbox{Baseline}) - \delta(\mbox{User})}{\delta(\mbox{Baseline})}\]

More can be found in wikipedia and
- https://doi.org/10.1016/B978-0-444-53737-9.50010-4
- Bentler PM. Comparative fit indexes in structural models. Psychological bulletin. 1990 Mar;107(2):238.
- http://www.davidakenny.net/cm/fit.htm

SEM Results: CFI and TLI

CFI: 0.999
TLI: 0.998

The Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) are both close to 1, indicating a good fit of the model to the data. For the two-factor model:
- acceptable: >0.9
- excellent: >0.95

CFI and TLI for user model and baseline model

Code

fitMeasures(obj0, c("chisq", "cfi", "rmsea"))

  chisq     cfi   rmsea 
462.072   0.000   0.705

Code

fitMeasures(obj, c("chisq", "cfi", "rmsea"))

chisq   cfi rmsea 
8.586 0.999 0.035

Compute Factor scores

Code

objscores <- lavPredict(obj)
plot(objscores)

Helpful Resources

https://quantdev.ssri.psu.edu/tutorials/intro-basic-confirmatory-factor-analysis
https://lavaan.ugent.be/tutorial/tutorial.pdf
https://stats.oarc.ucla.edu/r/seminars/rcfa/

Code

#knitr::knit_exit()