Multivariate Analysis Lecture 17: Factor Analysis

Zhaoxia Yu | Professor, Department of Statistics

2025-05-27

Required packages

Code
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(tidyr) #the pipe (%>%) tool is extremely useful
library(MASS)
library(ggplot2)
library(kableExtra)

library(corrplot)#for visualizing the corr matrix of the iris data

library(png)
library(grid)
library(gridExtra)

#install.packages("psych")
#install.packages("semPlot")
#install.packages("lavaan")
#install.packages("lavaanPlot")
library(psych)
library(semPlot)  
library(lavaan)
library(lavaanPlot)

Outline

  • Review of PCA

Motivating Example

A Motivating Example: Exam Scores

  • Simulated data
  • 60 students
  • six subjects: Latin, English, History, Arithmetic, Algebra, Geometry
Code
set.seed(6)
mu=rnorm(60, sd=1)
mu1=mu+rnorm(60)
mu2=mu+rnorm(60)
exam=cbind(mu1+rnorm(60, sd=0.5), mu1+rnorm(60, sd=0.5), mu1+rnorm(60, sd=0.5), 
          mu2+rnorm(60, sd=0.5), mu2+rnorm(60, sd=0.5), mu2+rnorm(60, sd=0.5))
colnames(exam)=c("Latin","English","History",
                 "Arithmetic","Algebra","Geometry")
exam=data.frame(exam)

Pairwise Correlation

Code
corrplot(cor(exam), method="number")

PCA: Visualize Eigenvalues

Code
obj=princomp(exam, cor=TRUE)
cumsum(obj$sdev^2)/sum(obj$sdev^2)
   Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6 
0.6986579 0.9361760 0.9583366 0.9742750 0.9873617 1.0000000 
Code
plot(obj, type="lines", main="Scree Plot")

PCA Loadings

Code
obj$loadings

Loadings:
           Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Latin       0.394  0.456  0.176  0.149  0.756  0.107
English     0.426  0.360 -0.127 -0.187 -0.465  0.650
History     0.417  0.393 -0.139        -0.323 -0.740
Arithmetic  0.414 -0.368  0.806        -0.193       
Algebra     0.396 -0.437 -0.456  0.659              
Geometry    0.402 -0.427 -0.275 -0.708  0.265       

               Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
SS loadings     1.000  1.000  1.000  1.000  1.000  1.000
Proportion Var  0.167  0.167  0.167  0.167  0.167  0.167
Cumulative Var  0.167  0.333  0.500  0.667  0.833  1.000
  • 1st PC is about average
  • 2nd PC is about difference between Latin/English/History and Arithmetic/Algebra/Geometry

PCA

  • 1st PC = 0.39Latin+0.43English+0.42History +0.41Arithmetic+0.40Algebra+0.40Geometry

  • 2nd PC = 0.46Latin+0.36English+0.39History -0.37Arithmetic-0.44Algebra-0.43Geometry

Code
obj$scores[,1:2] #first two PCs
           Comp.1       Comp.2
 [1,] -0.12510081  0.387885389
 [2,] -0.24313883  0.565843287
 [3,]  0.61706648 -1.169012826
 [4,]  1.57192571 -1.061544977
 [5,] -1.01259642  0.511535779
 [6,]  0.08159618  0.709633832
 [7,] -3.97028171  1.616681077
 [8,]  1.40252349 -1.718247894
 [9,] -0.87166984 -0.676249351
[10,] -2.83899249  1.873213808
[11,]  2.72771614 -0.850526753
[12,] -1.70338268 -1.375732667
[13,]  0.45223712  1.335051512
[14,] -0.73935793 -0.391321467
[15,]  0.36383804 -1.679191292
[16,] -0.40033457 -0.951024825
[17,]  4.07653952 -1.202123539
[18,]  0.61591640  0.133077527
[19,] -0.12488695  0.259977199
[20,]  3.32163279  2.091207138
[21,]  2.00387753  3.054062570
[22,] -1.33374043 -0.320546882
[23,]  5.52664746  0.816383883
[24,] -0.33635132 -1.217102103
[25,] -0.10498766 -1.814458635
[26,] -4.21673559 -0.146734355
[27,]  1.65883546  1.052902505
[28,]  2.16635789  1.786264476
[29,]  2.73200731 -1.478791682
[30,] -1.27463938 -0.621599441
[31,] -2.37991077 -1.616443270
[32,]  1.37425859  0.969212378
[33,] -2.47024496  2.191557493
[34,] -0.02205375  0.198973539
[35,]  1.27620254  1.394858358
[36,] -1.69777512 -1.082925687
[37,]  1.21645111  1.935638299
[38,]  2.17466798 -0.001283984
[39,] -2.26900283  1.182184153
[40,] -4.47651168  0.610396065
[41,] -1.16212454 -1.013488801
[42,] -0.46563689 -0.170207403
[43,] -0.25281534 -0.155695673
[44,] -0.64299833 -0.617060300
[45,]  1.60383026 -1.065000051
[46,] -0.49803511 -1.390479937
[47,] -0.42374873 -0.582368919
[48,]  2.95636926  1.190798983
[49,] -1.27521832  0.505207025
[50,] -0.14644013 -0.618323482
[51,]  0.29360813 -1.258124936
[52,] -1.53084891  0.644636778
[53,] -1.16723274 -1.853200992
[54,]  2.73613797  0.871310350
[55,] -1.90680059 -0.110926424
[56,]  3.14836985 -0.234451019
[57,] -3.98034871  1.828724734
[58,] -1.63857531  0.287901854
[59,]  0.03921813 -1.629983051
[60,]  1.56468801  0.069052627

Visualize PCs (biplot)

Code
biplot(obj, main="Biplot")

Introduction to Factor Analysis

  • Factor analysis (FA) is a statistical model used to identify underlying relationships between variables.
  • It tries to understand/identify what (latent factors) leads to the observed correlations among a set of variables.

PCA vs FA

Code
knitr::include_graphics("img/PCA_FA.png")

https://livebook.manning.com/book/r-in-action-second-edition/chapter-14/6

PCA vs FA

  • Both reduce dimensionality
  • Different goals:
    • PCA: aims to reduce the number of variables while preserving as much variance as possible
    • FA: aims to identify latent factors that explain the correlations among observed variables
  • PCA: finds PCs (eg, PC1, PC2). Focuses on total variance.
  • FA: find latent factors (eg, F1, F2). Focuses on correlations.

FA Model

Factor Model

  • Consider a random vector \(\mathbf X \in \mathbb R^p\)
  • Let \(\boldsymbol \mu \in \mathbb R^p\) denote the population mean
  • Let \(F\in \mathbb R^m\) denote \(m\) factors

\[\mathbf{X} = \left(\begin{array}{c}X_1\\X_2\\\vdots\\X_p\end{array}\right), \boldsymbol{\mu} = \left(\begin{array}{c}\mu_1\\\mu_2\\\vdots\\\mu_p\end{array}\right), \mathbf{F} = \left(\begin{array}{c}f_1\\f_2\\\vdots\\f_m\end{array}\right)\]

Factor model

  • \(X_j\), which is observable, is assumed to be a linear function of the unobservable \(f_1, \cdots, f_m\) plus specific errors. \[\begin{aligned} X_1 & = \mu_1 + l_{11}f_1 + l_{12}f_2 + \dots + l_{1m}f_m + \epsilon_1\\ X_2 & = \mu_2 + l_{21}f_1 + l_{22}f_2 + \dots + l_{2m}f_m + \epsilon_2 \\ & \vdots \\ X_p & = \mu_p + l_{p1}f_1 + l_{p2}f_2 + \dots + l_{pm}f_m + \epsilon_p \end{aligned}\] where \(\epsilon_j\) is called the specific factor for feature \(j\).
  • The means \(\mu_1, \cdots, \mu_p\) are parameters
  • The coefficients in the factor loading matrix are also parameters

Factor model

  • Let \(\mathbf L\) denote the \(p\times m\) matrix of factor loadings \[\mathbf{L} = \left(\begin{array}{cccc}l_{11}& l_{12}& \dots & l_{1m}\\l_{21} & l_{22} & \dots & l_{2m}\\ \vdots & \vdots & & \vdots \\l_{p1} & l_{p2} & \dots & l_{pm}\end{array}\right) \]
  • A compact expression of the factor model is \[\mathbf X_{p\times 1} = \boldsymbol \mu_{p\times 1} + \mathbf L_{p\times m} \mathbf F_{m\times 1} + \boldsymbol \epsilon_{p\times 1}\]

Example: Loading

Code
obj=factanal(exam, factors=2, scores="regression")
obj$loadings

Loadings:
           Factor1 Factor2
Latin      0.937   0.195  
English    0.905   0.321  
History    0.920   0.280  
Arithmetic 0.290   0.878  
Algebra    0.207   0.923  
Geometry   0.220   0.931  

               Factor1 Factor2
SS loadings      2.718   2.710
Proportion Var   0.453   0.452
Cumulative Var   0.453   0.905
  • How to interpret the loadings and the two factors?

Example: Scores

Code
obj.heatmap=heatmap(obj$scores, main="Factor Scores", col=heat.colors(256, rev=TRUE), scale="none")
  • How to interpret the scores

Example: Original Scores (sorted)

Code
tmp=as.matrix(exam)[obj.heatmap$rowInd,]
row.names(tmp)=obj.heatmap$rowInd
heatmap(tmp, main="Exam Scores",  col=heat.colors(256, rev=TRUE), Rowv=NA)

Mathematical Details: Assumptions

  • \(\mathbf F\) and \(\boldsymbol \epsilon\) are uncorrelated

  • The \(m\) common factors are uncorrelated \[\mathbb E(\mathbf F)=0, Cov(\mathbf F)=\mathbf I_m\]

  • The specific factors are uncorrelated \[\mathbb E(\boldsymbol\epsilon)=0, Cov(\boldsymbol\epsilon)=\Psi\] where \(\Psi\) is a diagonal matrix with non-negative values, i.e.,

\[\boldsymbol{\Psi} = \left(\begin{array}{cccc}\psi_1 & 0 & \dots & 0 \\ 0 & \psi_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \psi_p \end{array}\right)\]

The Covariance

  • By the factor model and its assumptions, we have

\[\begin{aligned} \Sigma&=cov(\mathbf X)\\ &= cov(\mathbf L\mathbf F + \boldsymbol \epsilon)\\ &=\mathbf L Cov(\mathbf F) \mathbf L^T + \Psi\\ &=\mathbf L \mathbf L^T + \Psi \end{aligned} \] The last step is due to our assumption that \(cov(\mathbf F)=\mathbf I\)

The Covariance

  • For \(i\not=j\), the covariance between \(X_i\) (feature \(i\)) and \(X_j\) (feature \(j\)) is \[\sigma_{ij}=cov(X_i, X_j)=\sum_{k=1}^m l_{ik}l_{jk}\]
  • The variance of \(X_i\) is \[\sigma_{ii} = \sum_{k=1}^m l_{ik}^2 + \psi_i\]

Communality

Communality and Specific Variance

  • From last slide \[\sigma_{ii} = \sum_{k=1}^m l_{ik}^2 + \psi_i\]

  • We say that the variance of \(X_i\) is partitioned into communality and specific variance where

    • communality is defined as \(h_i^2=\sum_{k=1}^m l_{ik}^2\), which is the proportion of variance contributed by common factors
    • specific variance \(\psi_i\), which is the specific variance of \(X_i\)

Example of Communality

Code
obj=factanal(exam, factors=2)
L=obj$loadings[,1:2] 
Psi=diag(obj$uniquenesses)
#uniquenesses
obj$uniquenesses
     Latin    English    History Arithmetic    Algebra   Geometry 
0.08437518 0.07777364 0.07581204 0.14534743 0.10513511 0.08395685 
Code
#communality
1-obj$uniquenesses
     Latin    English    History Arithmetic    Algebra   Geometry 
 0.9156248  0.9222264  0.9241880  0.8546526  0.8948649  0.9160432 
  • For Geometry, the communality is ____ and the uniqueness (specific variance) is ____.

Non-uniqueness

Non-uniqueness of Factor Loadings

  • The factor loading coefficient is NOT unique.
  • Suppose \(\textbf{X} = \boldsymbol{\mu} + \textbf{LF}+ \boldsymbol{\epsilon}\)
  • Consider any \(m\times m\) orthogonal matrix \(\Gamma\), which satisfies \(\Gamma \Gamma^T=\Gamma^T \Gamma=\mathbf I\).
  • Let \(\tilde {\mathbf L}=\mathbf L \Gamma\) The model \(\textbf{X} = \boldsymbol{\mu} + \tilde {\textbf L} \mathbf F+ \boldsymbol{\epsilon}\)

give the same \(\Sigma\) because \[cov(\tilde {\textbf L} \mathbf F)= \tilde {\textbf L} cov (\mathbf F)\tilde {\textbf L}^T=\tilde {\textbf L} \tilde {\textbf L}^T=\mathbf L \Gamma \Gamma^T \mathbf L^T=\mathbf L \mathbf L^T=cov(\mathbf {LF})\]

Non-uniqueness of Factor Loadings

Code
#Estimated Sigma
L%*%t(L) + Psi
               Latin   English   History Arithmetic   Algebra  Geometry
Latin      1.0000002 0.9104285 0.9161363  0.4431684 0.3743267 0.3881987
English    0.9104285 1.0000000 0.9222470  0.5448653 0.4843854 0.4989042
History    0.9161363 0.9222470 1.0000001  0.5128914 0.4493253 0.4636887
Arithmetic 0.4431684 0.5448653 0.5128914  1.0000007 0.8702984 0.8814692
Algebra    0.3743267 0.4843854 0.4493253  0.8702984 0.9999997 0.9053334
Geometry   0.3881987 0.4989042 0.4636887  0.8814692 0.9053334 1.0000002

Non-uniqueness of Factor Loadings

  • Consider a rotation matrix \(R\) and define \(\tilde {\mathbf L}=L R\)
Code
theta=pi/6
R=matrix(c(cos(theta), sin(theta), -sin(theta), cos(theta)), 2,2)
L.tilde=L%*%R

L.tilde %*% t(L.tilde) + Psi
               Latin   English   History Arithmetic   Algebra  Geometry
Latin      1.0000002 0.9104285 0.9161363  0.4431684 0.3743267 0.3881987
English    0.9104285 1.0000000 0.9222470  0.5448653 0.4843854 0.4989042
History    0.9161363 0.9222470 1.0000001  0.5128914 0.4493253 0.4636887
Arithmetic 0.4431684 0.5448653 0.5128914  1.0000007 0.8702984 0.8814692
Algebra    0.3743267 0.4843854 0.4493253  0.8702984 0.9999997 0.9053334
Geometry   0.3881987 0.4989042 0.4636887  0.8814692 0.9053334 1.0000002

Factor Rotation

Rotation for Better Interpretation

  • Interpretation of final results are easier for some choices of \(\mathbf L\) than others.
  • We often rotate the factors to gain insights or for better interpretation
  • This is one advantage of factor analysis
  • In practice,
    • Step 1: fit a factor model by imposing conditions that lead to a unique solution
    • Step 2: the loading matrix L is rotated (multiplied by an orthogonal matrix) in a way that gives a good interpretation of the data. Trial and error
  • Well know criteria of rotation exist

Two Major Types of Rotation

  • An orthogonal rotation
    • maintains the perpendicularity between factors after rotation.
    • assumes that factors are unrelated or independent of each other.
    • Varimax is the most commonly used method of orthogonal rotation.

Two Major Types of Rotation

  • An oblique rotation
    • allows factors to be correlated and does not maintain a 90 degrees angle.
    • assumes that factors are related or dependent on each other.
    • One popular method is Promax
  • Orthogonal rotations are mathematically appealing/convenient
  • There is no reason that factors have to be uncorrelated

Varimax and Promax

  • https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1745-3984.2006.00003.x
  • Consider a rotation matrix with angle \(\psi\)

\[\begin{pmatrix} cos \psi & -sin \psi\\ sin \psi & cos \psi \end{pmatrix}\]

  • The Varimax method looks for \(\psi\) that maximizes the Varimax criterion \[\frac{1}{p}\sum_{i} \left[\sum_j l_{ij}^4/h_i - (\sum_j (l_{ij}/h_i)^2)^2/p \right] \]
  • The Promax is based on Varimax. It basically shrinks small loadings by using a powerful function

No Rotation

Code
principal(exam,nfactors=2, rotate="none")
Principal Components Analysis
Call: principal(r = exam, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
            PC1   PC2   h2    u2 com
Latin      0.81 -0.54 0.95 0.052 1.8
English    0.87 -0.43 0.95 0.054 1.5
History    0.85 -0.47 0.95 0.052 1.6
Arithmetic 0.85  0.44 0.91 0.090 1.5
Algebra    0.81  0.52 0.93 0.070 1.7
Geometry   0.82  0.51 0.94 0.064 1.7

                       PC1  PC2
SS loadings           4.19 1.43
Proportion Var        0.70 0.24
Cumulative Var        0.70 0.94
Proportion Explained  0.75 0.25
Cumulative Proportion 0.75 1.00

Mean item complexity =  1.6
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1

No Rotation

Code
par(pty="s")
plot(principal(exam,nfactors=2, rotate="none"), main="No Rotation")

Factor Rotation: varimax (an orthogonal rotation)

Code
principal(exam,nfactors=2)
Principal Components Analysis
Call: principal(r = exam, nfactors = 2)
Standardized loadings (pattern matrix) based upon correlation matrix
            RC1  RC2   h2    u2 com
Latin      0.96 0.17 0.95 0.052 1.1
English    0.93 0.30 0.95 0.054 1.2
History    0.94 0.25 0.95 0.052 1.1
Arithmetic 0.30 0.90 0.91 0.090 1.2
Algebra    0.22 0.94 0.93 0.070 1.1
Geometry   0.24 0.94 0.94 0.064 1.1

                       RC1  RC2
SS loadings           2.86 2.76
Proportion Var        0.48 0.46
Cumulative Var        0.48 0.94
Proportion Explained  0.51 0.49
Cumulative Proportion 0.51 1.00

Mean item complexity =  1.1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1

Factor Rotation: varimax (an orthogonal rotation)

Code
par(pty="s")
plot(principal(exam, nfactors=2),xlim=c(-0.2,1),ylim=c(-0.2,1), main="varimax (default)")
abline(h=0, v=0)

Factor Rotation: an oblique (non-orthogonal) rotation

Code
principal(exam,nfactors=2, rotate="promax")
Principal Components Analysis
Call: principal(r = exam, nfactors = 2, rotate = "promax")
Standardized loadings (pattern matrix) based upon correlation matrix
             RC1   RC2   h2    u2 com
Latin       1.01 -0.08 0.95 0.052   1
English     0.93  0.07 0.95 0.054   1
History     0.96  0.02 0.95 0.052   1
Arithmetic  0.07  0.92 0.91 0.090   1
Algebra    -0.03  0.98 0.93 0.070   1
Geometry   -0.02  0.98 0.94 0.064   1

                       RC1  RC2
SS loadings           2.84 2.78
Proportion Var        0.47 0.46
Cumulative Var        0.47 0.94
Proportion Explained  0.50 0.50
Cumulative Proportion 0.50 1.00

 With component correlations of 
     RC1  RC2
RC1 1.00 0.49
RC2 0.49 1.00

Mean item complexity =  1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1

Factor Rotation: an oblique (non-orthogonal) rotation

Code
par(pty="s")
plot(principal(exam,nfactors=2, rotate="promax"), main="promax (oblique)")

Factor Rotation

  • The following articles provide nice descriptions of the two major types of rotations:

https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1251&context=pare

https://www.theanalysisfactor.com/rotations-factor-analysis/

Computation

Method 1: Use PCA

  • By the spectral decomposition of \(\Sigma\) we have \[\Sigma=\Gamma \Lambda \Gamma^T\] where \(\Gamma=(\gamma_1, \cdots, \gamma_p)\) is an orthogonal matrix and \(\Lambda=diag (\lambda_1, \cdots, \lambda_p)\) be the diagonal matrix of eigenvalues.

  • The spectral decomposition can be rewritten to \[\boldsymbol \Sigma=\sum_{i=1}^p \lambda_i \gamma_i\gamma_i^T=\sum_{i=1}^p (\sqrt{\lambda_i}\gamma_i)(\sqrt{\lambda_i}\gamma_i)^T\]

Method 1: Use PCA

  • Suppose that \(\lambda_m, \lambda_{m+1}, \cdots, \lambda_p\) are small. Then

\[\boldsymbol \Sigma\approx \sum_{i=1}^m (\sqrt{\lambda_i}\gamma_i)(\sqrt{\lambda_i}\gamma_i)^T\]

  • Let \(\mathbf L=\left(\sqrt{\lambda_1}\gamma_1, \cdots, \sqrt{\lambda_m}\gamma_m\right)\)

  • Let \(\boldsymbol \Psi=\boldsymbol \Sigma-\mathbf L \mathbf L^T\)

R code

Code
principal(exam, nfactors=2, rotate = "none")
Principal Components Analysis
Call: principal(r = exam, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
            PC1   PC2   h2    u2 com
Latin      0.81 -0.54 0.95 0.052 1.8
English    0.87 -0.43 0.95 0.054 1.5
History    0.85 -0.47 0.95 0.052 1.6
Arithmetic 0.85  0.44 0.91 0.090 1.5
Algebra    0.81  0.52 0.93 0.070 1.7
Geometry   0.82  0.51 0.94 0.064 1.7

                       PC1  PC2
SS loadings           4.19 1.43
Proportion Var        0.70 0.24
Cumulative Var        0.70 0.94
Proportion Explained  0.75 0.25
Cumulative Proportion 0.75 1.00

Mean item complexity =  1.6
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1
Code
principal(exam, nfactors=2)
Principal Components Analysis
Call: principal(r = exam, nfactors = 2)
Standardized loadings (pattern matrix) based upon correlation matrix
            RC1  RC2   h2    u2 com
Latin      0.96 0.17 0.95 0.052 1.1
English    0.93 0.30 0.95 0.054 1.2
History    0.94 0.25 0.95 0.052 1.1
Arithmetic 0.30 0.90 0.91 0.090 1.2
Algebra    0.22 0.94 0.93 0.070 1.1
Geometry   0.24 0.94 0.94 0.064 1.1

                       RC1  RC2
SS loadings           2.86 2.76
Proportion Var        0.48 0.46
Cumulative Var        0.48 0.94
Proportion Explained  0.51 0.49
Cumulative Proportion 0.51 1.00

Mean item complexity =  1.1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.02 
 with the empirical chi square  0.79  with prob <  0.94 

Fit based upon off diagonal values = 1

Method 2: MLE

  • We impose multivariate normality on the common and specific factors

\[\mathbf F\sim N(\mathbf 0, \mathbf I), \epsilon\sim N(\mathbf 0, \Psi)\]

  • The log-likelihood is \[\begin{aligned} l(\mu,\mathbf L, \Psi) &= - \dfrac{np}{2}\log{2\pi}- \dfrac{n}{2}\log{|\mathbf{LL}^T + \Psi|} -\\ &\dfrac{1}{2}\sum_{i=1}^{n}(\mathbf X_i-\mu)^T(\mathbf L \mathbf L^T+\Psi)^{-1}(\mathbf X_i-\mu) \end{aligned}\]

where \(\mathbf {X}_i\in \mathbb R^p\) denotes the \(i\) observation (not the \(i\)th feature).

The Number of Common Factors

  • The \(p\times p\) covariance matrix \(\Sigma\) is symmetric. As a result, there are \(\frac{p(p+1)}{2}\) parameters.
  • A factor mode imposes a structure on \(\Sigma\)
  • For a FA model with \(m\) common factors
  • A FA model a small number common factors, i.e., when \(m\) is small, the model uses fewer parameters
    • the model is more parsimonious
    • the model might not be adequate is \(m\) is too small
  • One can test whether an \(m\) is large enough

Choose \(m\) of Factors Computed using PCA

  • Cumulatative variance explained is should be reasonably large, such as >\(80\%\)

  • Look for elbow from the scree plot

A Goodness of Fit Test for the Adequacy of the Number of Common Factors

\[\begin{align*} H_0: \quad \boldsymbol{\Sigma}_{(p \times p)} &= \mathbf{L}_{(p \times m)} \, \mathbf{L}'_{(m \times p)} + \boldsymbol{\Psi}_{(p \times p)} \\ \\ -2 \ln \Lambda &= -2 \ln \left[ \frac{\text{maximized likelihood under } H_0} {\text{maximized likelihood}} \right] \\ \\ &= -2 \ln \left( \frac{|\hat{\boldsymbol{\Sigma}}|}{|\mathbf{S}_n|} \right)^{-n/2} + n \left[ \operatorname{tr}(\hat{\boldsymbol{\Sigma}}^{-1} \mathbf{S}_n) - p \right] \\ \\ & \text{It can be shown that } \operatorname{tr}(\hat{\boldsymbol{\Sigma}}^{-1} \mathbf{S}_n) - p = 0 \\ \\ &= n \ln \left( \frac{|\hat{\boldsymbol{\Sigma}}|}{|\mathbf{S}_n|} \right) \end{align*}\]

Test for the Adequacy of the Number of Common Factors

  • The number of parameters for covariance in the full model is \[\frac{p(p+1)}{2}\]
  • The number of parameters for covariance in the reduced model is \[mp + p - \frac{m(m-1)}{2}\]

Note: \(- \frac{m(m-1)}{2}\) is due to the nonuniqueness of \(\mathbf L\).

Degrees of freedom

  • The difference in numbers of parameters between the two models is

\[\begin{aligned} df&=\frac{p(p+1)}{2}-[mp+p-\frac{m(m-1)}{2}]\\ &= \frac{1}{2}[(p-m)^2 -p-m] \end{aligned}\]

  • Under the null hypothesis (adequate), the test statistics follows a chi-squared distribution when the sample size is large enough.

Test for the Adequacy of the Number of Common Factors

  • The result indicates that 1 factor is not adequate as the p-value is small.
  • The p-value is about whether the correlation structure specified in the proposed model is significantly different from that of the full model
Code
print("p-value for 1 factor")
[1] "p-value for 1 factor"
Code
factanal(exam, factors=1)$PVAL
   objective 
2.264759e-30 

Test for the Adequacy of the Number of Common Factors

  • The result indicates that 2 factors is adequate because the fit is not substantially from the full model.
Code
print("p-value for 2 factors")
[1] "p-value for 2 factors"
Code
factanal(exam, factors=2)$PVAL
objective 
 0.930055 

CFA vs EFA

Exploratory Factor Analysis

  • The FA approach we have discussed is exploratory in nature.
  • In fact, we can perform EFA and identify latent factors by using only correlations, not the data
  • The purpose of EFA is to explore the possible underlying structure that can explain the observed pattern of correlations
  • EFA is used when researchers do not have a specific idea about the underlying structure of data
  • EFA tries to identify the factor configuration (model)
  • EFA is hypothesis-generating

Exploratory or Confirmatory Factor Analysis

  • Confirmatory Factory Analysis (CFA) is used when a researcher has specific hypotheses or theories about the factor structure of their data.
  • It is a “theory-driven” approach.
  • In CFA, the researcher specifies the number of factors and which variables load onto which factors.
  • CFA is typically used in later stages of research to test or confirm the factor structure suggested by EFA
  • CFA is hypothesis testing. A pre-specified model is required

Exploratory or Confirmatory Factor Analysis

  • Use EFA when:
    • You are unsure about the underlying structure.
    • You aim to uncover complex patterns.
    • You need to form hypotheses and develop theory.
  • Use CFA when:
    • You have a predetermined theory or model.
    • You aim to test the hypothesis about the factor structure.
    • You need to confirm or disconfirm theories.

CFA: Example

R Packages for CFA

  • Conduct CFA in R
  • R packages:
    • sem
    • OpenMx
    • lavaan

An Example using “lavaan”

  • CFA can be performed using the latent variable analysis (“lavaan”) package in r
Code
model1 <- '
verbal =~ Latin + English + History
math =~ Arithmetic + Algebra + Geometry'

obj=cfa(model1, data=data.frame(exam))
summary(obj)
lavaan 0.6-19 ended normally after 51 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        13

  Number of observations                            60

Model Test User Model:
                                                      
  Test statistic                                 8.586
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.378

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  verbal =~                                           
    Latin             1.000                           
    English           0.993    0.059   16.762    0.000
    History           0.952    0.056   17.055    0.000
  math =~                                             
    Arithmetic        1.000                           
    Algebra           0.976    0.071   13.740    0.000
    Geometry          1.035    0.073   14.253    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  verbal ~~                                           
    math              1.151    0.342    3.360    0.001

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Latin             0.285    0.072    3.928    0.000
   .English           0.210    0.063    3.349    0.001
   .History           0.175    0.056    3.140    0.002
   .Arithmetic        0.341    0.084    4.060    0.000
   .Algebra           0.234    0.068    3.423    0.001
   .Geometry          0.206    0.071    2.916    0.004
    verbal            2.505    0.509    4.920    0.000
    math              2.015    0.429    4.693    0.000

The User model

Code
semPaths(obj, what="std", layout="tree", edge.label.cex=1.2)

The User model

  • The model fits well. This suggests that the model does not significantly deviate from the observed data.
Model Test User Model:
                                                      
  Test statistic                                 8.586
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.378

Understand the Baseline model

Code
model0 <- '
Latin ~~ Latin
English ~~ English
History ~~ History
Arithmetic ~~ Arithmetic
Algebra ~~ Algebra
Geometry ~~ Geometry
'
obj0=cfa(model0, data=data.frame(exam))
semPaths(obj0, what="est")

Understand the Baseline model

  • The null model assumes no relationships between the variables. The output indicates that the baseline model is a poor fit, which is expected since it does not account for any relationships.
Code
summary(obj0)
lavaan 0.6-19 ended normally after 8 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         6

  Number of observations                            60

Model Test User Model:
                                                      
  Test statistic                               462.072
  Degrees of freedom                                15
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    Latin             2.790    0.509    5.477    0.000
    English           2.679    0.489    5.477    0.000
    History           2.445    0.446    5.477    0.000
    Arithmetic        2.356    0.430    5.477    0.000
    Algebra           2.153    0.393    5.477    0.000
    Geometry          2.366    0.432    5.477    0.000

Model Fit Statistics

  • We often need to compare a FA (reduced) model to the full model \[H_0: \boldsymbol \Sigma(\mathbf L, \Psi)=\boldsymbol \Sigma\]

  • Chi-square test. Derived from the likelihood ratio test. Depends on sample sizes.

  • RMSEA: root mean square error of approximation compares the sample correlation matrix and the model correlation matrix. <0.06 is good. \[RMSEA = \sqrt{\frac{\delta}{df(N-1)}}\] where \(\delta=\chi^2 - df\).

Model Fit Statistics

  • CFI: comparative fit index that measures the relative difference between two models; is not affected by sample size too much; between 0 and 1; the larger the better. >0.9 is good

\[CFI= \frac{\delta(\mbox{Baseline}) - \delta(\mbox{User})}{\delta(\mbox{Baseline})}\]

  • More can be found in wikipedia and

    • https://doi.org/10.1016/B978-0-444-53737-9.50010-4
    • Bentler PM. Comparative fit indexes in structural models. Psychological bulletin. 1990 Mar;107(2):238.
    • http://www.davidakenny.net/cm/fit.htm

SEM Results: CFI and TLI

CFI: 0.999
TLI: 0.998
  • The Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) are both close to 1, indicating a good fit of the model to the data. For the two-factor model:
    • acceptable: >0.9
    • excellent: >0.95

CFI and TLI for user model and baseline model

Code
fitMeasures(obj0, c("chisq", "cfi", "rmsea"))
  chisq     cfi   rmsea 
462.072   0.000   0.705 
Code
fitMeasures(obj, c("chisq", "cfi", "rmsea"))
chisq   cfi rmsea 
8.586 0.999 0.035 

Compute Factor scores

Code
objscores <- lavPredict(obj)
plot(objscores)

Helpful Resources

  • https://quantdev.ssri.psu.edu/tutorials/intro-basic-confirmatory-factor-analysis
  • https://lavaan.ugent.be/tutorial/tutorial.pdf
  • https://stats.oarc.ucla.edu/r/seminars/rcfa/

Code
#knitr::knit_exit()